But your <list> is not sufficiently randomly chosen. When doing benchmarking, you have to try and anticipate real world needs. The best way to do this is to allow as much randomization as possible and then run the randomizations several thousands of times. The reason is, you don't know what the real use will be. You might think your benchmarks are more accurate, however having taken classes in benchmarking, you're taught a few major rules. #1 make sure both functions are tested under the EXACT same circumstances, #2 make sure the input to the functions is sufficiently randomly chosen, #3 only benchmark the code that needs to be benchmarked. Your benchmark doesn't follow #2, or #3. It doesn't randomize, for example, "34 34 68 29 09 34 563 653 354356 3563 3 35356 366 36 36 36 3452 653 236" which is what you chose, will work very fast if the sorting algorithm mIRC uses is Radix sort. It's optimized for when the numbers contain similar digits. So if mIRC does use radix sort, you've only shown that it is fast in that instance, not when each number has different digits. When testing a sorting algorithms there are generally 5 cases that must be tested:
1.) the input is already sorted
2.) the input is sorted in reverse order
3.) the input is completely random
4.) the input is partially sorted (1 2 3 5748 32154 7843)
5.) the input is partially reverse sorted (3 2 1 5748 32154 7843)

Your tests don't take any of those into account really.

As for the third, you are benchmarking code that you don't care about. while %i { $2- | dec %i } is what is being benchmarked. All you really care about is the $2-. Granted, that same code is benchmarked for both which prevents any margin of error due to that, but you still only benchmark the function in question, not other code as well.