mIRC Home    About    Download    Register    News    Help

Print Thread
#270985 05/11/22 01:25 PM
Joined: Jan 2004
Posts: 1,944
maroon Offline OP
Hoopy frood
OP Offline
Hoopy frood
Joined: Jan 2004
Posts: 1,944
When extremely large range sizes are of certain bit lengths, the max random number returned appears to come from a range whose size is a 1+ bits smaller.

This appears different than the 2019 biases in the old RNG where numbers at the bottom of the range had preference. The granularity issue discovered back then at 24bits is something that would be more difficult to detect each time the range size doubles. I didn't try to look further because this kind of bias can overshadow everything else.

//var %bits 512, %hex %bits / 4, %a.bf $base($rands(0,$calc(2^%bits -1)),10,16,%hex) | if ($len(%a.bf) < %hex) var -s %a.bf $str(0,$calc($v2 - $v1)) $+ %a.bf

I was testing the string manipulation work-around to the 100 digit limit for $base padding, by generating a random 512-bit number, changing it to hex, then padding it to 128 digits. I kept on coming up with the 1st digit being zero, but it wasn't an error calculating the padding length. It was returning random numbers that didn't exceed 1% of 2^512. So when I tested range size 2^1024 I found the 1st 3 digits were always zero, so this was starting to look like an effect that just got wider as the numbers got larger.

However that wasn't the case. It seems that there were ranges where max_rand/range_size was close to 100%, but the next higher 2^n it was 50% then 25% etc. The max_rand would stall out as coming from the same 2^n range as the requested range got larger, and then eventually a few bits longer it would wake up and be able to return numbers larger than 99% again. But would eventually reach another plateau that lasts for several more bit lengths. And as the bit lengths get longer, and the plateaus continue across a larger group of bitlengths.

--

This next alias is just a quick and dirty script that fetches 99 random numbers from a range size of each bit length, and reports the largest number as a percentage of the range size and gives log2(max-rand), so typically that's more than enough rolling the dice to get a number very close to the top of the range.

But this shows that there are many different range sizes where it's the equivalent of rolling a 100-sided dice 99 times and never coming up with a number larger than 1, or less.

The first plateau happens from range sizes having bit lengths from 53 through 64 which don't want to return a number larger than 2^53. But at larger bit lengths the plateau can be a lot broader than 11.

--

This brief test isn't demonstrating that the ranges with max being 99% are ok, just that they didn't demonstrate this type of bias. With random numbers you can't prove they're random, you can only find bias that shows they're not.

//var %i 10 | while (%i isnum 1-2048) { var %range.bf $calc(2^%i -1) , %max.bf 0 , %j 99 | while (%j) { var %r.bf $rands(0,%range.bf) | if (%r.bf > %max.bf) var %max.bf $v1 | dec %j } | echo -a $left( $calc(100* %max.bf / %range.bf ),5) $+ % range 2^ $+ %i : log2(max)= $round($log2(%max.bf),1) | inc %i }

--

Regarding the 2^64 size value from the RNG being translated into a 2^53 double in past versions. In the rand bias thread from 2019, you were talking about finding various ways to cast to a double without having biases, such as the one beta where the range size being odd or even had a bias on the output number itself being more/less likely to be even. It seems strange that there's not an easy way to translate data from integer to doubles without needing to have manipulation that can cause bias.

It would seem that it should be possible to have the 64bit RNG output be contained in 2 different uint32's. Since that range size has exactly 11 bits too many to fit into the [0,2^53-1] range, that it could be a simple matter of the compiler being able to plunk the [0,2^32-1] unsigned integer into a double where each of the 2^32 outputs are possible, and the other 32bit value could discard the top 11 bits, before turning that uint32 into a different double. Something on an assembler level like

mov eax,[jsf64_low]
mov ebx,[jsf64_high]
call eax_unsigned_int_to_double#1
mov eax,ebx
and eax,001FFFFFxD
call eax_unsigned_int_to_double#2

....
Rand53double = double#2 * 2^32 + double#1

Then it seems it would be a precise calculation to multiply the Double#2 by 2^32 and then add them together, and you have 2^53 different outputs that each map to 1 input, and there wouldn't need to be any rounding or anything which could ever cause the doubles output to have bias, as long as the %throwaway_above or %throwaway_below is done appropriately.

maroon #270988 05/11/22 07:41 PM
Joined: Dec 2002
Posts: 5,225
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,225
Thanks this has been fixed for the next beta.


Link Copied to Clipboard