mIRC Home    About    Download    Register    News    Help

Print Thread
#241770 27/05/13 07:17 PM
Joined: Oct 2012
Posts: 1
B
Bitgod Offline OP
Mostly harmless
OP Offline
Mostly harmless
B
Joined: Oct 2012
Posts: 1
I've noticed for years that $rand has been biased when it comes to the first and last entries of its $rand().

See below:

[03:05pm] <@IRContagi> N
[03:05pm] <@IRContagi> O
[03:05pm] <@IRContagi> G
[03:05pm] <@IRContagi> N
[03:05pm] <@IRContagi> G
[03:05pm] <@IRContagi> N
[03:05pm] <@IRContagi> N
[03:05pm] <@IRContagi> I
[03:05pm] <@IRContagi> G
[03:05pm] <@IRContagi> I
[03:05pm] <@IRContagi> N
[03:05pm] <@IRContagi> I
[03:05pm] <@IRContagi> I
[03:05pm] <@IRContagi> G
[03:05pm] <@IRContagi> G
[03:05pm] <@IRContagi> I
[03:05pm] <@IRContagi> G
[03:05pm] <@IRContagi> I
[03:05pm] <@IRContagi> O
[03:05pm] <@IRContagi> N
[03:05pm] <@IRContagi> G
[03:05pm] <@IRContagi> G
[03:05pm] <@IRContagi> O
[03:05pm] <@IRContagi> O
[03:05pm] <@IRContagi> O
[03:05pm] <@IRContagi> I
[03:05pm] <@IRContagi> N
[03:05pm] <@IRContagi> N
[03:05pm] <@IRContagi> O

I know you will say it's not broken or that a old script won't work but a true $rand is needed. Thank you.

Bitgod #241771 27/05/13 07:20 PM
Joined: Jul 2006
Posts: 4,149
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,149
It looks like (from irc) you are using the following:
Code:
timercall 0 5 msg $chan $!gettok(B I N G O G G O O B B B B O N G N B B O I O N G I B I N G O B O, $!rand(1,32), 32)
The result you are seeing are correct, what's wrong according to you?

Last edited by Wims; 27/05/13 07:26 PM.

#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Bitgod #241773 27/05/13 07:41 PM
Joined: Mar 2010
Posts: 57
W
Babel fish
Offline
Babel fish
W
Joined: Mar 2010
Posts: 57
$rand() is not broken. You are likely a victim of the clustering illusion.

The number of samples you've taken (28 or so) is way too small to make any mathematical claims. That being said, it probably uses C's rand() which is not the best, but it does just fine for most purposes.

Bitgod #241774 27/05/13 07:51 PM
Joined: Mar 2010
Posts: 57
W
Babel fish
Offline
Babel fish
W
Joined: Mar 2010
Posts: 57
Seeing the line you were using to generate those results:

Quote:
$!gettok(B I N G O G G O O B B B B O N G N B B O I O N G I B I N G O B O, $!rand(1,32), 32)


Firstly, "biased" is a very ambiguous term, for our purposes I assume you meant that in the context of probability distribution: as in expecting a continuous uniform distribution.

Code:
Given the following string:

B I N G O G G O O B B B B O N G N B B O I O N G I B I N G O B O

We have:

B = 9
I = 4
N = 5
G = 6
O = 8
-------
+ = 32

Expected distribution (given a LARGE ENOUGH sample):

B = 9/32 = 28.125% 
I = 4/32 = 12.5%
N = 5/32 = 15.625%
G = 6/32 = 18.75%
O = 8/32 = 25%


Code:
alias rand_demo {

  ; large enough to make sense of the randomness distribution
  var %x = 1000000

  var %total = %x
  var %b = 0, %i = 0, %n = 0, %g = 0, %o = 0

  while (%x) {
    inc % $+ $exp
    dec %x 
  }

  echo -a B = %b / %total =  $calc(%b / %total * 100)
  echo -a I = %i / %total =  $calc(%i / %total * 100)
  echo -a N = %n / %total =  $calc(%n / %total * 100)
  echo -a G = %g / %total =  $calc(%g / %total * 100)
  echo -a O = %o / %total =  $calc(%o / %total * 100)
}

alias -l exp return $gettok(B I N G O G G O O B B B B O N G N B B O I O N G I B I N G O B O, $rand(1,32), 32)


With 1 million samples, here is my results:

Code:
B = 281972 / 1000000 = 28.1972
I = 125320 / 1000000 = 12.532
N = 156013 / 1000000 = 15.6013
G = 186661 / 1000000 = 18.6661
O = 250034 / 1000000 = 25.0034


Compare to the expected:

Code:
28.1972/28.125 = 100.256711%
12.532/12.5 = 100.256%
15.6013/15.625 = 99.84832%
18.6661/18.75 = 99.552533%
25.0034/25 = 100.0136%


99.55 - 100.25 is perfectly fine. The results are as expected. Of course if you increase the sample even further, you should see the numbers converging at 100%.

Last edited by Wiz126; 27/05/13 08:05 PM.
Bitgod #241775 27/05/13 08:06 PM
Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
I also don't see any bias:

Code:
alias testrand {
  var %buckets = $iif($1,$1,5), %picks = $iif($2,$2,5000)
  var %i = 1
  if ($hget(testrand)) hfree testrand
  hmake testrand $calc(%buckets * 2)
  while (%i <= %picks) {
    hinc testrand n $+ $rand(1,%buckets) 
    inc %i
  }
  var %output
  %i = 1
  while (%i <= %buckets) {
    %output = $addtok(%output, $+(%i,=,$hget(testrand,n $+ %i)), 32)
    inc %i
  }
  echo -a Testing $!rand(1, %buckets $+ ) in %picks samples
  echo -a Frequencies for $!rand(1, %buckets $+ ) in %picks samples: %output
  hfree testrand
}


Running /testrand 5 5000 (default args):

Frequencies for $rand(1, 5) in 5000 samples: 1=990 2=1010 3=1000 4=991 5=1009

Running /testrand 100 5000:

Frequencies for $rand(1, 100) in 5000 samples: 1=57 2=51 3=55 4=50 5=56 6=54 7=53 8=42 9=50 10=49 11=41 12=49 13=48 14=54 15=42 16=37 17=52 18=37 19=44 20=49 21=38 22=44 23=42 24=47 25=49 26=51 27=57 28=44 29=46 30=45 31=52 32=54 33=60 34=60 35=51 36=39 37=60 38=51 39=51 40=57 41=51 42=49 43=55 44=56 45=49 46=51 47=51 48=55 49=56 50=48 51=58 52=41 53=52 54=44 55=48 56=44 57=42 58=57 59=48 60=62 61=46 62=52 63=52 64=60 65=59 66=56 67=50 68=40 69=41 70=66 71=47 72=50 73=48 74=42 75=48 76=48 77=36 78=46 79=45 80=48 81=52 82=43 83=53 84=54 85=58 86=45 87=57 88=57 89=54 90=54 91=55 92=57 93=43 94=44 95=56 96=46 97=55 98=47 99=55 100=50

I don't see any bias in either of these runs. The first run might look biased to the first bucket, so for good measure I did some more runs:

Frequencies for $rand(1, 5) in 10000 samples: 1=2067 2=1966 3=1995 4=2033 5=1939
Frequencies for $rand(1, 5) in 10000 samples: 1=2055 2=2018 3=2008 4=2002 5=1917
Frequencies for $rand(1, 5) in 10000 samples: 1=2031 2=1963 3=1966 4=2000 5=2040
Frequencies for $rand(1, 5) in 10000 samples: 1=2056 2=1988 3=1990 4=2005 5=1961
Frequencies for $rand(1, 5) in 10000 samples: 1=2005 2=2014 3=2000 4=2008 5=1973
Frequencies for $rand(1, 5) in 10000 samples: 1=1997 2=2085 3=2026 4=1960 5=1932
Frequencies for $rand(1, 5) in 10000 samples: 1=1996 2=1945 3=2029 4=1984 5=2046
Frequencies for $rand(1, 5) in 10000 samples: 1=1971 2=1994 3=2020 4=2047 5=1968
Frequencies for $rand(1, 5) in 10000 samples: 1=1977 2=2010 3=2048 4=1922 5=2043
Frequencies for $rand(1, 5) in 10000 samples: 1=2002 2=1927 3=2093 4=1969 5=2009
Frequencies for $rand(1, 5) in 10000 samples: 1=2071 2=1937 3=2010 4=2031 5=1951
Frequencies for $rand(1, 5) in 10000 samples: 1=2054 2=1965 3=2061 4=1949 5=1971

I'm no statistician, but at a quick glance I don't see anything too far out of the ordinary. That said, calculating the relative standard deviation on this is 0.924%, so this data set has a standard deviation of less than %1. That's not what you'd expect to see if the data set was biased. (*)

(*) Wiz's continuous probability distribution is a much better metric than %RSD-- as I said, I am not a statistician.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"

Link Copied to Clipboard