|
Joined: Jul 2007
Posts: 1,124
Hoopy frood
|
OP
Hoopy frood
Joined: Jul 2007
Posts: 1,124 |
mirc's filter command have switches for numeric sort and descending sort. I was wondering if a random sort switch could be implemented to place all the strings in random order.
|
|
|
|
Joined: Jan 2003
Posts: 2,125
Hoopy frood
|
Hoopy frood
Joined: Jan 2003
Posts: 2,125 |
You can use /filter -a for that, eg /filter -ffa in.txt out.txt randomsort
alias randomsort return $calc($rand(1,3) - 2)
|
|
|
|
Joined: Jul 2007
Posts: 1,124
Hoopy frood
|
OP
Hoopy frood
Joined: Jul 2007
Posts: 1,124 |
Thank you, qwerty. That does the trick.
|
|
|
|
Joined: Nov 2006
Posts: 1,552
Hoopy frood
|
Hoopy frood
Joined: Nov 2006
Posts: 1,552 |
It should "sufficiently" shuffle your lines. However, if you're pedantic about the degree of "randomness", you have to send it through the alias several times, due to the way it works. ; randomtest <S> <L> <I> <R>
; S = Set (total numbers) (N)
; L = Line to show distribution for (N out of S)
; I = Iteration runs (N)
; R = Randomize repetitions (N)
alias randomtest {
if ($1 isnum 1-) && ($2 isnum 1- $+ $1) && ($3 isnum 1-) && ($4 isnum 1-) {
echo -sg $3 iterations on a set of $1 lines, randomized $4 time(s). Line $qt($2) found at position:
var %s = 1, %l = 1, %i = 1, %result
; create set S
window -h @rand.in
while (%s <= $1) { aline @rand.in $v1 | inc %s }
; I iteration runs
hmake randomize
window -h @rand.out
while (%i <= $3) {
; randomize set R times
var %r = 1
while (%r <= $4) { filter -wwca $iif((%r == 1),@rand.in,@rand.out) @rand.out randomsort | inc %r }
; count positions of L
hinc randomize $fline(@rand.out,$2,1)
inc %i
}
; distribution result
while (%l <= $1) { %result = $iif(%result,$v1 ---) %l : $hget(randomize,%l) $+ x | inc %l }
echo -sg %result
window -c @rand.in | window -c @rand.out
hfree randomize
}
} A single run shows noticeable "clusters": /randomtest 5 1 5000 1
5000 iterations on a set of 5 lines, randomized 1 time(s). Line "1" found at position:
1 : 1434x --- 2 : 1347x --- 3 : 1079x --- 4 : 787x --- 5 : 353x
/randomtest 5 3 5000 1
5000 iterations on a set of 5 lines, randomized 1 time(s). Line "3" found at position:
1 : 1128x --- 2 : 873x --- 3 : 1209x --- 4 : 958x --- 5 : 832x
/randomtest 5 5 5000 1
5000 iterations on a set of 5 lines, randomized 1 time(s). Line "5" found at position:
1 : 543x --- 2 : 729x --- 3 : 739x --- 4 : 1021x --- 5 : 1968x
With three runs, they're almost gone: /randomtest 5 1 5000 3
5000 iterations on a set of 5 lines, randomized 3 time(s). Line "1" found at position:
1 : 1010x --- 2 : 1021x --- 3 : 1067x --- 4 : 935x --- 5 : 967x
/randomtest 5 3 5000 3
5000 iterations on a set of 5 lines, randomized 3 time(s). Line "3" found at position:
1 : 984x --- 2 : 952x --- 3 : 1026x --- 4 : 1046x --- 5 : 992x
/randomtest 5 5 5000 3
5000 iterations on a set of 5 lines, randomized 3 time(s). Line "5" found at position:
1 : 919x --- 2 : 984x --- 3 : 992x --- 4 : 996x --- 5 : 1109x
Just for your information 
Last edited by Horstl; 13/05/10 11:55 AM.
|
|
|
|
Joined: Jul 2007
Posts: 1,124
Hoopy frood
|
OP
Hoopy frood
Joined: Jul 2007
Posts: 1,124 |
Thank you, Horstl, I appreciate your time and detailed input.
|
|
|
|
Joined: Dec 2002
Posts: 294
Pan-dimensional mouse
|
Pan-dimensional mouse
Joined: Dec 2002
Posts: 294 |
It should "sufficiently" shuffle your lines. However, if you're pedantic about the degree of "randomness", you have to send it through the alias several times, due to the way it works. Actually even with three iterations you're not going to achieve reasonable randomness. And this is only with a set of 5 lines. It would scale terribly as the number of lines increased. It's just a flawed method to begin with (although at first look I thought it might work, too). The only really valid method would be to manually randomize the data without the use of /filter. And since this option exists, there's really no point in adding this functionality to /filter. For example, if you want to randomize lines from @input and put them in @output, you can just do something simple like this:
while ($line(@input,0) > 0) {
var %i = $rand(1,$ifmatch)
aline @output $line(@input,%i)
dline @input %i
}
which would only require one iteration.
|
|
|
|
Joined: Oct 2003
Posts: 3,641
Hoopy frood
|
Hoopy frood
Joined: Oct 2003
Posts: 3,641 |
What about performance? I'd say a switch could be useful for improving perf, I imagine calling an alias N times will start to get slow, though I haven't done the benchmarks.
|
|
|
|
Joined: Jan 2003
Posts: 2,125
Hoopy frood
|
Hoopy frood
Joined: Jan 2003
Posts: 2,125 |
Sorting with an alias (which is called O(N*logN) times) is indeed significantly slower than internal sorting, even when the alias is very simple.
The random sort way is also not a good randomizer (as has been pointed out). It's just a quick and dirty way (and not an argument against the suggestion) when the file is not very large and when randomness is not meant for any "serious" application.
|
|
|
|
|