mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Oct 2003
Posts: 3,918
A
argv0 Offline OP
Hoopy frood
OP Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
I know everyone loves hash tables when dealing with arrays in mIRC because its "faster" and more "efficient". My question is, has anyone ever actually benchmarked this time difference, and if so, could they post the results?

I'm really not buying the whole hash tables are oh so much better belief, since generally, most speedups in mIRC script are non-noticeable. My hypothesis is that any array with elements under a couple hundred would be accessed just as quickly as a hash table with as many elements.. Can someone prove/disprove this? shocked


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
What kind of arrays are you talking about? Associative arrays (mappings) will be more efficient since that is what a hash table is designed for. Accessing a hash table using a numeric index however is particularly inefficient since it's not what they were designed for, in which case variables would in all likelihood be as fast or faster. I haven't got the time/inclination to create and run benchmarks at the moment, why not run the benchmarks yourself? If you do, post the benchmark code you use aswell.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Oct 2004
Posts: 38
S
Ameglian cow
Offline
Ameglian cow
S
Joined: Oct 2004
Posts: 38
Hi! I hope correctly has understood what are you about, so what I get:

Code:
 
alias hash {
  hmake test 10
  var %x = 0
  var %ms = $ticks  
  while (%x < 1000) {
    hadd test %x 0
    inc %x 1
  }
  echo -s end(hadd test) (time elapsed: $calc(($ticks - %ms) / 1000) secs)
  hfree test
}

alias _var {
  var %x = 0
  var %ms = $ticks
  while (%x < 1000) {
    set %test_ $+ %x 0
    inc %x 1
  }
  echo -s end(setvar test) (time elapsed: $calc(($ticks - %ms) / 1000) secs)
  unset %test_*
}

alias file {
  var %x = 0
  var %ms = $ticks 
  while (%x < 1000) {
    write test.txt 0
    inc %x
  }
  echo -s end(filewrite test) (time elapsed: $calc(($ticks - %ms) / 1000) secs)
  write -c test.txt
}

 


results are as follows:
hash tables: 0.1 - 0.12 seconds
variables: 0.35 - 0.38 seconds
fwrite: ~8.5 seconds

It shows that the hash tables realy are faster (about 3x times faster that the /set command, and fwrite totally sucks).

Joined: Mar 2004
Posts: 175
Vogon poet
Offline
Vogon poet
Joined: Mar 2004
Posts: 175
You could write scripts for file handling (/fwrite), and ini files (/writeini). Additionally, you could write scripts for reading off the different formats ($hget, $read, $fread, $var, $readini).


- Relinsquish
Joined: Oct 2003
Posts: 3,918
A
argv0 Offline OP
Hoopy frood
OP Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
Quote:

I haven't got the time/inclination to create and run benchmarks at the moment, why not run the benchmarks yourself? If you do, post the benchmark code you use aswell.


The idea was I was hoping someone else had already done the bench, since I, like you, am pretty lazy cool

Anyway, modifying the code above (Thanks Stranger), I ended up with these results, running the test on 6 arrays with sizes ranging from 100-600 elements. I used 100-600 because, according to my hypothesis, variable arrays should start becoming slower after 300 elements. I ran the test three times for good measure.

Note that while all times are calculated to be in seconds, using $ticks is an approximate calculation of a true second. For simplicity, however, i'll just call them seconds from here on in.

Results:
Code:
Test results for hadd,setv,file tests:
 
size:  100 	200 	300 	400 	500 	600
-----------------------------------------------------
hadd:  0	0	0	0.015	0.016	0.031
setv:  0	0.016	0	0.016	0.031	0.031
file:  0.047	0.078	0.125	0.156	0.188	0.25
 
Test results for hadd,setv,file tests:
 
size:  100 	200 	300 	400 	500 	600
-----------------------------------------------------
hadd:  0	0	0.015	0.016	0.015   0.031
setv:  0	0.015	0.016	0.031   0.032   0.032
file:  0.047	0.063	0.109	0.141   0.187   0.234
 
Test results for hadd,setv,file tests:
 
size:  100 	200 	300 	400 	500 	600
-----------------------------------------------------
hadd:  0        0       0       0.015   0.016   0.015
setv:  0        0.016   0.016   0.016   0.031   0.047
file:  0.046	0.078   0.125   0.156   0.188   0.219


Discussion:
First off lets conclude that file handling is largely slower, this was not under debate but I left it there as a 'control', i suppose.

Interestingly though, in an iterative array, this test shows that /set only seems to be slower on arrays of size 200 and 500 and sometimes 400. I ran extra tests using the code (following this report) and realized that the slowdown on /set seems to be more or less 'random'.

Keep in mind though, that according to my hypothesis, /set should start slowing down at or around 300, and this seems to be exactly the case. Before 300, the difference is 'unmeasurable'.

However, even after 300, the difference between var and hadd seems to max out at around 0.03 seconds, or 3 milliseconds, for the array sizes used. This is a truly small number, and I think it's safe to say that 3 milliseconds will not create any noticeable time difference in any script process.

Conclusion:
Let me just say first off, that I was never debating variable arrays being faster than hash tables for arrays of many elements (1000 or greater). I'll leave the tests up to you, but the results I saw with the test code (given below) show that hash tables are truly better for any array greater than 1000 elements.

However, my original assumption was that variables and hash tables are more or less equivalent in 'efficiency' and 'speed' for arrays of size under and around 300. Test results proved that this is the case, even up to 600 elements.

Therefore, I think it's fair to say that hash tables are not always necessary for a scripter to use when dealing with arrays, at least for reasonable purposes (clone scanners, nick completion, flood detection, anything dealing with nicknames). I think I can maintain this argument with the results shown above.

Code:

I didn't change the vital organs of the code given by Stranger, I just updated the way it reports the results to make it cleaner. I also made a /bench function which tests each array incrementing by 100 from $$1 to $$2. Finally, a minor adjustment: I changed the names because there is a name conflict for $hash and $file

Code:
alias _hash {
  hmake test 10
  var %x = 0
  var %ms = $ticks  
  while (%x < $$1) {
    hadd test %x 0
    inc %x 1
  }
  %x = $calc(($ticks - %ms) / 1000)
  hfree test
  return %x
}

alias _var {
  var %x = 0
  var %ms = $ticks
  while (%x < $$1) {
    set %test_ $+ %x 0
    inc %x 1
  }
  %x = $calc(($ticks - %ms) / 1000) 
  unset %test_*
  return %x
}

alias _file {
  var %x = 0
  var %ms = $ticks 
  while (%x < $$1) {
    write test.txt 0
    inc %x
  }
  %x = $calc(($ticks - %ms) / 1000) 
  write -c test.txt
  return %x
}

; usage: /bench <START> <END>
alias bench {
  var %h, %v, %f, %s, %i = $$1
  while (%i <= $$2) {
    %s = %s %i $+ $chr(9)
    %h = %h $_hash(%i) $+ $chr(9)
    %v = %v $_var(%i) $+ $chr(9)
    %f = %f $_file(%i) $+ $chr(9)
    inc %i 100
  }
  window -a -t4 @Test-results
  aline -p @Test-results Test results for hadd,setv,file tests:
  aline -p @Test-results $chr(3)
  aline -p @Test-results size: $chr(3) %s
  aline -p @Test-results -----------------------------------------------------
  aline -p @Test-results hadd: $chr(3) %h 
  aline -p @Test-results setv: $chr(3) %v
  aline -p @Test-results file: $chr(3) %f
  aline -p @Test-results $chr(3)
}


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
The factor that affects performance the most is the position of a variable in Variables. Referencing a variable that happens to be 10th in the list of variables is very fast, however this isn't the case when that var is the 3000th in the list. This is also true for setting a var: if Variables has a cpl of thousands of items, changing the value of the 10th variable is fast but doing the same to the 3000th var or, even worse, setting a new var (which is then appended to the end of the list) is much slower. None of this is true for hash tables: the number of items in it affects neither /hadd nor $hget() (but it affects $hfind() and $hget().item, as starbucks_mafia hinted in a previous post).

I have benchmarked all these things, but I didn't write separate code for each case I examined, I just modified the code for the next measurement each time. So, I cannot post a full benchmark that proves it (too lazy to do that), but I can post the 4 basic aliases I used and point out which parts I changed. By passing the appropriate numbers to them, one can test vars vs hash tables for any number of items.
Code:
alias varset {
  unset %test_*
  var %i = $$1, %a = $str(b,$$2), %t = $ticks
  while %i { set %test_ $+ [color:red]%i[/color] %a | dec %i }
  echo -a VarSet $1-: $calc($ticks - %t)
}

alias varget {
  var %i = $$1, %t = $ticks
  while %i { !.echo -q %test_ [ $+ [ [color:red]%i[/color] ] ] | dec %i }
  echo -a VarGet $1-: $calc($ticks - %t)
}

alias hshset {
  hfree -w test | hmake test
  var %i = $$1, %a = $str(a,$$2), %t = $ticks
  while %i { hadd test _ $+ %i %a | dec %i }
  echo -a HshSet $1-: $calc($ticks - %t)
}

alias hshget {
  var %i = $$1, %t = $ticks
  while %i { !.echo -q $hget(test,_ $+ %i) | dec %i }
  echo -a HshGet $1-: $calc($ticks - %t)
}

The syntax is
/varset <iterations> <length of data>
/varget <iterations>
<length of data> was included in case this makes any significant difference; it doesn't appear to.
I ran these aliases quite a few times with different arguments, changing the red parts (%i) to a specific number each time. After running
/varset 3000 10
one can easily notice that changing "%i" to "3000" makes any subsequent /varget and /varset calls fast, but changing it to "1" slows them down a lot. That's because %test_1 is at the bottom of the vars list, while %test_3000 is at the top (assuming you don't have other vars in Variables).

The conclusion is more or less the same as yours:
- the more the items you're going to use, the better hash tables are for the job
- this difference becomes significant with >1000 items (or perhaps >500, in slower systems)


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Oct 2003
Posts: 3,918
A
argv0 Offline OP
Hoopy frood
OP Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
FOLLOWUP:

Funny, i ran benches for elements up to 6000 and it seems /set becomes considerably slower than /write

Code:
Test results for hadd,setv,file tests:
 
size:  1000	 2000	 3000	
-----------------------------------------------------
hadd:  0.047	 0.078	 0.125	
setv:  0.094	 0.593	 1.984	
file:  0.453	 1.016	 1.39	
fopen:  0.047	 0.109	 0.172	
 
Test results for hadd,setv,file tests:
 
size:  1000	 2000	 3000	
-----------------------------------------------------
hadd:  0.031	 0.078	 0.187	
setv:  0.078	 0.609	 2.016	
file:  0.453	 0.969	 1.625	
fopen:  0.047	 0.094	 0.156	
 
Test results for hadd,setv,file tests:
 
size:  6000	
-----------------------------------------------------
hadd:  0.469	
setv:  9.766	
file:  3.079	
fopen:  0.375
 
Test results for hadd,setv,file tests:
 
size:  6000	
-----------------------------------------------------
hadd:  0.375	
setv:  9.672	
file:  3	
fopen:  0.282		


Notice that i added in code for /fopen testing, tests seem to show that it's actually faster than hashes at high element counts... weird?

Code:
_fopen {
  if ($fopen(FILE)) .fclose FILE
  .fopen -no FILE test.txt
  var %x = 0
  var %ms = $ticks 
  while (%x &lt; $$1) {
    .fwrite -n FILE 0
    inc %x
  }
  %x = $calc(($ticks - %ms) / 1000) 
  write -c test.txt
  .fclose FILE
  return %x
}


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: Oct 2003
Posts: 3,918
A
argv0 Offline OP
Hoopy frood
OP Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
ANOTHER FOLLOWUP

This is a really interesting result ..look at /fopen

Code:
Test results for hadd,setv,file tests:
 
size:  10000	
-----------------------------------------------------
hadd:  1.062	
setv:  26.484	
file:  4.937	
fopen:  0.469	


as array sizes get bigger hadd barely even competes with fopen. and /set is over 4 times slower than /write.

Maybe /fopen should be under more consideration when dealing with large chunks of data?

Keep in mind this test stresses on multiple /fwrite or /hadd commands, and does not look at retrieving data from a list. Hash / file speed depends on how they are used.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"

Link Copied to Clipboard