Quote

28.Changed $sha1/$sha256/$sha384/$sha512/$hmac() to use larger read buffer with files to improve speed.


I'm seeing a noticeable boost in speed for the listed hashes, but it wasn't clear whether $crc and $md5 were already considered to be optimized, or just not a priority.

I'm curious how large is the SHA* disk buffer now? From Saturn's sha2.dll which had a 4kb disk buffer, I was finding a 10% speed gain from increasing that to 16kb, but the disk speeds here increased more than that.

The attached alias is slow, because I used an 8gb filesize trying to avoid the OS disk cache favoring the 2nd disk read. The alias compares the speed of CRC vs MD5 when reading from disk, string vs binvar of same 8kb length, and a longer binvar so that overhead is a smaller percent of the time.

/compare_crc_md5 2

The '2' changes the order in which the huge disk file is read, otherwise it calculates CRC first. I had to REM the benchmark of $hash because it was extremely slow compared to all the other hashes. Hashing an 8kb string takes approx 31 ticks, so the 100k repetitions would have taken nearly an hour.

Since CRC is a much simpler algorithm than MD5, I expected that CRC would be significantly faster than MD5. However, the time difference between MD5 and CRC when handling binvar's or text was closer than I expected, and regardless which order I use MD5 or CRC against the 8gb file, when reading from disk CRC is always slower than MD5, though by a small percentage.

From the benchmark showing CRC slower from disk but very slightly faster from string/binvar, it looks like CRC is using a smaller disk buffer than MD5 does, but also that either CRC is not as optimized as it could be, or that MD5 was coded in assembler compared to CRC being coded in a higher-level language.

The speed difference of hashing 8kb text vs 8kb binvar I'm assuming is because the binvar doesn't need to keep re-allocating the memory which the text is doing.

I was expecting the much more complex MD5 to be more than twice the time of CRC, but the difference was only around 5% - when comparing the time from hashing the long binvars where there was much less overhead than in other tests. From looking at mirc.exe in a hex viewer, I was able to find the 1024 bytes of the CRC lookup table, so the cause isn't from $crc using the slower no-lookup algorithm. (I did see the lookup table duplicated twice, so there might be some duplicate code that could be removed.)

SHA* hashes haven't made CRC and MD5 go away. CRC is used to verify integrity of file transfers or quickly comparing 2 files of the same size, and MD5 is sometimes done as a faster-than-SHA1 method of doing the same, when the chance of a CRC collision is considered too high.

This is the benchmark where I found $hash being about 300x slower than $crc for text strings. Unless someone needs compatibility with $hash, they should either use the crchash alias, or use up to 13 hex digits from one of the crypto hashes for up to 52 bits of a hash.

Code
//var %string $str(a,8192), %reps 1000, %t $ticks | while (%reps) { noop $crc(%string,0) | dec %reps } | echo -a ticks: $calc($ticks - %t)
//var %string $str(a,8192), %reps 1000, %t $ticks | while (%reps) { noop $hash(%string,32) | dec %reps } | echo -a ticks: $calc($ticks - %t)

alias crchash { return $calc( $base($crc($1,0),16,10) % (2^$iif($$2 isnum 1-32,$gettok($2,1,46),32)) ) }



This is the slow alias where each of the CRC or MD5 disk reads takes over 1 minute. And be sure to delete the 8 gigs disk file when done.

Code
alias compare_crc_md5 {
  if ($disk($mircdir).free < $calc(1.1*2^33)) { echo -a insufficient free disk space | return }
  tokenize 32 $iif($1 == 2,md5 crc,crc md5)
  bset &string 1 0 | bwrite 8gig.dat $calc(2^33-1) 1 &string
  var -s %t $ticks , %a $ $+ $1 $+ (8gig.dat,2)
  echo -a disk $1 $eval(%a,2) ticks: $calc($ticks - %t)
  var -s %t $ticks , %a $ $+ $2 $+ (8gig.dat,2)
  echo -a disk $2 $eval(%a,2) ticks: $calc($ticks - %t)
  var -s %len 8192 , %reps 100000
  bset &string %len 0 | var %string $str(a,%len) | bset &longstring 99999999 0
  var %i %reps, %t $ticks | while (%i) { noop  $crc(%string,0) | dec %i } | echo -a crc. text $calc($ticks - %t)
  var %i %reps, %t $ticks | while (%i) { noop  $md5(%string,0) | dec %i } | echo -a md5. text $calc($ticks - %t)
  var %i %reps, %t $ticks | while (%i) { noop $sha1(%string,0) | dec %i } | echo -a sha1 text $calc($ticks - %t)
  ; var %i %reps, %t $ticks | while (%i) { noop $hash(%string,32) | dec %i } | echo -a hash text $calc($ticks - %t)
  var %i %reps, %t $ticks | while (%i) { noop  $crc(&string,1) | dec %i } | echo -a crc. bvar $calc($ticks - %t)
  var %i %reps, %t $ticks | while (%i) { noop  $md5(&string,1) | dec %i } | echo -a md5. bvar $calc($ticks - %t)
  var %i %reps, %t $ticks | while (%i) { noop $sha1(&string,1) | dec %i } | echo -a sha1 bvar $calc($ticks - %t)
  var -s %reps 10
  var %i %reps, %t $ticks | while (%i) { noop  $crc(&longstring,1) | dec %i } | echo -a crc. bvar $calc($ticks - %t)
  var %i %reps, %t $ticks | while (%i) { noop  $md5(&longstring,1) | dec %i } | echo -a md5. bvar $calc($ticks - %t)
  var %i %reps, %t $ticks | while (%i) { noop $sha1(&longstring,1) | dec %i } | echo -a sha1 bvar $calc($ticks - %t)
}