I'm concerned that the $hash function doesn't have enough randomness to it. I first noticed this when benchmarking to see if there was a speed benefit to using local aliases. Basically I created a whole file of empty aliases which were similiarly named and noticed performance calling one of those aliases increased DRAMATICALLY based on the number of aliases in the file. Since I presume the same hash function is used internally for everything from variables to aliases I started testing this hypothesis using the $hash function.
As I understand it the $hash function takes a string and the number of bits to return. However assuming hash tables of roughly 10 bits or 1024 entries I was shocked to discover that every single letter/number/etc hashes to the value 1. In fact it's not until you add 17 bits of precision (128k) that it changes from the value 1. I don't think anybody is using 128k sized hash tables which means for all intents and purposes every single letter variable or alias is a hash collision if the result of the hash function isn't further modified. Given that it takes a bit parameter my guess is it isn't...
For the record every 2 letter/number/etc combination appears to collide at 2 for 10 bits, but starts to diverge very slowly starting at 11bits. Still very bad!
At 3 or more letters you start to see some randomness, but it's still somewhat limited because the last characters of each string contribute very little to the hash! For example: test-1 through test-9 are the same hash at 10bits, test-10 through test-99 are the same, test-100 through test-199 are the same. Test-1aa through test-1ZZ are all the same and only 1 off from the 100! In fact it's nearly impossible to get the last character in the string to modify the value of the hash for any reasonable bit size argument to the $hash function. I'm pretty sure that's the DEFINITION of a bad hash algorithm.
For the record 1000 aliases named test-#### suffered a bad performance hit, 1000 totally randomly generated alias names had little effect on performance.
I'm implementing stacks/temp storage in a hash using numbers so I'm guessing that I have insane hash collisions since the single digit numbers are all colliding and the 2 digits are all colliding. Given the poor parsing performance of mIRC it's probably hiding this, but why not choose a better hash algorithm...
Check out:
General Purpose Hash FunctionsMax bucket length would be a good property to use with $hget()...
Do scripted hash tables actually use the size specified in hmake or do they scale up to the next bit size? Since the $hash function isn't introducing enough randomness into the high bits it won't really help for smallish strings...