Raccoon, your example didn't use buckets=1, which is the only case where items remain in a sequential pattern in the order they were created.

//var %id = testing.hlist | hmake %id 1 | hadd %id a a | hadd %id b b | hadd %id c c | hsave %id %id | loadbuf -api %id | hfree %id | hmake %id 1 | hload %id %id | remove %id | hsave %id %id | loadbuf -api %id | remove %id | hfree %id

It appears the physical order on disk depends on the number of buckets. Buckets=100 leaves the order unchanged. Buckets=1 flip the order. But in this next example, it looks like most of the other buckets keep the same order generally, but most items are flipped with nearby items.

//var %buckets 5 , %id test.dat | hfree -w %id | hmake -s %id %buckets | var %i 10 | while (%i) { hadd %id item $+ %i 1 | dec %i } | hsave %id %id | filter -fs %id *item* | hfree %id | hmake -s %id %buckets | hload %id %id | hsave %id %id | filter -fs %id *item*


Khaled wrote

from the perspective of the scripter, they are random. You should not depend on items in hash tables being stored in any particular order.

I wasn't trying to depending on items being in a specific shuffled order when buckets is greater than 1, I was saying that repeating the same combo of buckets/names+number-of-items causes the identical pattern each time it's tried, and the example of a small quantity of similarly named items showed the order doesn't appear very random. But yes, different mIRC versions have different shuffling patterns than others. For the code in my 2nd post, or changing the last code in my 1st post to use buckets=101, the current version and 6.35 have different shuffling order from each other, but each version consistently shuffles the same things the same way each time.

I had incorrectly left a "not" in my previous post, it should be "especially if the item names are not similar". When changing the buckets from 1 to 100 in the 2nd example of my 1st post, the items look like they're not shuffled very well at all. 8,9 6,7 4,5 2,3 1... with the 5-length and 6-length item names being shuffled as separate groups.

The similar item names appear to be causing the lack of shuffling, as $hash() does not have very "random-looking" output. The article on $hash at wikichip shows some of the problems with $hash output; including poor distribution of output values, output values heavily dependant on input length, and changing 1 bit of the input changes very few bits of the output.


Bob Jenkins has a few varieties of hashes which are pretty fast, and might be a good replacement for $hash(). I've created aliases for his "Lookup3" and "One at a time" hashes at https://mircscripts.net/WHkiR and there is C source code around for both functions. The 1-at-a-time is probably the better choice for a replacement of $hash in bucketing hash items, as Lookup3 handles bytes in groups of 12, and might be more suited for longer strings - though hard to know without benchmarking them.

$hget won't tell if these items are getting placed into the same bucket, or just happen to be in nearby buckets. If it's the latter, this is probably fine, since the goal of the hash assignment to buckets is to spread them into different buckets in a smooth frequency distribution and be able to quickly find which bucket something was already placed. But if the current hashing is not doing a good job at evenly distributing 10k items into 1k buckets, especially with short input strings similar to each other, then perhaps a different hash algorithm is needed.