In the /hadd_test alias above, the 2230 beta is back to the performance of the 1438 beta for it, which is better than 1897 was, but is still noticeably slower than the no-beta was. Again, I don't believe the differences in performance were caused by the choice of hash function, because it was affecting the time needed to /hadd the items as well as $hget()'ing them later, and I'm assuming the choice of hash wouldn't be responsible for the /hadd delay.
* Regarding the change to forcing the bucket count to be a prime. Would you consider either allowing it to consider N=1 to be a prime, or allowing a new switch that allows the hash table to not be forced away from the requested bucket count? (Other than perhaps requiring it at least be odd like currently.)
It now is not permitting buckets=1 for those wishing to keep items in the original order, because N=1 is now flipped to use buckets=3.
* Could you change the max from 10000 to 10007?
It's now forcing N=9994 thru 10000 to use buckets 10007. This would prevent "hmake table2 $hget(table1).size" from being an error if the .size prop reported 10007 bucket. I'm not familiar enough with hash tables to know whether the largest prime below 2^14 (16381) would be a more appropriate max.
//hfree -sw test | hmake -s test 101 | var %i 20 , %N 1 | while (%i) { hadd test $base($rand(1,4503599627370495),10,36) | dec %i } | while ($hget(test,%N).item) { echo 4 -a $ord(%N) item is $v1 bucket: $calc(1+($fnv1a-32-mod($v1) % 101)) | inc %N }
As for FNV1a being modified, I found a modification mentioned at Brett Mulvey's website, and when using the FNV1a-32-Mod alias below, the calculated buckets increase as you cycle through the sequential $hget(table,N).item's. When I use FNV1a-32 in place of FNV1a-32-mod in this alias, that's the same order as the items were sorted in beta 1897.
However, when I substitute $hash($v1,32) in place of $fnv1a-32-mod($v1), the calculation of the bucket-number doesn't increase, indicating that unmodified $hash() isn't the method used to assign items to buckets.
* Was the $hash() identifier intended to be the same method used to assign items to buckets, or was it intentionally modified for use with hash tables? I'm mentioning this in case this might be a source of uneven assignments to buckets in the past.
* If FNV ends up making the cut, would we end up getting some kind of $fnv() alias to wean people away from using $hash()? It could either be a new identifier, or a new parameter where 0 is the default doing the old hash, while 1 and 2 would be fnv1a with/without the modification. True, it can be simulated, but is significantly slower as an alias.
alias fnv1a-32-mod {
noop $regsubex(temp,$1,,,&temp) | var %len $bvar(&temp,0) , %i 0 , %hash 2166136261
while (%i < %len) {
!inc %i
!var %hash $xor(%hash,$bvar(&temp,%i))
!var %hash $calc(( (%hash % 256) * 16777216 + %hash * 403) % 4294967296 )
}
var %hash $calc((%hash * 8193) % 4294967296)
var %hash $calc(($xor(%hash,$calc(%hash /128)) * 9) % 4294967296)
var %hash $calc(($xor(%hash,$calc(%hash /131072)) * 33) % 4294967296)
if (h isin $2) return $base(%hash,10,16,8) | return %hash
} ; by maroon 2018
; If not for the 2^52 limit, the MUL could have been $calc((%hash * 16777619) % 4294967296)
; because the bits of the product above 2^32 aren't needed. $fnv1a-32(string,[h]) h=hash as hex
; is identical to original FNV1a except adding the following operations after the string is hashed.
; hash += hash << 13; (same as hash * 8193)
; hash ^= hash >> 7;
; hash += hash << 3; (same as hash * 9)
; hash ^= hash >> 17;
; hash += hash << 5; (same as hash * 33)