Re: Multiple ON TEXT's or 1 with multiple if/elseif's?
Yesterday at 05:48 PM
Your example would not fit, because 8 character strings being delimited then repeating N times would require a length of (8+1) * 1000 -1.
Also, your search should be more efficient if it's case-sensitive. Searching for DEADBEEF should be faster if it doesn't need to effectively capitalize the find-the-string and string-being-searched before doing a match. So, being able to use $*tokcs identifiers or isincs should be faster.
However, if you want to have a test finding within a string, you could create a hashtable containing item names which are 8 random characters, then stop when you have 900 of them. Then create a string having all 900 item names delimited by periods. Then you can benchmark how long it takes for finding all 900 itemnames, searching either like (deadbeef isin string) vs $istok(string,deadbeef,46).
If your delimiter also precedes the 1st item, you could then search for the 9 character string beginning with the delimiter. i.e. searching for .DEADBEEF when the delimiter is the period. Beware that some text search routines try to be intelligent, so if searching for "aqua", they may decide that 'q' is a rare letter, so they should search for that letter then back up 1 position and try match string from there. So it's theoretically possible to have different results depending on how rare 1st character of the search string aka delimiter is chosen.
When benchmarking, you want to try to take as many other variables out of the testing, so you don't want to create a random number, then grab the Nth token from a string, then search how long it takes to find it, because the timing varies depending on whether you happen to chose more numbers closer to 900 or closer to 0.
If your 8 character hashes are using $hash, then don't use that function. $hash is a legacy identifier which has too high of a chance that several text strings would have the same $hash(string,32). Wikichip also describes several other problems with $hash, including the fact that the hash output is actually limited to 24 bits. If you translate the $hash output to hex, the result of $base($hash($rand(0,999999999),32),10,16) always ends with 00. You'd be better off using 8 hex digits from $crc or $md5 $sha1 etc.
Because of the birthday collision, if you generate N random values of B bits in length, there's a 50% chance of having at least 1 duplicate somewhere in the list, once N reaches approximately the square root of 2^N. I.e. 50% chance at 1 duplicate within 65536 32-bit numbers. You can switch from hex to base36 and keep the length as 8 while reducing the odds of a duplicate. A 10-digit hex number can be expressed as an 8-digit base-36 number.
//echo -a $base( $str(f,10) ,16,36)
16^10 is 2^40, so 50% would need 2^20 = 1048576 items instead of 65536 items.
mIRC doesn't have big-integer support, so $base and $calc are limited to 2^53, so you can't use $base to translate a number larger than 13 hex digits. Translating from hex to base36 is less efficient than you'd think for those large numbers. Translating a 13-digit hex number to base36 is a base36 string that's 3 digits longer than when translating hex length 10. If RAM is at a premium because your database is getting really large, the index can be squeezed shorter, but at a cost of more time creating the searching hash. $base supports only 10-36, so for base64, you'd need to use $base to translate a number to hex, then use /bset and $regsubex to create a &binvar where each each pair of hex digits is one of the byte values. Then you'd $encode(&binvar,m) to make it be mime. If RAM is at even more of a premium, you can search the forum for my base85 alias. Mime would encode 6 hex digits as 4 mime text, while base85 would translate 8 hex into 5 text.
Another method that would be harder to do, but could be potentially faster, is to store your string-to-be-searched in a &binvar then use $bfind to search for it. If your delimiter is 46 period, you could search for .DEADBEEF like $bfind(&binvar,1,46 68 69 65 68 66 69 69 70). While this carries overhead of using $hget(table,item,&binvar) to fetch your binvar, it could allow string-to-be-searched to be much longer than 8292.