mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Jul 2007
Posts: 66
B
beer Offline OP
Babel fish
OP Offline
Babel fish
B
Joined: Jul 2007
Posts: 66
If I write a script with say 10 commands, is it better to write 10 ON TEXT triggers (one for each command), or would I want to have a single ON TEXT with 10 if/elseif checks? Or does it even matter? I wasn't sure if one method would be safer some how, execute faster, or have other considerations I'm not aware of.

Joined: Feb 2003
Posts: 2,812
Hoopy frood
Offline
Hoopy frood
Joined: Feb 2003
Posts: 2,812
mIRC was designed for the former, that is, creating several event text triggers with different match text. However, these days, most people just lump all text triggers into a single On TEXT event, especially so they don't have to repeat / reuse lines of code that can all be used in a single common event.

It's really just up to you. Write it both ways and decide which styles you prefer or find convenient. Nobody sticks to just one method.


Well. At least I won lunch.
Good philosophy, see good in bad, I like!
Joined: Jul 2007
Posts: 66
B
beer Offline OP
Babel fish
OP Offline
Babel fish
B
Joined: Jul 2007
Posts: 66
Thanks once again Raccoon!

Since it doesn't sound like there's any real advantage to splitting everything into multiple ON TEXT, I'll probably join the crowd that prefers to keep it all in one and try minimizing lines.

On a kinda similar note, is there a better method between having one hash table with long data for each item (say 100-300 characters), or a few hash tables with the data split between them (so say 3 hash tables where each item has 100 character data)? I don't know how mirc manages hash table keys/items but I'd assume it would be faster to do a single $hget of 300 chars than 3 $hget's of 100 chars each. I don't care about memory but I do care about execution speed. Nobody likes a slow script. smile

Joined: Jan 2004
Posts: 2,127
Hoopy frood
Offline
Hoopy frood
Joined: Jan 2004
Posts: 2,127
In general, scripts are faster when you reduce the number of commands, because of the overhead involved in preparing the input output. i.e. if you need to use other identifiers to stitch items together, that takes even more time. Splitting data across several items just means there's more things to search past, but it's probably faster for the alternative which does not require stitching items together, or using string identifiers to hunt within the item data for a token.

If you're having a lot of items in your table, the biggest impact on $hget speed is the number of buckets. If you have 100k items split into 100 bucks, then each bucket has around 1000 items, and $hget needs to search on average 500 items to find the right one. If you have 1000 buckets, then each bucket has 100 items, and needs to search past average of 50 items to find the right one. There is probably a little overhead involved with having more buckets, so it may not be wise to just take the max 10000.

Buckets probably don't help you with using $hfind, since that needs to look at everything rather than looking in a specific bucket for a specific item.

https://en.wikichip.org/wiki/mirc/hash_tables

Joined: Jul 2006
Posts: 4,144
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,144
2 Things:

-Everytime a message is received on a channel/pm, mirc scans the script file for on text event, having multiple on text events means everytimes there is a message received like that, it takes more times for mIRC to figure it out, you don't want a lot of script file with a lot of on text event in it, I'd suggest to use one single event to avoid making mIRC 'freezing' to look for your event.

-With a single on text event per file, once the event triggers you have to use multiple if/else combination in order to differenciate channel/pm messages and act based on what has been said, which is slowing down the script itself

Another thing is that mIRC only triggers one event type per file, which often leads to people's event not getting triggered (more info & examples on https://en.wikichip.org/wiki/mirc/on_event), and as such I do not recommend multiple events per file.

In practice the time wasted in both cases is negligeable and you shouldn't be bothering too much.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Feb 2003
Posts: 2,812
Hoopy frood
Offline
Hoopy frood
Joined: Feb 2003
Posts: 2,812
Originally Posted by beer
Thanks once again Raccoon!

On a kinda similar note, is there a better method between having one hash table with long data for each item (say 100-300 characters), or a few hash tables with the data split between them (so say 3 hash tables where each item has 100 character data)? Nobody likes a slow script. smile


I typically, anymore, just assign one hashtable to a single script to handle all of its data storage. Then I name each item with prefixes / suffixes to identify the type of content that I'm storing. In practice, the only thing that makes hashtables slow is not knowing in advance what the item name is. It's always best to lookup an item rather than searching for it with $hfind(), which is slow.

So try not to worry about any other aspects like buckets or number of split up tables. The time savings would be negligible one way or another, and will change from mIRC version to version as updates are made to algorithms and speed improvements are introduced.

Code
var %reason User was running his mouth again.
var %table = kickscript
var %item = $+($network,.,$chan,.,$nick)
hinc -m %table $+(%item,.kickcount) 1
var %count = $hget(%table,$+(%item,.kickcount))
hadd -m %table $+(%item,.kickreason.,%count) %reason

; [kickscript]
; efnet.#mirc.raccoon.kickcount == 2
; efnet.#mirc.raccoon.kickreason.1 == User keeps harassing the ops.
; efnet.#mirc.raccoon.kickreason.2 == User was running his mouth again.

Last edited by Raccoon; 16/10/19 12:05 AM.

Well. At least I won lunch.
Good philosophy, see good in bad, I like!
Joined: Jul 2007
Posts: 66
B
beer Offline OP
Babel fish
OP Offline
Babel fish
B
Joined: Jul 2007
Posts: 66
Thanks for the insight guys, I really appreciate the knowledge/experience share! Hash tables are a new area for me and I'd prefer not to step into any design potholes if I don't have to. smile

Just thought of another question. Using the one table design, if I have a hash table (mytable), is it possible to search through the keys/items using a wildcard (such as root.user.*) but also set a max search depth and return only unique results? Using the following hash keys/items in testtable for example:

root.user.adam
root.user.bobby
root.user.bobby.abc
root.user.charlie
root.user.charlie.abc
root.user.charlie.def.jkl

Ideally I just want to return "adam, bobby, charlie". But not (w/ or w/o prefix) "adam, bobby, bobby.abc, charlie, charlie.abc, charlie.def.jkl. Basically if "." is the field seperator, just search keys/items that only have 3 fields. I tried "$hfind(testtable,$root.user.[^.]*$,1-,r)" but that didn't work.

Or would I want to just get the total # of keys/items and loop through them all, checking just the first two tokens for "root.user" and if true then writing the third token to a variable if it's !isin the variable already? Seems like looping through an entire table would be slow but I guess thats what trying to wildcard search the keys/items does anyways right?

Last edited by beer; 17/10/19 05:56 PM.
Joined: Feb 2003
Posts: 2,812
Hoopy frood
Offline
Hoopy frood
Joined: Feb 2003
Posts: 2,812
Right. As I explained above, $hfind() is going to be horrendously slow, so you will have to design your table accordingly to predict how you will access the data and to make sure that it can be queried directly without performing iterative searches. This means lots of cross indexing, reverse indexing, sideways indexing... that is, performing multiple /hadd's to cover all of your bases of lookup direction.

If you want the list "adam bobby charlie" then you will need to create that list.

Code
var %user = charlie
var %users = $hget(%table,root.users)
var %users = $addtok(%users,%user,32)  [addtok prevents adding duplicate items to a list]
hadd %table root.users %users
var %item = baz
var %useritems = $hget(%table,root.users. $+ %user $+ .items)
var %useritems = $addtok(%useritems,%item,32)
hadd %table root.users. $+ %user $+ .items %useritems
hadd %table root.user. $+ %user $+ . $+ %item I'm a little teapot.

; [table]
; root.users == adam bobby charlie
; root.users.adam.foo == Did you miss me?
; root.users.bobby.bar == Not that it matters!
; root.users.charlie.baz == I'm a little teapot.
; root.users.adam.items == foo
; root.users.bobby.items == bar
; root.users.charlie.items == baz

Last edited by Raccoon; 18/10/19 09:35 AM.

Well. At least I won lunch.
Good philosophy, see good in bad, I like!
Joined: Jul 2007
Posts: 66
B
beer Offline OP
Babel fish
OP Offline
Babel fish
B
Joined: Jul 2007
Posts: 66
Say I have an item data that contains say a list of 1000 space-seperated 8 char hash values and I want to check if a specific hash is present. I'm guessing using isin would check at every position except when the current position is <8 from the $len of the data. And $matchtok would build a temp string but reading the chars sequentially until it hits the token seperator, then compares the hash against the temp string. If != then reset the temp string and continue. If that's the case then wouldn't $matchtok be faster because there would be less compares? Am I wrong in my guess of how isin/$matchtok works?

Joined: Jan 2004
Posts: 2,127
Hoopy frood
Offline
Hoopy frood
Joined: Jan 2004
Posts: 2,127
Your example would not fit, because 8 character strings being delimited then repeating N times would require a length of (8+1) * 1000 -1.

Also, your search should be more efficient if it's case-sensitive. Searching for DEADBEEF should be faster if it doesn't need to effectively capitalize the find-the-string and string-being-searched before doing a match. So, being able to use $*tokcs identifiers or isincs should be faster.

However, if you want to have a test finding within a string, you could create a hashtable containing item names which are 8 random characters, then stop when you have 900 of them. Then create a string having all 900 item names delimited by periods. Then you can benchmark how long it takes for finding all 900 itemnames, searching either like (deadbeef isin string) vs $istok(string,deadbeef,46).

If your delimiter also precedes the 1st item, you could then search for the 9 character string beginning with the delimiter. i.e. searching for .DEADBEEF when the delimiter is the period. Beware that some text search routines try to be intelligent, so if searching for "aqua", they may decide that 'q' is a rare letter, so they should search for that letter then back up 1 position and try match string from there. So it's theoretically possible to have different results depending on how rare 1st character of the search string aka delimiter is chosen.

When benchmarking, you want to try to take as many other variables out of the testing, so you don't want to create a random number, then grab the Nth token from a string, then search how long it takes to find it, because the timing varies depending on whether you happen to chose more numbers closer to 900 or closer to 0.

If your 8 character hashes are using $hash, then don't use that function. $hash is a legacy identifier which has too high of a chance that several text strings would have the same $hash(string,32). Wikichip also describes several other problems with $hash, including the fact that the hash output is actually limited to 24 bits. If you translate the $hash output to hex, the result of $base($hash($rand(0,999999999),32),10,16) always ends with 00. You'd be better off using 8 hex digits from $crc or $md5 $sha1 etc.

Because of the birthday collision, if you generate N random values of B bits in length, there's a 50% chance of having at least 1 duplicate somewhere in the list, once N reaches approximately the square root of 2^N. I.e. 50% chance at 1 duplicate within 65536 32-bit numbers. You can switch from hex to base36 and keep the length as 8 while reducing the odds of a duplicate. A 10-digit hex number can be expressed as an 8-digit base-36 number.

//echo -a $base( $str(f,10) ,16,36)

16^10 is 2^40, so 50% would need 2^20 = 1048576 items instead of 65536 items.

mIRC doesn't have big-integer support, so $base and $calc are limited to 2^53, so you can't use $base to translate a number larger than 13 hex digits. Translating from hex to base36 is less efficient than you'd think for those large numbers. Translating a 13-digit hex number to base36 is a base36 string that's 3 digits longer than when translating hex length 10. If RAM is at a premium because your database is getting really large, the index can be squeezed shorter, but at a cost of more time creating the searching hash. $base supports only 10-36, so for base64, you'd need to use $base to translate a number to hex, then use /bset and $regsubex to create a &binvar where each each pair of hex digits is one of the byte values. Then you'd $encode(&binvar,m) to make it be mime. If RAM is at even more of a premium, you can search the forum for my base85 alias. Mime would encode 6 hex digits as 4 mime text, while base85 would translate 8 hex into 5 text.

Another method that would be harder to do, but could be potentially faster, is to store your string-to-be-searched in a &binvar then use $bfind to search for it. If your delimiter is 46 period, you could search for .DEADBEEF like $bfind(&binvar,1,46 68 69 65 68 66 69 69 70). While this carries overhead of using $hget(table,item,&binvar) to fetch your binvar, it could allow string-to-be-searched to be much longer than 8292.

Joined: Jul 2007
Posts: 66
B
beer Offline OP
Babel fish
OP Offline
Babel fish
B
Joined: Jul 2007
Posts: 66
maroon,

I'm not using $hash but am using $sha384 and trimming the first 8 chars. I read elsewhere about the birthday problem (though not in as great detail as you provided) and that doing it the way I am now should solve it. But, you seem far more knowledgeable on the subject that the page I got that info from. What are your thought on what I'm doing currently?

Also, there's plenty of ram so no worries there but I'm always interested in execution speed gains, even if the function(s) are a little more complex. That being the case, I'd like to play with your &binvar idea and see what the result is. I'm new to this stuff though so hopefully I can follow along the harder it gets. Thanks for sharing your knowledge! It really helps when trying to understand all this.

Joined: Jan 2004
Posts: 2,127
Hoopy frood
Offline
Hoopy frood
Joined: Jan 2004
Posts: 2,127
For your purposes, stealing hex digits from and MD5 hash would be just as good. All you need are results which are well distributed among all possible outputs, to avoid having too many birthday collisions. It doesn't matter what day your hash is born on, as long as it's a low chance to match another hash's birthday.

If you're using 8 hex digits to hash a text string, for your purposes you'll probably have no difference from using $crc(string,0). If FNV-1a were made as an identifier, it would work too. For obtaining a hash digest from a text string, you'll probably not be seeing a speed difference between your choices of $md5 or one of the SHA hashes. SHA384 is the same hash as SHA512 except it begins with a different 512-bit magic constant, and it hides 128 bits of the output.
https://forums.mirc.com/ubbthreads....arger-disk-buffer-speed-boost#Post265792

If you need to have the hash length be 8, you can squeeze 40 bits into an 8-character base 36 string:

//var %a samplestring7 | echo -a $base($left($sha512(%a),10),16,36,8)

Note the 8 is needed so it zero-pads shorter strings. With 40 bits in the base36 value, the birthday paradox reaches 50% around 2^20 instead of the 2^16 for 8 hex digits. If you don't want to zero-pad the string, your match-text needs to begin+end with the delimiter, and your list-of-hashes also needs to begin and end with the delimiter. But even then, you can't take advantage of the shorter string, because you can't count on any hashes being shorter 100% of the time.

//var %i 1 | while (%i isnum 1-13) { echo -a $base($str(F,%I),16,36) %i | inc %i }

As I mentioned earlier, lack of big integer support limits the hex string fetched from SHA at 13 hex digits, but once you encode 9 hex digits as 7 base36 digits, the range of 10-13 only give an extra 4 bits per base36 digit. But, if room in your string-to-be-searched is important, you can fit 14% more 7-digit base36 strings than 8-digit hex. If you need more than 9 hex digits as 7 base36 digits, your most efficient base36 hashes would do something like combining 9-to7 with 5-to-4, or 2 pairs of 9-to-7:

//var %a samplestring | echo -a $base($left($sha512(%a),9),16,36,7) $+ $base($right($sha512(%a),9),16,36,7)
//var %a samplestring | echo -a $base($left($sha512(%a),9),16,36,7) $+ $base($right($sha512(%a),5),16,36,4)


Link Copied to Clipboard