Register Log In

Forums Scripts & Popups Speeding up searching of hash tables

Print Thread

Speeding up searching of hash tables #201020 17/06/08 05:52 AM
Joined: May 2008 Posts: 329 A AWEstun OP Fjord artisan
OP AWEstun Fjord artisan A Joined: May 2008 Posts: 329	What can I do to help speed up the searching of hash tables? I have one table that's 23,000 items. While it's decently fast, when an item has 50 data values and each of those data values are cross referenced to matching items with the same data value, it does seem to pause, but it's searching for matches. The table's size is set to 2500 (total items / 10). Would it help if I increased the side to 5000? What else can I do? I registered; you should too.

Re: Speeding up searching of hash tables AWEstun #201023 17/06/08 08:11 AM
Joined: Aug 2006 Posts: 183 T Thrull Vogon poet
Thrull Vogon poet T Joined: Aug 2006 Posts: 183	if it has 23k items and its being cross referenced with the other 22999, that's about 529 million operations going on. You'd really need to shrink the item count down to something smaller if you can. How long does it take and how often are you going to use it? Last edited by Thrull; 17/06/08 08:11 AM. Yar

Re: Speeding up searching of hash tables Thrull #201024 17/06/08 08:31 AM
Joined: May 2008 Posts: 329 A AWEstun OP Fjord artisan
OP AWEstun Fjord artisan A Joined: May 2008 Posts: 329	Depending on how busy the rooms are, the table gets accessed about 2-3 times every 5 seconds. The average item has 5 data values, with the smallest being 1 data value and the largest being 81 data values. The table tracks user names and all the IP's seen with that user name (item). I had though about breaking the table up to individual rooms, but that would mean that some of the same people would be in multiple tables. Cross referencing is done to see who else uses the same IP's that the item has on file. On average it about 1-2 seconds, with larger searches taking 3-4 seconds. Though I guess if over 500 million operations are being done per search, I can't really complain. I registered; you should too.

Re: Speeding up searching of hash tables AWEstun #201027 17/06/08 09:11 AM
Joined: Aug 2006 Posts: 183 T Thrull Vogon poet
Thrull Vogon poet T Joined: Aug 2006 Posts: 183	The easiest solution is breaking it up into multiple tables. You're going through WAY too much useless data per search. if you were to make 4 tables at 1/4 the size, you're only looking at about 33 million operations. That should run more then ten times as fast. (16 times as fast in theory, but the other tables will generate some overhead.) That should keep any lookup under .5 seconds. Your best bet is optimizing everything as best you can. Making it so it looks "per channel" as opposed to "global" would vastly improve the script. Yes, this will generate some extra baggage, but it will go much quicker. Last edited by Thrull; 17/06/08 09:14 AM. Yar

Re: Speeding up searching of hash tables Thrull #201062 18/06/08 01:12 AM
Joined: May 2008 Posts: 329 A AWEstun OP Fjord artisan
OP AWEstun Fjord artisan A Joined: May 2008 Posts: 329	Originally Posted By: Thrull Your best bet is optimizing everything as best you can. Making it so it looks "per channel" as opposed to "global" would vastly improve the script. Yes, this will generate some extra baggage, but it will go much quicker. Yup, smaller tables would make it slightly faster, but the same person with differnt IP's might not visit the same rooms. For example Person X might visit local rooms which at home, but when they travel and get online from some 2-star motel with each door painted a different color, they will be on a different IP and might just join the local rooms where the fleabag roach-infested motel is. Thus, the out-of-town IP's the person uses wouldn't be captured for their name. I registered; you should too.

Re: Speeding up searching of hash tables AWEstun #201109 20/06/08 01:57 AM
Joined: Aug 2006 Posts: 183 T Thrull Vogon poet
Thrull Vogon poet T Joined: Aug 2006 Posts: 183	Well, you'll need to choose between the lesser of two evils then. There isn't a better way to speed it up aside from getting a whole new processor in your computer. And even that may not do a how lot depending on how new your current rig is. You COULD cross reference the different tables, but I wouldn't recommend doing on whenever anyone joins. It could be done once and hour, or once every 6 hours, or once a day. This would keep them in synch for the most part without adding a ton of extra processing every time someone joins. Yar

Re: Speeding up searching of hash tables AWEstun #201110 20/06/08 02:27 AM
Joined: Dec 2002 Posts: 2,962 Norwich, UK S starbucks_mafia Hoopy frood
starbucks_mafia Hoopy frood S Joined: Dec 2002 Posts: 2,962 Norwich, UK	You'd be better off posting some actual examples of data and your relevant search code. It's quite possible things can be sped up by either streamlining the search pattern or restructuring the table. If all else fails you might be able to use an SQLite database instead (there are DLLs available to use with mIRC). Spelling mistakes, grammatical errors, and stupid comments are intentional.

Re: Speeding up searching of hash tables starbucks_mafia #201119 20/06/08 11:05 AM
Joined: May 2008 Posts: 329 A AWEstun OP Fjord artisan
OP AWEstun Fjord artisan A Joined: May 2008 Posts: 329	Emaples of data: sam 1.2.3.4 2.3.4.5 3.4.5.6 4.5.6.7 clark 4.4.4.4 4.5.6.7 jeffery 0.0.0.0 ryan 1.2.3.4 2.3.4.5 bryan 7.7.7.7 sam joins the room, trace and match to all his ip's are done. sam is 3.4.5.6 1,2,3,4 matches to sam 2.3.4.5 matches to sam, ryan 3.4.5.6 matches to sam 4.5.6.7 matches to sam, clark 3 total name matches for sam: sam, ryan, clark 3 total ip matches for sam: 3.4.5.6, 2.3.4.5, 4.5.6.7 I registered; you should too.

Re: Speeding up searching of hash tables AWEstun #201123 20/06/08 01:33 PM
Joined: Oct 2003 Posts: 3,918 Montreal, QC, Canada A argv0 Hoopy frood
argv0 Hoopy frood A Joined: Oct 2003 Posts: 3,918 Montreal, QC, Canada	cache cache cache. When you add items to the table, insert the cross references then. It will take (barely) longer to insert and make the db larger but it will save you time when doing searches. The method for doing this efficiently would be to add an entry for both the nickname and the ip, as such... (Note the ip and name can easily be stored in two actual tables, my example lists them side by side for convenience) when "sam" enters with "1.2.3.4", add: 1.2.3.4 => sam sam => 1.2.3.4 When "barry" enters with "1.2.3.4" followed by "2.3.4.5", the table will be: 1.2.3.4 => sam, barry 2.3.4.5 => barry sam => 1.2.3.4 barry => 1.2.3.4, 2.3.4.5 You can then do lookups on the ip "1.2.3.4" to find sam and barry and then do lookups on sam, barry to get their other ips. You can even add caching for the names by doing a search on "barry"'s ips when he enters and then gathering the unique names for each ip (lookup 1.2.3.4, add "sam") for all names involved.. the new items would be added as: !barry_names => barry, sam !sam_names => sam, barry Making the table 1.2.3.4 => sam, barry 2.3.4.5 => barry sam => 1.2.3.4 barry => 1.2.3.4, 2.3.4.5 !barry_names => barry, sam !sam_names => sam, barry You would need to do more work to manage insertion and deletion-- but if you do more search lookups then insert/delete then your table will be optimized for search. There's plenty more cache based optimization you can do- each one slows down the insert time, so you'd have to strike a balance on insert/search speeds. Hopefully this makes sense to you. Using your data the table would be: 1.2.3.4 => sam, ryan 2.3.4.5 => sam, ryan 3.4.5.6 => sam 4.5.6.7 => sam, clark 4.4.4.4 => clark 0.0.0.0 => jeffrey 7.7.7.7 => bryan sam => 1.2.3.4, 2.3.4.5, 3.4.5.6, 4.5.6.7 clark => 4.4.4.4, 4.5.6.7 jeffery => 0.0.0.0 ryan => 1.2.3.4, 2.3.4.5 bryan => 7.7.7.7 !sam_names => sam, ryan, clark !clark_names => clark, sam !jeffrey_names => jeffrey !ryan_names => ryan, sam !bryan_names => bryan - argv[0] on EFnet #mIRC - "Life is a pointer to an integer without a cast"

Re: Speeding up searching of hash tables argv0 #201125 20/06/08 02:51 PM
Joined: May 2008 Posts: 329 A AWEstun OP Fjord artisan
OP AWEstun Fjord artisan A Joined: May 2008 Posts: 329	I see what you are saying and about it being 3 table. One for name -> ip, another for ip -> name, and lastly one for xname_total -> all x's names. Right now the table is set up with the user name as the item name and all the IP's that the item/user has been seen on as it's data. Though user names get purged/banned by admins for abuse. Thus, that item/user name won't ever be seen again. So my question is, should I be tracking by the user name or ip as the item name? I'm thinking I might need to insert a $ctime as data1 so if the item/name hasn't been seen in say 6-12 months, purge it from the tables. Tables include: name -> ip, name -> BIO, name -> last seen, name -> last said, name -> on_chan_join_ctime (not saved to file - if no user on-join-ctime user join ctime = my join ctime). Actually I'm already using a $ctime with the seen table. So for each name in the main table, I can check the $ctime for the same item in the seen table. If the duration is over 3-4 months, I can purge the item from all tables. Likewise, I can do a reverse purge and if there is item data in the other tables and that item isn't int he main table, it can be purge from the child tables. I registered; you should too.

Re: Speeding up searching of hash tables AWEstun #201126 20/06/08 02:59 PM
Joined: Oct 2003 Posts: 3,918 Montreal, QC, Canada A argv0 Hoopy frood
argv0 Hoopy frood A Joined: Oct 2003 Posts: 3,918 Montreal, QC, Canada	As I said, you should track both. If you do lookups on names and ip's, you should have both values indexed as keys. - argv[0] on EFnet #mIRC - "Life is a pointer to an integer without a cast"

Re: Speeding up searching of hash tables argv0 #201127 20/06/08 03:53 PM
Joined: May 2008 Posts: 329 A AWEstun OP Fjord artisan
OP AWEstun Fjord artisan A Joined: May 2008 Posts: 329	The only time matching is done is when a user joins the channel, and the occassional manual lookup match. So, recording matches wouldn't benefit me much. Also, the recorded matches would have to be checked and updated each time the person joins a channel. Thus, you are running a match, just like I am to begin with. I just purged around 5,000 items int he main table, so that should speed things up a fraction of a second. I registered; you should too.

Link Copied to Clipboard