mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: May 2008
Posts: 329
A
AWEstun Offline OP
Fjord artisan
OP Offline
Fjord artisan
A
Joined: May 2008
Posts: 329
What can I do to help speed up the searching of hash tables? I have one table that's 23,000 items. While it's decently fast, when an item has 50 data values and each of those data values are cross referenced to matching items with the same data value, it does seem to pause, but it's searching for matches. The table's size is set to 2500 (total items / 10). Would it help if I increased the side to 5000? What else can I do?


I registered; you should too.
Joined: Aug 2006
Posts: 183
T
Vogon poet
Offline
Vogon poet
T
Joined: Aug 2006
Posts: 183
if it has 23k items and its being cross referenced with the other 22999, that's about 529 million operations going on. You'd really need to shrink the item count down to something smaller if you can.

How long does it take and how often are you going to use it?

Last edited by Thrull; 17/06/08 08:11 AM.

Yar
Joined: May 2008
Posts: 329
A
AWEstun Offline OP
Fjord artisan
OP Offline
Fjord artisan
A
Joined: May 2008
Posts: 329
Depending on how busy the rooms are, the table gets accessed about 2-3 times every 5 seconds. The average item has 5 data values, with the smallest being 1 data value and the largest being 81 data values. The table tracks user names and all the IP's seen with that user name (item). I had though about breaking the table up to individual rooms, but that would mean that some of the same people would be in multiple tables. Cross referencing is done to see who else uses the same IP's that the item has on file. On average it about 1-2 seconds, with larger searches taking 3-4 seconds. Though I guess if over 500 million operations are being done per search, I can't really complain.


I registered; you should too.
Joined: Aug 2006
Posts: 183
T
Vogon poet
Offline
Vogon poet
T
Joined: Aug 2006
Posts: 183
The easiest solution is breaking it up into multiple tables. You're going through WAY too much useless data per search. if you were to make 4 tables at 1/4 the size, you're only looking at about 33 million operations. That should run more then ten times as fast. (16 times as fast in theory, but the other tables will generate some overhead.) That should keep any lookup under .5 seconds.

Your best bet is optimizing everything as best you can. Making it so it looks "per channel" as opposed to "global" would vastly improve the script. Yes, this will generate some extra baggage, but it will go much quicker.

Last edited by Thrull; 17/06/08 09:14 AM.

Yar
Joined: May 2008
Posts: 329
A
AWEstun Offline OP
Fjord artisan
OP Offline
Fjord artisan
A
Joined: May 2008
Posts: 329
Originally Posted By: Thrull
Your best bet is optimizing everything as best you can. Making it so it looks "per channel" as opposed to "global" would vastly improve the script. Yes, this will generate some extra baggage, but it will go much quicker.


Yup, smaller tables would make it slightly faster, but the same person with differnt IP's might not visit the same rooms. For example Person X might visit local rooms which at home, but when they travel and get online from some 2-star motel with each door painted a different color, they will be on a different IP and might just join the local rooms where the fleabag roach-infested motel is. Thus, the out-of-town IP's the person uses wouldn't be captured for their name.


I registered; you should too.
Joined: Aug 2006
Posts: 183
T
Vogon poet
Offline
Vogon poet
T
Joined: Aug 2006
Posts: 183
Well, you'll need to choose between the lesser of two evils then. There isn't a better way to speed it up aside from getting a whole new processor in your computer. And even that may not do a how lot depending on how new your current rig is.

You *COULD* cross reference the different tables, but I wouldn't recommend doing on whenever anyone joins. It could be done once and hour, or once every 6 hours, or once a day. This would keep them in synch for the most part without adding a ton of extra processing every time someone joins.


Yar
Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
You'd be better off posting some actual examples of data and your relevant search code. It's quite possible things can be sped up by either streamlining the search pattern or restructuring the table. If all else fails you might be able to use an SQLite database instead (there are DLLs available to use with mIRC).


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: May 2008
Posts: 329
A
AWEstun Offline OP
Fjord artisan
OP Offline
Fjord artisan
A
Joined: May 2008
Posts: 329
Emaples of data:

sam 1.2.3.4 2.3.4.5 3.4.5.6 4.5.6.7
clark 4.4.4.4 4.5.6.7
jeffery 0.0.0.0
ryan 1.2.3.4 2.3.4.5
bryan 7.7.7.7

sam joins the room, trace and match to all his ip's are done.

sam is 3.4.5.6
1,2,3,4 matches to sam
2.3.4.5 matches to sam, ryan
3.4.5.6 matches to sam
4.5.6.7 matches to sam, clark
3 total name matches for sam: sam, ryan, clark
3 total ip matches for sam: 3.4.5.6, 2.3.4.5, 4.5.6.7


I registered; you should too.
Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
cache cache cache.


When you add items to the table, insert the cross references *then*. It will take (barely) longer to insert and make the db larger but it will save you time when doing searches.

The method for doing this efficiently would be to add an entry for *both* the nickname *and* the ip, as such...

(Note the ip and name can easily be stored in two *actual* tables, my example lists them side by side for convenience)

when "sam" enters with "1.2.3.4", add:

1.2.3.4 => sam
sam => 1.2.3.4

When "barry" enters with "1.2.3.4" followed by "2.3.4.5", the table will be:

1.2.3.4 => sam, barry
2.3.4.5 => barry
sam => 1.2.3.4
barry => 1.2.3.4, 2.3.4.5

You can then do lookups on the ip "1.2.3.4" to find sam and barry and then do lookups on sam, barry to get their other ips. You can even add caching for the names by doing a search on "barry"'s ips when he enters and then gathering the unique names for each ip (lookup 1.2.3.4, add "sam") *for all names involved*.. the new items would be added as:

!barry_names => barry, sam
!sam_names => sam, barry

Making the table

1.2.3.4 => sam, barry
2.3.4.5 => barry
sam => 1.2.3.4
barry => 1.2.3.4, 2.3.4.5
!barry_names => barry, sam
!sam_names => sam, barry

You would need to do more work to manage insertion and deletion-- but if you do more search lookups then insert/delete then your table will be optimized for search. There's plenty more cache based optimization you can do- each one slows down the insert time, so you'd have to strike a balance on insert/search speeds.

Hopefully this makes sense to you.

Using your data the table would be:

1.2.3.4 => sam, ryan
2.3.4.5 => sam, ryan
3.4.5.6 => sam
4.5.6.7 => sam, clark
4.4.4.4 => clark
0.0.0.0 => jeffrey
7.7.7.7 => bryan
sam => 1.2.3.4, 2.3.4.5, 3.4.5.6, 4.5.6.7
clark => 4.4.4.4, 4.5.6.7
jeffery => 0.0.0.0
ryan => 1.2.3.4, 2.3.4.5
bryan => 7.7.7.7
!sam_names => sam, ryan, clark
!clark_names => clark, sam
!jeffrey_names => jeffrey
!ryan_names => ryan, sam
!bryan_names => bryan


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: May 2008
Posts: 329
A
AWEstun Offline OP
Fjord artisan
OP Offline
Fjord artisan
A
Joined: May 2008
Posts: 329
I see what you are saying and about it being 3 table. One for name -> ip, another for ip -> name, and lastly one for xname_total -> all x's names.

Right now the table is set up with the user name as the item name and all the IP's that the item/user has been seen on as it's data. Though user names get purged/banned by admins for abuse. Thus, that item/user name won't ever be seen again. So my question is, should I be tracking by the user name or ip as the item name?

I'm thinking I might need to insert a $ctime as data1 so if the item/name hasn't been seen in say 6-12 months, purge it from the tables. Tables include: name -> ip, name -> BIO, name -> last seen, name -> last said, name -> on_chan_join_ctime (not saved to file - if no user on-join-ctime user join ctime = my join ctime).

Actually I'm already using a $ctime with the seen table. So for each name in the main table, I can check the $ctime for the same item in the seen table. If the duration is over 3-4 months, I can purge the item from all tables. Likewise, I can do a reverse purge and if there is item data in the other tables and that item isn't int he main table, it can be purge from the child tables.


I registered; you should too.
Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
As I said, you should track both. If you do lookups on names *and* ip's, you should have both values indexed as keys.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: May 2008
Posts: 329
A
AWEstun Offline OP
Fjord artisan
OP Offline
Fjord artisan
A
Joined: May 2008
Posts: 329
The only time matching is done is when a user joins the channel, and the occassional manual lookup match. So, recording matches wouldn't benefit me much. Also, the recorded matches would have to be checked and updated each time the person joins a channel. Thus, you are running a match, just like I am to begin with.

I just purged around 5,000 items int he main table, so that should speed things up a fraction of a second.


I registered; you should too.

Link Copied to Clipboard