mIRC Home    About    Download    Register    News    Help

Print Thread
Large hash tables for a seen database #114753 17/03/05 02:04 PM
Joined: Dec 2002
Posts: 14
N
namnori Offline OP
Pikka bird
OP Offline
Pikka bird
N
Joined: Dec 2002
Posts: 14
I have a large hash table I use as a !seen database (records information useful for stalkers).

In my 1st, generation, I actually tried to trim it by having a timer running in the background trying to clean it up.

Then about 4 years ago, I gave up and just let it grow to whatever size.

It's been hovering around half a million entries and 50 megs now, and I'm wondering what I should do with it. Saving it takes 6 seconds I'd rather not waste and it's just been slow to use (but REALLY informative with that much data).

How you do maintain such a huge hash file?

One option I've been contemplating is to split it up. Put more important data like when they were last seen in one table, and in another table put the descriptions (kick messages, quit messages), but that really wouldn't' solve my problem.

Ideally there'd be a large datastructre of a defined size with a least recently used tag for each entry.

Re: Large hash tables for a seen database #114754 17/03/05 08:17 PM
Joined: Aug 2004
Posts: 237
L
LethPhaos Offline
Fjord artisan
Offline
Fjord artisan
L
Joined: Aug 2004
Posts: 237
maybe you could use a mysql dll or something like that and use a real database?

Re: Large hash tables for a seen database #114755 17/03/05 08:19 PM
Joined: Sep 2004
Posts: 40
U
us3rX Offline
Ameglian cow
Offline
Ameglian cow
U
Joined: Sep 2004
Posts: 40
you could purge all entrys older then xx amount of days/months

us3rX

Re: Large hash tables for a seen database #114756 17/03/05 10:22 PM
Joined: Sep 2003
Posts: 4,230
D
DaveC Offline
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
From what I can see you dont seem overly concerned about having a 50 meg hash table in memory so based on this...

I would break it down into two data structures, a complete table (hash1), and a users last seen in the last X Days (hash2),

When checking scan the new listings first (hash2) , then the complete list (hash1) [ unless accessing the complete table is slow ]

If updating somones information update hash1 [unless this just adding or altering it is slow as well] (not saving it to file) and hash2 (saving it to file) with there new info.

Each time you start the program up, it runs through a reconcilation process where it appliers the entries of hash2 to hash1 and then saves hash1, this might take longer than 6 seconds but it happens only once. You may also wish to trigger a save of hash1 periodticly or after a set amount of time the hash2 database hasnt been adjusted.

You can also make this change with little to no altering of your current code, by creating a set of layer aliases to handle both data bases examples...
/hadd originalhashtable $nick blah blah blah blah blah blah

/alias -l _hadd {
hadd $reptok($1-,originalhashtable,hash1,1,32) | ; remove this line if just doing this is to slow
hadd $reptok($1-,originalhashtable,hash2,1,32)
}

Others are more dependent on if your not altering hash1 as well as hash2

Also the reconcilation process could be speedup by simply loading the hash1 table file and then loading the hash2 table file into hash1, then saving hash1 back to file.


[edit]
Was i right in reading in that you felt just accessing the large hashtable was slow? or was the only real problem during the saving it?

Last edited by DaveC; 17/03/05 10:23 PM.
Re: Large hash tables for a seen database #114757 17/03/05 11:25 PM
Joined: Dec 2002
Posts: 14
N
namnori Offline OP
Pikka bird
OP Offline
Pikka bird
N
Joined: Dec 2002
Posts: 14
Quote:

[edit]
Was i right in reading in that you felt just accessing the large hashtable was slow? or was the only real problem during the saving it?


both accessing (lots of $hfind().data) and saving are slow. I'd like to optimize parts of my script that take a while because then my script has to play catch-up (a lot can happen in 6 seconds). So anything that has to periodically scan the table is unacceptable. But I'll look into doing that either in the beginning or at the end.

I'll also look into splitting up the database into a main table (item = nick!ident@address%network), and a date lookup table (item = ctime, data = nick!ident@address%network).

@LethPhaos - I haven't seriously scripted anything in mIRC in a couple years, I didn't know there was a mysql.dll. I'll definately have to look into that.