mIRC Home    About    Download    Register    News    Help

Print Thread
#131312 28/09/05 12:47 PM
Joined: Jul 2003
Posts: 655
Om3n Offline OP
Fjord artisan
OP Offline
Fjord artisan
Joined: Jul 2003
Posts: 655
At which point does it become laggy and/or inefficient to use a hash table. fex using $hfind on a table with 40,000 entries and 4000 slots (~10% as suggestion by mirc help file) or even bigger tables, would using single tables of this size be unwise to use? Would it cause noticable speed/lag problems in the script?

What is the relevance of the slots value when creating a hash table, what exactly is this used for and why does this affect how efficiently/quickly you can access/search/etc a hash table.

What sort of memory requirements are related to hash tables, obviously the data is stored in some manner within mirc for quick and easy access, but what sort of ratio is there between the size of the table and the memory used to store it as well as the processing time/consumption used when accessing/searching it.

Using a while loop to go through the entries in a hash table it obviously an inefficient use of them and with such large tables is very likely to cause script 'pausing' issues. What sort of method would be better in a circumstance when it is neccersary (or just desired) to for example list the information in the table inside mirc (combination of hsave then fopen maybe?)


"Allen is having a small problem and needs help adjusting his attitude" - Flutterby
#131313 28/09/05 02:52 PM
Joined: Apr 2004
Posts: 759
M
Hoopy frood
Offline
Hoopy frood
M
Joined: Apr 2004
Posts: 759
All i know for certain is that
Var %a = 1
while ($hget(HashTable,%a).item) {
var %item = $v1
if (*wildcard* iswm %item) {
...
...
is alot better then looping with $hfind. i'd say the slots are for alocating memory having too much and too litle for the data your gonna be using doesnt help.


$maybe
#131314 28/09/05 04:08 PM
Joined: Jul 2003
Posts: 655
Om3n Offline OP
Fjord artisan
OP Offline
Fjord artisan
Joined: Jul 2003
Posts: 655
The mention of hfind and the while loop question were completely unrelated. I would probably have to disagree with you, i think $hfind is a better method of searching a hash table than using a while loop with iswm like your example shows.


"Allen is having a small problem and needs help adjusting his attitude" - Flutterby
#131315 28/09/05 04:31 PM
Joined: Apr 2004
Posts: 759
M
Hoopy frood
Offline
Hoopy frood
M
Joined: Apr 2004
Posts: 759
you were talking about insuficient use of hashtables :/, and thought id point it out.
Quote:

Var %a = 1
while ($hfind(HashTable,*wildcard*,%a,w)) {
;Do stuff here
inc %a
}

Var %a = 1
while ($hget(HashTable,%a).item) {
var %item = $v1
if (*wildcard* iswm %item) {
;Do stuff here
}
inc %a
}
Although the first loop looks much neater and exploits the $hfind identifier to find you all the matching items, it will still take about twice as much time to resolve.

That happens because the $hfind identifier tries each time to find the next matching item, but it starts each time from the beginning of the Hash table. So if you have a lot of items that match the wildcard it will go over the the entire Hash table over and over.

But the second loop will just go over the Hash table once, and find all the matches.



i shall shut up now.


$maybe
#131316 28/09/05 05:14 PM
Joined: Jul 2003
Posts: 655
Om3n Offline OP
Fjord artisan
OP Offline
Fjord artisan
Joined: Jul 2003
Posts: 655
Ah i guess i misunderstood you, and yes i agree using hfind recursively is inefficient. To clarify my while loop comment i was refering more to using hget in a while loop to echo the contents of a hash table which is why i misunderstood.


"Allen is having a small problem and needs help adjusting his attitude" - Flutterby
#131317 29/09/05 01:56 PM
Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
Quote:
At which point does it become laggy and/or inefficient to use a hash table. fex using $hfind on a table with 40,000 entries and 4000 slots (~10% as suggestion by mirc help file) or even bigger tables, would using single tables of this size be unwise to use? Would it cause noticable speed/lag problems in the script?


Since hashtables reside in memory (and assuming you have that memory to use) I dont see that any size hashtable well be laggy, as compared to any other method of accessing data, u used the example of $hfind and if your limiting laggy ness it to a hfinds to produce you a dataset from your whole hashtable then i guess other methods such as a file and a filter may become faster or a external sql call, but for a single $hfind a hash table would beat anything else available.

Quote:
What is the relevance of the slots value when creating a hash table, what exactly is this used for and why does this affect how efficiently/quickly you can access/search/etc a hash table.


I have no exact knowledge of the way its used, i do however know of other tabling systems which are also called hashtables where the performance of the name/data look up is directly relevent to the inital size of the table, these also used 10% as a default starting point, these tables used a form of tree structure where similar named Items were based in a tree below each other, its somewhat hard to explain and of curse might not be the method used here at all. I can explain it if u wish but as im just therorising on what mirc uses me expalining wouldnt really help with anything. ask if u want ill try and explain it breifly.

Quote:
What sort of memory requirements are related to hash tables, obviously the data is stored in some manner within mirc for quick and easy access, but what sort of ratio is there between the size of the table and the memory used to store it as well as the processing time/consumption used when accessing/searching it.


Real hard to answer, i havent really lookde at what mirc consumes, since i have other things running in it all the time, you would have to test on a plain mirc not conencted, doing nothing else.

Quote:
Using a while loop to go through the entries in a hash table it obviously an inefficient use of them and with such large tables is very likely to cause script 'pausing' issues. What sort of method would be better in a circumstance when it is neccersary (or just desired) to for example list the information in the table inside mirc (combination of hsave then fopen maybe?)


A hsave and a filter pops into mind!, also the previouslly mentioned for loop using ISWM, however for not pausing mirc i suggest something like this

* this is one of the few times i think using gotos is ok
Code:
alias aliasname {
  if ($1 == reentry) { goto $2 }
  .timer 0 1 aliasname reentry label1 1
  return
  :label1
  if  ($hget(HashTable,$3).item) {
    var %item = $v1
    .timer 0 1 aliasname reentry label1 $calc($3 + 1)
    if (*wildcard* iswm %item) {
      ;Do stuff here with %item etc
    }
  }
}


Simply put, you run the alias, it didnt have parameters starting with "reentry" so it flows passed the goto and runs the "initialising code", this simply calls a timer to run it again with a reentry point saved in $2 and other things u might want to use in $3- (namely the loop counter)
The code is then called again on a timer, $1 is "reentry" so it gotos to $2 being "label1", whcih then sees if theres a hastable item numbered $3 "1" and if so goes into the IF
first thing it does in the IF is setoff a timer for the next loop around changing "1" to "2" or n to n+1
then performs the iswm and does soemthing to a matching one, then exits
The next timer goes off and loop 2 3 4 5 6 etc occurs.

Mirc however does not pause or freeze up like doing 40,000 reps would cause. Channels get text, dl and ul keep going, etc etc

This method does have a downside that its possable for the table to be altered by other scripts casuing a missed item, however this is more for a on the fly dataset such as matching addresses or something.

#131318 29/09/05 04:55 PM
Joined: Jul 2003
Posts: 655
Om3n Offline OP
Fjord artisan
OP Offline
Fjord artisan
Joined: Jul 2003
Posts: 655
Thanks for the information, i will have to look into hash tables in general a bit more and do a little testing with large sets to find my respective answers.

Good idea with the timer and alias. I havn't started scripting anything yet, i like to plan the basic working and order of coding before i actually start the code, in any language. I did concider using sql tables due to the potentially large number of entries, and even though i always use mysql for other things myself i decided this requirement would act as nothing more than a deterant for others who may with to use the script.

Thanks again for the information, i have used hash tables in the past but nothing of this size, just wanted to do a little research into possible limitations and memory usage.

I am also concidering using seperate tables for different catagories of data, such as mytable1 mytable2 where 1 and 2 represent something in the data that can be checked with an if statement to determine which table to search with the $hfind. Of corse i would prefer not to take this route, but this depends entirely on the results of speed and cpu testing for very large tables. I assume the memory usage would infact be less for a single table since the same data is there, but less information to remember (details of each table etc), so should not be a factor in this decision.


"Allen is having a small problem and needs help adjusting his attitude" - Flutterby
#131319 29/09/05 05:42 PM
Joined: Apr 2003
Posts: 701
K
Hoopy frood
Offline
Hoopy frood
K
Joined: Apr 2003
Posts: 701
A hash table is basically an array of small lists. The array is numbered 0 to size-1. For each key, the corresponding array item number is calculated using a hash function. This is a function that gives a seemingly random number to a key. How it is computed is not important, it's just important that it always returns the same number for the same key and that it maps different keys to different numbers as much as possible.

As long as each key has it's own space, lookups and writes are fast: compute hash index from key (some bit shifts and a modulo operation) and getting the n'th item from an array is fast too. Basically this always takes the same amount of time, irrespective of the number of items involved...

The problems begin when multiple keys map to the same array item. Then a little list is made inside that array item that has all entries that map to that number. For a lookup, those entries are parsed sequentially, check first item, then second etc until the correct item is found. The time taken to do this depends of the number of items in that list. Generally, going over 10 or 20 items doesn't take much time, so that why you can put 10 times as many items in a hash table as the size.

However, if you make the table way too small (or have the hash function return the same number(s) for abbout all keys), then each array item has a list of 1000s of entries, and that may take a noticable time to go through...

So, how does this hash function work? Usually it takes the word as a number, then does a bitshift or 2 and then takes the remainder after division by a prime number (preferrably close to the size of the hash table). You should realize that if the hash function

#131320 29/09/05 05:53 PM
Joined: Apr 2005
Posts: 1,009
Hoopy frood
Offline
Hoopy frood
Joined: Apr 2005
Posts: 1,009
Quote:
You should realize that if the hash function


wheres the end?


IceCapped
#131321 29/09/05 08:05 PM
Joined: Jul 2003
Posts: 655
Om3n Offline OP
Fjord artisan
OP Offline
Fjord artisan
Joined: Jul 2003
Posts: 655
Thanks Kelder, that cleared a lot of things up. So technically there is no relationship to the number of entries in a hash table and the amount of time it takes to search it with something like hfind? fex it should take no longer to search a 4000 slot table with 40,000 items than it would to to search a 40 slot table with 400 items?

And yeah, was there a second part to the last sentance in your post "You should realize that if the hash function..."


"Allen is having a small problem and needs help adjusting his attitude" - Flutterby
#131322 30/09/05 12:02 AM
Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
Quote:
it should take no longer to search a 4000 slot table with 40,000 items than it would to to search a 40 slot table with 400 items?


Kelder explanation of a hashtable was/is the same as my understanding of them also.
However It IS going to take longer to search a 40,000 item table at 4000 slots than a 400 item table at 40 slots, since theres 100 times the data
if u ment "is it going to take the same time to search a 40,000 item table at 4000 slots as to 40 slots" I would guess its likely the same, it well be a seqental search down the array.

On note to add, to my understanding and this is really not an issue with kelders explanation which i found very good. if two items both have the same hashed index value, rather than a mini list being created, there is a linking value in the already used item which links to the next item held later in the array. Even this is a bit hard to explain (for me)
say you said
/hmake BLAH 10, you get
/hadd blah fred alpha lets assume this hashs to 4
/hadd blah bill beta lets assume this hashs to 7
/hadd blah dave charlie lets assume this hashs to 4
/hadd blah pete delta lets assume this hashs to 4
/hadd blah mike echo lets assume this hashs to 9

Code:
slot ref# link# itemname itemvalue
0    4      -    -        - 
1    -      -    -        -
2    -      -    -        -
3    -      -    -        -
4    7      11   fred     alpha
5    -      -    -        -
6    -      -    -        -
7    9      -    bill     beta
8    -      -    -        -
9    11     -    mike     echo
10   -      -    -        -
11   12     12   dave     charlie
12   -      -    pete     delta
13   -      -    -        -
14   -      -    -        -
15   -      -    -        -


The ref# may or maynot exist, its just a pointer to the next next item in array order sometimes it has a backward pointer to the previous one as well, this waas done in FILE stored hashtables more than memory ones, and was designed to reduce disk reads, if you were moving through sequetially, since you wouldnt have to read all the blanks to find the next item.

Whats more the point i was trying to get to, is in this example 3 items (fred,dave,pete) hashed to slot 4, so when fred filled slot 4, and then dave was added, it added a link# to a free slot (beyond the table size of 10) at which dave and its data was stored, when pete was added at 4, first fred was there at 4 that linked to dave at 11 and then another link was added to 12 at which pete was added.

* a note here im not saying this is how mirc actually stores them, however it is my understanding of how a hash table works, in its simplest form, i beleive there is some more advanced calculation on how it actually gets the free slot positions (ie 11 & 12) than just assigning the next free, but for here i just used the next avalaible.

I beleive this is the reason for the aparent randomness of order of entries in a hashtable if you try and seqentially few them.

---
Just to add to the confusion I have seen a hashtable system where it uses a disecting tree system to store items
Code:
fred-+-bill-+-bill
     |      |
     |      \-dave
     |
     \-fred-+-fred
            |
            \-pete-+-mike
                   |
                   \-pete

* i might have got some of that table wrong there and i wouldnt even like to go into how its worked out, as it gave me a sore head years ago when i had to code one.

#131323 30/09/05 08:48 AM
Joined: Jul 2003
Posts: 655
Om3n Offline OP
Fjord artisan
OP Offline
Fjord artisan
Joined: Jul 2003
Posts: 655
I guess i am confused again then, that is not at all the impression i got from Kelder's explanation. My understanding has always been that the reason hashtables are so much faster is because they literally dont need to go through and check every item.

Going by what Kelder said, i assumed that it would somehow determine a hash key or keys in order to know which slots to look in. So you could conclude that the size of the table itself does not affect the speed, where as infact it is the ratio of the slots/items that determines this. (since it takes very little time to check through the 10ish items stored in each slot).

More specifically comments such as, however it is quite possible that i misunderstood.

"Basically this always takes the same amount of time, irrespective of the number of items involved..."
"The time taken to do this depends of the number of items in that list. Generally, going over 10 or 20 items doesn't take much time, so that why you can put 10 times as many items in a hash table as the size." - by 'list' here i got the impression he meant array list aka array slot aka slot.


"Allen is having a small problem and needs help adjusting his attitude" - Flutterby
#131324 30/09/05 09:30 AM
Joined: Apr 2003
Posts: 701
K
Hoopy frood
Offline
Hoopy frood
K
Joined: Apr 2003
Posts: 701
Yes, there seems to be stuff missing, but what? confused I think I kinda moved it to the alinea before... I guess I intended something like this:

So, how does this hash function work? Usually it takes the word as a number, then does a bitshift or 2 and then takes the remainder after division by a prime number (preferrably close to the size of the hash table). You should realize that if the hash function always returns the same number, or a very limited set of numbers for all keys, performance goes down the drain and you end up with a really bulky implementation of an unordered list you have to search through one by one...

#131325 30/09/05 09:47 AM
Joined: Apr 2003
Posts: 701
K
Hoopy frood
Offline
Hoopy frood
K
Joined: Apr 2003
Posts: 701
Yes, some implementations just put collisions (2 or more keys mapping to the same number) in the first free space after it. However, once you get to 'size' items in your hash table, it's full and there is no next space anymore...

Here's some more links:
Hash table explanation
Very elaborate hash explanation with animation! <= good read, and if you go to the index, there's plenty of other stuff to learn smile


Link Copied to Clipboard