Leaving aside the issues of interpreted scripting having inherent overhead, I suspect that the biggest overhead with hash tables is iterating over the items in the slot (or if you are indexing by position or using $hfind iterating over the whole table).

Clearly the fastest lookup is a hash lookup where every occupied slot has only a single entry, avoiding iteration. In mathematics there is a well know "birthday problem" which says that it only takes 23 people for the probability that 2 or more of them share a birthday to be > 50%. This is equivalent to saying that for a hash table of 23 items, you would need 366 slots for the probability of every slot having at most one item to be > 50%. On the other hand, since the maximum 10,000 slots only takes 40KB of memory, on modern computers it is not that big a problem to use that value. The only downside is that in a non-hashed lookup you are going to be iterating over a lot of empty slots which has its own overhead.

Alternatively, it is easy to calculate the probability that, for a table of M items and N slots, every slot is used. ((N-1)/N)^M It turns out that you need 1.444 slots per item for there to be a 50% probability of all slots being used.

So my advice would be:

a. If you are only doing non-hashed lookups, use a slots value of 1.

b. If you are only hashed lookups, then use a slots value of 10,000.

c. If you are doing both hashed lookups and non-hashed lookups use slots = estimated-items * 1.444.


If we are looking for optimising non-hashed table lookups, then my suggestion would be to allow the scripter to choose an alternative table implementation using flags to create a hash table stored internally:

a. An array of pointers to the item/data sequenced on item-name using binary search to locate the item.
b. An array of pointers to the item/data sequenced on data using binary search.
c. Use a. as an alternative to the current implementation for the list for each slot.
d. As previously suggested, enhance the existing implementation by storing the first and last items of the linked list in each slot (using 8 bytes instead of 4) and add new items to the end of the list instead of the beginning - so that slots=1 stores the table exactly sequentially.

And aside from the overheads of adding or deleting an item from three lists (and the memory of course), you could even allow hashing, and the above two additional methods to be used in combination.

This would allow mIRC to have an arsenal of optimisation techniques to improve the speed of non-hashed lookups. For example a wild-match on item which doesn't have */? as the first character could be optimised to use 2 binary searches to find the beginning and end of the section of the sequenced array that holds the relevant items, and then search only within that section. Or a search with items as the wild-match could do something similar to look for table entries that start with */?/first character and search only those parts of the table.

As far as I can see, this would be fully functionally backwards and forwards compatible, and scripts written for the new functionality would only have a performance issue on older mIRC versions.

What do people think?