Originally Posted By: Khaled
Originally Posted By: maroon
It now is not permitting buckets=1 for those wishing to keep items in the original order, because N=1 is now flipped to use buckets=3.

If I do this, this means that all future optimizations I make will need to cater for this behaviour, even though it is not meant to be a feature of hash tables. This is why I keep stating that you should not expect hash tables to preserve order.

In fact, one of the optimizations I was going to make, which repositions items closer to the top of the list when they are retrieved to make future retrieval faster, would undermine order even in a one slot hash table. I had planned to do this for the next beta.

1. To change how the linked_list is ordered will break backwards compatibility for any scripts which rely on order.

2. To not allow slots = 1 will break backwards compatibility for any scripts which rely on slots = 1 being ordered e.g. for queuing / stacks.

3. If you move an item in the linked-list, then you will also change the position in iterable use so e.g. $hget(table,11).item will suddenly become $hget(table,1).item, and this will break backwards compatibility.

4. In real-world use of hash tables, is there any evidence that moving the most recently accessed item to the beginning of the linked list will improve performance?

As an alternative, for each table how about caching the last $hget / $hfind parameters and linked-list location and intelligently deciding whether you can use those to speed things up.

In the real world you commonly get both item or data so $hget(table,n).item followed by $hget(table,n).data would be faster because you can go straight to the linked-list location. Similarly it is common to iterate through the table with $hget(table,n).item or $hfind so $hget(table,n).data followed by $hget(table,n + m).data would be faster because you can start iterating from the point that you got to last time. The same would work for $hfind.

You might also want to separately cache the parameters and results of any $hget(table,0).item or $hfind(...,0) so that loops which keep asking for these values would not need to iterate the whole list every time you call it.

These cached parameters / locations / results would be invalidated any time you add, change or delete records in the hash table.

These optimisations would work for all loops which iterate forwards i.e.
Code:
var %i 1 | while (%i <= $hget(table,0).item) { echo $hget(table,%i).item = $hget(table,%i).data | inc %i }
var %i 1 | while ($hget(table,%i).item) { echo $v1 = $hget(table,%i).data | inc %i }


Personally, I tend to code differently i.e.
Code:
var %i $hget(table,0).item | while (%i) { echo $hget(table,%i).item = $hget(table,%i).data | dec %i }

and these optimisations would not work for any descending loop. But they would be unlikely to slow them down noticeably, and for loops where the sequence is unimportant, recoding to take advantage of performance improvements would be simple.

Indeed, the only use case where you couldn't recode is when you want to access in the exact sequence you added items with slots=1. Because mIRC currently adds to the beginning of the linked-list you don't hold a pointer to the last item in the linked list (and possibly each item only has a forward pointer), it would be more difficult to optimise decreasing indexes. If you wanted to optimise for decreasing loops you would need an /hmake /hadd flag which tells mIRC to hold a pointer to the end of each linked-list and to hold backward pointers in each item so that you can iterate over the list in both directions. So this would enable you to decide whether e.g. for $hget(table,n).item followed by $hget(table,n - m).item mIRC could choose to iterate backwards over the linked_list using the cached location or to iterate from scratch from the beginning of the linked-list. (This choice could e.g. be based on the value of n and whether n-m is closer to 0 or to n, but I suspect in this event the scripter might want to influence this choice so we would need syntax for that.)

All that said, in the real world I suspect that hash tables used e.g. as a FIFO queue where you want to iterate in sequence added or e.g. $hget(table,$hget(table,0).item).data would tend to be relatively small cf. other types of table where you want to iterate but don't care about sequence, so personally I wouldn't worry about optimising decreasing loops.