mIRC Home    About    Download    Register    News    Help

Print Thread
Page 3 of 3 1 2 3
Joined: Dec 2002
Posts: 5,486
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,486
Yes, I have already researched this. A prime can be used as a precaution in case the hash function is not reliable. Thanks for looking into it though.

Joined: Feb 2003
Posts: 2,812
Hoopy frood
Offline
Hoopy frood
Joined: Feb 2003
Posts: 2,812
Do you have a table of pre-calculated primes, or is it discovering them on demand? And is there a potential for slow-down when working with 600,000+ item lists?


Well. At least I won lunch.
Good philosophy, see good in bad, I like!
Joined: Dec 2002
Posts: 5,486
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,486
No, there is no point in using a table of pre-calculated primes in this situation as it has little effect on performance.

Joined: Dec 2002
Posts: 5,486
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,486
Quote:
Has anyone looked at Paul Hsieh's SuperfastHash http://www.azillionmonkeys.com/qed/hash.html or Bob Jenkins even later and faster lookup3 both of which claim to be both fast and very evenly distributed.

I tested these as well as a large number of other hash functions. I even downloaded and compiled SMHasher and performed tests. When testing the hash tables in mIRC using direct calls to the hash table routines, FNV1-A-mod was better in both distribution and speed than any of the hashes tested so far. It has its flaws, eg. zero sensitivity, but that is unlikely to be an issue in practice. It's also satisfyingly simple :-) That is why I have chosen it.

Joined: Aug 2003
Posts: 320
P
Pan-dimensional mouse
Offline
Pan-dimensional mouse
P
Joined: Aug 2003
Posts: 320
Originally Posted By: Protopia
Originally Posted By: Khaled
Originally Posted By: Protopia
1. To change how the linked_list is ordered will break backwards compatibility for any scripts which rely on order.

Which is why I have stated many times that you should not depend on the order of hash table items.

In the real world, people have relied on this order even if you stated that they should not rely on it in some places (though NOT in the help file). So in the real world this change will break scripts.

Khaled - Upon consideration I would actually go further than I did above and say that there is a pressing need for scripters to be able to store unlimited data in a sequenced list.

Think about the difference between Python dicts (access by name) vs. lists (access by position) or Javascript Objects (access by name) vs. Arrays (access by position). If these languages have gone to the trouble of having entirely different ways of storing both types of data, there is clearly a strong need in any language to provide both types.

If you take away the current implementation whereby -m1 is sequenced, then you will leave mSL without any means of storing large amounts of sequenced data.

Joined: Jul 2006
Posts: 4,180
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,180
Quote:
If you take away the current implementation whereby -m1 is sequenced, then you will leave mSL without any means of storing large amounts of sequenced data.
Although it doesn't support binvar, a (hidden) @window is an ordered list data structure, this is what scripters should be using if they want ordered list . If you need one or more item in the list to be a binvar, you could always try to mark those lines specifically (if your data is known and never start with & for example, you can name the line &item where item is an item name to a known hash table having the binvar (could always put the name of the hash in there to make it generic)) and get a working list. It's not the same performance but it can be done. It is true that mIRC lacks data structure, specifically array, but it's also true that hash table are unordered, it's only unfortunate that the implementation + it not being documented properly resulted in some people relying on this to get order.

Breaking the bucket=1 behavior is not nice, ideally a new set of identifers and commands like /hmake2 could be added to preserve the current hash tables behavior but that's not practical in itself. I'm for the change though, let's have proper unordered associative array. But yes, an array support is needed, then.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Aug 2003
Posts: 320
P
Pan-dimensional mouse
Offline
Pan-dimensional mouse
P
Joined: Aug 2003
Posts: 320
Originally Posted By: Wims
Although it doesn't support binvar, a (hidden) @window is an ordered list data structure, this is what scripters should be using if they want ordered list.

1. No such thing as a hidden window - hidden windows still appear in the window list.

2. How does hidden window performance compare with hash tables -m1?

Joined: Jul 2006
Posts: 4,180
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,180
Hidden window is just a state, it's faster to display to an hidden window, you don't need such window to be visible if it's used for a list data structure is what I meant.
I don't know about the perf.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Aug 2003
Posts: 320
P
Pan-dimensional mouse
Offline
Pan-dimensional mouse
P
Joined: Aug 2003
Posts: 320
Originally Posted By: Wims
It is true that mIRC lacks data structure, specifically array, but it's also true that hash table are unordered, it's only unfortunate that the implementation + it not being documented properly resulted in some people relying on this to get order.

I think if you look back over the years, a lot of architecturally and aesthetically poor decisions have been made in the interests of preserving backwards compatibility. Why should this be different?

Originally Posted By: Wims
Breaking the bucket=1 behavior is not nice,

Exactly. Backwards compatibility is a watchword of mIRC's history.

Originally Posted By: Wims
Ideally a new set of identifers and commands like /hmake2 could be added to preserve the current hash tables behavior but that's not practical in itself.

How about a new set of commands and identifiers for a new better implementation of hash tables, keeping the existing ones for backward compatibility - or as previously proposed use the same commands and identifiers but with additional hmake flags to defined alternative internal processing.

Originally Posted By: Wims
let's have proper unordered associative array.

We already have an unordered associative array - the hash table. Aside from performance issues (which as I have already said several times are down to use of a linked-list per bucket much more than a slow or uneven hashing algorithm), what we have is fully functional.

We also have a proper, reasonably well performing ordered array solution for small amounts of data - tokenised strings.

What we don't have are:

1. A well performing ordered array solution that is not size limited.

2. Ability to do as JSON does and nest associative or ordered tables inside one another.

My wish list for an ordered list equivalent to existing hash tables would be:

a. Commands to insert and sort.
b. Nice to be able to slice and to transfer slices to / from token strings in a single statement without coded loops.
c. Nice to have ability to keep it auto sorted by data values.
d. Would be nice to have mixed associative / ordered tables which could be positional or auto-sorted on item OR data.

In terms of implementation, might be worth referring to the ancient IBM VSAM "database" which (if memory serves me) worked as follows:

1. Index was a linked list of fixed size array blocks each of which that could hold a UP TO a certain number of pointers to data (or name / data) records. (Each block held links to next and previous blocks and a counter of how many records were included in the block.) Deleting a record meant moving the pointers down one, inserting a record meant moving them up one. If block was full, then it would be split into two blocks each half full. There was also a command to reorganise the index if you had a lot of almost empty blocks in order to make all the index blocks x% full.

2. Whether you inserted by position or allowed it to determine position based on a sort order was a user choice.

3. Aside from the need to move (on average 1.5k of) memory up or down by a few bytes every time you delete or insert, this would be relatively well performing.

For positional access, the correct block would be found by iterating over the linked list of blocks and summing the number of items in each block. Once you have the right block, the memory position in the block can be calculated. Since you can probably store c. 1000 pointers in a 4k block this would be c. 1000x faster than iterating over a linked list of single items (as per single bucket hash list). (Of course they don't have to be 4K blocks - you could use any size blocks (say 1k or 16k) depending on the performance characteristics you were looking for.)

For auto-sorted access, you find the right block by looking at the last entry in each block, and can use a binary search within the correct block to find the exact value - so also pretty well performing.

Wildcard searches might still need to iterate over the whole table, but leading non-wild characters could allow you to subset which parts of the table to iterate over.

4. This algorithm can be used not only at a table-wide level, but you can also use it at a bucket level for buckets >=3 to store records in either item or data sequence and so do a binary search rather than an iteration to find the correct item.

NOTE: Since HADD has to do an HGET in order to determine whether to replace an existing record or add a new one, the performance improvements would likely happen for most types of hash table command / identifier.

5. Subject to the overhead of storing and maintaining the indexes, you could actually combine hashed, item-sorted, data-sorted and positional access methods on a single table, using hmake flags to indicate which types of access to create internally.

The only new commands needed would be hinsert and hsort.

6. This would create extended functionality with significantly better performance and WITHOUT losing backwards compatibility.

Joined: Aug 2003
Posts: 320
P
Pan-dimensional mouse
Offline
Pan-dimensional mouse
P
Joined: Aug 2003
Posts: 320
Originally Posted By: Wims
Hidden window is just a state, it's faster to display to an hidden window, you don't need such window to be visible if it's used for a list data structure is what I meant.
I don't know about the perf.

Currently writing scripts which need unlimited sequenced data - I might try a hidden window technique rather than hash -m1.

Joined: Jul 2006
Posts: 4,180
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,180
Quote:
How about a new set of commands and identifiers for a new better implementation of hash tables, keeping the existing ones for backward compatibility
That's what I meant, it was said in the sense that a new set allows to preserve the old one.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Aug 2003
Posts: 320
P
Pan-dimensional mouse
Offline
Pan-dimensional mouse
P
Joined: Aug 2003
Posts: 320
Originally Posted By: Wims
Quote:
How about a new set of commands and identifiers for a new better implementation of hash tables, keeping the existing ones for backward compatibility
That's what I meant, it was said in the sense that a new set allows to preserve the old one.

Yes - but even better would be to NOT create a competing set of commands / identifiers which then results in having to maintain both sets for ever more, but instead to leave the existing set as they are and extend them by use of flags (which is what has generally been done in the past).

But of course, all this is dependent on Khaled being willing to develop any changes, and I think he has made his position quite clear on what he is and isn't prepared to do at the present time.

Last edited by Protopia; 04/11/18 06:30 PM.
Joined: Dec 2002
Posts: 5,486
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,486
Okay, I have extended the hash table code to allow custom options per table, so different rules can now be applied to different tables. For backwards compatibility, the next beta will allow single slot hash tables and will disable any optimizations that affect the order of items.

Joined: Aug 2003
Posts: 320
P
Pan-dimensional mouse
Offline
Pan-dimensional mouse
P
Joined: Aug 2003
Posts: 320
Originally Posted By: Khaled
Okay, I have extended the hash table code to allow custom options per table, so different rules can now be applied to different tables.

I would be very interested to hear details of these changes if you are willing to spend the time to describe them.

Originally Posted By: Khaled
For backwards compatibility, the next beta will allow single slot hash tables and will disable any optimizations that affect the order of items.

I think that this is a good decision.

Last edited by Protopia; 07/11/18 09:25 AM. Reason: speeling
Joined: Aug 2003
Posts: 320
P
Pan-dimensional mouse
Offline
Pan-dimensional mouse
P
Joined: Aug 2003
Posts: 320
I am not suggesting another attempt to improve hash tables should be made now, but I just wanted to note for future consideration that Facebook have just open-sourced their F14 hashing algorithm C++ code which is supposed to minimise hash clashes, be extremely fast and have 14 access types.

Page 3 of 3 1 2 3

Link Copied to Clipboard