It is true that mIRC lacks data structure, specifically array, but it's also true that hash table are unordered, it's only unfortunate that the implementation + it not being documented properly resulted in some people relying on this to get order.
I think if you look back over the years, a lot of architecturally and aesthetically poor decisions have been made in the interests of preserving backwards compatibility. Why should this be different?
Breaking the bucket=1 behavior is not nice,
Exactly. Backwards compatibility is a watchword of mIRC's history.
Ideally a new set of identifers and commands like /hmake2 could be added to preserve the current hash tables behavior but that's not practical in itself.
How about a new set of commands and identifiers for a new better implementation of hash tables, keeping the existing ones for backward compatibility - or as previously proposed use the same commands and identifiers but with additional hmake flags to defined alternative internal processing.
let's have proper unordered associative array.
We already have an unordered associative array - the hash table. Aside from performance issues (which as I have already said several times are down to use of a linked-list per bucket much more than a slow or uneven hashing algorithm), what we have is fully functional.
We also have a proper, reasonably well performing ordered array solution for small amounts of data - tokenised strings.
What we don't have are:
1. A well performing ordered array solution that is not size limited.
2. Ability to do as JSON does and nest associative or ordered tables inside one another.
My wish list for an ordered list equivalent to existing hash tables would be:
a. Commands to insert and sort.
b. Nice to be able to slice and to transfer slices to / from token strings in a single statement without coded loops.
c. Nice to have ability to keep it auto sorted by data values.
d. Would be nice to have mixed associative / ordered tables which could be positional or auto-sorted on item OR data.
In terms of implementation, might be worth referring to the ancient IBM VSAM "database" which (if memory serves me) worked as follows:
1. Index was a linked list of fixed size array blocks each of which that could hold a UP TO a certain number of pointers to data (or name / data) records. (Each block held links to next and previous blocks and a counter of how many records were included in the block.) Deleting a record meant moving the pointers down one, inserting a record meant moving them up one. If block was full, then it would be split into two blocks each half full. There was also a command to reorganise the index if you had a lot of almost empty blocks in order to make all the index blocks x% full.
2. Whether you inserted by position or allowed it to determine position based on a sort order was a user choice.
3. Aside from the need to move (on average 1.5k of) memory up or down by a few bytes every time you delete or insert, this would be relatively well performing.
For positional access, the correct block would be found by iterating over the linked list of blocks and summing the number of items in each block. Once you have the right block, the memory position in the block can be calculated. Since you can probably store c. 1000 pointers in a 4k block this would be c. 1000x faster than iterating over a linked list of single items (as per single bucket hash list). (Of course they don't have to be 4K blocks - you could use any size blocks (say 1k or 16k) depending on the performance characteristics you were looking for.)
For auto-sorted access, you find the right block by looking at the last entry in each block, and can use a binary search within the correct block to find the exact value - so also pretty well performing.
Wildcard searches might still need to iterate over the whole table, but leading non-wild characters could allow you to subset which parts of the table to iterate over.
4. This algorithm can be used not only at a table-wide level, but you can also use it at a bucket level for buckets >=3 to store records in either item or data sequence and so do a binary search rather than an iteration to find the correct item.
NOTE: Since HADD has to do an HGET in order to determine whether to replace an existing record or add a new one, the performance improvements would likely happen for most types of hash table command / identifier.
5. Subject to the overhead of storing and maintaining the indexes, you could actually combine hashed, item-sorted, data-sorted and positional access methods on a single table, using hmake flags to indicate which types of access to create internally.
The only new commands needed would be hinsert and hsort.
6. This would create extended functionality with significantly better performance and WITHOUT losing backwards compatibility.