mIRC Home    About    Download    Register    News    Help

Print Thread
Page 1 of 2 1 2
#162421 17/10/06 07:22 PM
Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Hi all,

I need some help.
I have a hash table with data stored in it, around 3000 items.
It is stored like:
item token1 token2 token3 token4 token5 token6 token7 token8 token9

Now, my problem is this.
I want to take a piece of text, and find that piece of text back in the hash table.
I'm using something like this at the moment:
var %match = $hget(Table,$hfind(Table,* $+ %matchtext $+ *,1,w).data)

Now, I DON'T want to search all tokens in the table, but ONLY token5.
And I also don't want wildmatches, I want only EXACT matches.

Example:
2231 Green Blue Silver Orange Purple Black Yellow Red
2232 Blue Silver Orange Purple Black Yellow Red Green

If I search for Orange, it should return nothing.
If I search for Black, it should return the data of line 2232
If I search for Purple, it should return the data of line 2231
If I search for Red, it should return nothing.

Can this be done with the above mentioned line?
var %match = $hget(Table,$hfind(Table,* $+ %matchtext $+ *,1,w).data)

Thx a lot in advance!

Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
Not using the code you've got there. Firstly, is the number of tokens always the same? If it is, you can use this (assuming 7 tokens every time):
Code:
$hfind(Table,* * * * %matchtext * *,1,w).data


If not, you'll have to use regular expressions to search like so:
Code:
$hfind(Table, $+(/^(?:\S+\s){4}, $re_escape(%matchtext), (?:\s|$)/), 1, r).data

You'll also need the following alias to use that:
Code:
alias re_escape return $regsubex($1, /([^\w\s])/g, \\t)


I have no idea what the performance of a regular expression search is on a hash table with 3000 items, my guess is "not good". But you'll have to try it and see.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Hmm, the first option looks the best imo.
Yes, the number of tokens is always the same, but it is seperated by the tab-character.

So, instead of using the * * * *
$hfind(Table,* * * * %matchtext * *,1,w).data

Should we change it?
Including the $chr(9) somehow?

Thx starbucks

Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
Ah. In that case:
Code:
$hfind(Table,$instok($replace($str(*.,7),.,$chr(9)),%matchtext,5,9),1,w).data


That should do it.

Last edited by starbucks_mafia; 17/10/06 09:48 PM.

Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Thats looks awesome!

Thanks a lot man. I hope it works grin

Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Hmm, didn't work really.
Freezes mirc for like 60 seconds, and results nothing.

The tokes are actually 14.
And we indead need to search only token 5 of each item.

Would that make a difference?

Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
If there are 14 tokens you'd need to change the code to:
Code:
$hfind(Table,$instok($replace($str(*.,[color:red]14[/color]),.,$chr(9)),%matchtext,5,9),1,w).data


But that's gonna make it take even longer to search. You could try the regular expression code I gave, there's a slight possibility it'll run faster than regular wildcards for this. Beyond that I'm afraid there's not much to be done about the speed beyond changing the way you're storing the data.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
For some reason, seeing it take 60 seconds to search 3000 items seems like something is wrong. A hash table should be able to do that in under 5 seconds and probably under 1. I'll have to test this at home.

Think about it... if it takes 60 seconds to search 3000 items, then that's only checking 50 items per second. I'm sure you could search a text file (reading each line) faster than that.


Invision Support
#Invision on irc.irchighway.net
Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Hmm, the 60 seconds are possibly because it is part of a loop.
Basicly it takes a piece of text, and searches the hash table.

In total it searches for like 300 items in that loop.
But still, with the original code, it takes like 5 seconds. The only problem is that it matches wildmatch stuff, on several tokens. So I get wrong results.

I want to only return exact matches located in token 5.
I tried the updated version from starbucks, but it returns nothing, and takes also like 2 minutes or something.

Perhaps I should copy only the relevant tokens (1,2,3,5) to a hidden window, and search for it from there...

Wouldn't know how, but ok.

Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Tried the regex string as well, but doesn't return anything as far as I can see. What should it return? Only the matched part of the string? Or the entire data line?

I need the entire data line frown

Anyone with idea's?

Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
Is the table private? If not, and you can /hsave it to a file for me to load up, I can test it out some tonight. If you want to do that, join the channel listed below and send me the file and then leave me a message. (*Note that it should accept the file if it's txt. I'm not sure if I have ini blocked*). I'll be home a bit later and can check it out.

Using yours would prevent me needing to try and create a 3000 entry table to work with.


Invision Support
#Invision on irc.irchighway.net
Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Oww yeah, that would be awesome.
I'm making a ZIP package, and will deliver it soon-ish.
I'll include a little description, so you understand exactly what the purpose is.

Thx for giving it a try in any case smile

Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
I replied to your first message, as i think you were staired down a bad path.

Quote:

2231 Green Blue Silver Orange Purple Black Yellow Red
2232 Blue Silver Orange Purple Black Yellow Red Green

If I search for Orange, it should return nothing.
If I search for Black, it should return the data of line 2232
If I search for Purple, it should return the data of line 2231
If I search for Red, it should return nothing.

var %match = $hget(Table,$hfind(Table,& & & & %matchtext *,1,w).data)

& symbolizes one word broken by space.

FYI The reason using * * * * etc was causing such delays was when that matching method is used, its running backward and forward all over each data item trying to match this complexe string of lots of * and things, I wrote about this in some other thread once. Using & well allow it to know it needs 4 words then your match text then anything else after it.

PS: watch out if your matching the last token that you dont have the " *" on the end.

Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Thanks DaveC.

It's a little more complicated.
The tokens itself contain spaces. The only seperator is the tab key. So I guess the & character wouldn't work.

Also, there are like 14 token, but that should be no problem. I could edit your line for that.

Anyway, I mailed the file to Riamus, and he's taking a look at it now. So lets he can find a solution for it.

If we don't get it sorted, I'll post another message here.
Thx in any case!

Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
If thats the case, you main problem is with the speed of search using multiple *'s to cover each token space, i assume its $chr(9) for tab between them is it? (not space like you showed)

Solution one: (keep orginal hash data) [im going to assume your actually using TAB/$chr(9) to seperate data]

Search for items by simplly using $+(*,%matchtext,*) and then compare the result using IF ($+(*,$chr(9),*,$chr(9),*,$chr(9),*,$chr(9),%matchtext,$chr(9),*,$chr(9),*,$chr(9),etc etc) iswm %var)
ex
Code:
var %n = 1, %c = 0
while ($hget($hfind(TABLE,* $+ %matchtext $+ *,%n,w).data)) {
  var %match = $v1
  if ($gettok(%match,5,9) == %matchtext) {
    inc %c
    echo -a MATCH $+(chr(35),%c) BEING %match
  }
  inc %n
}

You well actually find this is alot faster than letting a complexe wildmatch using * loose on the whole hash table, your essentially going "ok take only lines with XYZ, and then wildmatch them"


Solution two (alter hash data)
isolate a character that one appear in the tokendata ever, and replace all spaces with this character, then replace it again as you pull the data out of the hash table
ex
Code:
alias spaces.to.255 { return $replace($1,$chr(32),$chr(255)) }
alias spaces.from.255 { return $replace($1,$chr(255),$chr(32)) }
var %match = $spaces.from.255($hget(TABLE,$hfind(Table,& & & & %matchtext *,1,w).data))
;and also have to use /HADD TABLE itemname $spaces.to.255(%normaldata)

I prefer the other way unless i was just starting out on the project, and thus wouldnt need to worry about pacthing existing code

Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
You're assuming the subset of matches from the hashtable will be small enough to warrant the slowdown caused by manually looping over it instead of using $hfind()'s efficient built-in loop. Based on the examples OrionsBelt has given, each token is based on a relatively small set and will frequently occur in the table at some point, just not necessarily as the fifth token. If that's true for the entire table then making a vague match and a second pass will only make things slower.

Ultimately the only solution that's likely to provide acceptable results is a reworking of the data into something easier to index.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
You're not considering the complexity of the search it's having to make. I don't know exactly how efficient or 'intelligent' the matching code is for wildcards but it's entirely possible that it's performing thousands of searches for each item in order to make or rule out a match. The speed of accessing the hash table (or even a file) is negligable compared to the processing time required to match.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
Strange it's not matching. The peice of code I gave you should return the item name of the first match. Perhaps you should post the relevant code you're using and maybe a few items of the actual hash table data.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
Actually, I did test it out and I have it working in about 1.5 seconds.

After loading his hash table into mIRC, and calling it Orion, this is the format I used. It handles the data in a short time and works well.

//echo -a $hget(orion,$hfind(orion,$str(* $+ $chr(9),4) $+ %string $+ $str($chr(9) $+ *,9),1,w).data).data

There still is that 1.5s pause, but that's not too bad... especially compared to 60s.


Invision Support
#Invision on irc.irchighway.net
Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
Is the 1.5 seconds for a match though? Remember it'll only search to the first match, so if it's matching near the 'start' of the hash table it'll be a lot faster than searching for something not in the table at all.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Page 1 of 2 1 2

Link Copied to Clipboard