mIRC Home    About    Download    Register    News    Help

Print Thread
Page 1 of 2 1 2
#162421 17/10/06 07:22 PM
Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Hi all,

I need some help.
I have a hash table with data stored in it, around 3000 items.
It is stored like:
item token1 token2 token3 token4 token5 token6 token7 token8 token9

Now, my problem is this.
I want to take a piece of text, and find that piece of text back in the hash table.
I'm using something like this at the moment:
var %match = $hget(Table,$hfind(Table,* $+ %matchtext $+ *,1,w).data)

Now, I DON'T want to search all tokens in the table, but ONLY token5.
And I also don't want wildmatches, I want only EXACT matches.

Example:
2231 Green Blue Silver Orange Purple Black Yellow Red
2232 Blue Silver Orange Purple Black Yellow Red Green

If I search for Orange, it should return nothing.
If I search for Black, it should return the data of line 2232
If I search for Purple, it should return the data of line 2231
If I search for Red, it should return nothing.

Can this be done with the above mentioned line?
var %match = $hget(Table,$hfind(Table,* $+ %matchtext $+ *,1,w).data)

Thx a lot in advance!

Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
Not using the code you've got there. Firstly, is the number of tokens always the same? If it is, you can use this (assuming 7 tokens every time):
Code:
$hfind(Table,* * * * %matchtext * *,1,w).data


If not, you'll have to use regular expressions to search like so:
Code:
$hfind(Table, $+(/^(?:\S+\s){4}, $re_escape(%matchtext), (?:\s|$)/), 1, r).data

You'll also need the following alias to use that:
Code:
alias re_escape return $regsubex($1, /([^\w\s])/g, \\t)


I have no idea what the performance of a regular expression search is on a hash table with 3000 items, my guess is "not good". But you'll have to try it and see.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Hmm, the first option looks the best imo.
Yes, the number of tokens is always the same, but it is seperated by the tab-character.

So, instead of using the * * * *
$hfind(Table,* * * * %matchtext * *,1,w).data

Should we change it?
Including the $chr(9) somehow?

Thx starbucks

Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
Ah. In that case:
Code:
$hfind(Table,$instok($replace($str(*.,7),.,$chr(9)),%matchtext,5,9),1,w).data


That should do it.

Last edited by starbucks_mafia; 17/10/06 09:48 PM.

Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Thats looks awesome!

Thanks a lot man. I hope it works grin

Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Hmm, didn't work really.
Freezes mirc for like 60 seconds, and results nothing.

The tokes are actually 14.
And we indead need to search only token 5 of each item.

Would that make a difference?

Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
If there are 14 tokens you'd need to change the code to:
Code:
$hfind(Table,$instok($replace($str(*.,[color:red]14[/color]),.,$chr(9)),%matchtext,5,9),1,w).data


But that's gonna make it take even longer to search. You could try the regular expression code I gave, there's a slight possibility it'll run faster than regular wildcards for this. Beyond that I'm afraid there's not much to be done about the speed beyond changing the way you're storing the data.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
For some reason, seeing it take 60 seconds to search 3000 items seems like something is wrong. A hash table should be able to do that in under 5 seconds and probably under 1. I'll have to test this at home.

Think about it... if it takes 60 seconds to search 3000 items, then that's only checking 50 items per second. I'm sure you could search a text file (reading each line) faster than that.


Invision Support
#Invision on irc.irchighway.net
Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Hmm, the 60 seconds are possibly because it is part of a loop.
Basicly it takes a piece of text, and searches the hash table.

In total it searches for like 300 items in that loop.
But still, with the original code, it takes like 5 seconds. The only problem is that it matches wildmatch stuff, on several tokens. So I get wrong results.

I want to only return exact matches located in token 5.
I tried the updated version from starbucks, but it returns nothing, and takes also like 2 minutes or something.

Perhaps I should copy only the relevant tokens (1,2,3,5) to a hidden window, and search for it from there...

Wouldn't know how, but ok.

Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Tried the regex string as well, but doesn't return anything as far as I can see. What should it return? Only the matched part of the string? Or the entire data line?

I need the entire data line frown

Anyone with idea's?

Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
Is the table private? If not, and you can /hsave it to a file for me to load up, I can test it out some tonight. If you want to do that, join the channel listed below and send me the file and then leave me a message. (*Note that it should accept the file if it's txt. I'm not sure if I have ini blocked*). I'll be home a bit later and can check it out.

Using yours would prevent me needing to try and create a 3000 entry table to work with.


Invision Support
#Invision on irc.irchighway.net
Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Oww yeah, that would be awesome.
I'm making a ZIP package, and will deliver it soon-ish.
I'll include a little description, so you understand exactly what the purpose is.

Thx for giving it a try in any case smile

Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
I replied to your first message, as i think you were staired down a bad path.

Quote:

2231 Green Blue Silver Orange Purple Black Yellow Red
2232 Blue Silver Orange Purple Black Yellow Red Green

If I search for Orange, it should return nothing.
If I search for Black, it should return the data of line 2232
If I search for Purple, it should return the data of line 2231
If I search for Red, it should return nothing.

var %match = $hget(Table,$hfind(Table,& & & & %matchtext *,1,w).data)

& symbolizes one word broken by space.

FYI The reason using * * * * etc was causing such delays was when that matching method is used, its running backward and forward all over each data item trying to match this complexe string of lots of * and things, I wrote about this in some other thread once. Using & well allow it to know it needs 4 words then your match text then anything else after it.

PS: watch out if your matching the last token that you dont have the " *" on the end.

Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Thanks DaveC.

It's a little more complicated.
The tokens itself contain spaces. The only seperator is the tab key. So I guess the & character wouldn't work.

Also, there are like 14 token, but that should be no problem. I could edit your line for that.

Anyway, I mailed the file to Riamus, and he's taking a look at it now. So lets he can find a solution for it.

If we don't get it sorted, I'll post another message here.
Thx in any case!

Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
If thats the case, you main problem is with the speed of search using multiple *'s to cover each token space, i assume its $chr(9) for tab between them is it? (not space like you showed)

Solution one: (keep orginal hash data) [im going to assume your actually using TAB/$chr(9) to seperate data]

Search for items by simplly using $+(*,%matchtext,*) and then compare the result using IF ($+(*,$chr(9),*,$chr(9),*,$chr(9),*,$chr(9),%matchtext,$chr(9),*,$chr(9),*,$chr(9),etc etc) iswm %var)
ex
Code:
var %n = 1, %c = 0
while ($hget($hfind(TABLE,* $+ %matchtext $+ *,%n,w).data)) {
  var %match = $v1
  if ($gettok(%match,5,9) == %matchtext) {
    inc %c
    echo -a MATCH $+(chr(35),%c) BEING %match
  }
  inc %n
}

You well actually find this is alot faster than letting a complexe wildmatch using * loose on the whole hash table, your essentially going "ok take only lines with XYZ, and then wildmatch them"


Solution two (alter hash data)
isolate a character that one appear in the tokendata ever, and replace all spaces with this character, then replace it again as you pull the data out of the hash table
ex
Code:
alias spaces.to.255 { return $replace($1,$chr(32),$chr(255)) }
alias spaces.from.255 { return $replace($1,$chr(255),$chr(32)) }
var %match = $spaces.from.255($hget(TABLE,$hfind(Table,& & & & %matchtext *,1,w).data))
;and also have to use /HADD TABLE itemname $spaces.to.255(%normaldata)

I prefer the other way unless i was just starting out on the project, and thus wouldnt need to worry about pacthing existing code

Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
You're assuming the subset of matches from the hashtable will be small enough to warrant the slowdown caused by manually looping over it instead of using $hfind()'s efficient built-in loop. Based on the examples OrionsBelt has given, each token is based on a relatively small set and will frequently occur in the table at some point, just not necessarily as the fifth token. If that's true for the entire table then making a vague match and a second pass will only make things slower.

Ultimately the only solution that's likely to provide acceptable results is a reworking of the data into something easier to index.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
You're not considering the complexity of the search it's having to make. I don't know exactly how efficient or 'intelligent' the matching code is for wildcards but it's entirely possible that it's performing thousands of searches for each item in order to make or rule out a match. The speed of accessing the hash table (or even a file) is negligable compared to the processing time required to match.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
Strange it's not matching. The peice of code I gave you should return the item name of the first match. Perhaps you should post the relevant code you're using and maybe a few items of the actual hash table data.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
Actually, I did test it out and I have it working in about 1.5 seconds.

After loading his hash table into mIRC, and calling it Orion, this is the format I used. It handles the data in a short time and works well.

//echo -a $hget(orion,$hfind(orion,$str(* $+ $chr(9),4) $+ %string $+ $str($chr(9) $+ *,9),1,w).data).data

There still is that 1.5s pause, but that's not too bad... especially compared to 60s.


Invision Support
#Invision on irc.irchighway.net
Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
Is the 1.5 seconds for a match though? Remember it'll only search to the first match, so if it's matching near the 'start' of the hash table it'll be a lot faster than searching for something not in the table at all.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
Quote:
//echo -a $hget(orion,$hfind(orion,$str(* $+ $chr(9),4) $+ %string $+ $str($chr(9) $+ *,9),1,w).data).data


The $hfind returns a ITEMNAME, from what i read above this happens to be a number, $hget().data is a direct index access using the offset into the hashtable not the ITEMNAME returned by the $hfind, i beleive if it is still matching the correct data it is doing it by shear chance, and that chance is likely to fail as soon as some data is altered.

Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
Well i did give the alter the hash table option as well, all tho now reading it i didnt actually mention the TABS were ment to be replaced by spaces as well, as the spaces being replaced by 255 so it would have screwed up anyway. this should suit him well enough, and its pretty simple to patch into existing code even.

alias -l data.IN { return $replace($1,$chr(32),$chr(255),$chr(9),$chr(32)) }
alias -l data.OUT{ return $replace($1,$chr(32),$chr(9),$chr(255),$chr(32)) }
;
/VAR %match = $data.OUT( $hget(TABLE,$hfind(Table,& & & & $data.in(%matchtext) *,1,w).data) )
/HADD TABLE itemname $data.IN(%normaldata)
; only thing to watch is the %matchtext also has spaces replaced with $chr(255) as i showed.

Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
Hm... that shouldn't have had .data at the end.


Invision Support
#Invision on irc.irchighway.net
Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
Quote:
Is the 1.5 seconds for a match though? Remember it'll only search to the first match, so if it's matching near the 'start' of the hash table it'll be a lot faster than searching for something not in the table at all.


It's the same time regardless of the location in the hash table, or if there are no results.

Here is the code used (fixed thanks to the comment by DaveC):

//echo -a $hget(orion,$hfind(orion,$str(* $+ $chr(9),4) $+ %string $+ *,1,w).data)

The only change is the removal of .data at the end of the original and I also removed the second string of $chr(9)'s since it wasn't needed. Even with that, it was still 1.5s or so.


Invision Support
#Invision on irc.irchighway.net
Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
Quote:
//echo -a $hget(orion,$hfind(orion,$str(* $+ $chr(9),4) $+ %string $+ *,1,w).data)

I would change that to
//echo -a $hget(orion,$hfind(orion,$str(* $+ $chr(9),4) $+ %string $+ $chr(9) $+ *,1,w).data)

To ensure no accidental part matches such as %string = "fred" making *<tab>*<tab>*<tab>*<tab>fred<tab>* which wont mistakenly match to say fredrick

One thing about the 1.5s was it on every match not just ,1, ?
Im also unsure on the order of results of using hfind (i have never checked them) do they come in the order they would appear in a hsave file? I would think it likely is.

Since you have the sample data, try changing it to search for something in say ther 14th token and see if there is a increased delay, i have a feeling there well be.

Side Note: I had this same type of problem with /FILTER i was filtering from different tokens in it in like token number 18 to 22, and the delay was horrible, I finally added a extra token at token 18 that was a unquie value (cant remeber what it was lets say it was ¥) so then i could filter on *¥~*~*~*~*~ $+ matchtext $+ ~* (for token match on 22 now token 23)
aka i allow it to jump all the starter tokens and align to where its ment to be using a unique marker on each line to aline with.
(this of course means a adjustment of the data, which isnt always possable)

How often is the code triggered as even 1.5s is pretty high pause rate for mirc if your doing it alot, or your looping.
A /hsave -n followed by a /filter -n on the file giving you a result set with the first word as a backrefrence number to $hget(table,N).data might be faster (if theres a lot of matches to find)
^-- that idea is the same princible as the sorting hashtable values thing with using /filter -n (hope im making my self clear enough here)

Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
Quote:
I would change that to
//echo -a $hget(orion,$hfind(orion,$str(* $+ $chr(9),4) $+ %string $+ $chr(9) $+ *,1,w).data)

- This won't work correctly. You must explicitly specify every token delimiter in order to ensure it only matches the correct token. Using your code it could match any item which has %string as token 5-13.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Oct 2005
Posts: 1,741
G
Hoopy frood
Offline
Hoopy frood
G
Joined: Oct 2005
Posts: 1,741
I'm not sure why the regular expression $hfind was dismissed, but it seems to work best (most accurate) out of all the examples given. I wrote up a test alias to show the regex working:

Code:
alias re_hfind {
if ($1 == $null) { echo -a Syntax: /re_hfind &lt;matchtext&gt; | return }

;;;;;; Create a table for testing ;;;;;;
  if ($hget(test)) hfree test

  echo -a Making Table
  hmake test 100
  var %c = 0, %cc = 3000
  while (%c &lt; %cc) {
    inc %c
    hadd test %c $+($rand(100,999),$chr(9),$rand(100,999),$chr(9),$rand(100,999),$chr(9),$rand(100,999),$chr(9),$rand(100,999),$chr(9),$rand(100,999),$chr(9),$rand(100,999))
  }
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


  var %ticks = $ticks
  var %matchtext = $re_escape($1)
  var %regex = /^(?:\S+\s+){4}( $+ %matchtext $+ )/i
  echo -a %regex
  var %h, %c = 0, %cc = $hfind(test,%regex,0,r).data
  echo 0&gt; $calc($ticks - %ticks) ms

  echo -a $hfind(test,*,0,w) items; %cc matches

  while (%c &lt; %cc) {
    inc %c
    %h = $hfind(test,%regex,%c,r).data
    echo -a Match: Ticks- $calc($ticks - %ticks) $+ ms Item- %h Data- $hget(test,%h)

  }
  echo X&gt; $calc($ticks - %ticks) ms
}

alias re_escape return $regsubex($1,/([^\w\s])/g,\\t)


It creates a test hash table (with numbers as values) each time the alias is called (to provide different test conditions). Simply a matchtext as $1 when calling the alias, and various bits of data will be shown on the screen.

I ran the code, and it seemed to take about 50ms per $hfind call.

-genius_at_work

Joined: Apr 2004
Posts: 759
M
Hoopy frood
Offline
Hoopy frood
M
Joined: Apr 2004
Posts: 759
hehehe i was reading this thinking the figures dont add up seeing as you prooved regex hfind is not slow (atleast not as slow as 60 seconds on 3000 items) i wont anymore :tongue:

First thing i noticed noone asked how the OP was using the regex find in his loop.
Quote:

Hmm, the 60 seconds are possibly because it is part of a loop.
Basicly it takes a piece of text, and searches the hash table.

In total it searches for like 300 items in that loop.
But still, with the original code, it takes like 5 seconds. The only problem is that it matches wildmatch stuff, on several tokens. So I get wrong results.


which should be the reason for the large execution time.


$maybe
Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
DaveC: That won't matter because the search param always has quotes around it and nothing else does. However, if he did want to search other tokens, you're correct that it would need that extra $chr(9).

starbucks_mafia: Same thing... because there are quotes around it, it's not going to be a problem. However, if he does want to search other params, it should include all tokens.


In either case, the search returns the result in 1.5s regardless which way it is handled. I'll throw back in the remaining tokens as I originally had it.


Invision Support
#Invision on irc.irchighway.net
Joined: Apr 2006
Posts: 464
O
Fjord artisan
OP Offline
Fjord artisan
O
Joined: Apr 2006
Posts: 464
Well, the delay has been solved in my opinion.
If searching for 1 item takes 1.5 second, searching for 50 items will take about 75 seconds, which is 1.5 minute.

And that is exactly what my script does.
Just repeating the same find command several times.

For now I'll use:
$hget(Table,$hfind(Table,$str(* $+ $chr(9),4) $+ $chr(34) $+ %matchtext $+ $chr(34) $+ $str($chr(9) $+ *,9),1,w).data)
As that actually returns the results I was looking for cool

Thank you all for thinking along.

Page 1 of 2 1 2

Link Copied to Clipboard