mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Jan 2004
Posts: 162
R
RRX Offline OP
Vogon poet
OP Offline
Vogon poet
R
Joined: Jan 2004
Posts: 162
Code:
alias testfilter {
  var %ticks = $ticks
  var %w = @ChannelList1 | window %w | clear %w
  aline -p %w Network1×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DESCRIBE×OFF
  aline -p %w Network2×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DESCRIBE×OFF
  aline -p %w Network3×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DESCRIBE×OFF
  aline -p %w Network4×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DESCRIBE×OFF
  aline -p %w Network5×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DESCRIBE×OFF
  aline -p %w Network6×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DESCRIBE×OFF
  window @TMP | clear @TMP | filter -wwc %w @TMP $makewm1(23,$+(1 *,$chr(215),15 ON,$chr(215),21 ON))
  echo -s $calc(($ticks - %ticks) / 1000)
}
alias makewm1 {
  var %t = $chr(215), %a = $asc(%t), %total = $1, %toks = $2-
  var %i = 1, %wm | while (%i < %total) { var %wm = $+(%wm,*,%t) | inc %i }
  var %wm = $+(%wm,*), %total = $numtok(%toks,%a), %i = 1 | while (%i <= %total) { tokenize 32 $gettok(%toks,%i,%a) | var %wm = $puttok(%wm,$2-,$1,%a) | inc %i }
  return %wm
}

The second alias makes a wildcard string with searchtokens on certain positions.

The example takes 39 secs.

When changing the wildcard string to
$makewm1(23,$+(1 *,$chr(215),4 ON,$chr(215),6 ON))
it takes 0.065 secs.

Don't know if this can be called a bug as such, but there is some issue with /filter

Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
Of course it's going to be slow for extremely convoluted wildcards like that. It takes a hell of a lot of processing to correctly match against a string with 23 wildcards in it. The code could probably be optimised a bit but the key issue here is that's not the best way to do what you want.

Instead of using a massive wildcard why not use /filter's -k switch and let an alias process each line. That way it'll take a fraction of the time.

Code:
alias testfilter {
  [color:red]set -u0 %_testfilter 1 *×15 ON×21 ON[/color]
  var %ticks = $ticks
  var %w = @ChannelList1 | window %w | clear %w
  aline -p %w Network1×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DESCRIBE×OFF
  aline -p %w Network2×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DESCRIBE×OFF
  aline -p %w Network3×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DESCRIBE×OFF
  aline -p %w Network4×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DESCRIBE×OFF
  aline -p %w Network5×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DESCRIBE×OFF
  aline -p %w Network6×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DESCRIBE×OFF
  window @TMP | clear @TMP | [color:red]filter -wk %w matchwm1[/color]
  echo -s $calc(($ticks - %ticks) / 1000)
}

[color:red]alias matchwm1 {
  var %i = 1
  while $gettok(%_testfilter,%i,215) {
    if ($gettok($v1,2,32) !iswm $gettok($1-,$gettok($v1,1,32),215)) return
    inc %i
  }
  aline -p @TMP $1-
}[/color]


Notice the first highlighted line. A global variable called %_testfilter just needs to be set with the same info as you were previously giving as the second parameter to $makewm1() before the actual filter is called.

This method takes well under a second.


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Jan 2004
Posts: 162
R
RRX Offline OP
Vogon poet
OP Offline
Vogon poet
R
Joined: Jan 2004
Posts: 162
Thanks for the reply.

I use this matching way all over the place in the script, and until now, without any trouble or surprising performance loss, in a 900k script, this makewm alias is called from over 100 places in every script part.

IYour way requires a second alias for every alias that does such filter. And I have to pass parameters and return them using script-duration-set variables instead of alias parameters and return values.
That is quite unacceptable, I consider it very messy for little gain.
I tried that filter -fk way in the past, and regarding performance gain, the difference wasn't noticable and surely not worth above drawbacks.

In fact, in the past I used simply a while loop on all lines, a full tokenize (slow!) or some $gettok()'s (according to how many tokens to match) and actually, I started using /filter because it was faster in doing this

You did see my second wildcard test?
It contained the same amount (23) tokens
$makewm1(23,$+(1 *,$chr(215),4 ON,$chr(215),6 ON)) and it took 0.065 secs.

It doesn't make sense.
I ran another test, with only 3 (!!!) lines to filter:
Code:
alias testfilter {
  var %w = @ChannelList1 | window %w | clear %w
  aline -p %w Network1×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DESCRIBE×OFF
  aline -p %w Network2×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DESCRIBE×OFF
  aline -p %w Network3×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DESCRIBE×OFF
  window @TMP
  var %i = 1 | while (%i <= 23) { testfilter2 $makewm1(23,%i OFF) | inc %i }
}
alias testfilter1 { var %ticks = $ticks | clear @TMP | filter -wwc @ChannelList1 @TMP $makewm1(23,$1 ON) | echo -s $1 $calc(($ticks - %ticks) / 1000) }
alias testfilter2 { 
  var %ticks = $ticks | clear @TMP
  var %total1 = $line(@ChannelList1,0), %i1 = 1
  while (%i1 <= %total1) {
    var %rec = $line(@ChannelList1,%i1), %i2 = 1, %ok = 0
    while (%i2 <= 23) {
      if ($gettok($1-,%i2,215) iswm $gettok(%rec,%i2,215)) { inc %ok }
      inc %i2
    }
    if (%ok == 23) { aline @TMP %rec }
    inc %i1
  }
  echo -s $calc(($ticks - %ticks) / 1000)
}

alias makewm1 {
  var %t = $chr(215), %a = $asc(%t), %total = $1, %toks = $2-
  var %i = 1, %wm | while (%i < %total) { var %wm = $+(%wm,*,%t) | inc %i }
  var %wm = $+(%wm,*), %total = $numtok(%toks,%a), %i = 1 | while (%i <= %total) { tokenize 32 $gettok(%toks,%i,%a) | var %wm = $puttok(%wm,$2-,$1,%a) | inc %i }
  return %wm
}

The testfilter alias now has a while loop that shifts up one token to match (22/23 are just a * wildcard) from position 1 to position 23, and calls everytime the testfilter1 or testfilter2 aliases.
I wrote alias testfilter so that it has to check all of the 23 token positions, to mimic filter in some degree.

These are the results:

tesffilter1:testfilter2

1 0.005 : 0.01
2 0.01 : 0.01
3 0.045 : 0.01
4 0.182 : 0.009
5 0.474 : 0.01
6 0.884 : 0.01
7 1.215 : 0.015
8 1.54 : 0.01
9 2.12 : 0.01
10 3.425 : 0.01
11 5.415 : 0.01
12 0.005 : 0.01
13 10.285 : 0.009
14 12.29 : 0.01
15 13.645 : 0.015
16 14.436 : 0.01
17 14.86 : 0.01
18 15.02 : 0.01
19 15.056 : 0.01
20 15.065 : 0.01
21 15.079 : 0.009
22 15.09 : 0.01
23 15.08 : 0.01

You see what happens? Filter is as speedy as the scripted way for the first two tokens and then becomes thousands times slower, there is a weird exception somewhere around the middle position, where it is suddenly back as fast as the scripted one..

The conclusion is simple:
A full scripted loop is 1000's times faster than /filter except from the first two onwards and the middlest token.

Remember this is about 3 lines to filter on one position?
15 seconds?
Matching ONE line of a @custom window needs 5 full seconds.
A SINGLE string, that is.

So I'll just abandon using filter for this. For some reason, its extremely bad for this.

Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
The real problem is with the real bad way your using * to try and match to tokens sepeatered by "×", the matching system is placed under huge stress to try and match up what you want becuase of what "*" means, it doesnt stop at the next "×" but tries matching for none or all of the remaining text.

You should have made your data matchable using SPACE as the token sperator (using "×" to replace spaces in any token if need be) this would have allowed you to match using "&" for any word match.

here is an example.
Code:
alias testfilter2 {
  var %ticks = $ticks
  var %w = @ChannelList1 | window %w | clear %w
  aline -p %w Network1 #channel1 <NOKEY> OFF OFF OFF OFF OFF OFF OFF OFF ON OFF OFF OFF OFF OFF OFF OFF OFF OFF DESCRIPTION×OF×SOMETHING OFF
  aline -p %w Network1 #channel1 <NOKEY> OFF OFF OFF OFF OFF OFF OFF OFF ON OFF OFF OFF OFF OFF OFF OFF OFF OFF DESCRIPTION×OF×SOMETHING OFF
  aline -p %w Network1 #channel1 <NOKEY> OFF OFF OFF OFF OFF OFF OFF OFF ON OFF OFF OFF OFF OFF OFF OFF OFF OFF DESCRIPTION×OF×SOMETHING OFF
  aline -p %w Network1 #channel1 <NOKEY> OFF OFF OFF OFF OFF OFF OFF OFF ON OFF OFF OFF OFF OFF OFF OFF OFF OFF DESCRIPTION×OF×SOMETHING OFF
  aline -p %w Network1 #channel1 <NOKEY> OFF OFF OFF OFF OFF OFF OFF OFF ON OFF OFF OFF OFF OFF OFF OFF OFF OFF DESCRIPTION×OF×SOMETHING OFF
  aline -p %w Network1 #channel1 <NOKEY> OFF OFF OFF OFF OFF OFF OFF OFF ON OFF OFF OFF OFF OFF OFF OFF OFF OFF DESCRIPTION×OF×SOMETHING OFF
  window @TMP | clear @TMP
  ; var -s %filter = $makewm2(23,$+(1 *,$chr(215),15 ON,$chr(215),21 OFF)) | ; aka 1 *×15 ON×21 ON
  ; ^ I dont see the purpose of setting token one to * as thats what it was to begin with anyway. So i dont do it (but if it was done speed isnt overly different)
  var %filter = $makewm2(23,$+(    $chr(215),15 ON,$chr(215),21 OFF)) | ; aka    ×15 ON×21 ON
  ///echo -s filter -wwc %w @TMP %filter
  filter -wwc %w @TMP %filter
  echo -s $calc(($ticks - %ticks) / 1000)
}
alias makewm2 {
  var %t = $chr(215), %a = $asc(%t), %total = $1, %toks = $2-
  var %i = 1, %wm = * | while (%i < %total) { var %wm = & %wm | inc %i }
  var %total = $numtok(%toks,%a), %i = 1 | while (%i <= %total) { tokenize 32 $gettok(%toks,%i,%a) | var %wm = $puttok(%wm,$2-,$1,32) | inc %i }
  return %wm
}



However I can see that trhis might be hard to do considering it would appear you have alot of script already completed.
here is another option.
Code:
alias testfilter3 {
  var %ticks = $ticks
  var %w = @ChannelList1 | window %w | clear %w
  aline -p %w Network1×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×D SCRIBE×OFF
  aline -p %w Network2×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DE CRIBE×OFF
  aline -p %w Network3×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×OFF×DES RIBE×OFF
  aline -p %w Network4×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×OFF×DESC IBE×OFF
  aline -p %w Network5×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×OFF×DESCR BE×OFF
  aline -p %w Network6×#channel1×<NOKEY>×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×OFF×ON×OFF×OFF×OFF×OFF×DESCRI E×OFF
  window @TMP | clear @TMP
  var %filter = $makewm1(23,$+(1 *,$chr(215),15 OFF,$chr(215),21 OFF)) | ; aka 1 *×15 ON×21 ON
  ///echo -s filter3 -wwc %w @TMP %filter
  filter3 -wwc %w @TMP %filter
  echo -s $calc(($ticks - %ticks) / 1000)
}
alias filter3 {
  if (-* iswm $1) { var %switches = $1, %source = $2, %destination = $3, %filter = $+(×,$4-,×) }
  else            { var %switches,      %source = $1, %destination = $2, %filter = $+(×,$3-,×) }
  var %filter = $replacex($left($right($replace(%filter,×*×,×&×,×*×,×&×),-1),-1),$chr(32),$chr(215),$chr(215),$chr(32))
  var %filter = $+($iif(($right(%filter,1) == &),$left(%filter,-1),%filter),*)
  ;
  var %tempspace = $+(@temp.,$ticks,$mid($r(1000000000,1999999999),2))
  savebuf %source %tempspace
  bread %tempspace 0 $file(%tempspace).size &binvar
  breplace &binvar 32 215 215 32
  bwrite %tempspace 0 -1 &binvar
  window -h %tempspace
  loadbuf -r %tempspace %tempspace
  filter $replace($+(-,%switches,$iif(c !isin %switches,c)),--,-) %tempspace %tempspace %filter
  savebuf %tempspace %tempspace
  if ($file(%tempspace).size) {
    bunset &binvar
    bread %tempspace 0 $file(%tempspace).size &binvar
    breplace &binvar 32 215 215 32
    bwrite %tempspace 0 -1 &binvar
    loadbuf -r %tempspace %tempspace
  }
  filter $remove($+(-ww,$iif(c isin %switches,c),$iif(p isin %switches,p)),x) %tempspace %destination
  .remove %tempspace 
  window -c %tempspace 
}


The above uses a altered filter system that does the following.

Adjusts the original filter of ex "*×*×SOMETHING×*×*" to "& & SOMETHING & *"
Creates a workspace window/file name
Saves the source window to a file
loads that file as a binvar
swaps " " for "×" and viseversa in the binvar
saves the binvar back to the file
loads the file into a temp window
filters the tempwindow into itself to get the result set
saves the temp window to a file
if the file isnt zero in size then
.. loads the file as a binvar'
.. swaps " " for "×" and viseversa in the binvar (returning it to the original)
.. saves the binvar back to the file
.. loads the file into the temp window
finally filters the temp window into destination window
and cleans up

** I dont believe this couldnt be fooled but i think it would help u out on the long delay ones **

Joined: Jan 2004
Posts: 162
R
RRX Offline OP
Vogon poet
OP Offline
Vogon poet
R
Joined: Jan 2004
Posts: 162
Thanks for the effort to figure out this kinda workaround but chosing such messy way just to save work (change a script in a later stage of progress) is something I would never do, I would change the script then, regardless amount of work.

The problem is that I don"t have the choice.

Script has, besides hashes, alot custom windows (around 25), both configuration and session, for data that needs to keep its order and fast accessible (I filter upon start disk>@window), and I really do need an identical token separator, without that I would simply have to abandon it completely, for example, if a certain script part needs to pass its parameters to another part (for example, a queue) that is used by other parts as well, using different token separators.
ChannelList does not need separator 215 because its current parameters can not have spaces, while UserData cannot use separator 32 due to the storage of channels and full name.
Such problems would rise everywhere.
A week ago I tried an idea of using a hash format systemname tokenseparator, and I quickly run into problems when aliases pass parameters between the different parts.

I will just abandon using /filter (and also $fline) when I have to filter on token positions that are too far, and use scripted loops and tokenize or $gettok() again, which in fact is like I did it in the past, until I found out filter was abit faster on average, but the most important reason was that I could decrease the script size with /filter due to the lesser conditions, loops, builtin sorting on token position, etc, since it's already 920k in 13 files with a careful chosen tradeoff readability-maintainability/size.

Now, what is relevant in this section, if I can assume, like you state, that this extreme slowness is due to the way wildcard matching works, with no way of improving it, then this is not a bug (it's just required to get the job done) and the discussion (and bug report) stops here, and I use other ways with better performance.
But, it surprises me that /filter, well known as fastest method, becomes 1000 times slower than a fully scripted approach (using tokens) due to certain wildcard string content, especially because in fact, in my tests, the wildcard string length remained the same all the time, just one string 'ON' shifted up left to right with huge impact on speed.
Starbucks_mafia said:
Quote:

Of course it's going to be slow for extremely convoluted wildcards like that. It takes a hell of a lot of processing to correctly match against a string with 23 wildcards in it.

The amount wildcards never changed. It remained the same in both fast and extremely slow results. Moving the 'ON' string to position 1,2 and 12 was fast while really slow on most others.

Last edited by RRX; 29/07/06 01:11 PM.
Joined: Sep 2003
Posts: 4,230
D
Hoopy frood
Offline
Hoopy frood
D
Joined: Sep 2003
Posts: 4,230
Quote:
Thanks for the effort to figure out this kinda workaround but chosing such messy way just to save work (change a script in a later stage of progress) is something I would never do, I would change the script then, regardless amount of work.


The second method i showed should allow a seemless adjustment, the main scripts simply call FILTER3 (call it something better maybe smile ) instead of FILTER, and you could even add code to the front of that alias to make a choice if u think a regular filter well be faster, and just call it, rather than reprocessing the soure file and filter.

IMO it is always going to be better to have created a master routine to deal with things like your filtering data in your scripts than just using a filter itself, this way should a problem arrise as it appears, you can alter the master filtering routine, once, rather than having to eal with indervidual filters scattered through the code. aka Modulization of commands to reduce debugging.

I however yeild to your greater knowledge on your own script as to the ability to intergrate what I admit is only a cludge to solve manifesting problem.

Quote:
The amount wildcards never changed. It remained the same in both fast and extremely slow results. Moving the 'ON' string to position 1,2 and 12 was fast while really slow on most others.


I thinkyou have missed the importantce of how and what the wild card must match to.
The earlier in the filter a exact MATCH/MISMATCH can be detected the faster that line can accepted/abandoned and the next moved on to.
Amagine a filter like this *-*-3-* processing a line like 1-2-3-4
This is a possable order of comparsions taking 14 steps before matching, this is based loosly on the idea that the * consumes as many as possable characters first then starts reducing leaving chracters for the remaining parts of the filter.
Code:
[color:blue]*       | - | *       | -3- | *[/color]
1-2-3-4 |   |         |     |   [color:red]failed[/color]
1-2-3-  | 4 |         |     |   [color:red]failed[/color]
1-2-3   | - | 4       |     |   [color:red]failed[/color]
1-2-    | 3 | -4      |     |   [color:red]failed[/color]
1-2     | - | 3-4     |     |   [color:red]failed[/color]
1-2     | - | 3-      | 4   |   [color:red]failed[/color]
1-2     | - | 3       | -4  |   [color:red]failed[/color]
1-2     | - |         | 3-4 |   [color:red]failed[/color]
1-      | 2 | -3-4    |     |   [color:red]failed[/color]
1       | - | 2-3-4   |     |   [color:red]failed[/color]
1       | - | 2-3-    | 4   |   [color:red]failed[/color]
1       | - | 2-3     | -4  |   [color:red]failed[/color]
1       | - | 2-      | 3-4 |   [color:red]failed[/color]
1       | - | 2       | -3- | 4 [color:blue]matched[/color]


Now this second example is based on the idea that * consumes as few as possable characters and then is added to untill a match is found. And takes just 3 comparisons.
Code:
*       | - | *       | -3- | *
        | 1 |         | -2- | 3-4 [color:red]failed[/color]
1       | - |         | 2-3 | -4  [color:red]failed[/color]
1       | - | 2       | -3- | 4   [color:blue]matched[/color]


*I dont of course know how the wildmatching is done exactly, and if my examples are close or miles off, but they show how a complexe filter with many *'s might be slow because before its validated or not there are 1000's of comparasions to have to check. And this is where i said using the & (1 word) matcher would be much quicker, this of course requieres the tokenseperators to be space however.

Joined: Jan 2004
Posts: 162
R
RRX Offline OP
Vogon poet
OP Offline
Vogon poet
R
Joined: Jan 2004
Posts: 162
Originally Posted By: DaveC

*I dont of course know how the wildmatching is done exactly, and if my examples are close or miles off, but they show how a complexe filter with many *'s might be slow because before its validated or not there are 1000's of comparasions to have to check. And this is where i said using the & (1 word) matcher would be much quicker, this of course requieres the tokenseperators to be space however.

I dont know either, but it may be due to a performance wasting code method, and that's why I reported this, because if that is the case, some thinking and recoding may resolve this issue.

Today I fixed a performance problem, and the cause was again this.
I had an alias:
Code:
alias SC_NetQueue_CountIdentDBusy {
  SC_dl SC_NetQueue_CountIdentDBusy $1-
  filter -wk @SC_NetQueue return $SC_makewm(18,18 IDENTDBUSY) | return $filtered
}

(that $SC_makewm() produces here a wildcard string *×*×*×*×*×*×*×*×*×*×*×*×*×*×*×*×*×IDENTDBUSY)
This is the output, notice the timestamps:
Quote:

20070527-102843 (Beavis) NETQUEUE: (GameSurge) Added task CONNECT - Server gigenet.il.us.gamesurge.net port 6667 (if enabled: delay 0 sec.)
20070527-102843 (Beavis) NETQUEUE: (UnderNet) Added task CONNECT - Server oslo1.no.eu.undernet.org port 6664 (if enabled: delay 0 sec.)
20070527-102844 (Beavis) NETQUEUE: (Abjects) Added task CONNECT - Server wolfpac.ca.us.abjects.net port 6667 (if enabled: delay 0 sec.)
20070527-102845 (Beavis) NETQUEUE: (DalNet) Added task CONNECT - Server soho.ix.us.dal.net port 6667 (if enabled: delay 0 sec.)
20070527-102847 (Beavis) NETQUEUE: (EFNet) Added task CONNECT - Server irc.igs.ca port 6666 (if enabled: delay 0 sec.)
20070527-102849 (Beavis) NETQUEUE: (Rizon) Added task CONNECT - Server irc.xstaticsolutions.com port 6667 (if enabled: delay 0 sec.)
20070527-102851 (Beavis) NETQUEUE: (IRCHighWay) Added task CONNECT - Server hiroes.de.eu.irchighway.net port 6667 (if enabled: delay 0 sec.)
20070527-102854 (Beavis) NETQUEUE: (SPEpisodes) Added task CONNECT - Server irc.spepisodes.com port 6667 (if enabled: delay 0 sec.)
20070527-102857 (Beavis) NETQUEUE: (SwiftIRC) Added task CONNECT - Server griffin.mo.us.swiftirc.net port 7000 (if enabled: delay 0 sec.)
20070527-102900 (Beavis) NETQUEUE: (freenode) Added task CONNECT - Server sterling.freenode.net port 6667 (if enabled: delay 0 sec.)
20070527-102904 (Beavis) NETQUEUE: (bunker7) Added task CONNECT - Server dethfyre.bunker7.net port 6667 (if enabled: delay 0 sec.)
20070527-102909 (Beavis) NETQUEUE: (ChatSpike) Added task CONNECT - Server stitch.chatspike.net port 6667 (if enabled: delay 0 sec.)
20070527-102913 (Beavis) NETQUEUE: (IRC3K) Added task CONNECT - Server sonic.irc3k.net port 6667 (if enabled: delay 0 sec.)
20070527-102918 (Beavis) NETQUEUE: (AustNet) Added task CONNECT - Server restored.il.us.austnet.org port 6667 (if enabled: delay 0 sec.)
20070527-102924 (Beavis) NETQUEUE: (Azzurra) Added task CONNECT - Server wmgitalia.azzurra.org port 6665 (if enabled: delay 0 sec.)

I changed above alias to:
Code:
alias SC_NetQueue_CountIdentDBusy {
  SC_dl SC_NetQueue_CountIdentDBusy $1-
  filter -wk @SC_NetQueue return *IDENTDBUSY) | return $filtered
}

and the output was:
Quote:

20070527-105609 (Beavis) NETQUEUE: (SwiftIRC) Added task CONNECT - Server griffin.mo.us.swiftirc.net port 7000 (if enabled: delay 0 sec.)
20070527-105609 (Beavis) NETQUEUE: (Azzurra) Added task CONNECT - Server wmgitalia.azzurra.org port 6666 (if enabled: delay 0 sec.)
20070527-105609 (Beavis) NETQUEUE: (IRC3K) Added task CONNECT - Server sonic.irc3k.net port 6667 (if enabled: delay 0 sec.)
20070527-105609 (Beavis) NETQUEUE: (Rizon) Added task CONNECT - Server irc.xstaticsolutions.com port 6667 (if enabled: delay 0 sec.)
20070527-105609 (Beavis) NETQUEUE: (AustNet) Added task CONNECT - Server restored.il.us.austnet.org port 6667 (if enabled: delay 0 sec.)
20070527-105609 (Beavis) NETQUEUE: (EFNet) Added task CONNECT - Server irc.igs.ca port 6666 (if enabled: delay 0 sec.)
20070527-105609 (Beavis) NETQUEUE: (bunker7) Added task CONNECT - Server dethfyre.bunker7.net port 6667 (if enabled: delay 0 sec.)
20070527-105609 (Beavis) NETQUEUE: (UnderNet) Added task CONNECT - Server oslo1.no.eu.undernet.org port 6664 (if enabled: delay 0 sec.)
20070527-105609 (Beavis) NETQUEUE: (freenode) Added task CONNECT - Server sterling.freenode.net port 6667 (if enabled: delay 0 sec.)
20070527-105609 (Beavis) NETQUEUE: (DalNet) Added task CONNECT - Server soho.ix.us.dal.net port 6667 (if enabled: delay 0 sec.)
20070527-105609 (Beavis) NETQUEUE: (SPEpisodes) Added task CONNECT - Server irc.spepisodes.com port 6667 (if enabled: delay 0 sec.)
20070527-105609 (Beavis) NETQUEUE: (GameSurge) Added task CONNECT - Server gigenet.il.us.gamesurge.net port 6667 (if enabled: delay 0 sec.)
20070527-105609 (Beavis) NETQUEUE: (ChatSpike) Added task CONNECT - Server stitch.chatspike.net port 6667 (if enabled: delay 0 sec.)
20070527-105609 (Beavis) NETQUEUE: (IRCHighWay) Added task CONNECT - Server hiroes.de.eu.irchighway.net port 6667 (if enabled: delay 0 sec.)
20070527-105609 (Beavis) NETQUEUE: (Abjects) Added task CONNECT - Server wolfpac.ca.us.abjects.net port 6667 (if enabled: delay 0 sec.)

(note: the network addition order is randomized)

With the first alias, the addition of these jobs (about 15 alias executions) took 41 secs, and with the second alias it all happenend in the same second.

Regarding your proposal of using a space as token separator (which is no option in my case since the tokens may contain spaces), in the end, a space is just another character choice, like my token separator (ascii code 215) or any other.

The only difference is that there is a wildcard symbol & that handles a space as separator, which apparently delivers some performance gains in the filtering method.
Why wouldnt it be possible to tell it to see another character as separator, so those performance gains can also be achieved for other than space-separated token strings. Maybe a new wildcard symbol like &<asciicodehere>.

Anyways, that last was just in case the currently used matching method cannot be made better, which is rather hard to believe, considering even a scripted loop with $gettok($line(@window,%loopcounter),18,215) also finishes in the same second.

By messing around with this, I noticed a weird thing:
If I type this on commandline:
Code:
//var %ctime = $ctime | SC_NetQueue_CountIdentDBusy | echo -s $calc($ctime - %ctime)

It takes around 5 sec (the @window has 15 lines) but it echoes 6 (6 millisec), what explains this?




Joined: Dec 2002
Posts: 2,962
S
Hoopy frood
Offline
Hoopy frood
S
Joined: Dec 2002
Posts: 2,962
I don't understand why you're so confused about why the first alias is slower than the second. Matching against over a dozen wildcards versus matching a single wildcard anchored at the end of a string is obviously going to produce vastly different results.

Your logic comparing matching wildcards versus using tokens doesn't make sense. Using tokens the algorithm knows that there is a token separator and what it is. Using wildcards it knows no such thing, all it knows is that there are given characters separated by any number of characters (including that character), so it's not at all like providing tokens. I have no idea if mIRC's internal wildcard algorithm attempts to match * greedily or ungreedily, I'd guess it's greedy and maybe if that were changed it might improve your results somewhat. But then it might also make other wildcard matches far slower. Of course this is all pure speculation on my part.

Quote:
If I type this on commandline:
Code:

//var %ctime = $ctime | SC_NetQueue_CountIdentDBusy | echo -s $calc($ctime - %ctime)



It takes around 5 sec (the @window has 15 lines) but it echoes 6 (6 millisec), what explains this?

That's not 6 milliseconds, that's six seconds. $ctime returns a value in seconds since the "UNIX epoch" (00:00:00 January 1st 1970 GMT). If you want a value in milliseconds use $ticks (although that still only has an accurate resolution of approximately 10ms).


Spelling mistakes, grammatical errors, and stupid comments are intentional.
Joined: Jan 2004
Posts: 162
R
RRX Offline OP
Vogon poet
OP Offline
Vogon poet
R
Joined: Jan 2004
Posts: 162
Originally Posted By: starbucks_mafia
I don't understand why you're so confused about why the first alias is slower than the second. Matching against over a dozen wildcards versus matching a single wildcard anchored at the end of a string is obviously going to produce vastly different results.

Your logic comparing matching wildcards versus using tokens doesn't make sense. Using tokens the algorithm knows that there is a token separator and what it is. Using wildcards it knows no such thing, all it knows is that there are given characters separated by any number of characters (including that character), so it's not at all like providing tokens. I have no idea if mIRC's internal wildcard algorithm attempts to match * greedily or ungreedily, I'd guess it's greedy and maybe if that were changed it might improve your results somewhat. But then it might also make other wildcard matches far slower. Of course this is all pure speculation on my part.

It's not the fact that the first alias is slower than the second that 'confuses' me. What -surprises- me is the difference in speed.
How does it come that a wildcard match, even with 100 * wildcard symbols and 1000 chars long, takes 6 sec to go through 15 lines?
How many cpu cycles is that on todays average cpu's?

And indeed, I used $ctime instead of $ticks and missed that completely.


Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
Believe it or not, it's normal. For example, you have a string W containing 5 * chars and you want to compare it against a 20-char string A. Each * can match from 0 to 20 chars (at most), so the number of ways W can match A is of the order of 20^5 (it's smaller, even in the worst case, but still grows exponentially with the number of *'s). In the worst case, mirc will have to do several thousands char-by-char comparisons to determine if those innocent-looking strings W and A match! So while it's possible that the wildcard matching routine could be optimised a bit, don't expect any dramatic improvements in general; in certain cases, early termination of the matching process is possible but this cannot be generalised to apply to every case.

Last edited by qwerty; 27/05/07 06:39 PM.

/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Jan 2004
Posts: 162
R
RRX Offline OP
Vogon poet
OP Offline
Vogon poet
R
Joined: Jan 2004
Posts: 162
Originally Posted By: qwerty
Believe it or not, it's normal. For example, you have a string W containing 5 * chars and you want to compare it against a 20-char string A. Each * can match from 0 to 20 chars (at most), so the number of ways W can match A is of the order of 20^5 (it's smaller, even in the worst case, but still grows exponentially with the number of *'s). In the worst case, mirc will have to do several thousands char-by-char comparisons to determine if those innocent-looking strings W and A match! So while it's possible that the wildcard matching routine could be optimised a bit, don't expect any dramatic improvements in general; in certain cases, early termination of the matching process is possible but this cannot be generalised to apply to every case.

I can't believe it and I do not understand your 20^5 exponential calculation.
Take my case.
-> wildcard search string:
*×*×*×*×*×*×*×*×*×*×*×*×*×*×*×*×*×IDENTDBUSY
-> text string:
99×QUEUE×DalNet×riga-r.ca.us.dal.net×6668×1×IRCConnect×1×ijhsvlrz×1180326780×67×2×0×CONNECT×<unknown>×<none>×DNSFAIL×IDENTDNOTBUSY

Search routine loops through characters of wildcard string.
1) search string position 1: "*" -> wildcard symbol for any char so move on to next
2) search string position 2: "×" -> loop through text string from position 1 till character "×" found, store the position of the character after this character (pointer).
3) search string position 3: "*" -> wildcard symbol for any char so move on to next
4) search string position 4: "×" -> loop through text from previous position onwards till character "×" found, update the pointer to the position after this character.

Seach result 'false' (discarding moment) happens when the current no-wildcard symbol * being character is not found in the text string on the position of the pointer.

The outer loop (wildcard search text) does in my example case 44 iterations (the length of it), and every one of them requires a single character check on the pointer position.
I can't see any exponential component in this method.
The amount character comparisons is just the length of the wildcard search string minus the * wildcard symbols in it, i.e. the total characters that require a literal match.

I could be overlooking something of course.
I saw /filter (and also $fline() and iswm) needing 5 sec for a single line in a custom window, whereas the line had 23 tokens so the wildcard search text had 22 * wildcard symbols (see earlier in this thread).

Try for example this:
Code:
alias testiswm1 {
  var %ticks = $ticks, %i = 1
  while (%i <= 15) {
    if (*×*×*×*×*×*×*×*×*×*×*×*×*×*×*×*×*×IDENTDBUSY !iswm 99×QUEUE×DalNet×riga-r.ca.us.dal.net×6668×1×IRCConnect×1×ijhsvlrz×1180326780×67×2×0×CONNECT×<unknown>×<none>×DNSFAIL×IDENTDNOTBUSY) { echo -ag !ISWM %i }
    inc %i
  }
  echo -ag $calc($ticks - %ticks)
}
alias testiswm2 {
  var %ticks = $ticks, %i = 1
  while (%i <= 15) {
    if (*×*×*×*×*×*×*×*×*×*×*×*×*×*×*×*×*×IDENTDBUSY iswm 99×QUEUE×DalNet×riga-r.ca.us.dal.net×6668×1×IRCConnect×1×ijhsvlrz×1180326780×67×2×0×CONNECT×<unknown>×<none>×DNSFAIL×IDENTDBUSY) { echo -ag ISWM %i }
    inc %i
  }
  echo -ag $calc($ticks - %ticks)
}

The first takes 5436 ms, the second 2 ms.
The only difference is that in the first alias, the last token is NOT matched, while it is matched in the second alias, the fast (or should I say 'normal'?) one - which is also remarkable, because you only know if there is a match after checking ALL characters, while a check for a non-match may be able to discard at an earlier position, so less characters compared.

Doesn't that blow up the entire theory of alot wildcard symbols as reason?

Last edited by RRX; 28/05/07 06:30 AM.
Joined: Jul 2003
Posts: 655
Fjord artisan
Offline
Fjord artisan
Joined: Jul 2003
Posts: 655
i may be wrong, but if i remember correctly wildcard matching is greedy and possible backwards-processing.

Use of a good regex will improve the speed significantly... (based on your aliases)
testiswm1 = 4109
testregex1 = 922
testregex2 = 797
When actually implemented correcting into your makewm to compile a proper regex pattern there will be much less wildcard tokens in the regex so it should improve more than the above. You should make sure you use the ^ and $ boundaries in the regex also otherwise it will become much slower, proper use of greey/ungreedy matching with regex will improve it also.

EDIT :: ok so i looked at your original post a little more closely, i guess regex isnt the best solution since it cant be used in filter. i personally would probably keep full list in hidden window and iterate each line through a compiled regex and writing matches to visible window or something similar.

Last edited by Om3n; 28/05/07 08:00 AM.

"Allen is having a small problem and needs help adjusting his attitude" - Flutterby
Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
Quote:
Search routine loops through characters of wildcard string.
1) search string position 1: "*" -> wildcard symbol for any char so move on to next
2) search string position 2: "×" -> loop through text string from position 1 till character "×" found, store the position of the character after this character (pointer).
3) search string position 3: "*" -> wildcard symbol for any char so move on to next
4) search string position 4: "×" -> loop through text from previous position onwards till character "×" found, update the pointer to the position after this character.

Seach result 'false' (discarding moment) happens when the current no-wildcard symbol * being character is not found in the text string on the position of the pointer.

The key idea you're missing here is backtracking; when a particular way of matching each * with a substring fails, a different combination is tried, and the number of such combinations is exponential to the number of *'s.

Quote:
The first takes 5436 ms, the second 2 ms.
The only difference is that in the first alias, the last token is NOT matched, while it is matched in the second alias, the fast (or should I say 'normal'?) one - which is also remarkable, because you only know if there is a match after checking ALL characters, while a check for a non-match may be able to discard at an earlier position, so less characters compared.

Doesn't that blow up the entire theory of alot wildcard symbols as reason?
Not at all, the problem mostly occurs when the pattern does not match the string in any way, so that the routine has to try all possible "assignments" of *'s.

You should probably Google these issues, as I suspect there's not much I could say to convince you. The problem is central to regular expressions implementations too (pcre.txt, section "ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS" describes a similar case). An analogous situation in regex is this:

//echo -ag $regex($str(a,39),/^.*.*.*.*.*.*\d$/)

\d matches a digit (0 to 9), so this pattern will never match a string of a's. Even a single call to the above takes a noticeably long time to complete. Removing a ".*" makes it much faster; adding a ".*" makes it return -8, which is the error code meaning "too many attempts to match". Regular expressions in mirc are supported through a 3rd-party library, PCRE (my point being, different author). PCRE contains an optimisation, ie if you change \d to a literal, say b, $regex fails very quickly. This is the sort of 'early termination' checks I mentioned in my previous post. Khaled could do something like that in his wildcard matching routine, but of course that would only cover a subset of problem cases.

In my opinion, it is the user who should come up with a 'better' pattern (or an entirely different method altogether), not the programmer who should come up with a better algorithm.


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Jan 2004
Posts: 162
R
RRX Offline OP
Vogon poet
OP Offline
Vogon poet
R
Joined: Jan 2004
Posts: 162
Originally Posted By: qwerty

The key idea you're missing here is backtracking; when a particular way of matching each * with a substring fails, a different combination is tried, and the number of such combinations is exponential to the number of *'s.

Not at all, the problem mostly occurs when the pattern does not match the string in any way, so that the routine has to try all possible "assignments" of *'s.

You should probably Google these issues, as I suspect there's not much I could say to convince you. The problem is central to regular expressions implementations too (pcre.txt, section "ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS" describes a similar case). An analogous situation in regex is this:

//echo -ag $regex($str(a,39),/^.*.*.*.*.*.*\d$/)

\d matches a digit (0 to 9), so this pattern will never match a string of a's. Even a single call to the above takes a noticeably long time to complete. Removing a ".*" makes it much faster; adding a ".*" makes it return -8, which is the error code meaning "too many attempts to match". Regular expressions in mirc are supported through a 3rd-party library, PCRE (my point being, different author). PCRE contains an optimisation, ie if you change \d to a literal, say b, $regex fails very quickly. This is the sort of 'early termination' checks I mentioned in my previous post. Khaled could do something like that in his wildcard matching routine, but of course that would only cover a subset of problem cases.

I'm unsure why you now talk about regular expressions in a case where only simple wildcards symbols are used, actually only 1, of the 3 existing,a "*".
The only symbols that the parser gives information how to handle it are those wildcards, so how does that 'backtracking' apply then? To use your words: why does it "try a different combination"? And what does that "combination" combine?
At any stage in the method I described, there is a pointer in the text to match (in my case what is right of the iswm operator), everything before the pointer is 'matched', so why would it need to re-use text that is before that pointer position? Let alone so many times that mIRC needs 5 sec to know a 44 characters wide wildcard text does not match a literal text?

Last edited by RRX; 28/05/07 10:06 AM.
Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
I mentioned regular expressions because a similar problem arises there (in regex, ".*" is the equivalent of the wildcard character "*"). I hoped you'd understand that performance issues associated with multiple occurrences of variable-length subpatterns (* in wildcards, .* in regex) are universal.


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Dec 2002
Posts: 5,411
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,411
The wildcard routine in mIRC is greedy and performs backtracking. In the case of a non-match, it will recursively backtrack and try every possible permutation exhaustively.

The question is whether mIRC needs to perform backtracking in a wildcard on plaintext matching. I'll experiment a little to see what I can come up with.

Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
Quote:
At any stage in the method I described, there is a pointer in the text to match (in my case what is right of the iswm operator), everything before the pointer is 'matched', so why would it need to re-use text that is before that pointer position? Let alone so many times that mIRC needs 5 sec to know a 44 characters wide wildcard text does not match a literal text?
Since we now know that the algorithm is greedy, we know it *has* to reuse text before the pointer, ie it has to backtrack (I hope that much is clear by now).

An interesting question however is: could your non-greedy idea work in general? Consider the case where
pattern = *X*X
input = AXBXCZ
According to the algorithm you described, the first * would match A (the substring until the first X in input). Then the second * would match B (the substring from the previous position to the next X). So we've gone through the entire pattern without encountering any mismatches. Does that mean we can return with "success"? Clearly, *X*X doesn't match AXBXCZ.

However, there's a way around this: check whether the last *-delimited token in the pattern is a suffix of the input, which is essentially what I said before about 'early termination', PCRE etc. A similar trick could be applied for the first *-delimited token: mirc could check if it's a prefix of the input. So by changing the algorithm to
1) be non-greedy
2) perform prefix/suffix checks (as above)
its running time would greatly improve.

Unless we're both missing a special case where all this would fail, this may be a good idea indeed. smile


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
Originally Posted By: qwerty
An interesting question however is: could your non-greedy idea work in general? Consider the case where
pattern = *X*X
input = AXBXCZ
According to the algorithm you described, the first * would match A (the substring until the first X in input). Then the second * would match B (the substring from the previous position to the next X). So we've gone through the entire pattern without encountering any mismatches. Does that mean we can return with "success"? Clearly, *X*X doesn't match AXBXCZ.


I don't think that's the best example as I think his idea would be valid in that it would fail because there is more after the *X*X match unless a * follows that: *X*X*. Using his idea for that example, it would still seem to work properly.

However, here's another example where it would fail.
pattern = *X*X
search = AXBXCX

In this example, his idea is to use * until the first X match, in which case it would be just the A. The second star would match up until the next X, which would be after the B. Because it doesn't have a * after the last X, it would fail because there's more in the search after AXBX. Obviously, it *should* match. Of course, this needs to check in multiple ways (forward and backward)... If the first * included AXB instead of just A, it would match. Or, if the first * was A and the second star included BXC instead of just B, it would also match. That example should help to show why RRX's suggestion won't work.


Invision Support
#Invision on irc.irchighway.net
Joined: Jan 2004
Posts: 162
R
RRX Offline OP
Vogon poet
OP Offline
Vogon poet
R
Joined: Jan 2004
Posts: 162
Originally Posted By: qwerty

An interesting question however is: could your non-greedy idea work in general? Consider the case where
pattern = *X*X
input = AXBXCZ
According to the algorithm you described, the first * would match A (the substring until the first X in input). Then the second * would match B (the substring from the previous position to the next X). So we've gone through the entire pattern without encountering any mismatches. Does that mean we can return with "success"? Clearly, *X*X doesn't match AXBXCZ.

Yes it went through the wildcard search text but the pointer in the text that should match didn't reach the end so that indicates then no match.
A solution for a * at the start, which is indeed something overlooked, might be chosing the loop direction in the wildcard search string.
- If it starts with a *, loop from the end till the beginning and decrease the text pointer of course.
- If it ends with a *, loop from beginning till end and increase, as in the original example.
- If both beginning and ending *, perform both.

Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
Quote:
I don't think that's the best example as I think his idea would be valid in that it would fail because there is more after the *X*X match unless a * follows that: *X*X*.
This is exactly why I said the stuff about prefix/suffix checks. In this example, *X*X succeeds. To determine if this is right, check whether the substring after the last * in the pattern is a suffix of the text. If it isn't, return 'failure', otherwise 'success'.

I implemented all of this and so far it seems to work correctly. (I've tested it with a lot of pattern/text pairs. I'll run some more tests to make sure but it looks good.)

There are other approaches too (for example the routine used in Saturn's io.dll, which I saw just today) that avoid trying all possible combinations (ie recursive backtracking). Any of these can certainly be used in mirc, so RRX was right, his example string doesn't have to be slow.


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com

Link Copied to Clipboard