mIRC Home    About    Download    Register    News    Help

Print Thread
#196932 25/03/08 11:06 PM
Joined: Sep 2007
Posts: 202
F
firefox Offline OP
Fjord artisan
OP Offline
Fjord artisan
F
Joined: Sep 2007
Posts: 202
If I originally have:

if (Sometext isin %var) { goto section }

and I wanted to convert it to regexp would this be correct (I know how to create the regexp pattern just want to be sure on the correct way to use it in a script):

if ($regex(/^Sometext*/i) == 1) && ($regml(1) isin %var) { goto section }

Joined: Sep 2005
Posts: 2,881
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Sep 2005
Posts: 2,881
Code:
if ($regex(%var,/^Sometext/i) == 1) { goto section } 

Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
no...

$regex takes 2 parameters:

$regex(%var, /sometext/)

returns nonzero if the regex matches against %var, and 0 otherwise

No need to look at $regml if you dont need to pull data out of the %var


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: Sep 2007
Posts: 202
F
firefox Offline OP
Fjord artisan
OP Offline
Fjord artisan
F
Joined: Sep 2007
Posts: 202
OK I get it now thanks

Joined: Sep 2007
Posts: 202
F
firefox Offline OP
Fjord artisan
OP Offline
Fjord artisan
F
Joined: Sep 2007
Posts: 202
followup question

is this cpu/ram intensive if say I have 100 regexp checks like

if ($regex(%var,/^Sometext/i) == 1) { goto section }
if ($regex(%var,/^Sometext/i) == 1) { goto section }
if ($regex(%var,/^Sometext/i) == 1) { goto section }
if ($regex(%var,/^Sometext/i) == 1) { goto section }
if ($regex(%var,/^Sometext/i) == 1) { goto section }
if ($regex(%var,/^Sometext/i) == 1) { goto section }
if ($regex(%var,/^Sometext/i) == 1) { goto section }
if ($regex(%var,/^Sometext/i) == 1) { goto section }
if ($regex(%var,/^Sometext/i) == 1) { goto section }
if ($regex(%var,/^Sometext/i) == 1) { goto section }
if ($regex(%var,/^Sometext/i) == 1) { goto section }
etc.


Joined: Sep 2005
Posts: 2,881
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Sep 2005
Posts: 2,881
Depends on what %var contains and the regexes in question..

Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
And unless you absolutely need to use GOTO, you should avoid it.


Invision Support
#Invision on irc.irchighway.net
Joined: Sep 2007
Posts: 202
F
firefox Offline OP
Fjord artisan
OP Offline
Fjord artisan
F
Joined: Sep 2007
Posts: 202
Originally Posted By: hixxy
Depends on what %var contains and the regexes in question..
say if var is on average 30 chars and the regex is average 15

Originally Posted By: Riamus2
And unless you absolutely need to use GOTO, you should avoid it.
why is that?

Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
gotos usually imply bad code, and while a usage like that *could* be justified, that specific example is not.. instead:

Code:
var %v = Sometext|someothertext|blah|blah|blah|blah
if ($regex(%var,/^( $+ %v $+ )/i)) { 
  do your thing.
} 


Would be a much simpler and probably less cpu intensive solution


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: Sep 2005
Posts: 2,881
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Sep 2005
Posts: 2,881
Again, depends on the string and the regex. Some regexes are very fast to match but others are very slow, it's not the size that determines this.

Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
Not really. The comparison here is one regex vs. many, so we can assume the actual lookup of each individual regex will correlate with a combined regex, ex:

m/match1/, m/match2/ and m/match3/

vs.

m/match1|match2|match3/

The lookup times for each individual regex in the first list should sum up to *about* the lookup time for the final match. The only potential difference would be in the efficiency of the final combined DFA algorithm used by the regex library. However efficiency is unlikely to be the limiting factor, as most regex libraries are highly efficient at combining DFA's.

The important point to note, however, is that this does not consider the compile time cost overhead for compiling a regular expression into the DFA. It's likely that the overhead of compiling three separate expressions will outweigh that of compiling one, since in this case its complexity is only the summation of the individual expression complexities (plus any delta in the efficiency as noted above). This also does not include *significant* overhead in the mIRC parser itself to actually parse out three $regex identifiers and send it through the interpreter.

I would purport that for any %s, the overhead of "$regex(%s, /a/) $regex(%s, /b/) $regex(%s, /c/) ..." will likely overshadow that of $regex(%s, /a|b|c|.../) and eventually incur a significantly larger execution time for some N $regex's, regardless of the matches or string values in question. The "N" value itself, would depend on the complexity of the matches used, but not the string value %s. If all the matches are string literals (like a|b|c|...), the N value should be very low.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: Sep 2005
Posts: 2,881
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Sep 2005
Posts: 2,881
This may be a fairly unique occurrence, but here is a regex which takes about a second to fail on my machine, straight out of man.txt:

Code:
//echo -a $regex(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,/(\D+|<\d+>)*[!?]/)


Change it to a possessive quantifier and it fails immediately:

Code:
//echo -a $regex(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,/(\D++|<\d+>)*[!?]/)


Just one example of how a seemingly very similar regex can take a lot longer to compute smile

Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
Right, but that's not relevant, because the script would still execute both matches (especially if they fail) to be comparable to the code posted by the OP. According to the example posted above, this would be the equivalent code:

Code:
var %x = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
if ($regex(%x, /(\D+|<\d+>)*[!?]/)) goto end
if ($regex(%x, /(\D++|<\d+>)*[!?]/)) goto end
echo -a fail.
:end


Notice that you're still executing the longer-to-process regex, but you're also compiling each regex separately which is far less efficient because of the overhead to actually compile the regex.

My point is that it will probably be faster to compile and process only one regex where the sub regexes in question are relatively similar in complexity, ex:

Code:
var %x = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
if ($regex(%x, /(\D++|\D+|<\d+>)*[!?]/)) goto end
echo -a fail.
:end


I bet if benchmarked, the second code block would win. And that probably holds true for any N $regex's assuming the complexity of each expression is similar.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: Sep 2005
Posts: 2,881
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Sep 2005
Posts: 2,881
I was only replying to his "say if var is on average 30 chars and the regex is average 15" comment. I was only saying that it's not black and white like that, you cannot say if a regex will be CPU/ram intensive without knowing what the regex is. A short regex isn't necessarily efficient and a long one isn't necessarily inefficient.

You seem to have moved onto another debate.

Joined: Sep 2007
Posts: 202
F
firefox Offline OP
Fjord artisan
OP Offline
Fjord artisan
F
Joined: Sep 2007
Posts: 202
I am afraid what argv0 is talking about now is beyond me

Thank you all for your help


Link Copied to Clipboard