|
Joined: Sep 2007
Posts: 202
Fjord artisan
|
OP
Fjord artisan
Joined: Sep 2007
Posts: 202 |
If I originally have:
if (Sometext isin %var) { goto section }
and I wanted to convert it to regexp would this be correct (I know how to create the regexp pattern just want to be sure on the correct way to use it in a script):
if ($regex(/^Sometext*/i) == 1) && ($regml(1) isin %var) { goto section }
|
|
|
|
Joined: Sep 2005
Posts: 2,881
Hoopy frood
|
Hoopy frood
Joined: Sep 2005
Posts: 2,881 |
if ($regex(%var,/^Sometext/i) == 1) { goto section }
|
|
|
|
Joined: Oct 2003
Posts: 3,918
Hoopy frood
|
Hoopy frood
Joined: Oct 2003
Posts: 3,918 |
no...
$regex takes 2 parameters:
$regex(%var, /sometext/)
returns nonzero if the regex matches against %var, and 0 otherwise
No need to look at $regml if you dont need to pull data out of the %var
- argv[0] on EFnet #mIRC - "Life is a pointer to an integer without a cast"
|
|
|
|
Joined: Sep 2007
Posts: 202
Fjord artisan
|
OP
Fjord artisan
Joined: Sep 2007
Posts: 202 |
|
|
|
|
Joined: Sep 2007
Posts: 202
Fjord artisan
|
OP
Fjord artisan
Joined: Sep 2007
Posts: 202 |
followup question
is this cpu/ram intensive if say I have 100 regexp checks like
if ($regex(%var,/^Sometext/i) == 1) { goto section } if ($regex(%var,/^Sometext/i) == 1) { goto section } if ($regex(%var,/^Sometext/i) == 1) { goto section } if ($regex(%var,/^Sometext/i) == 1) { goto section } if ($regex(%var,/^Sometext/i) == 1) { goto section } if ($regex(%var,/^Sometext/i) == 1) { goto section } if ($regex(%var,/^Sometext/i) == 1) { goto section } if ($regex(%var,/^Sometext/i) == 1) { goto section } if ($regex(%var,/^Sometext/i) == 1) { goto section } if ($regex(%var,/^Sometext/i) == 1) { goto section } if ($regex(%var,/^Sometext/i) == 1) { goto section } etc.
|
|
|
|
Joined: Sep 2005
Posts: 2,881
Hoopy frood
|
Hoopy frood
Joined: Sep 2005
Posts: 2,881 |
Depends on what %var contains and the regexes in question..
|
|
|
|
Joined: Oct 2004
Posts: 8,330
Hoopy frood
|
Hoopy frood
Joined: Oct 2004
Posts: 8,330 |
And unless you absolutely need to use GOTO, you should avoid it.
Invision Support #Invision on irc.irchighway.net
|
|
|
|
Joined: Sep 2007
Posts: 202
Fjord artisan
|
OP
Fjord artisan
Joined: Sep 2007
Posts: 202 |
Depends on what %var contains and the regexes in question.. say if var is on average 30 chars and the regex is average 15 And unless you absolutely need to use GOTO, you should avoid it. why is that?
|
|
|
|
Joined: Oct 2003
Posts: 3,918
Hoopy frood
|
Hoopy frood
Joined: Oct 2003
Posts: 3,918 |
gotos usually imply bad code, and while a usage like that *could* be justified, that specific example is not.. instead:
var %v = Sometext|someothertext|blah|blah|blah|blah
if ($regex(%var,/^( $+ %v $+ )/i)) {
do your thing.
}
Would be a much simpler and probably less cpu intensive solution
- argv[0] on EFnet #mIRC - "Life is a pointer to an integer without a cast"
|
|
|
|
Joined: Sep 2005
Posts: 2,881
Hoopy frood
|
Hoopy frood
Joined: Sep 2005
Posts: 2,881 |
Again, depends on the string and the regex. Some regexes are very fast to match but others are very slow, it's not the size that determines this.
|
|
|
|
Joined: Oct 2003
Posts: 3,918
Hoopy frood
|
Hoopy frood
Joined: Oct 2003
Posts: 3,918 |
Not really. The comparison here is one regex vs. many, so we can assume the actual lookup of each individual regex will correlate with a combined regex, ex:
m/match1/, m/match2/ and m/match3/
vs.
m/match1|match2|match3/
The lookup times for each individual regex in the first list should sum up to *about* the lookup time for the final match. The only potential difference would be in the efficiency of the final combined DFA algorithm used by the regex library. However efficiency is unlikely to be the limiting factor, as most regex libraries are highly efficient at combining DFA's.
The important point to note, however, is that this does not consider the compile time cost overhead for compiling a regular expression into the DFA. It's likely that the overhead of compiling three separate expressions will outweigh that of compiling one, since in this case its complexity is only the summation of the individual expression complexities (plus any delta in the efficiency as noted above). This also does not include *significant* overhead in the mIRC parser itself to actually parse out three $regex identifiers and send it through the interpreter.
I would purport that for any %s, the overhead of "$regex(%s, /a/) $regex(%s, /b/) $regex(%s, /c/) ..." will likely overshadow that of $regex(%s, /a|b|c|.../) and eventually incur a significantly larger execution time for some N $regex's, regardless of the matches or string values in question. The "N" value itself, would depend on the complexity of the matches used, but not the string value %s. If all the matches are string literals (like a|b|c|...), the N value should be very low.
- argv[0] on EFnet #mIRC - "Life is a pointer to an integer without a cast"
|
|
|
|
Joined: Sep 2005
Posts: 2,881
Hoopy frood
|
Hoopy frood
Joined: Sep 2005
Posts: 2,881 |
This may be a fairly unique occurrence, but here is a regex which takes about a second to fail on my machine, straight out of man.txt: //echo -a $regex(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,/(\D+|<\d+>)*[!?]/) Change it to a possessive quantifier and it fails immediately: //echo -a $regex(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,/(\D++|<\d+>)*[!?]/) Just one example of how a seemingly very similar regex can take a lot longer to compute
|
|
|
|
Joined: Oct 2003
Posts: 3,918
Hoopy frood
|
Hoopy frood
Joined: Oct 2003
Posts: 3,918 |
Right, but that's not relevant, because the script would still execute both matches (especially if they fail) to be comparable to the code posted by the OP. According to the example posted above, this would be the equivalent code:
var %x = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
if ($regex(%x, /(\D+|<\d+>)*[!?]/)) goto end
if ($regex(%x, /(\D++|<\d+>)*[!?]/)) goto end
echo -a fail.
:end
Notice that you're still executing the longer-to-process regex, but you're also compiling each regex separately which is far less efficient because of the overhead to actually compile the regex. My point is that it will probably be faster to compile and process only one regex where the sub regexes in question are relatively similar in complexity, ex:
var %x = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
if ($regex(%x, /(\D++|\D+|<\d+>)*[!?]/)) goto end
echo -a fail.
:end
I bet if benchmarked, the second code block would win. And that probably holds true for any N $regex's assuming the complexity of each expression is similar.
- argv[0] on EFnet #mIRC - "Life is a pointer to an integer without a cast"
|
|
|
|
Joined: Sep 2005
Posts: 2,881
Hoopy frood
|
Hoopy frood
Joined: Sep 2005
Posts: 2,881 |
I was only replying to his "say if var is on average 30 chars and the regex is average 15" comment. I was only saying that it's not black and white like that, you cannot say if a regex will be CPU/ram intensive without knowing what the regex is. A short regex isn't necessarily efficient and a long one isn't necessarily inefficient.
You seem to have moved onto another debate.
|
|
|
|
Joined: Sep 2007
Posts: 202
Fjord artisan
|
OP
Fjord artisan
Joined: Sep 2007
Posts: 202 |
I am afraid what argv0 is talking about now is beyond me
Thank you all for your help
|
|
|
|
|