mIRC Home    About    Download    Register    News    Help

Print Thread
#160160 23/09/06 09:48 PM
Joined: Feb 2006
Posts: 546
J
jaytea Offline OP
Fjord artisan
OP Offline
Fjord artisan
J
Joined: Feb 2006
Posts: 546
there seems to be a new limitation on recursive matching (?N) in mirc 6.2:

Code:
//echo -ag $regex(216.223.2.249,/^(?:(1?\d?\d|2[0-4]\d|25[0-5])\.){3}[color:red](?1)[/color]$/)


match fails due to that (?1) there but it works in mirc 6.17 smirk to help diagnose the problem:

[17:42:46] (@Msmo) I can only guess why it's happening
[17:43:05] (@Msmo) I'd say it's something like first partial match in recursion becomes atomic
[17:43:26] (@Msmo) 2[0-4]\d has a partial match in 250 (the 2 matches the 2)

edit:

[17:49:12] (@Msmo) ok, looks like they made (?1) atomic
[17:49:21] (@Msmo) that's it

perhaps an intentional change to make recursive matching less intensive? if so, i don't suppose there's any way you could reconsider changing it back to the way it was, or at least document this change in case others wonder about it :P


"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde
Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
mirc uses the PCRE library for its regex identifiers, so it has no control over this. This is indeed the source of the problem, all recursive items ((?1) and (?R)) are atomic, as the PCRE changelog for version 6.5 (01-Feb-06) states:
Quote:
3. A nasty bug was discovered in the handling of recursive patterns, that is,
items such as (?R) or (?1), when the recursion could match a number of
alternatives. If it matched one of the alternatives, but subsequently,
outside the recursion, there was a failure, the code tried to back up into
the recursion. However, because of the way PCRE is implemented, this is not
possible, and the result was an incorrect result from the match.

In order to prevent this happening, the specification of recursion has
been changed so that all such subpatterns are automatically treated as
atomic groups. Thus, for example, (?R) is treated as if it were (?>(?R)).

This also applies when (?1) is used non-recursively, ie as a subroutine, according to the PCRE manual:
Quote:
Like recursive subpatterns, a "subroutine" call is always treated as an
atomic group. That is, once it has matched some of the subject string,
it is never re-entered, even if it contains untried alternatives and
there is a subsequent matching failure.
I agree that it's inconvenient, if not crippling.


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Feb 2006
Posts: 546
J
jaytea Offline OP
Fjord artisan
OP Offline
Fjord artisan
J
Joined: Feb 2006
Posts: 546
oh haha, thanks! should've figured, i need to start following those updates laugh


"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde
Joined: Oct 2005
Posts: 1,741
G
Hoopy frood
Offline
Hoopy frood
G
Joined: Oct 2005
Posts: 1,741
I can't say that I know why its doing that, but here is what my testing has shown me.

I modified the regex slightly so that I could see what was being matched:

//echo -ag $regex(a,xxx.xxx.xxx.xxx,/^((?:(1?\d?\d|2[0-4]\d|25[0-5])\.){3}(?2)$)/) - $regml(a,1) : $regml(a,2)

If you take out the last $ you can see what the regex is trying to match, and you can see why it fails.

I used these IPs (with the $ removed:
121.122.123.1 displays 1 - 121.122.123.1 : 123
121.122.123.12 displays 1 - 121.122.123.12 : 123
121.122.123.124 displays 1 - 121.122.123.124 : 123
121.122.123.234 displays 1 - 121.122.123.23 : 123
121.122.123.254 displays 1 - 121.122.123.25 : 123

It looks like the 1? at the beginning is what is messing it up. If you take out the ? The above examples will work (with the $ in place). Obviously that will break the ability to match any number under 100, but it demonstrates the problem. I don't know much about atomic grouping, but from what I read, it is possible that (?n) have been made atomic.

The easiest solution I found for that specific regex situation is to rearrange the alternation order. The following code should work with all IPs.

//echo -ag $regex(a,xxx.xxx.xxx.xxx,/^((?:(2[0-4]\d|25[0-5]|1?\d?\d)\.){3}(?2)$)/) - $regml(a,1) : $regml(a,2)

(remove the extra brackets)

//echo -ag $regex(a,xxx.xxx.xxx.xxx,/^(?:(2[0-4]\d|25[0-5]|1?\d?\d)\.){3}(?1)$/)

Edit: obviously the above posts have pinpointed the reason for this change.

-genius_at_work

Last edited by genius_at_work; 23/09/06 11:21 PM.
Joined: Sep 2003
Posts: 261
S
Fjord artisan
Offline
Fjord artisan
S
Joined: Sep 2003
Posts: 261
I've always used g to match recursivly.


We don't just write the scripts, we put them to the test! (ScriptBusters)
Joined: Feb 2004
Posts: 2,019
Hoopy frood
Offline
Hoopy frood
Joined: Feb 2004
Posts: 2,019
Very strange, someone asked me today on IRC about this exact same problem with also IP matching regex code, and I also proposed to him to change the order of the character classes (what g_at_work suggested), now tonight I see this thread, kind of freaky. Do you know this guy Jensen on Dalnet or something?!

Btw the 1?\d?\d will allow for something like 04.05.06.07 are you okay with that? I don't actually know what's really a valid IP address regarding zero padding of digits.


Gone.
Joined: Feb 2006
Posts: 546
J
jaytea Offline OP
Fjord artisan
OP Offline
Fjord artisan
J
Joined: Feb 2006
Posts: 546
ya thats the obvious workaround here, too bad there doesn't exist a more general solution ;>

lol foptics, no i don't know that guy, that is quite a strange coincidence!

Scorpwanna, that has nothing to do with what we're discussing here :P

Last edited by jaytea; 24/09/06 01:28 AM.

"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde
Joined: Mar 2004
Posts: 210
F
Fjord artisan
Offline
Fjord artisan
F
Joined: Mar 2004
Posts: 210
Quote:
Btw the 1?\d?\d will allow for something like 04.05.06.07 are you okay with that? I don't actually know what's really a valid IP address regarding zero padding of digits.


That's only a human readable representation of a long integer. I doubt that you'll actually get zero padding, unless the IP is typed by hand. (Arithmetic doesn't have a concept of "zero padding".)


Link Copied to Clipboard