mIRC Homepage
Posted By: jaytea recursive matching in regex - 23/09/06 09:48 PM
there seems to be a new limitation on recursive matching (?N) in mirc 6.2:

Code:
//echo -ag $regex(216.223.2.249,/^(?:(1?\d?\d|2[0-4]\d|25[0-5])\.){3}[color:red](?1)[/color]$/)


match fails due to that (?1) there but it works in mirc 6.17 smirk to help diagnose the problem:

[17:42:46] (@Msmo) I can only guess why it's happening
[17:43:05] (@Msmo) I'd say it's something like first partial match in recursion becomes atomic
[17:43:26] (@Msmo) 2[0-4]\d has a partial match in 250 (the 2 matches the 2)

edit:

[17:49:12] (@Msmo) ok, looks like they made (?1) atomic
[17:49:21] (@Msmo) that's it

perhaps an intentional change to make recursive matching less intensive? if so, i don't suppose there's any way you could reconsider changing it back to the way it was, or at least document this change in case others wonder about it :P
Posted By: qwerty Re: recursive matching in regex - 23/09/06 10:58 PM
mirc uses the PCRE library for its regex identifiers, so it has no control over this. This is indeed the source of the problem, all recursive items ((?1) and (?R)) are atomic, as the PCRE changelog for version 6.5 (01-Feb-06) states:
Quote:
3. A nasty bug was discovered in the handling of recursive patterns, that is,
items such as (?R) or (?1), when the recursion could match a number of
alternatives. If it matched one of the alternatives, but subsequently,
outside the recursion, there was a failure, the code tried to back up into
the recursion. However, because of the way PCRE is implemented, this is not
possible, and the result was an incorrect result from the match.

In order to prevent this happening, the specification of recursion has
been changed so that all such subpatterns are automatically treated as
atomic groups. Thus, for example, (?R) is treated as if it were (?>(?R)).

This also applies when (?1) is used non-recursively, ie as a subroutine, according to the PCRE manual:
Quote:
Like recursive subpatterns, a "subroutine" call is always treated as an
atomic group. That is, once it has matched some of the subject string,
it is never re-entered, even if it contains untried alternatives and
there is a subsequent matching failure.
I agree that it's inconvenient, if not crippling.
Posted By: jaytea Re: recursive matching in regex - 23/09/06 11:12 PM
oh haha, thanks! should've figured, i need to start following those updates laugh
Posted By: genius_at_work Re: recursive matching in regex - 23/09/06 11:18 PM
I can't say that I know why its doing that, but here is what my testing has shown me.

I modified the regex slightly so that I could see what was being matched:

//echo -ag $regex(a,xxx.xxx.xxx.xxx,/^((?:(1?\d?\d|2[0-4]\d|25[0-5])\.){3}(?2)$)/) - $regml(a,1) : $regml(a,2)

If you take out the last $ you can see what the regex is trying to match, and you can see why it fails.

I used these IPs (with the $ removed:
121.122.123.1 displays 1 - 121.122.123.1 : 123
121.122.123.12 displays 1 - 121.122.123.12 : 123
121.122.123.124 displays 1 - 121.122.123.124 : 123
121.122.123.234 displays 1 - 121.122.123.23 : 123
121.122.123.254 displays 1 - 121.122.123.25 : 123

It looks like the 1? at the beginning is what is messing it up. If you take out the ? The above examples will work (with the $ in place). Obviously that will break the ability to match any number under 100, but it demonstrates the problem. I don't know much about atomic grouping, but from what I read, it is possible that (?n) have been made atomic.

The easiest solution I found for that specific regex situation is to rearrange the alternation order. The following code should work with all IPs.

//echo -ag $regex(a,xxx.xxx.xxx.xxx,/^((?:(2[0-4]\d|25[0-5]|1?\d?\d)\.){3}(?2)$)/) - $regml(a,1) : $regml(a,2)

(remove the extra brackets)

//echo -ag $regex(a,xxx.xxx.xxx.xxx,/^(?:(2[0-4]\d|25[0-5]|1?\d?\d)\.){3}(?1)$/)

Edit: obviously the above posts have pinpointed the reason for this change.

-genius_at_work
Posted By: Scorpwanna Re: recursive matching in regex - 24/09/06 12:02 AM
I've always used g to match recursivly.
Posted By: FiberOPtics Re: recursive matching in regex - 24/09/06 01:15 AM
Very strange, someone asked me today on IRC about this exact same problem with also IP matching regex code, and I also proposed to him to change the order of the character classes (what g_at_work suggested), now tonight I see this thread, kind of freaky. Do you know this guy Jensen on Dalnet or something?!

Btw the 1?\d?\d will allow for something like 04.05.06.07 are you okay with that? I don't actually know what's really a valid IP address regarding zero padding of digits.
Posted By: jaytea Re: recursive matching in regex - 24/09/06 01:25 AM
ya thats the obvious workaround here, too bad there doesn't exist a more general solution ;>

lol foptics, no i don't know that guy, that is quite a strange coincidence!

Scorpwanna, that has nothing to do with what we're discussing here :P
Posted By: FNar Re: recursive matching in regex - 24/09/06 03:34 AM
Quote:
Btw the 1?\d?\d will allow for something like 04.05.06.07 are you okay with that? I don't actually know what's really a valid IP address regarding zero padding of digits.


That's only a human readable representation of a long integer. I doubt that you'll actually get zero padding, unless the IP is typed by hand. (Arithmetic doesn't have a concept of "zero padding".)
© mIRC Discussion Forums