mIRC Home    About    Download    Register    News    Help

Print Thread
#179774 28/06/07 12:02 AM
Joined: May 2007
Posts: 89
T
TropNul Offline OP
Babel fish
OP Offline
Babel fish
T
Joined: May 2007
Posts: 89
hello everyone.

after making some tests with the $regsubex ident, with a friend, we figured out this problem:

Why does
Quote:
//echo -a > $regsubex(Ga is Ga is Ga,/(is) (Ga)/g, > \t <)

returns
Quote:
> Ga > is < > Ga <

instead of returning
Quote:
> Ga > is Ga < > is Ga <

?

It returns the correct thing when using
Quote:
//echo -a > $regsubex(Ga is Ga is Ga,/(is Ga)/g, > \t <)


Our questions are:

What's the difference between /(is) (Ga)/ and /(is Ga)/ ? (we know that in the second case there's only \1 which exists)

Why does it search differently in these 2 cases ?

Is there some weird manner the regexp library treats them ? If so, how ?

What's the exact definition of the options of the $regsubex ident ?

Quote:
Subtext can also contain special markers where \0 = number of matches, \n = match number, \t = match text, \a = all match items, and \A which is a non-spaced version of \a


Can someone please, describe them in detail ? In the pcre.txt document, they are not present.

Thx in advance smile


tropnul
TropNul #179807 28/06/07 10:17 AM
Joined: Jun 2007
Posts: 933
5
Hoopy frood
Offline
Hoopy frood
5
Joined: Jun 2007
Posts: 933
Isn't this pretty obvious? The input string is processed from left to right and the regex first processes values in parentheses.
This means (is) is first matched and then (Ga), resulting in your echo (and only returns it once since you use /t).
For the second example the entire string between parentheses is matched, this giving you that particular echo.

As for the last question, pretty clear to me...

\0 = how many matches were found
\n = what number a given match was (e.g. the 6th match found)
\t = the text that was matched by the regex (/n in text form)
\a = all of \t in one string
\A = \a without spacing out the individual matches

TropNul #179810 28/06/07 10:36 AM
Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
First of all, the \0, \a etc sequences (when used in the <sub> parameter) are handled by mirc itself, not PCRE. Before explaining what each sequence is, let's first consider $regml(). $regml(N) returns the Nth captured substring, taking into account all /g iterations. To see what I mean, consider $regex(ab@c@d,/([a-z])|@/g)

The pattern matches a lowercase letter or a '@'. If it's a letter, it is captured, otherwise it's not. So in this case,
$regml(1) = a
$regml(2) = b
$regml(3) = c
$regml(4) = d
and
$regml(0) = 4

So the number of captured substrings in total is $regml(0) = 4.

Now most of the special sequences can be explained in terms of $regml():
  • \n is the number of the Nth iteration when /g is used (to see what it is exactly, type //echo -ag $regsubex(ab@c@d,/([a-z])|@/g,<\n>) - if no /g is used, \n is always 1)
  • \0 is the same as $regml(0)
  • \t is the same as $regml(\n) (note that you can use $regml() directly inside the <sub> parameter in $regsub/$regsubex)
  • \a is a space-separated list of all $regml(N) items, ie all the captured substrings. A piece of code that would replicate \a would be:
    Code:
    var %i = $regml(0), %a
    while %i {
      %a = $regml(%i) %a
      dec %i
    }
    ; %a is the same as \a now
  • \A is the same as \a except there are no spaces between the captures. The same piece of code as above would generate \A if you changed the 2nd line to %a = $regml(%i) $+ %a
  • \1, \2, \3 etc cannot be expressed in terms of $regml(); \1 returns the captured substring of the first () pair in the pattern, \2 the capture of the second () pair and so on. However, it may be that a ()-enclosed subpattern doesn't always match (eg when it's an alternative subpattern). The example mentioned before is exactly such a case:
    $regsubex(ab@c@d,/([a-z])|@/g,<\1>)
    In the first /g iteration, ([a-z]) matches "a", so \1 would be "a".
    In the second /g iteration, ([a-z]) matches "b", so \1 would be "b".
    In the third /g iteration, ([a-z]) doesn't match "@" but @ does. Here \1 takes the value of the next successful \1 (in this case "c"). A similar thing happens if \2 is used and there's only one () pair in the pattern; \2 takes the value of the next \1.


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
TropNul #179813 28/06/07 12:38 PM
Joined: May 2007
Posts: 89
T
TropNul Offline OP
Babel fish
OP Offline
Babel fish
T
Joined: May 2007
Posts: 89
thx for the help smile

So, if I correctly understood the issue, that would be the way the regexp library treats them.

//echo -a > $regsubex(Ga is Ga is Ga,/(is) (Ga)/g, > \t <)

The first match would set \t to \1
The second match would set \t to \2
And as \1 and \2 are dissociated, the replacements are correct.

First match > \1 == is ; so replace 'is Ga' with 'is'
Second match > \2 == Ga ; so replace 'is Ga' with 'Ga'

For the second case, it's simpler. Only \1 exists each time. So \t will always be equal to 'is Ga' thus replacement is correct.

Am I right ?

Is the functionning of \t normal in the 2 cases ?

Last edited by TropNul; 28/06/07 12:49 PM.

tropnul
TropNul #179814 28/06/07 01:00 PM
Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
This is a case where you need to be precise with the terminology. In your first example, the first "/g iteration" (which I assume is what you mean by "match") sets \t to $regml(\n). \n = 1 in the first iteration, so \t is set to $regml(1) = "is". In the next iteration, \t is set to to $regml(2) = "Ga". Not sure if that's what you meant, but if you didn't, note that \1 and $regml(1) are quite different when /g comes into play. \1 always refers to the first capture in each iteration whereas $regml(1) refers to the first capture ever. \t is not set to \1, \2, \3 etc, it is set to $regml(1), $regml(2), $regml(3) etc.


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
qwerty #179819 28/06/07 01:53 PM
Joined: May 2007
Posts: 89
T
TropNul Offline OP
Babel fish
OP Offline
Babel fish
T
Joined: May 2007
Posts: 89
thx a lot smile


tropnul

Link Copied to Clipboard