Register Log In

Forums Scripts & Popups regexp problem

Print Thread

regexp problem #179774 28/06/07 12:02 AM
T TropNul
TropNul T	hello everyone. after making some tests with the $regsubex ident, with a friend, we figured out this problem: Why does Quote: //echo -a > $regsubex(Ga is Ga is Ga,/(is) (Ga)/g, > \t <) returns Quote: > Ga > is < > Ga < instead of returning Quote: > Ga > is Ga < > is Ga < ? It returns the correct thing when using Quote: //echo -a > $regsubex(Ga is Ga is Ga,/(is Ga)/g, > \t <) Our questions are: What's the difference between /(is) (Ga)/ and /(is Ga)/ ? (we know that in the second case there's only \1 which exists) Why does it search differently in these 2 cases ? Is there some weird manner the regexp library treats them ? If so, how ? What's the exact definition of the options of the $regsubex ident ? Quote: Subtext can also contain special markers where \0 = number of matches, \n = match number, \t = match text, \a = all match items, and \A which is a non-spaced version of \a Can someone please, describe them in detail ? In the pcre.txt document, they are not present. Thx in advance

Re: regexp problem #179807 28/06/07 10:17 AM
5 5618
5618 5	Isn't this pretty obvious? The input string is processed from left to right and the regex first processes values in parentheses. This means (is) is first matched and then (Ga), resulting in your echo (and only returns it once since you use /t). For the second example the entire string between parentheses is matched, this giving you that particular echo. As for the last question, pretty clear to me... \0 = how many matches were found \n = what number a given match was (e.g. the 6th match found) \t = the text that was matched by the regex (/n in text form) \a = all of \t in one string \A = \a without spacing out the individual matches

Re: regexp problem #179810 28/06/07 10:36 AM
Joined: Jan 2003 Posts: 2,125 Q qwerty Hoopy frood
qwerty Hoopy frood Q Joined: Jan 2003 Posts: 2,125	First of all, the \0, \a etc sequences (when used in the <sub> parameter) are handled by mirc itself, not PCRE. Before explaining what each sequence is, let's first consider $regml(). $regml(N) returns the Nth captured substring, taking into account all /g iterations. To see what I mean, consider $regex(ab@c@d,/([a-z])\|@/g) The pattern matches a lowercase letter or a '@'. If it's a letter, it is captured, otherwise it's not. So in this case, $regml(1) = a $regml(2) = b $regml(3) = c $regml(4) = d and $regml(0) = 4 So the number of captured substrings in total is $regml(0) = 4. Now most of the special sequences can be explained in terms of $regml(): \n is the number of the Nth iteration when /g is used (to see what it is exactly, type //echo -ag $regsubex(ab@c@d,/([a-z])\|@/g,<\n>) - if no /g is used, \n is always 1) \0 is the same as $regml(0) \t is the same as $regml(\n) (note that you can use $regml() directly inside the <sub> parameter in $regsub/$regsubex) \a is a space-separated list of all $regml(N) items, ie all the captured substrings. A piece of code that would replicate \a would be: Code: var %i = $regml(0), %a while %i { %a = $regml(%i) %a dec %i } ; %a is the same as \a now \A is the same as \a except there are no spaces between the captures. The same piece of code as above would generate \A if you changed the 2nd line to %a = $regml(%i) $+ %a \1, \2, \3 etc cannot be expressed in terms of $regml(); \1 returns the captured substring of the first () pair in the pattern, \2 the capture of the second () pair and so on. However, it may be that a ()-enclosed subpattern doesn't always match (eg when it's an alternative subpattern). The example mentioned before is exactly such a case: $regsubex(ab@c@d,/([a-z])\|@/g,<\1>) In the first /g iteration, ([a-z]) matches "a", so \1 would be "a". In the second /g iteration, ([a-z]) matches "b", so \1 would be "b". In the third /g iteration, ([a-z]) doesn't match "@" but @ does. Here \1 takes the value of the next successful \1 (in this case "c"). A similar thing happens if \2 is used and there's only one () pair in the pattern; \2 takes the value of the next \1.

Re: regexp problem #179813 28/06/07 12:38 PM
T TropNul
TropNul T	thx for the help So, if I correctly understood the issue, that would be the way the regexp library treats them. //echo -a > $regsubex(Ga is Ga is Ga,/(is) (Ga)/g, > \t <) The first match would set \t to \1 The second match would set \t to \2 And as \1 and \2 are dissociated, the replacements are correct. First match > \1 == is ; so replace 'is Ga' with 'is' Second match > \2 == Ga ; so replace 'is Ga' with 'Ga' For the second case, it's simpler. Only \1 exists each time. So \t will always be equal to 'is Ga' thus replacement is correct. Am I right ? Is the functionning of \t normal in the 2 cases ? Last edited by TropNul; 28/06/07 12:49 PM.

Re: regexp problem #179814 28/06/07 01:00 PM
Joined: Jan 2003 Posts: 2,125 Q qwerty Hoopy frood
qwerty Hoopy frood Q Joined: Jan 2003 Posts: 2,125	This is a case where you need to be precise with the terminology. In your first example, the first "/g iteration" (which I assume is what you mean by "match") sets \t to $regml(\n). \n = 1 in the first iteration, so \t is set to $regml(1) = "is". In the next iteration, \t is set to to $regml(2) = "Ga". Not sure if that's what you meant, but if you didn't, note that \1 and $regml(1) are quite different when /g comes into play. \1 always refers to the first capture in each iteration whereas $regml(1) refers to the first capture ever. \t is not set to \1, \2, \3 etc, it is set to $regml(1), $regml(2), $regml(3) etc.

Re: regexp problem qwerty #179819 28/06/07 01:53 PM
T TropNul
TropNul T	thx a lot

Link Copied to Clipboard