escaped characters in regsub - mIRC Discussion Forums

hi there,

ive been trying to alter a strings case with regsub and figured it would not work as expected. escaped characters in the 'subtext' part of $regsub are not given their special meaning.
when trying to apply title case (ie All Words Are Capitalized), i used the following:
$regsub(%string,/(\b.)/g,\U\1\E,%string)
\U meaning 'capitalize until \E'. this did not give the expected result, ie uppercase word boundary+following character. whats it does instead is return "U"+\1(the match)+"E".
i dont know if this is a problem with mirc or pcre, or the implementation of pcre within mirc, but id like to see it fixed.

perl -e '($a = "this bug") =~ s/\b(.)/\u\1/g; print "$a\n";'
returns This Bug, which is correct. the regex i used is not exactly the same, but s/(\b.)/\U\1\E/g would work as well.

for reference, see http://www.perlpod.com/5.8.4/pod/perlre.html

hope its fixe'able!

PCRE itself doesn't support these escape sequences. This is documented in PCRE's manual, more specifically in section DIFFERENCES FROM PERL:

Code:

     4. The following Perl escape sequences  are  not  supported:
     \l,  \u,  \L,  \U,  \P, \p, and \X. In fact these are imple-
     mented by Perl's general string-handling and are not part of
     its pattern matching engine. If any of these are encountered
     by PCRE, an error is generated.

However, from what I've seen, there are no 'substitute' facilities in PCRE itself (like Perl's s/re/sub/): $regsub() only uses PCRE for pattern matching and capturing. The substitutions, as well as the meaning of special chars and sequences in <subtext>, are handled by mirc itself. So I guess your report can be viewed as a feature suggestion; support for these escape sequences. In my opinion (and others scripters'), a more flexible solution for mirc would be the ability to pass \1 in <subtext> to mirc identifiers. This way you would be able to use $regsub(string,/\b(.)/g,$upper(\1),%var).

Quote:

The substitutions, as well as the meaning of special chars and sequences in <subtext>, are handled by mirc itself.

Are you sure about that? If mIRC handles them i'd of thought you would use the normal method of escaping identifiers ($!identifier(\1)) but instead you have to use \$identifier(\1).

both solutions look good to me. the implementation of special escapes gets my vote though, but i guess its less likely to happen.
thank you qwerty.

The parsing of parameters in <subtext> is the same as with any other mirc identifier: $!identifier still evaluates to $identifier. However, $ is considered a special char in <subtext>: $1 is the same as \1, $2 = \2 etc:
//var %a, %b = $regsub(cd,/(.)/g,A$1B,%a) | echo -s %a
result: AcBAdB
Most probably, this feature has its roots to Perl. To escape $ in subtext, you use \$. Note that the \ in \$ident also prevents mirc from evaluating "$ident" simply because it touches the $, like it does in //echo -s \$me
Also note that
//var %a, %b = $regsub(cd,/(.)/g,A $1 B,%a) | echo -s %a
wouldn't give "A c BA d B" because mirc still evaluates <subtext>, as it does with all identifier params. So in this case, it would try to evaluate $1, which would be the first param passed to the calling routine (eg an alias or an event):
//tokenize 32 TEST | var %a, %b = $regsub(cd,/(.)/g,A $1 B,%a) | echo -s %a
To get it to work like the first example, you need to use $!1.

Ah, I remember reading that somewhere else now.

I still don't understand why you can't use a $replace() there though, if $1 is evaluated like normal then surely any other identifier should be.

You can use any identifier, you just can't pass the contents of \1 to it; if you try, the identifier is passed the string "\1", returns a value and then mirc replaces any \1 in it with the PCRE-captured content:

//var %a, %b = $regsub(a,/(a)/,$str(\1,3),%a) | echo -s %a
"aaa"

$str() indeed worked, returning "\1\1\1" (like it would inside any other identifier, e.g. //echo -a $replace($str(\1,3),1,2) ). Then mirc replaced every \1 with the captured "a". What we would all like is \1 to be replaced before the standard evaluation of idents/variables in <subtext> (which would have to be repeated as many times as the number of matches).

Quote:

What we would all like is \1 to be replaced before the standard evaluation of idents/variables in <subtext> (which would have to be repeated as many times as the number of matches).

That would be such a great addition (as suggested many times before). I really hope Khaled will add that in the next version!

Ah yes I see now, great idea.

Having it \1 etc replaced before evaluation flys in the face of all logical evaluation order, i mean its a parameter thats passed to the procedure, you could always markup the output so the \1 etc return values were easy to identify, placing some type of tags infront and behind etc.

Not that im saying its not a damn fine idea but i thought it should use a new command, to do that, something in the nature of the looping alias callable commands such as filter or findfile.

Quote:

Having it \1 etc replaced before evaluation flys in the face of all logical evaluation order, i mean its a parameter thats passed to the procedure, you could always markup the output so the \1 etc return values were easy to identify, placing some type of tags infront and behind etc.

- So? $findfile() and $finddir() do it already. If someone wants to evaluate identifiers beforehand they can use evaluation brackets. Using 'markup' is very fiddly.

$findfile isn't that different from the proposed behaviour in $regsub. $findfile's [command] parameter is evaluated very differently from the other identifiers already: it's a mini-environment where you can use expressions with $1- etc and where the contained code is evaluated each time a new file is found. This behaviour is (to my eyes) almost identical to the hypothetical $regsub, except two things:
- the parameter wouldn't be treated as a command but as a value to replace certain substrings in the input string.
- $regsub would use \1 instead of $1
These differences are minor details, both from the user's and the developer's perspective; I'm guessing that Khaled wouldn't have to work too hard to make this happen, since he did a similar thing with $findfile.

Edit: too slow once again, starbucks summed it up as I was writing my post