mIRC Homepage
Posted By: whoami regex loop - 12/10/05 10:59 AM
Hi. I've just learned regular expression and while i was searching for some codes i can study them. i found this:

alias backsp {
var %m = $1
while ($regsub(%m,(^|.)\ $+ $chr($2),,%m)) !
return %m
}


what i've been trying to do is this:

alias underline {
%a = scripting
.echo -q $regsub(%a,([^gp]),\1,%a)
return %a
}


crazy who can explain me how i can use loop througt regsub!
Posted By: DaveC Re: regex loop - 12/10/05 11:25 AM
The $regsub indentifer comes back evalaueted as the number of substitutions it made in the first %m, and stores the resulting string into %m again, so as long as it made at least one sub the result well be NON zero, which when used on its own a while loop makes that loop true, so it well loop back around and do the $regsub again on the new %m text.

*** HOWEVER I dont believe the first code works at least looking at it here i think the $regsub line should look like this while ($regsub(%m,(^|.)\ $+ $chr($2),,%m)) { } , nothing is needed in the while loop ie { }, as everything is done in the $regsub

I must say im not any good with regsubs, but for the most part i thought they usally didnt require repeditive processing, the regsub can be instructed to replace any number of occrances, however the original example may have been something to do with one pass creating newly substitueable sections of the text, which i guess would need multiple passes untell no substitutions occur.

Its not actually looping
Posted By: FiberOPtics Re: regex loop - 12/10/05 12:25 PM
Before anything else, you should first state what you're trying to do.

In addition, I get the impression that you haven't learned regex enough to be able to grasp any explanation at this point.

Your first code seems to want to remove a certain char specified by it's numeric representation, and its preceeding char. $backsp(this is a test,$asc(i)) returns tss a test. You don't need a while loop for that, you can use the /g modifier that will make the regex engine repeat the pattern on the string until all matches have been made.

/*
Usage: $backsp(string,character)

Example: //echo -a $backsp(this is a test,s)

*/

alias backsp {
var %a, %b = $regsub($1,/(?:^|.)\ $+ $base($asc($2),10,8) /gx,,%a)
return %a
}

Note that I used the x modifier, which will ignore unescaped whitespace in the expression, and I made the first pair of brackets uncapturing, since you're not referencing them anywhere. Also note that a lot of characters have a meaning in a regex. I've changed the regsub so that you can input an actual character instead of it's ascii numeric representation. However, with that there is a danger accompanied. Since we have that \ in the expression, for example if we would have used the character "S" as input, it would turn into \S which means "match a non whitespace character". To avoid this I use the octal representation of any character you input, making sure there will be no conflicts with built-in regex constructs. In theory there could be a conflict with backreferences, but since we're not capturing anything, this will form no problem.

Your second code seems to want to underline any character that is neither a "g" or a "p". It does however only do one substitution because the regex engine by nature is lazy, unless you force it not to be. If you want to make the regex engine keep doing substitutions, specify the /g modifier, like was done in the backsp code.

Note that some day you are going to run into trouble because you're not using the regex delimiters / / to enclose your regex patterns. This is especially true if you use regex in events when specifying the $ event prefix, I've had patterns not work due to lacking / /, even in the case where the expression didn't start with an "m". If you know regex, you'll know what I mean with that.

The modifiers come after the second regex delimiter /, or you can specify them by putting them inside brackets like this (?<modifiers>)

I think you need to do some more reading and practicing, here's a good tutorial

Here's also a link to the main reference for anything regarding PCRE, which is the regex library that mIRC also uses: pcre.txt, although I don't recommend it at first as it's somewhat hard to read through. I'd go with the tutorial first.
Posted By: Sat Re: regex loop - 12/10/05 02:01 PM
Quote:
Your first code seems to want to remove a certain char specified by it's numeric representation, and its preceeding char. $backsp(this is a test,$asc(i)) returns tss a test. You don't need a while loop for that, you can use the /g modifier that will make the regex engine repeat the pattern on the string until all matches have been made.

You do need a while loop, as several backspace characters can be next to eachother..
Posted By: FiberOPtics Re: regex loop - 12/10/05 02:30 PM
That depends on what you're after of course.

If the char that's passed to the regsub ($2) is an i, then on the string "thiis", one way to look at it, is that we have a "hi" part, and a "ii" part, which is two times the char to be removed with its preceeding char. One could say the output of that should then be "ts". If that's the case, then putting a simple + after the $base code, will take care of that, no while loop required.

The code for this approach:

alias backsp {
var %a, %b = $regsub($1,/(?:^|.)\ $+ $base($asc($2),10,8) +/gx,,%a)
return %a
}

On the other hand one could look at it and say:

Let's first remove "hi", which leaves "tis", and only then remove the "ti", leaving nothing but "s". For that, one will indeed need a loop.

The code for this approach:

alias _backsp {
var %a = $1, %re = /(?:^|.)\ $+ $base($asc($2),10,8) $+ /
while ($regsub(%a,%re,,%a)) !
return %a
}
Posted By: Sat Re: regex loop - 12/10/05 02:34 PM
Well, in general any backspace functionality follows the second method :tongue:
Posted By: FiberOPtics Re: regex loop - 12/10/05 02:47 PM
I have no idea what the "general" thing is, but I believe ya smile
Posted By: Kelder Re: regex loop - 12/10/05 04:26 PM
Here's a way to do it without a loop...
Ofcourse replace i with \ooo where ooo is the octal character code of your choice. \xhh with hh the hexadecimal character code is ok too...
Combining it into one regex /^i+|.(?1)*i)/g can fail in situations like
"aii", "aiii" works though.

Code:
alias backspace {
  var %res, %q = $regsub($1-,/(.(?1)*i)/g,,%res) $regsub(%res,/^i+/,,%res)
  return %res
}
Posted By: FiberOPtics Re: regex loop - 12/10/05 04:47 PM
I should start using that recursive feature more often, looks really handy!
Posted By: qwerty Re: regex loop - 12/10/05 04:50 PM
Actually there is a way to avoid the loop, although it's worse than looping. I just mention it for the 'fun' of it and because some interesting things happen when you feed this with a lot of backspaces.

First of all, it gets really slow after some point, for example:

//echo -s $_backsp(ABCDE $+ $str(a,16) $+ $str(i,18),i)

it echoes "ABC" after a few seconds here. Now the weird part. If you type the above command and then type this:
//echo -s $_backsp(ABCDEFG $+ $str(a,26) $+ $str(i,28),i)
it takes a very long time, which is normal, but doesn't echo the correct result (ABCDE), neither $null: it echoes the previous answer (ABC). PCRE has issues with recursion and there's even a pattern with (?R), which I won't mention here, that crashes mirc, even though the crashes are supposed to have been fixed (not sure on which part, PCRE or mirc, the point is that if a recursive pattern gets out of hand, normally $regsub() doesn't crash mirc anymore. Instead it returns the original string unaffected, as if it didn't match anything).

Anyway, here's the code:

Code:
alias _backsp {
  var %c = \ $+ $base($asc($2),10,8) 
  !.echo -q $regsub($1,/[^ $+ %c $+ ](?R)+ %c |/gx,,%a) $regsub(%a,/^ %c +/x,,%a)
  return %a
}
Posted By: qwerty Re: regex loop - 12/10/05 04:53 PM
Damn you sneaky people! I posted a good 25 minutes after your post... who's gonna believe me now? :tongue:

It's weird how we came up with exactly the same solution tho... there must be only one smile
Posted By: Sat Re: regex loop - 12/10/05 05:11 PM
Woah, I didn't even know that something like that existed! Thanks Kelder and qwerty, I have something new to check out now smile

However, a quick glance at the concerned section of the PCRE documentation seems to suggest that the solution to the slowness is atomic grouping. In fact, the following variant of qwerty's regex seems to be fast enough to be usable, and produce the expected result for the second example as well:

Code:
alias _backsp {
  var %c = \ $+ $base($asc($2),10,8) 
  !.echo -q $regsub($1,/[^ $+ %c $+ ](?R)+[color:red]+[/color] %c |/gx,,%a) $regsub(%a,/^ %c +/x,,%a)
  return %a
}

On the other hand, applying the same trick to Kelder's regex results in horribly wrong answers, so I really can't say whether it's fully correct in any case. Thoughts?

(with apologies to whoami for taking this more and more off-topic)
Posted By: qwerty Re: regex loop - 12/10/05 05:54 PM
Ah yes, atomic grouping seems to help indeed! One can never read pcre.txt enough smile I'll test more but so far it works like a charm.

The reason Kelder's doesn't work is that he uses . instead of [^i] just before the (?1) call.
Posted By: whoami Re: regex loop - 13/10/05 01:15 PM
well thanks for helping. grin
© mIRC Discussion Forums