mIRC Home    About    Download    Register    News    Help

Print Thread
#199436 14/05/08 04:35 PM
Joined: Nov 2006
Posts: 1,559
H
Horstl Offline OP
Hoopy frood
OP Offline
Hoopy frood
H
Joined: Nov 2006
Posts: 1,559
Hi,
now and then I'm using quotas (\QTheLiteralText\E) in regex, for example to /filter case sensitive, and/or to /filter -g multiple expressions like /(?:\QexpressionA\E)|(?:\QexpressionB\E)|(?:\QexpressionC\E)/ (sceme only).

Using \Q\E had been handy, as I didn't need to excape metachars separately - especially if the regex matchtext isn't statical (users input, part of $fulladdresses etc).
Now, this isn't working if the escaped text (or what I expect to be escaped) contains metachars AND word boundaries come into play.

No word boundary in regex; all true:
Code:
//var %txt = test | echo -a $regex(%txt,/\Qtest\E/) 
//var %txt = xx test xx | echo -a $regex(%txt,/\Qtest\E/) 

//var %txt = \d^ | echo -a $regex(%txt,/\Q\d^\E/) 
//var %txt = xx [a] xx | echo -a $regex(%txt,/\Q[a]\E/)

Still true, word boundary in regex:
Code:
//var %txt = test | echo -a $regex(%txt,/\b\Qtest\E\b/) 
//var %txt = xx test xx | echo -a $regex(%txt,/\b\Qtest\E\b/)

FALSE, word boundary in regex:
Code:
//var %txt = \d^ | echo -a $regex(%txt,/\b\Q\d^\E\b/) 
//var %txt = xx [a] xx | echo -a $regex(%txt,/\b\Q[a]\E\b/)

....even more irritating: this one IS true:
Code:
//var %txt = xx [a] xx | echo -a $regex(%txt,/\B\Q[a]\E\B/)

I'd really appreciate an explanation of this behaviour (looks like I misconceive the way \Q\E work) and, furthermore, a workaround.
I'd like to keep on using \Q\E instead of an ugly $replacex(string of all the possible metachars to escape)
Thanks!

Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
This is not related to \Q\E, but to the way \b works. \b matches the position between a \w character and a \W character, in either order (or between a \w character and the beginning/end of the string). So /\b@test\b/ would never match "hello @test hi" because the first \b is between two \W characters: the space and @. In contrast, /@\btest\b/ would match the previous string, because the first \b is between a \W (@) and and a \w (t). Whether @test is inside \Q\E or not doesn't matter at all.

What you really mean to match is not a position between \w and \W but between \s and \S. There is no shorthand assertion for this sort of match; you have to use the usual, lengthy assertion construct:

(?<=^|\s)STRINGHERE(?=$|\s)

Examples:

//echo -ag $regex(hello @test hi,/(?<=^|\s)@test(?=$|\s)/)

//echo -ag $regex(hello @testhi,/(?<=^|\s)@test(?=$|\s)/)

//echo -ag $regex(hello@test hi,/(?<=^|\s)@test(?=$|\s)/)


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Nov 2006
Posts: 1,559
H
Horstl Offline OP
Hoopy frood
OP Offline
Hoopy frood
H
Joined: Nov 2006
Posts: 1,559
Thanks, I just didn't realize that my "metachars" are word boundaries themselves. smirk

Now, to get the "bounary effect" in some circumstances none the less (e.g. text bordering punctuation, brackets...), I'm temped to use: /(?<=^|\W)\Qmy expression\E(?=$|\W)/

That aside: you may have noticed that I used (?:...) to set "non captive" patterns. You're using (?<=...) and (?=...). I never figured out how to use these atomic things properly - could you please explain why you used the one at the beginning and the other at the end?

Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
(?<=) and (?=) are called assertions ("atomic" elements are an entirely different matter) and they are essentially the same sort of thing as \b, in that they do not consume characters when matching: meaning that another part (earlier or later) of the pattern can match the same characters that were matched by the assertions.

Another way to look at assertions is to treat them as positions, ie assertions match a position in the input that is preceded or followed by whatever is inside the assertion. Sort of like the cursor in an editbox: a cursor can be put anywhere in the string, but it's always between characters. In contrast, "normal" regex patterns can be thought of as a selected area of text in an editbox. A selection always covers one or more characters.

Assertions can be a little hard to get at first (although they are straightforward in reality) and this is not the place for a full explanation, examples etc. This tutorial should be helpful.

The reason I used assertions instead of (?:) will become apparent if your expressions use the /g modifier. In such cases, the same part of the input can match an assertion in more than one /g rounds, for example once in a lookahead, in one round, and a second time in a lookbehind, in the next round. Not sure how much of this makes sense now, but I hope it will once you read the tutorial and play with some examples yourself.

Last edited by qwerty; 14/05/08 07:22 PM.

/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Nov 2006
Posts: 1,559
H
Horstl Offline OP
Hoopy frood
OP Offline
Hoopy frood
H
Joined: Nov 2006
Posts: 1,559
Now I've an image of what assertions can do (at least a vague one). Your "cursor" analogy was helpful in particular - Thanks again smile



Link Copied to Clipboard