mIRC Home    About    Download    Register    News    Help

Topic Options
#230611 - 15/03/11 06:40 PM regex match for event
Sherip Offline
Ameglian cow

Registered: 15/03/11
Posts: 23
I think you have a problem in parsing the event if the regex pattern has a colon in it. Can give the hex for the colon if it is part of the pattern, but pcre also has ?: for noncapturing subpatterns. Don't know of a work around for that except not to use them (and they are worthwhile). Haven't checked but am wondering if there is a similar problem for percent sign. I am quite familiar with pcre but new to mirc scripts. Can you tell me what options the builit-in PCRE is compiled with?

Top
#230612 - 15/03/11 07:13 PM Re: regex match for event [Re: Sherip]
Wiz126 Offline
Babel fish

Registered: 22/03/10
Posts: 57
This behavior is derived from the fact that the mIRC parser first tokenize the event parameters by the colons. Only after that mIRC checks for matchtext and such.

As a result something like on *:text:!foo:bar:#:{ will break.

An easy way around it is to use the $() identifier. We can put the colon after mIRC does it parsing, which will occur before the regex match takes place.

for example:

Code:
on $*:text:$(/^!foobar (? $+ $chr(58) $+ \S+)/):#:{
  ;code here.
}


Its a little obscure, but it works.

Alternatively, you can use a variable or an identifier:

Code:
assume %regex = /^!foobar (?:\S+)/

on $*:text:%regex:#:{
  ;code here.
}

or

alias re return /^!foobar (?:\S+)/
on $*:text:$($re):#:{
  ;code here
}






Edited by Wiz126 (15/03/11 07:18 PM)

Top
#230614 - 15/03/11 07:57 PM Re: regex match for event [Re: Wiz126]
Tomao Offline
Hoopy frood

Registered: 07/07/07
Posts: 1129
Loc: United States
Why not escape it:
Code:
/^!foo\:bar\S+/
Then you make the colon sign literal so it won't be associated with regex's special meaning. You can also use the octal char for it: \x3A or \72 represent the colon aka $chr(58)

Top
#230615 - 15/03/11 08:02 PM Re: regex match for event [Re: Tomao]
Wiz126 Offline
Babel fish

Registered: 22/03/10
Posts: 57
Originally Posted By: Tomao
Why not escape it:
Code:
/^!foobar\:\S+/
Then you make the colon sign literal so it won't be associated with regex's special meaning.

This has nothing to do with regex, in fact he wants it to work with regex, not escape it. Like I said in my previous post its the way mIRC parses the event, also there is no escape sequences for match texts.

If you have something like: on $*:text:/^!foobar (?\:\S+)/:#:{

mIRC will parse is into "$*", "text", "/^!foobar (?\", "\S+)/", "?"
notice, at that point your matchtext is broken, this is done way before the matchtext is even being matched. as a result you simply cannot have colons in the matchtext at all.


Edited by Wiz126 (15/03/11 08:03 PM)

Top
#230616 - 15/03/11 08:04 PM Re: regex match for event [Re: Wiz126]
Tomao Offline
Hoopy frood

Registered: 07/07/07
Posts: 1129
Loc: United States
Originally Posted By: Wiz126
there is no escape sequences for match texts.
You're correct. I totally overlook that. But by using the octal char, it will work:
Code:
/^!foo\72bar\S+/
\x3A will do the same too.

Top
#230617 - 15/03/11 08:09 PM Re: regex match for event [Re: Tomao]
Wiz126 Offline
Babel fish

Registered: 22/03/10
Posts: 57
Originally Posted By: Tomao
Originally Posted By: Wiz126
there is no escape sequences for match texts.
You're correct. I totally overlook that. But by using the octal char, it will work:
Code:
/^!foo\72bar\S+/


You missed what he was trying to do, he knows about hex, as he stated:

Quote:
Can give the hex for the colon if it is part of the pattern, but pcre also has ?: for noncapturing subpatterns.


He is not trying to use a colon as a character literal, he is trying to use it for a non-capturing pattern. for example (?:foobar), unlike (\x3Afoobar) which is literally ":foobar". In such case ':' must be a ':', you cannot escape it.


Edited by Wiz126 (15/03/11 08:10 PM)

Top
#230618 - 15/03/11 08:15 PM Re: regex match for event [Re: Wiz126]
Tomao Offline
Hoopy frood

Registered: 07/07/07
Posts: 1129
Loc: United States
from what I know ?: will still match a string but not capture it. ?! will negate a string.

Top
#230621 - 15/03/11 11:33 PM Re: regex match for event [Re: Wiz126]
Sherip Offline
Ameglian cow

Registered: 15/03/11
Posts: 23
Thank you very much, looks like exactly the info I need.

Do you know the answer to my other question, re which options are compiled into mirc's PCRE library?

Top
#230622 - 15/03/11 11:46 PM Re: regex match for event [Re: Tomao]
hixxy Offline
Hoopy frood

Registered: 06/09/05
Posts: 2876
You're right about ?:, but I'm not sure what you mean by "negate a string."

?! is a negative lookahead assertion. If performs a zero-width match.

Top
#230623 - 16/03/11 12:11 AM Re: regex match for event [Re: hixxy]
Tomao Offline
Hoopy frood

Registered: 07/07/07
Posts: 1129
Loc: United States
Yes hixxy, your answer is more in-depth to the regex. Pardon me that I didn't word it better.
Code:
/hello\s(?!hixxy)\w+/
This will sort of negate a set of words that consists the name hixxy in it. That is what I meant by "negate."

Top
#230624 - 16/03/11 12:18 AM Re: regex match for event [Re: Sherip]
Tomao Offline
Hoopy frood

Registered: 07/07/07
Posts: 1129
Loc: United States
Originally Posted By: Sherip
which options are compiled into mirc's PCRE library?
http://www.pcre.org/

There's a heap lot of info there you can find.

Top
#230630 - 16/03/11 05:20 AM Re: regex match for event [Re: Tomao]
Sherip Offline
Ameglian cow

Registered: 15/03/11
Posts: 23
No. There you can find the options that are available, not the ones that were selected when mirc's pcre library was built.

Top
#230649 - 16/03/11 03:31 PM Re: regex match for event [Re: Sherip]
hixxy Offline
Hoopy frood

Registered: 06/09/05
Posts: 2876
Try messaging Khaled, he usually responds to private messages but may take a while as he gets a lot!

Top
#230653 - 16/03/11 06:30 PM Re: regex match for event [Re: hixxy]
Tomao Offline
Hoopy frood

Registered: 07/07/07
Posts: 1129
Loc: United States
Sending him an email might get a faster response. One time I had a question about my mIRC registration, I sent him an email and I received a reply almost a few hours later.

Top
#230657 - 16/03/11 07:56 PM Re: regex match for event [Re: Sherip]
jaytea Offline
Fjord artisan

Registered: 23/02/06
Posts: 546
we can determine the state of most of the build-time options - the ones most relevant to scripters - through a bit of testing and reasoning:

  • --enable-utf8: $regex($chr(128), /^.$/) and $regex($chr(128), /(*UTF8)^.$/), returning 0 and 1 respectively, shows this option is enabled.
  • --enable-unicode-properties: $regex($chr(8712), /(*UTF8)^\p{Sm}$/) where $chr(8712) is '∈', a mathematical symbol denoting set membership, and '\p{Sm}' is a unicode property, the use of which is permitted only when that option is enabled.
  • --enable-newline-is-*: all of these options appear to not be enabled, with LF being the default new line indicator supported by mIRC's build of PCRE: $regex($+(a, $cr, a, $crlf), /a$/m) is 0 indicating neither CR nor CRLF signals the end of a line, whereas $regex(a $+ $lf, /a$/m) is 1 confirming LF to be the only acceptable line separator.
  • --enable-bsr-anycrlf: this is not enabled, $regex($chr(133), /\R/) = 1 where $chr(133) is the 'next line' character, a Unicode new line sequence, matched by \R only when this option is not enabled.
  • --with-link-size=2: we, as scripters, are not capable of forcing mIRC to pass PCRE an expression that's more than ~4kb. this is well within the 64KB limit implied by the default option value of '2', so i see no reason why it may have been modified.
  • --with-match-limit and --with-match-limit-recursion: i suspect the first limit is around 1,000,000 since $regex($str(a, 1412), /a+a+[b]/) returns 0, but $regex($str(a, 1413), /a+a+[b]/) returns -8 (PCRE_ERROR_MATCHLIMIT) - the number of backtracks being around 998,990 and 1,000,404 respectively. it may not be as clear cut as that though; $regex($str(a, 18), /(?:a+)+[bc]/) also returns -8 but demands far fewer backtracks than the previous example. as for the -recursion limit: this appears to be 999 as $regex($str(a, 999), /a(?R)|$/) and $regex($str(a, 999), (a)+) match successfully, but changing those 999s to 1000 produces a return value of -21 (PCRE_ERROR_RECURSIONLIMIT).
_________________________
"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde

Top
#230683 - 17/03/11 04:54 PM Re: regex match for event [Re: jaytea]
Sherip Offline
Ameglian cow

Registered: 15/03/11
Posts: 23
Thank you jaytea.

Possibly mirc's author should consider for future builds to use a default newline of ANYCRLF as well as the backslash-R option of anycrlf

I find those improve PCRE's default behavior in the Windows environment. Coincidentally they and their pattern override counterparts were implemented in PCRE in the last few years based on my own suggestions. cool

Of course if multiline processing is rarely if ever used it wouldn't matter very much. And in any event the pattern overrides are available to the user since mirc is using a recent PCRE version.

Regards,
Sheri

Top
#230708 - 18/03/11 05:08 PM Re: regex match for event [Re: Sherip]
Khaled Offline


Planetary brain

Registered: 04/12/02
Posts: 4344
Loc: London, UK
I originally disabled the BSR_ANYCRLF option to be conservative, since I was not sure what the side-effects would be for most users. If everyone thinks enabling it is a good thing, I will go ahead and do that for the next release.

Top