mIRC Homepage
Posted By: Sherip regex match for event - 15/03/11 06:40 PM
I think you have a problem in parsing the event if the regex pattern has a colon in it. Can give the hex for the colon if it is part of the pattern, but pcre also has ?: for noncapturing subpatterns. Don't know of a work around for that except not to use them (and they are worthwhile). Haven't checked but am wondering if there is a similar problem for percent sign. I am quite familiar with pcre but new to mirc scripts. Can you tell me what options the builit-in PCRE is compiled with?
Posted By: Wiz126 Re: regex match for event - 15/03/11 07:13 PM
This behavior is derived from the fact that the mIRC parser first tokenize the event parameters by the colons. Only after that mIRC checks for matchtext and such.

As a result something like on *:text:!foo:bar:#:{ will break.

An easy way around it is to use the $() identifier. We can put the colon after mIRC does it parsing, which will occur before the regex match takes place.

for example:

Code:
on $*:text:$(/^!foobar (? $+ $chr(58) $+ \S+)/):#:{
  ;code here.
}


Its a little obscure, but it works.

Alternatively, you can use a variable or an identifier:

Code:
assume %regex = /^!foobar (?:\S+)/

on $*:text:%regex:#:{
  ;code here.
}

or

alias re return /^!foobar (?:\S+)/
on $*:text:$($re):#:{
  ;code here
}




Posted By: Tomao Re: regex match for event - 15/03/11 07:57 PM
Why not escape it:
Code:
/^!foo\:bar\S+/
Then you make the colon sign literal so it won't be associated with regex's special meaning. You can also use the octal char for it: \x3A or \72 represent the colon aka $chr(58)
Posted By: Wiz126 Re: regex match for event - 15/03/11 08:02 PM
Originally Posted By: Tomao
Why not escape it:
Code:
/^!foobar\:\S+/
Then you make the colon sign literal so it won't be associated with regex's special meaning.

This has nothing to do with regex, in fact he wants it to work with regex, not escape it. Like I said in my previous post its the way mIRC parses the event, also there is no escape sequences for match texts.

If you have something like: on $*:text:/^!foobar (?\:\S+)/:#:{

mIRC will parse is into "$*", "text", "/^!foobar (?\", "\S+)/", "?"
notice, at that point your matchtext is broken, this is done way before the matchtext is even being matched. as a result you simply cannot have colons in the matchtext at all.
Posted By: Tomao Re: regex match for event - 15/03/11 08:04 PM
Originally Posted By: Wiz126
there is no escape sequences for match texts.
You're correct. I totally overlook that. But by using the octal char, it will work:
Code:
/^!foo\72bar\S+/
\x3A will do the same too.
Posted By: Wiz126 Re: regex match for event - 15/03/11 08:09 PM
Originally Posted By: Tomao
Originally Posted By: Wiz126
there is no escape sequences for match texts.
You're correct. I totally overlook that. But by using the octal char, it will work:
Code:
/^!foo\72bar\S+/


You missed what he was trying to do, he knows about hex, as he stated:

Quote:
Can give the hex for the colon if it is part of the pattern, but pcre also has ?: for noncapturing subpatterns.


He is not trying to use a colon as a character literal, he is trying to use it for a non-capturing pattern. for example (?:foobar), unlike (\x3Afoobar) which is literally ":foobar". In such case ':' must be a ':', you cannot escape it.
Posted By: Tomao Re: regex match for event - 15/03/11 08:15 PM
from what I know ?: will still match a string but not capture it. ?! will negate a string.
Posted By: Sherip Re: regex match for event - 15/03/11 11:33 PM
Thank you very much, looks like exactly the info I need.

Do you know the answer to my other question, re which options are compiled into mirc's PCRE library?
Posted By: hixxy Re: regex match for event - 15/03/11 11:46 PM
You're right about ?:, but I'm not sure what you mean by "negate a string."

?! is a negative lookahead assertion. If performs a zero-width match.
Posted By: Tomao Re: regex match for event - 16/03/11 12:11 AM
Yes hixxy, your answer is more in-depth to the regex. Pardon me that I didn't word it better.
Code:
/hello\s(?!hixxy)\w+/
This will sort of negate a set of words that consists the name hixxy in it. That is what I meant by "negate."
Posted By: Tomao Re: regex match for event - 16/03/11 12:18 AM
Originally Posted By: Sherip
which options are compiled into mirc's PCRE library?
http://www.pcre.org/

There's a heap lot of info there you can find.
Posted By: Sherip Re: regex match for event - 16/03/11 05:20 AM
No. There you can find the options that are available, not the ones that were selected when mirc's pcre library was built.
Posted By: hixxy Re: regex match for event - 16/03/11 03:31 PM
Try messaging Khaled, he usually responds to private messages but may take a while as he gets a lot!
Posted By: Tomao Re: regex match for event - 16/03/11 06:30 PM
Sending him an email might get a faster response. One time I had a question about my mIRC registration, I sent him an email and I received a reply almost a few hours later.
Posted By: jaytea Re: regex match for event - 16/03/11 07:56 PM
we can determine the state of most of the build-time options - the ones most relevant to scripters - through a bit of testing and reasoning:

  • --enable-utf8: $regex($chr(128), /^.$/) and $regex($chr(128), /(*UTF8)^.$/), returning 0 and 1 respectively, shows this option is enabled.
  • --enable-unicode-properties: $regex($chr(8712), /(*UTF8)^\p{Sm}$/) where $chr(8712) is '∈', a mathematical symbol denoting set membership, and '\p{Sm}' is a unicode property, the use of which is permitted only when that option is enabled.
  • --enable-newline-is-*: all of these options appear to not be enabled, with LF being the default new line indicator supported by mIRC's build of PCRE: $regex($+(a, $cr, a, $crlf), /a$/m) is 0 indicating neither CR nor CRLF signals the end of a line, whereas $regex(a $+ $lf, /a$/m) is 1 confirming LF to be the only acceptable line separator.
  • --enable-bsr-anycrlf: this is not enabled, $regex($chr(133), /\R/) = 1 where $chr(133) is the 'next line' character, a Unicode new line sequence, matched by \R only when this option is not enabled.
  • --with-link-size=2: we, as scripters, are not capable of forcing mIRC to pass PCRE an expression that's more than ~4kb. this is well within the 64KB limit implied by the default option value of '2', so i see no reason why it may have been modified.
  • --with-match-limit and --with-match-limit-recursion: i suspect the first limit is around 1,000,000 since $regex($str(a, 1412), /a+a+[b]/) returns 0, but $regex($str(a, 1413), /a+a+[b]/) returns -8 (PCRE_ERROR_MATCHLIMIT) - the number of backtracks being around 998,990 and 1,000,404 respectively. it may not be as clear cut as that though; $regex($str(a, 18), /(?:a+)+[bc]/) also returns -8 but demands far fewer backtracks than the previous example. as for the -recursion limit: this appears to be 999 as $regex($str(a, 999), /a(?R)|$/) and $regex($str(a, 999), (a)+) match successfully, but changing those 999s to 1000 produces a return value of -21 (PCRE_ERROR_RECURSIONLIMIT).
Posted By: Sherip Re: regex match for event - 17/03/11 04:54 PM
Thank you jaytea.

Possibly mirc's author should consider for future builds to use a default newline of ANYCRLF as well as the backslash-R option of anycrlf

I find those improve PCRE's default behavior in the Windows environment. Coincidentally they and their pattern override counterparts were implemented in PCRE in the last few years based on my own suggestions. cool

Of course if multiline processing is rarely if ever used it wouldn't matter very much. And in any event the pattern overrides are available to the user since mirc is using a recent PCRE version.

Regards,
Sheri
Posted By: Khaled Re: regex match for event - 18/03/11 05:08 PM
I originally disabled the BSR_ANYCRLF option to be conservative, since I was not sure what the side-effects would be for most users. If everyone thinks enabling it is a good thing, I will go ahead and do that for the next release.
© mIRC Discussion Forums