mIRC Home    About    Download    Register    News    Help

Print Thread
#230611 15/03/11 06:40 PM
Joined: Mar 2011
Posts: 23
S
Sherip Offline OP
Ameglian cow
OP Offline
Ameglian cow
S
Joined: Mar 2011
Posts: 23
I think you have a problem in parsing the event if the regex pattern has a colon in it. Can give the hex for the colon if it is part of the pattern, but pcre also has ?: for noncapturing subpatterns. Don't know of a work around for that except not to use them (and they are worthwhile). Haven't checked but am wondering if there is a similar problem for percent sign. I am quite familiar with pcre but new to mirc scripts. Can you tell me what options the builit-in PCRE is compiled with?

Joined: Mar 2010
Posts: 57
W
Babel fish
Offline
Babel fish
W
Joined: Mar 2010
Posts: 57
This behavior is derived from the fact that the mIRC parser first tokenize the event parameters by the colons. Only after that mIRC checks for matchtext and such.

As a result something like on *:text:!foo:bar:#:{ will break.

An easy way around it is to use the $() identifier. We can put the colon after mIRC does it parsing, which will occur before the regex match takes place.

for example:

Code:
on $*:text:$(/^!foobar (? $+ $chr(58) $+ \S+)/):#:{
  ;code here.
}


Its a little obscure, but it works.

Alternatively, you can use a variable or an identifier:

Code:
assume %regex = /^!foobar (?:\S+)/

on $*:text:%regex:#:{
  ;code here.
}

or

alias re return /^!foobar (?:\S+)/
on $*:text:$($re):#:{
  ;code here
}





Last edited by Wiz126; 15/03/11 07:18 PM.
Joined: Jul 2007
Posts: 1,129
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Jul 2007
Posts: 1,129
Why not escape it:
Code:
/^!foo\:bar\S+/
Then you make the colon sign literal so it won't be associated with regex's special meaning. You can also use the octal char for it: \x3A or \72 represent the colon aka $chr(58)

Joined: Mar 2010
Posts: 57
W
Babel fish
Offline
Babel fish
W
Joined: Mar 2010
Posts: 57
Originally Posted By: Tomao
Why not escape it:
Code:
/^!foobar\:\S+/
Then you make the colon sign literal so it won't be associated with regex's special meaning.

This has nothing to do with regex, in fact he wants it to work with regex, not escape it. Like I said in my previous post its the way mIRC parses the event, also there is no escape sequences for match texts.

If you have something like: on $*:text:/^!foobar (?\:\S+)/:#:{

mIRC will parse is into "$*", "text", "/^!foobar (?\", "\S+)/", "?"
notice, at that point your matchtext is broken, this is done way before the matchtext is even being matched. as a result you simply cannot have colons in the matchtext at all.

Last edited by Wiz126; 15/03/11 08:03 PM.
Joined: Jul 2007
Posts: 1,129
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Jul 2007
Posts: 1,129
Originally Posted By: Wiz126
there is no escape sequences for match texts.
You're correct. I totally overlook that. But by using the octal char, it will work:
Code:
/^!foo\72bar\S+/
\x3A will do the same too.

Joined: Mar 2010
Posts: 57
W
Babel fish
Offline
Babel fish
W
Joined: Mar 2010
Posts: 57
Originally Posted By: Tomao
Originally Posted By: Wiz126
there is no escape sequences for match texts.
You're correct. I totally overlook that. But by using the octal char, it will work:
Code:
/^!foo\72bar\S+/


You missed what he was trying to do, he knows about hex, as he stated:

Quote:
Can give the hex for the colon if it is part of the pattern, but pcre also has ?: for noncapturing subpatterns.


He is not trying to use a colon as a character literal, he is trying to use it for a non-capturing pattern. for example (?:foobar), unlike (\x3Afoobar) which is literally ":foobar". In such case ':' must be a ':', you cannot escape it.

Last edited by Wiz126; 15/03/11 08:10 PM.
Joined: Jul 2007
Posts: 1,129
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Jul 2007
Posts: 1,129
from what I know ?: will still match a string but not capture it. ?! will negate a string.

Joined: Mar 2011
Posts: 23
S
Sherip Offline OP
Ameglian cow
OP Offline
Ameglian cow
S
Joined: Mar 2011
Posts: 23
Thank you very much, looks like exactly the info I need.

Do you know the answer to my other question, re which options are compiled into mirc's PCRE library?

Joined: Sep 2005
Posts: 2,881
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Sep 2005
Posts: 2,881
You're right about ?:, but I'm not sure what you mean by "negate a string."

?! is a negative lookahead assertion. If performs a zero-width match.

Joined: Jul 2007
Posts: 1,129
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Jul 2007
Posts: 1,129
Yes hixxy, your answer is more in-depth to the regex. Pardon me that I didn't word it better.
Code:
/hello\s(?!hixxy)\w+/
This will sort of negate a set of words that consists the name hixxy in it. That is what I meant by "negate."

Joined: Jul 2007
Posts: 1,129
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Jul 2007
Posts: 1,129
Originally Posted By: Sherip
which options are compiled into mirc's PCRE library?
http://www.pcre.org/

There's a heap lot of info there you can find.

Joined: Mar 2011
Posts: 23
S
Sherip Offline OP
Ameglian cow
OP Offline
Ameglian cow
S
Joined: Mar 2011
Posts: 23
No. There you can find the options that are available, not the ones that were selected when mirc's pcre library was built.

Joined: Sep 2005
Posts: 2,881
H
Hoopy frood
Offline
Hoopy frood
H
Joined: Sep 2005
Posts: 2,881
Try messaging Khaled, he usually responds to private messages but may take a while as he gets a lot!

Joined: Jul 2007
Posts: 1,129
T
Hoopy frood
Offline
Hoopy frood
T
Joined: Jul 2007
Posts: 1,129
Sending him an email might get a faster response. One time I had a question about my mIRC registration, I sent him an email and I received a reply almost a few hours later.

Joined: Feb 2006
Posts: 546
J
Fjord artisan
Offline
Fjord artisan
J
Joined: Feb 2006
Posts: 546
we can determine the state of most of the build-time options - the ones most relevant to scripters - through a bit of testing and reasoning:

  • --enable-utf8: $regex($chr(128), /^.$/) and $regex($chr(128), /(*UTF8)^.$/), returning 0 and 1 respectively, shows this option is enabled.
  • --enable-unicode-properties: $regex($chr(8712), /(*UTF8)^\p{Sm}$/) where $chr(8712) is '∈', a mathematical symbol denoting set membership, and '\p{Sm}' is a unicode property, the use of which is permitted only when that option is enabled.
  • --enable-newline-is-*: all of these options appear to not be enabled, with LF being the default new line indicator supported by mIRC's build of PCRE: $regex($+(a, $cr, a, $crlf), /a$/m) is 0 indicating neither CR nor CRLF signals the end of a line, whereas $regex(a $+ $lf, /a$/m) is 1 confirming LF to be the only acceptable line separator.
  • --enable-bsr-anycrlf: this is not enabled, $regex($chr(133), /\R/) = 1 where $chr(133) is the 'next line' character, a Unicode new line sequence, matched by \R only when this option is not enabled.
  • --with-link-size=2: we, as scripters, are not capable of forcing mIRC to pass PCRE an expression that's more than ~4kb. this is well within the 64KB limit implied by the default option value of '2', so i see no reason why it may have been modified.
  • --with-match-limit and --with-match-limit-recursion: i suspect the first limit is around 1,000,000 since $regex($str(a, 1412), /a+a+[b]/) returns 0, but $regex($str(a, 1413), /a+a+[b]/) returns -8 (PCRE_ERROR_MATCHLIMIT) - the number of backtracks being around 998,990 and 1,000,404 respectively. it may not be as clear cut as that though; $regex($str(a, 18), /(?:a+)+[bc]/) also returns -8 but demands far fewer backtracks than the previous example. as for the -recursion limit: this appears to be 999 as $regex($str(a, 999), /a(?R)|$/) and $regex($str(a, 999), (a)+) match successfully, but changing those 999s to 1000 produces a return value of -21 (PCRE_ERROR_RECURSIONLIMIT).


"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde
Joined: Mar 2011
Posts: 23
S
Sherip Offline OP
Ameglian cow
OP Offline
Ameglian cow
S
Joined: Mar 2011
Posts: 23
Thank you jaytea.

Possibly mirc's author should consider for future builds to use a default newline of ANYCRLF as well as the backslash-R option of anycrlf

I find those improve PCRE's default behavior in the Windows environment. Coincidentally they and their pattern override counterparts were implemented in PCRE in the last few years based on my own suggestions. cool

Of course if multiline processing is rarely if ever used it wouldn't matter very much. And in any event the pattern overrides are available to the user since mirc is using a recent PCRE version.

Regards,
Sheri

Joined: Dec 2002
Posts: 5,411
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,411
I originally disabled the BSR_ANYCRLF option to be conservative, since I was not sure what the side-effects would be for most users. If everyone thinks enabling it is a good thing, I will go ahead and do that for the next release.


Link Copied to Clipboard