mIRC Homepage
Posted By: Lusht Some mIRC bugs with cyrillics - 18/08/07 10:03 AM
Subj:

Why identifiers "$regex" do not support cyrillics as a symbolical class?
For example: $regex(тест,/(\w)/g) returns 0 though should return 4.
*"тест: may be translated as "test"
Posted By: Bekar Re: Some mIRC bugs with cyrillics - 18/08/07 10:25 AM
Sounds like bad internationalization implemented in the PCRE libraries.. :-|
Posted By: qwerty Re: Some mIRC bugs with cyrillics - 18/08/07 10:29 AM
mirc is not fully Unicode-enabled yet. Another thing you should know however is that mirc provides regex support through the PCRE library, whose manual states that \w only matches A-Z, a-z, 0-9 and _ by design. It doesn't match letters in other languages (this is explained in the manual). To match those, one must use the \p{xx} syntax. So even if mirc fully supported Unicode (and enabled Unicode support in PCRE), you'd have to use something like \p{L} instead of \w.

An alternative would be locales, which PCRE already supports, but simultaneous usage of Unicode/UTF-8 and locales is discouraged. If mirc utilised PCRE's locale support, you'd be able to use \w to match Cyrillic letters, but your script wouldn't work correctly on systems with different locales.
Posted By: Lusht Re: Some mIRC bugs with cyrillics - 18/08/07 11:17 AM
thanks a lot.
© mIRC Discussion Forums