mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Aug 2007
Posts: 42
L
Lusht Offline OP
Ameglian cow
OP Offline
Ameglian cow
L
Joined: Aug 2007
Posts: 42
Subj:

Why identifiers "$regex" do not support cyrillics as a symbolical class?
For example: $regex(тест,/(\w)/g) returns 0 though should return 4.
*"тест: may be translated as "test"


"Do, or do not. There is no 'try'." - Yoda ('The Empire Strikes Back')
Joined: Dec 2002
Posts: 503
B
Fjord artisan
Offline
Fjord artisan
B
Joined: Dec 2002
Posts: 503
Sounds like bad internationalization implemented in the PCRE libraries.. :-|

Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
mirc is not fully Unicode-enabled yet. Another thing you should know however is that mirc provides regex support through the PCRE library, whose manual states that \w only matches A-Z, a-z, 0-9 and _ by design. It doesn't match letters in other languages (this is explained in the manual). To match those, one must use the \p{xx} syntax. So even if mirc fully supported Unicode (and enabled Unicode support in PCRE), you'd have to use something like \p{L} instead of \w.

An alternative would be locales, which PCRE already supports, but simultaneous usage of Unicode/UTF-8 and locales is discouraged. If mirc utilised PCRE's locale support, you'd be able to use \w to match Cyrillic letters, but your script wouldn't work correctly on systems with different locales.

Last edited by qwerty; 18/08/07 10:46 AM.

/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Aug 2007
Posts: 42
L
Lusht Offline OP
Ameglian cow
OP Offline
Ameglian cow
L
Joined: Aug 2007
Posts: 42
thanks a lot.


"Do, or do not. There is no 'try'." - Yoda ('The Empire Strikes Back')

Link Copied to Clipboard