Register Log In

Forums Scripts & Popups [7.64] UTF related issue

Print Thread

[7.64] UTF related issue #268480 21/02/21 05:33 PM
Joined: Nov 2004 Posts: 842 I live inside your computer. S... Jigsy OP Hoopy frood
OP Jigsy Hoopy frood Joined: Nov 2004 Posts: 842 I live inside your computer. S...	Again, another issue that's been more noticable since upgrading to 7.64. For a while now, I've been puzzled as to why searching for certain Japanese words (/g, /jisho, etc.) would point me to page with gibberish. I believe there's a weird inconsistancy in isalpha and isalnum when it comes to certain UTF characters. For example: 語 is considered by mIRC to be an alpha character, yet ご is not. Code jisho { url -a $+(http://jisho.org/,$iif($1-,$+(search/,$htmlhex($v1)))) } htmlhex { if ($1-) { var %i = 1, %x while (%i <= $len($1-)) { if ($mid($1-,%i,1) isalnum) { var %x = %x $+ $v1 } else { var %x = %x $+ $chr(37) $+ $base($asc($mid($1-,%i,1)),10,16,2) } inc %i } return %x } } ; $htmlhex(日本語!) -> 日本語%21 ; $htmlhex(にほんご) -> %306B%307B%3093%3054 (however this pointed me to the above image) What do you do at the end of the world? Are you busy? Will you save us?

Re: [7.64] UTF related issue Jigsy #268481 21/02/21 05:57 PM
Joined: Jan 2004 Posts: 2,127 maroon Hoopy frood
maroon Hoopy frood Joined: Jan 2004 Posts: 2,127	This looks like it does what you want, and is probably faster to use regsubex than go through a scripted loop. I assume the definition of alnum you're needing is the case-insensitive base36 alphabet. This replicates your "good" example, but I'm not sure either case handles a codepoint in the range 256-4095 which is a 3-digit hex number, or non-alnum in the 33-126 range? Instead of this simplistic substitution pattern, it may need to call a $myalias($asc(\t)) to handle different styles in different ranges. If it needs to be encoding each UTF8 character separately, remove the /u flag. Code //tokenize 32 にほんご \| var %i 1 \| echo -a $regsubex(foo,$1-,/([^0-9A-Za-z])/gu,$chr(37) $+ $base($asc(\t),10,16,2))

Link Copied to Clipboard