Hi,

I wanted to write a script that alters a block of UTF-8 text given by the user. Since custom dialogs don't support Unicode, I had to use a custom window for this. (UTF-8 encoding for custom windows must be enabled.) The user pastes a block of text, and the script process each line and /alines it.

This solution works perfectly for a few small lines of text, but it turns out that if you paste a big block of text, mIRC doesn't encode it in UTF-8. (It also beeps when this happens, perhaps to warn me that there's no UTF-8 encoding? But why not?)

Anyway, here's a simple code snippet to show my point. It simply displays the values of the last three bytes of the text you enter in @test's editbox.

Code:
; /window -e @test
on *:INPUT:@test:{
   var %last = $right($1-,3)
   aline -p @test Last three bytes: $asc($left(%last,1)) $&
     $asc($mid(%last,2,1)) $asc($mid(%last,3,1))
}

Now enable UTF-8 encoding with "/font", and paste the following line in the window:

Code:
This is a line with a non-ASCII character which will be encoded in UTF-8: ©

The result is 32 194 169. "32" is the space before the copyright sign, and "194 169" is the UTF-8 representation of the copyright sign. Good.

Now try pasting this:

Code:
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©

(The line length and amount of lines are important, because it only happens after a certain threshold.)

Even though this is the same line and UTF-8 is still enabled, you will get a different result than above (one for each line): 58 32 169. "58" is the colon, "32" is the space and "169" is a non-Unicode representation of the copyright sign. Bad.

Thanks for reading,

Rotem


Desired: right alignment of text; consecutive spaces in /command args; Ctrl+A in custom dialogs.