mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Jun 2007
Posts: 35
Ameglian cow
OP Offline
Ameglian cow
Joined: Jun 2007
Posts: 35
Hi,

I wanted to write a script that alters a block of UTF-8 text given by the user. Since custom dialogs don't support Unicode, I had to use a custom window for this. (UTF-8 encoding for custom windows must be enabled.) The user pastes a block of text, and the script process each line and /alines it.

This solution works perfectly for a few small lines of text, but it turns out that if you paste a big block of text, mIRC doesn't encode it in UTF-8. (It also beeps when this happens, perhaps to warn me that there's no UTF-8 encoding? But why not?)

Anyway, here's a simple code snippet to show my point. It simply displays the values of the last three bytes of the text you enter in @test's editbox.

Code:
; /window -e @test
on *:INPUT:@test:{
   var %last = $right($1-,3)
   aline -p @test Last three bytes: $asc($left(%last,1)) $&
     $asc($mid(%last,2,1)) $asc($mid(%last,3,1))
}

Now enable UTF-8 encoding with "/font", and paste the following line in the window:

Code:
This is a line with a non-ASCII character which will be encoded in UTF-8: ©

The result is 32 194 169. "32" is the space before the copyright sign, and "194 169" is the UTF-8 representation of the copyright sign. Good.

Now try pasting this:

Code:
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©
This is a line with a non-ASCII character which will be encoded in UTF-8: ©

(The line length and amount of lines are important, because it only happens after a certain threshold.)

Even though this is the same line and UTF-8 is still enabled, you will get a different result than above (one for each line): 58 32 169. "58" is the colon, "32" is the space and "169" is a non-Unicode representation of the copyright sign. Bad.

Thanks for reading,

Rotem


Desired: right alignment of text; consecutive spaces in /command args; Ctrl+A in custom dialogs.
Joined: Dec 2002
Posts: 503
B
Fjord artisan
Offline
Fjord artisan
B
Joined: Dec 2002
Posts: 503
Oo, mIRC's inbuilt paste-flooding protection run amuk!

Very thorough description and test scenario's!

Joined: Jun 2007
Posts: 35
Ameglian cow
OP Offline
Ameglian cow
Joined: Jun 2007
Posts: 35
What paste-flooding protection?

If you mean that warning window, it's unrelated...


Desired: right alignment of text; consecutive spaces in /command args; Ctrl+A in custom dialogs.
Joined: Dec 2002
Posts: 5,411
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,411
Thanks I was able to reproduce this issue. It is due to an internal limit on the length of the line that can be encoded. When pasting large amounts of text the limit was being exceeded, causing mIRC to skip the encoding. This should be fixed in the next version. This issue appears to be related to (or the same as) the issue reported here.


Link Copied to Clipboard