Register Log In

Forums Bug Reports No UTF-8 encoding when pasting large text blocks

Print Thread

No UTF-8 encoding when pasting large text blocks #178741 14/06/07 01:52 AM
A alephresh
alephresh A	Hi, I wanted to write a script that alters a block of UTF-8 text given by the user. Since custom dialogs don't support Unicode, I had to use a custom window for this. (UTF-8 encoding for custom windows must be enabled.) The user pastes a block of text, and the script process each line and /alines it. This solution works perfectly for a few small lines of text, but it turns out that if you paste a big block of text, mIRC doesn't encode it in UTF-8. (It also beeps when this happens, perhaps to warn me that there's no UTF-8 encoding? But why not?) Anyway, here's a simple code snippet to show my point. It simply displays the values of the last three bytes of the text you enter in @test's editbox. Code: ; /window -e @test on *:INPUT:@test:{ var %last = $right($1-,3) aline -p @test Last three bytes: $asc($left(%last,1)) $& $asc($mid(%last,2,1)) $asc($mid(%last,3,1)) } Now enable UTF-8 encoding with "/font", and paste the following line in the window: Code: This is a line with a non-ASCII character which will be encoded in UTF-8: © The result is 32 194 169. "32" is the space before the copyright sign, and "194 169" is the UTF-8 representation of the copyright sign. Good. Now try pasting this: Code: This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © (The line length and amount of lines are important, because it only happens after a certain threshold.) Even though this is the same line and UTF-8 is still enabled, you will get a different result than above (one for each line): 58 32 169. "58" is the colon, "32" is the space and "169" is a non-Unicode representation of the copyright sign. Bad. Thanks for reading, Rotem

Entire Thread
Subject	Posted By	Posted
No UTF-8 encoding when pasting large text blocks	Anonymous	14/06/07 01:52 AM
Re: No UTF-8 encoding when pasting large text blocks	Bekar	14/06/07 02:06 AM
Re: No UTF-8 encoding when pasting large text blocks	Anonymous	14/06/07 04:11 AM
Re: No UTF-8 encoding when pasting large text blocks	Khaled	05/09/07 11:59 AM

Link Copied to Clipboard