Register Log In

Forums Bug Reports No UTF-8 encoding when pasting large text blocks

Print Thread

No UTF-8 encoding when pasting large text blocks #178741 14/06/07 01:52 AM
A alephresh
alephresh A	Hi, I wanted to write a script that alters a block of UTF-8 text given by the user. Since custom dialogs don't support Unicode, I had to use a custom window for this. (UTF-8 encoding for custom windows must be enabled.) The user pastes a block of text, and the script process each line and /alines it. This solution works perfectly for a few small lines of text, but it turns out that if you paste a big block of text, mIRC doesn't encode it in UTF-8. (It also beeps when this happens, perhaps to warn me that there's no UTF-8 encoding? But why not?) Anyway, here's a simple code snippet to show my point. It simply displays the values of the last three bytes of the text you enter in @test's editbox. Code: ; /window -e @test on *:INPUT:@test:{ var %last = $right($1-,3) aline -p @test Last three bytes: $asc($left(%last,1)) $& $asc($mid(%last,2,1)) $asc($mid(%last,3,1)) } Now enable UTF-8 encoding with "/font", and paste the following line in the window: Code: This is a line with a non-ASCII character which will be encoded in UTF-8: © The result is 32 194 169. "32" is the space before the copyright sign, and "194 169" is the UTF-8 representation of the copyright sign. Good. Now try pasting this: Code: This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © This is a line with a non-ASCII character which will be encoded in UTF-8: © (The line length and amount of lines are important, because it only happens after a certain threshold.) Even though this is the same line and UTF-8 is still enabled, you will get a different result than above (one for each line): 58 32 169. "58" is the colon, "32" is the space and "169" is a non-Unicode representation of the copyright sign. Bad. Thanks for reading, Rotem

Re: No UTF-8 encoding when pasting large text blocks #178742 14/06/07 02:06 AM
Joined: Dec 2002 Posts: 503 Melbourne, Australia B Bekar Fjord artisan
Bekar Fjord artisan B Joined: Dec 2002 Posts: 503 Melbourne, Australia	Oo, mIRC's inbuilt paste-flooding protection run amuk! Very thorough description and test scenario's!

Re: No UTF-8 encoding when pasting large text blocks Bekar #178751 14/06/07 04:11 AM
A alephresh
alephresh A	What paste-flooding protection? If you mean that warning window, it's unrelated...

Re: No UTF-8 encoding when pasting large text blocks #185126 05/09/07 11:59 AM
Joined: Dec 2002 Posts: 3,854 London, UK Khaled Hoopy frood
Khaled Hoopy frood Joined: Dec 2002 Posts: 3,854 London, UK	Thanks I was able to reproduce this issue. It is due to an internal limit on the length of the line that can be encoded. When pasting large amounts of text the limit was being exceeded, causing mIRC to skip the encoding. This should be fixed in the next version. This issue appears to be related to (or the same as) the issue reported here.

Link Copied to Clipboard