Originally Posted By: bwuser
Originally Posted By: Collective
There is no charset detection. mIRC uses Unicode, in which the first 255 characters are the same as those in ISO-8859-1.

No! Still wrong!
Originally Posted By: http://en.wikipedia.org/wiki/UTF-8
The first 128 characters of the Unicode character set (which correspond directly to the ASCII) use a single octet with the same binary value as in ASCII.

I'm not sure what point you're trying to make by quoting that. I never said that characters between 128-255 were sent unencoded. I said that when they were receieved (UTF8-encoded or otherwise) they'd be displayed as Unicode, and that mIRC does not seek to override their display based on some codepage detection mechanism.

Quote:
Why would it decode UTF-8? It probably turns it into UTF-16 for internal use, but I wouldn't call that decoding.

"Decode" seems a perfectly reasonable term here. Don't make me get out a dictionary.

Quote:
Also, more importantly, ISO-8859-1 encoded umlauts are NOT valid UTF-8. If interpreted as UTF-8 without any conversion (which is what you suggested)

That isn't what he suggested. He suggested invalid UTF-8 sequences would not be decoded (or "converted", as you put it). This is not entirely true (and hence neither is my assertation above) due to this bug, however it is true for invalid sequences that bug doesn't affect. Put another way: bytes that are not part of a valid UTF-8 sequence are treated as characters.