Originally Posted By: Collective
There is no charset detection. mIRC uses Unicode, in which the first 255 characters are the same as those in ISO-8859-1.

No! Still wrong!
Originally Posted By: http://en.wikipedia.org/wiki/UTF-8
The first 128 characters of the Unicode character set (which correspond directly to the ASCII) use a single octet with the same binary value as in ASCII.


Originally Posted By: Wims
Oh yeah my bad, then it might be because mirc recognize an invalid sequence of utf-8 and then don't decode it.

Why would it decode UTF-8? It probably turns it into UTF-16 for internal use, but I wouldn't call that decoding.
Also, more importantly, ISO-8859-1 encoded umlauts are NOT valid UTF-8. If interpreted as UTF-8 without any conversion (which is what you suggested), the characters turn into gibberish, not the correct visual representation.

So anybody? Khaled? frown

To clarify: All I want to know is what mIRC does when it encounters an input sequence that is not valid UTF-8.

Last edited by bwuser; 03/08/10 05:54 PM.