Quote:
Be honest, did you ever type € ?


May I answer this question with a question? Did you ever type ЙЮ,ЁОТ?

Because that text includes the exact same character combination as what you just posted above, but using the Cryllic ANSI codepage instead of Western. I don't know Russian or any Cryllic languages, but I'll bet that combination is far more likely to them, and that's just one of many other codepages.

The problem here is that your solution only works for people who speak to you using utf-8 or in english rather than ANSI codepages, the latter still being common among russian/japanese/chinese IRC'ers. Again, what's "uncommon" to you is not so uncommon to others. UTF-8 doesn't specifically choose "uncommon" character combinations, it just utilizes the ANSI code space to encode the code points. A lot of people actually *use* the ANSI code space, and not just for fancy themed output, but to actually communicate.

The better way to handle this really is to treat words individually, that way if one word breaks the encoding, it won't affect the rest of the line, and it will most accurately represent the expected line. This will give you what you want as well. Neither solution will work 100% of the time, but rather than excluding a whole demographic of non-english speakers from this new feature, encoding on a word-by-word basis would fail less while still making it work for everybody. Frankly, I can't even imagine many edge cases where the "word by word" encoding would fail.. only if the line was one big word, or if the themed output did not space out the input text (which is generally uncommon in itself).


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"