while it looks like the "auto-complete before a UTF8 string screwing up the text by shifting the UTF8 one byte" is fixed (hurray), auto-complete now has an interesting new bug:

When writing something in UTF8, let's say UTF8 encoded japanese, tab auto-complete on channel names and nicknames only works before any non-ascii characters. Any ascii characters after the first UTF8 characters will not be caught by tab auto-complete.

I assume this is because once the 'UTF8 input' flag is turned on, it is not turned off anymore when ascii is input, seeing how the first 127 ascii characters are the same as the first 127 UTF8 glyphs. However, because it's not turned off on 'simple' (single byte) input it does mean that autocomplete behaves counter intuitive right now. A partial implementation for tab-autocomplete for single byte UTF8 strings should be a trivial extension on the current implementation.

Alternatively, auto-complete could be extended to work on any random UTF8 string, although for spaceless languages (Chinese, Japanese, etc) that would require a fairly fancy word boundary detection algorithm just to figure out what is actually being autocompleted. Alternatively it would require language-specific tokenisers hooked up to mirc just to make the autocomplete work, which would probably make mirc's size quite a bit bigger for a feature a large part of the mirc using community couldn't care less about... so if I could recommend a proper bugfix, just extend auto-complete to operated on the ascii block of UTF8 too.

- Pomax
nihongoresources.com

Last edited by Pomax; 13/11/07 01:43 PM.