Whoo! Here's a little newbie FAQ I've written to help people who are perhaps a little confused by it all. It's been put up at www.mirc.net/newbie/unicode.php but I figured I'd include a copy here :-)

---
What is Unicode?

Unicode is a way for all kinds of text to be displayed across the Internet. You will find that complex characters, such as those used in Chinese, Japanese and Korean writing, will not display with the current character encoding, ASCII. You'll get a bunch of black boxes or question marks in place of the actual characters. Unicode means that people can communicate on IRC in their own language, or (just for fun) use weird symbols they couldn't use before, both in their messages and in nicknames. Due to the extremely fast growth of Internet popularity, and indeed the diversity of those that use mIRC, Unicode is becoming an extremely popular feature in many programmes.

For these characters to appear in nicknames or channels, the IRC server you're connected to must support it. At this time, very few IRC servers do. This does NOT however affect your ability to send/receive text.

What is UTF-8?

UTF-8 is basically what makes these Unicode characters able to display on your screen. UTF stands for Unicode Transformation Format. The "8" shows that it's an 8-bit character encoding. It uses 1-4 bytes for each character. There are other character encodings for Unicode, such as UTF-7, UTF-16 and UTF-32. These are either obsolete or simply unpopular (and unnecessary), with UTF-8 being the usual chosen standard for translating Unicode. It is also backwards compatible with ASCII, which just means that it fully supports ASCII AND Unicode.

How does this affect my mIRC?

UTF-8 support was added in mIRC version 6.17. This version, or higher, must be used for these characters to display correctly. Support has been added to status, channel, query and DCC windows, as well as the nicklist, titlebar, switchbar and tool tips. Various other windows have also got support for display/encoding of UTF-8 text.

For the average English-speaking user (or anyone else who uses a language that displays fine with ASCII), there's not a huge change. You probably won't need to enable UTF-8 support. However, others may still try to speak to you with their own language and therefore use Unicode characters. If you want to view these characters correctly, rather than them appearing as black boxes, you can enable UTF-8 support. There is no harm in doing so whether you intend to use it or not. Assuming UTF-8 support becomes more widely implemented on IRC servers, this may become a more essential feature in the future.

How can I use this in mIRC?

This feature is enabled by default. However, should you need to enable it for some reason, go to Tools > Options > IRC > Messages and check the box saying 'UTF-8 display'. Individual window support is also available via the Fonts dialog. Right click on the channel name in the switchbar and click on 'Font...' and from the drop down list choose 'Display and encode' or 'Display only'.

Note, mIRC the application does not support Unicode, it can only display it or encode outgoing messages into it in IRC channel windows, or DCC chat windows, etc. Therefore, Unicode characters will not display correctly in various text boxes throughout mIRC, such as the ones found in the Address Book, /uwho or the Options dialog. For characters to display correctly in the editbox, you will need to enable the 'Multibyte editbox' option, again in Tools > Options > IRC > Messages.

It is worth noting that enabling 'UTF-8 Display' in the options dialog will not automatically make your messages encoded to UTF-8 - you need to enable this separately in the Font dialog, by choosing 'Display and encode' from the dropdown menu.

Will this affect my scripts?

Not really. Protection scripts which ban "$nick" will work fine as usual, as with any other identifiers or scripts which set variables containing nicknames, text or channel names or uses such data in other ways.

Has extra support been added in scripting related to UTF-8?

Of course! Three identifiers have been added should you need them - $utfencode, $utfdecode and $isutf.

$utfencode and $utfdecode will encode/decode given text respectively. $isutf returns a numerical value based on the state of the provided text, where 0 means the text is not UTF-8, 1 means it is plaintext (e.g. just normal English characters), and 2 means it is UTF-8 text (e.g. Chinese/Japanese/Korean writing, but could be many other languages too).

Okay, so why does it not display right on the network I use?

Text should display fine, provided you have UTF-8 enabled and are using mIRC 6.17 or above. You cannot include Unicode chars in nicknames or channel names however, unless the server you use supports UTF-8. You will notice that a bunch of jargon text appears when you connect to IRC - part of this is the settings for the server, for example, the max number of channels you can join (MAXCHANS=15) or the max length of topics (TOPICLEN=400). If you see "CHARSET=utf-8", then nicknames and channels should show up fine. If it does not (and none of the major networks do support it), then you are limited to text only. For an example of a network which does support CHARSET=utf-8 though, try /server irc.unilang.org.

Furthermore, the font you use within mIRC may not be able to display Unicode characters correctly. Fixedsys, mIRC's default font, is a poor choice for Unicode display. There is no specific "Unicode font", but there are a number of current fonts which are deemed acceptable (all of which should be available in your default Font dialog, View > Font):

Lucida Sans Unicode
Times New Roman
Arial
Courier
Impact
Shruti
MS Gothic
Bitstream Cyberbit (not available by default)

Bitstream Cyberbit is available from http://ftp.netscape.com/pub/communicator/extras/fonts/windows/Cyberbit.ZIP, and documentation concerning it can be found at http://ftp.netscape.com/pub/communicator/extras/fonts/windows/ReadMe.htm

Fonts vary in their ability to encode/display Unicode, but the ones mentioned above are popular choices, and you should not run in to too many problems.

Where can I get more information?

Plenty of information is available on the web, including these good links:

http://www.unicode.org/ - The Unicode homepage.
http://en.wikipedia.org/wiki/Unicode - Wikipedia article on Unicode.
http://en.wikipedia.org/wiki/UTF-8 - Wikipedia article on UTF-8.
http://www.microsoft.com/typography/default.mspx - Microsoft's Typography website.
http://www.google.com - A brilliant search engine!
---

If you have something not too advanced to add, a correction to any wrong information (I'm no guru!) or anything else to say, please send me a private message rather than fill up this thread. Cheers!

Regards,


Mentality/Chris