mIRC Home    About    Download    Register    News    Help

Print Thread
Codepages support - Reply to Khaled's sticky #227970 01/12/10 10:48 AM
Joined: Jan 2009
Posts: 6
M
MeiR_ct Offline OP
Nutrimatic drinks dispenser
OP Offline
Nutrimatic drinks dispenser
M
Joined: Jan 2009
Posts: 6
[Quote from: http://forums.mirc.com/ubbthreads.php?ubb=showflat&Number=223676&page=1#Post223676]:
Originally Posted By: Khaled

As some of you may have noticed, the new version of mIRC no longer supports codepages. This is because mIRC is now a Unicode application and will only send and receive UTF-8 encoded text. Ideally, the aim is to make IRC as a whole non-reliant on codepages, so that everyone will be able to view all characters in all languages.

In older versions of mIRC, every user had to manually configure each of their status/channel/query windows with a specific font and codepage to display one particular language, and this only worked for that language. If someone new joined the channel, or someone joined that channel and spoke a different language, text would appear as garbage to them and to others. The use of UTF-8 means that none of that is necessary any more. Everyone on IRC will be able to read everyone else's text, regardless of language, without any fiddling or configuring of codepages.

The issue with codepages is most obvious in the channels list window. The channels list window lists thousands of channel names and topics, many of which use codepage-specific encodings. However the window can only display text using one codepage. This means that channel names or topics not using that codepage will be displayed as garbage. The same issue applies to all channel and query windows where users chat in multiple languages.

While it may seem that mIRC v6.35 handles codepages correctly, there are a large number of situations where it cannot do so due to a lack of context, just like in the channels list window, and many users have reported issues, such as corrupted text and incorrect encodings, over the years. It would not be possible to add codepage support to mIRC v7.1 without leading to the same issues that were present in all previous versions of mIRC.

Some IRC clients support "hybrid" encodings that combine both UTF-8 and a specific codepage. Unfortunately if mIRC had done that it would have created even more confusion since the result would not have been valid UTF-8 and the codepage issues would have continued.

During the development of the Unicode version of mIRC, the choice was either to continue to support codepages, along with all the encoding issues that come with them that affect all users, and to perpetuate these problems for many years to come, or to try to resolve the issue once and for all, for IRC as a whole, by moving fully to UTF-8.

I realize this is going to be an issue for some users - there are many networks/servers/channels that use codepage-specific encodings. I am hoping that the benefit of UTF-8, which has been supported by mIRC since 2006, will be enough of an incentive to convince users to move away from codepages.


Dear Khaled,
I'm not trying to sound rude, but I think it is important to understand two things:
1. Clients are supposed to conform with servers, and not the other way around.
2. mIRC is very popular without doubt, but it is not the only client there is.

I really welcome your step, but unfortunately it came too early.
Before mentioning future development and progress of the rest of IRC world, let me please remind the current situation.

About servers:
IRCds don't have (and probably will never have) UTF support for nicknames and channel names, or config directives.
Let's take a case of a user with mIRC 7.*, that connects to some local country server that allows nicks in the local charset, for example Greek or Turkish.
This user won't be able to enter this kind of nick, although other will, since IRCds support only codepages in nicks. He just will be forced to use an English nick.
Same thing with channels, if there're already existing channels created by others, he just won't be able to join them.
Bottom line: IRC commands should be sent in codepages.

About clients:
Latest mIRC is whole Unicode, other clients still have also codepages support.
It will take a lot of time and work for developers of other clients to catch up with mIRC, and for meanwhile, communities should force all of other client users to switch to UTF, if they want mIRC users to be part of the conversation.
More than that, there're still considerable clients which don't have UTF support at all. That mainly includes bots (PHP\perl\etc.), browser-based clients (PJIRC, flash clients) or built-in client windows inside other applications (like SMIRC in eMule).
So even if we demand from users to switch to Unicode, some of them just won't be able, and obviously, no one can force the whole world to use mIRC.

Please, consider seriously the return of codepages support as an option in mIRC, along with Unicode support.
Each channel, community or network, will choose to how conduct themselves with character encoding, preferred language, and client recommendation for their users.


I truly assume that many mIRC users, including me, and of course server admins, will thank you and appreciate it.

As I said above, I believe that your idea is important, but Kahled, please let the free choice to decide for meanwhile.

Thanks a lot in advance,
Meir.

Re: Codepages support - Reply to Khaled's sticky [Re: MeiR_ct] #227971 01/12/10 11:15 AM
Joined: Oct 2004
Posts: 8,327
Riamus2 Offline
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,327
mIRC gave people years of support with both UTF8 and code pages. During that time, anyone who wanted to could have made the switch easily. There was plenty of time. Khaled has decided that mIRC is going to move toward the future now instead of providing support to outdated methods of communication (code pages) even if there are people and servers who have not made the switch. Anyone who absolutely cannot or refuses to switch can continue using mIRC 6.35 without a problem.

Khaled has stated that he's not bring back code page support and has given explanations multiple times. Bringing it up again is not likely to change his mind.

IRCDs *can* support UTF8 nicks and channels. Many do. It's true that for those that do not yet support that, it may take a lot of work and time to update the software to support it. They have had many years since Unicode became widespread to do so. Just because some have chosen to ignore where everyone else is headed doesn't mean that everyone else has to waste time and effort propping them up on their outdated software.

Also, most clients have supported Unicode for a long time just like mIRC has. Those that do not support it yet are again outdated. Outdated software isn't worth using, imo. There are plenty of choices that support Unicode if someone doesn't want to use mIRC and they can easily switch. Holding mIRC back because some clients don't support Unicode is not a good reason by any means.

Quote:
no one can force the whole world to use mIRC


By that same token, no one is forcing anyone to use mIRC. If someone wants code page support, they are free to use other clients or even older versions of mIRC. They aren't required to use the newer versions of mIRC.


Invision Support
#Invision on irc.irchighway.net
Re: Codepages support - Reply to Khaled's sticky [Re: MeiR_ct] #227982 01/12/10 08:47 PM
Joined: Oct 2003
Posts: 3,918
A
argv0 Offline
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
Dear MeiR_ct,

Originally Posted By: MeiR_ct
1. Clients are supposed to conform with servers, and not the other way around.


Prove it. IRC Servers have often conformed to clients, and IRC clients have often conformed to servers. Some servers explicitly disallow certain control codes in user information or channel names ever since colors were introduced *by IRC clients*. Other protocols have had similar push-and-pull (ever used HTTP recently? A lot of it was redefined by defacto usage) and server developers chose the pragmatic approach of implementing support for clients on many occasions. The world is not black and white, and IRCd developers should not choose this, of all fights, to harp on that argument.

Originally Posted By: MeiR_ct
2. mIRC is very popular without doubt, but it is not the only client there is.


You're right. This is why the change was made. Name another client that exclusively uses codepages? I know of none. mIRC added UTF-8 compatibility years ago specifically to be more compatible with Unicode clients, not less. 7.x is the same story. If anything, the only client mIRC is breaking compatibility with... is mIRC. The vast majority of the clients still using codepages are mIRC 6.x clients. Do you have data that offers different statistics?

Originally Posted By: MeiR_ct
I really welcome your step, but unfortunately it came too early.


It came as a surprise, but it certainly did not come "early". Unicode has been the standard encoding on the internet for many years now. Practically speaking, it's been *the* way to communicate on the web for at least the last 6. How are you qualifying "early"? When should codepage support be dropped? Note that this contradicts a claim you made below that "IRC commands should be sent in codepages."-- if you believe that statement, then Unicode should never come at all-- if this is your actual opinion, I don't think we're going to get anywhere.

Originally Posted By: MeiR_ct
About servers:
IRCds don't have (and probably will never have) UTF support for nicknames and channel names, or config directives.


A few IRC servers already do (I can name RusNet, off-hand). Although this already disproves your claim outright, let me elaborate: there is no reason why they can't. It's technically possible, and in fact solves the problem much more effectively than clients can, since servers can keep track of each user's codepages (and per channel) with server commands, they can also keep track of transcoding these encodings for clients so that the clients don't need to support any encoding besides their local one. This is something the client cannot do unless servers exposed a command to query the remote encoding for a specific user-- much more complicated (and still requires servers to do extra book-keeping anyway). Therefore implementing server-side encoding awareness to me seems like the ideal solution, something IRCd developers should start contemplating. Again, some already have. Therefore, it's possible, it's better, and there's no real reason why not, besides developer laziness, FUD, delegating responsibility ("the client should do it!") and crap about the RFC not explicitly saying it. Do you have an explicit reason why this is not possible?

Originally Posted By: MeiR_ct
Bottom line: IRC commands should be sent in codepages.


Says who? Are you the IRC god? IRC commands can (and are) sent in whatever encoding you want. How the data is interpreted is application defined-- in this case the IRCd can choose to handle it or not. Note that this is interpretation is as-per spec, if you're using one of the literalists using the RFC as your bible.

Originally Posted By: MeiR_ct
About clients:


You've made a lot of false claims about clients here, let's look at them:

Originally Posted By: MeiR_ct
It will take a lot of time and work for developers of other clients to catch up with mIRC


Every popular IRC client I know of already has UTF-8 support. More clients use UTF-8 now than codepages, if we're talking about this on a per-client basis (not usage per client). Every OSX client has zero codepage support, because OSX has zero codepage support (everything is Unicode). Every Java IRC client should also have builtin Unicode support, since Unicode is the standard encoding there too. Every popular Linux client I know of uses the system locale to get encoding support, which includes UTF-8. We've already discussed every IRC client for 2 operating systems and one entire language. It's safe to say that mIRC is the one catching up, not the other way around.

Originally Posted By: MeiR_ct
More than that, there're still considerable clients which don't have UTF support at all. That mainly includes bots (PHP\perl\etc.),


Bots, for the most part, don't need unicode support. They perform actions on set commands, which are (again for the most part) made up of purely ASCII characters. More importantly, those clients need to be updated anyway. Just as you argue that mIRC should support ENCODING XYZ where XYZ is a codepage, these bots should support ENCODING XYZ where XYZ is UTF-8.

Originally Posted By: MeiR_ct
browser-based clients (PJIRC, flash clients) or built-in client windows inside other applications (like SMIRC in eMule).


As I mentioned, Java uses Unicode, Flash probably uses the same. These clients should have no problem. Browser based (CGI/AJAX) clients are even easier, because UTF-8 is *THE* standard for web communication. PJIRC supports utf-8, and has for the last 6 years at least, so I don't know where you got this factoid from.

Originally Posted By: MeiR_ct
So even if we demand from users to switch to Unicode, some of them just won't be able, and obviously, no one can force the whole world to use mIRC.


Similarly, you can't force mIRC to support archaic encodings. This is a silly argument. Software moves forwards, not backwards. Nobody is forcing mIRC to be used, and I've mentioned a boatload of clients that support UTF-8 as well as, if not better than, mIRC. Use one of those, if you're having problems communicating with others on IRC. If not, this is a non-issue.



- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Re: Codepages support - Reply to Khaled's sticky [Re: argv0] #228011 03/12/10 01:23 AM
Joined: Jan 2009
Posts: 6
M
MeiR_ct Offline OP
Nutrimatic drinks dispenser
OP Offline
Nutrimatic drinks dispenser
M
Joined: Jan 2009
Posts: 6
argv0 and Riamus2, thanks for the time you have invested to reply.

I'm taking you back to a key sentence in my post:
Originally Posted By: MeiR_ct
Before mentioning future development and progress of the rest of IRC world, let me please remind the current situation.


All you said is right, and I support complete moving to Unicode.
But, as argv0 said, it came as a surprise! And therefore, there is a complete mess now, especially in local networks.of non-english countries.

Every fact or claim that I've mentioned about clients, was refered to command input, and not user messages input/display.
Even after you enable UTF encode/decode in all those major clients, basic commands like "nick" and "join" will [or should] be sent with codpages, as the way it was in mIRC until 6.35, since the vast majority of servers just won't understand anything else.
Have a look in here: http://searchirc.com/ircd-versions
And about bots etc., it's not trivial at all that, for example, a php written security bot, was programmed to recognize unicode nicknames, or to store them in a database, or to utf-encode its commands to the server.

You said that implementing utf nicks support in ircd is technically possible, and the reason it's not on the go, is developers' laziness.
If talking technically, let's take the simplest examples, such as nick prefixes, or colons. Do you have any idea how much implemention is needed to re-define structures, parameters syntax, arguments, etc.? And that's only for these few characters!
Well, you talked like it's only re-writing a bunch of few lines in the source code, while actually it's kind of starting from scratch in the majority of methods.

About sticking to 6.35, we should understand something. As soon as it gets to a release of higher version, aka upgrade, users will almost always update their software, and rightly; mIRC 7 is not only a matter of moving to Unicode. There are, and there will be new features, improvements, and of course, bug fixes.
It barely can be controlled by network owners.

So, also if assuming that in some past cases servers have conformed to clients, the current mess in the kind of servers that I've mentioned, should get considered.
And also if assuming that developers' laziness is the main obstacle in IRCds aspect, an extra time should be given to them, since the needed work is decent and far from being simple or immidiate.

I had no much time to fully answer to all your replies, but I assume this discussion, even if without briging benefits, will continue.

My main claim: Although the idea is great and a work should be done, mIRC could wait for it, and not to try and force the idea on IRCds as it did, ending with users get between the hammer and the anvil.

Re: Codepages support - Reply to Khaled's sticky [Re: MeiR_ct] #228018 03/12/10 07:39 AM
Joined: Oct 2003
Posts: 3,918
A
argv0 Offline
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
Originally Posted By: MeiR_ct
Even after you enable UTF encode/decode in all those major clients, basic commands like "nick" and "join" will [or should] be sent with codpages, as the way it was in mIRC until 6.35, since the vast majority of servers just won't understand anything else.


Err, no, this is not true. It's more likely that these clients will simply use a single encoding for all of their IO. As I pointed out, all of the clients I've seen on OSX only understand one encoding: Unicode. I highly doubt they behave the way you're describing (or wishing). Furthermore, there's no such rule that "commands must be sent using codepages". It's not in the RFC, it doesn't exist in reality either.

Originally Posted By: MeiR_ct


Those statistics are vastly oversimplified. Many networks will use one of those ircds as a base, but perform many customizations on it that don't get detected as a separate version. The number of networks that understand unicode commands are growing daily.

Besides, this isn't even a valid comparison. Even if most servers don't yet understand unicode commands-- what are we comparing this to? Codepaged commands? I don't know of *ANY* server that accepts nicknames using codepages, specifically because it's impossible to know what codepage you want to be using (actually, RusNet might support this, but RusNet has a robust encoding system, so internally it's probably not storing codepage data at all). I think you're confusing this with servers accepting *BINARY DATA* for *CHANNELS* (not nicknames, those are usually strictly defined in the ASCII range). This is indeed an issue, but it's not a server issue. It has to do with the fact that mIRC assumes everything the server is sending is UTF-8. This could be fixed without strictly supporting codepages, mIRC just needs to be more aware about how it reads channel names from a server. Occasionally this is impossible, but for the most part, mIRC could be smarter about the way it does things with channel names. Again, this has nothing to do with supporting codepages, though.

Originally Posted By: MeiR_ct
You said that implementing utf nicks support in ircd is technically possible, and the reason it's not on the go, is developers' laziness.
If talking technically, let's take the simplest examples, such as nick prefixes, or colons. blah blah blah majority of methods.


Are you an IRCd developer? You don't sound like one. Colons are part of the IRC protocol. Enabling UTF-8 doesn't change the fact that colons are not allowed in nicknames. Furthermore, the ":" is a single byte character in UTF-8, so nothing would change here, even if it was magically allowed in the spec. "Supporting unicode" does not mean "any character is allowed". Also, please don't put words in my mouth, because I never claimed it was easy. I actually know that it's not easy, but I don't really care. Why should I care that it's not easy? Is that some kind of valid excuse? I wouldn't think so. So let's move on.

Originally Posted By: MeiR_ct
As soon as it gets to a release of higher version, aka upgrade, users will almost always update their software


Ahh, the irony. You were paying attention to my previous response, weren't you? I've gathered data on this very issue before when investigating the number of users still on codepages. The vast majority in channels with codepage encodings are users on mIRC. Remember, 7.x users can't join these codepage channels, so we're talking 6.x. In fact, the data I collected showed that most users on this channel averaged an mIRC version of ~6.2 (I saw enough 5.x's to make you cry). People aren't upgrading. That is precisely the problem. If every 6.x user upgraded to 7.x *TODAY*, this discussion would be over, because there would be no codepage problem. mIRC is virtually the last client to still be using codepages. The other clients that use codepages are doing so for compatibility with mIRC only. If you visited that PJIRC link in my last response you would have seen that the conversation was exactly about trying to communicate with mIRC users who could not use UTF-8 (because PJIRC defaults to UTF-8). The conversation took place in 2004.

For what it's worth, mIRC 6.16+ supports UTF-8, and mIRC 7.x supports disabling UTF-8 outright. That means 2 things:

1) Most 6.x users can start using UTF-8. Tell your friends to enable it.
2) 7.x users can disable UTF-8 if they really want to talk to their 6.x friends.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Re: Codepages support - Reply to Khaled's sticky [Re: MeiR_ct] #228019 03/12/10 08:33 AM
Joined: Nov 2009
Posts: 81
V
Voglea Offline
Babel fish
Offline
Babel fish
V
Joined: Nov 2009
Posts: 81
Just make multibyte /breplace (for unicode symbols) and will not have problems with codepages; by proxy (mirc script) changing codepage "on fly". smile

Re: Codepages support - Reply to Khaled's sticky [Re: argv0] #228023 03/12/10 01:13 PM
Joined: Apr 2003
Posts: 342
M
MeStinkBAD Offline
Fjord artisan
Offline
Fjord artisan
M
Joined: Apr 2003
Posts: 342
Originally Posted By: argv0
Err, no, this is not true. It's more likely that these clients will simply use a single encoding for all of their IO. As I pointed out, all of the clients I've seen on OSX only understand one encoding: Unicode. I highly doubt they behave the way you're describing (or wishing). Furthermore, there's no such rule that "commands must be sent using codepages". It's not in the RFC, it doesn't exist in reality either.


Every OSX client support both Unicode as well as other encoding standards. The users decides what encoding to use. There's a screenshot under the UTF-8 topic started by Jay_tea.

Most people don't care. But mIRC should allow the user to select the proper encoding if so desired. It does not. And people are not going to migrate to version 7 just for Unicode support.


Beware of MeStinkBAD! He knows more than he actually does!
Re: Codepages support - Reply to Khaled's sticky [Re: MeiR_ct] #228024 03/12/10 02:22 PM
Joined: Dec 2002
Posts: 4,521
Khaled Offline
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 4,521
While I appreciate your comments on this issue, it has been discussed many times already - please refer to the previous threads on this topic. I am sorry but there are no plans to revert to the old, cumbersome, complex, and unworkable codepage support that wase the cause of so many issues in previous versions, as explained in my post. Thanks for your comments everyone.