mIRC Home    About    Download    Register    News    Help

Topic Options
#264234 - 11/11/18 07:07 PM Unicode discrepancy
Wims Offline
Planetary brain

Registered: 31/07/06
Posts: 3436
Loc: France
I am aware that mIRC is not 100% unicode compliant.
For example
Code:
//if ( == ) echo -ag ok
is false.

I'm also aware this particular issue/example above with case folding is on the to-do list case folding & on the todo list, maybe argv0 could help you there.

Now in unicode, it's possible to render the same character using different bytes:

caf\u00E9 and cafe\u0301 should both be "caf".

With the font consolas, both are displayed correctly as "caf".
With the font fixedsys (which does have the glyph for '', no font-linking should be applied?), the \u0301 version does not show "caf", just "cafe", is this a bug or am I missing something?


This topic also implies unicode normalization, to be able to compare the string caf\u00E9 and cafe\u0301, we need unicode normalization, is this also on the to do list somewhere? I can see you're pretty busy with adding multi language support in mIRC but multi-language is related to unicode somehow, I believe it would be a good idea to get mIRC to be more unicode compliant first.
_________________________
Looking for a good help channel about mIRC? Check #mircscripting @ irc.swiftirc.net

Top
#264236 - 11/11/18 07:50 PM Re: Unicode discrepancy [Re: Wims]
maroon Offline
Hoopy frood

Registered: 12/01/04
Posts: 969
If the \u0301 symbol were added to the clipboard then pasted into the editbox of a channel whose font is Segoe UI Symbol, the \u0301 does correctly appear in the editbox with the accent mark over the 'e'.

Code:
//clipboard caf $+ $chr($base(00e9,16,10)) cafe $+ $chr($base(0301,16,10))


But as soon as you press <enter> and the message appears in channel, the text in the channel message area appears as Wims described, as if a normal 'e'.

I wonder if this is also causing what I see when trying to display chess pieces in #channel or in a @picture window.

Quote:
//clipboard $replace(R N B Q K B N R,R,$chr(9814),N,$chr(9816),B,$chr(9815),Q,$chr(9813),K,$chr(9812))


If I paste the above chess clipboard into the editbox of a #channel whose font is Consolas or most other fonts, the editbox shows the symbols correctly. However when I press <enter> to display them in #channel, or draw them into a @picture, the icons are instead displayed with a poor replica, where the queen icon has a few low-resolution vertical spikes like Bart Simpson's haircut. Only when I change to a few fonts like MS Gothic do the symbols look like they do in the editbox, or when I paste those chess pieces into Notepad using Consolas.

In looking at /run charmap, it seems that the poor quality chess symbols exist in Fixedsys Excelsior 3.01, so mIRC is linking from that instead of getting them from where-ever Windows gets them from.

So I'm guessing that somewhere in the mIRC font-linking list, there's a font that shows a non-accented glyph for this?

Is it possible to edit the content/sequence of mIRC's own font-linking choices, or do I need to uninstall Excelsior 3.01 to stop that linking? Or if I do uninstall 3.01, is it going to use v2 or Excelsior NoLiga instead of Excelsior Regular? Excelsior are the only fonts that show the color control codes in the scripts editor using helpful symbols like [b] [c] [o].

There is at least 1 exception where the editbox gets it wrong. When I change channel font to "Emoji One Color", everything I type in the editbox is all compressed on top of each other, but when i press <enter> the text displays normally into channel.

Top
#264237 - 11/11/18 07:56 PM Re: Unicode discrepancy [Re: Wims]
Raccoon Online
Hoopy frood

Registered: 18/02/03
Posts: 2501
plain old FIXEDSYS will never display anything outside of 1-255, except for font linking where capable.

U+0301 is outside of 1-255. In absence of font-linking, nothing could be displayed.

In Fixedsys Excelsior 3.01 NoLig, it looks goofy/broken, because ligaturization is purposefully damaged. I think that's why it doesn't work at least.
_________________________
doin� things a particle can

Top
#264238 - 11/11/18 08:29 PM Re: Unicode discrepancy [Re: Raccoon]
Wims Offline
Planetary brain

Registered: 31/07/06
Posts: 3436
Loc: France
You're misunderstanding, U+0301 is from the Combining Diacritical Marks block, this is not trying to render that character/glyph: https://en.wikipedia.org/wiki/Combining_character

To quote:

Quote:
combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents).

Unicode also contains many precomposed characters, so that in many cases it is possible to use both combining diacritics and precomposed characters, at the user's or application's choice. This leads to a requirement to perform Unicode normalization before comparing two Unicode strings and to carefully design encoding converters to correctly map all of the valid ways to represent a character in Unicode to a legacy encoding to avoid data loss.[1]
the letter e followed by a "COMBINING ACUTE ACCENT" should render the character, which is in the fixedsys font, or if you want a font with a better unicode support than fixedsys, check segoe ui symbol, for the same issue.
_________________________
Looking for a good help channel about mIRC? Check #mircscripting @ irc.swiftirc.net

Top
#264239 - 11/11/18 08:31 PM Re: Unicode discrepancy [Re: Wims]
Khaled Offline


Planetary brain

Registered: 04/12/02
Posts: 4295
Loc: London, UK
These issues are all on the to-do list. As mentioned in my previous post on this topic, this is a huge project and will involve significant changes throughout mIRC's codebase. It is far more complex, and will take vastly more work, than multilingual support, which itself is a challenge.

Top
#264241 - 11/11/18 08:44 PM Re: Unicode discrepancy [Re: maroon]
Wims Offline
Planetary brain

Registered: 31/07/06
Posts: 3436
Loc: France
I explained on IRC that the editboxes are handled by Windows, Windows is choosing (font-linking) the (what seems to be, lot of font are similar) MS gothic here, there's no issue with that.

Quote:
if I paste the above chess clipboard into the editbox of a #channel whose font is Consolas or most other fonts, the editbox shows the symbols correctly. However when I press <enter> to display them in #channel, or draw them into a @picture, the icons are instead displayed with a poor replica, where the queen icon has a few low-resolution vertical spikes like Bart Simpson's haircut. Only when I change to a few fonts like MS Gothic do the symbols look like they do in the editbox, or when I paste those chess pieces into Notepad using Consolas.

In looking at /run charmap, it seems that the poor quality chess symbols exist in Fixedsys Excelsior 3.01, so mIRC is linking from that instead of getting them from where-ever Windows gets them from.


Consolas does not have the chess piece characters, simply. Whenever you're pasting into the channel, it looks like mIRC is telling Windows to do a font linking using the list of font there whereas Windows by itself does it with a different list of font (not including this custom fixed excelsior, obviously).

Quote:
Is it possible to edit the content/sequence of mIRC's own font-linking choices, or do I need to uninstall Excelsior 3.01 to stop that linking? Or if I do uninstall 3.01, is it going to use v2 or Excelsior NoLiga instead of Excelsior Regular? Excelsior are the only fonts that show the color control codes in the scripts editor using helpful symbols like [b] [c] [o].
Yeah if you're not happy with the glyph in the font, you should not use it, adding more font or editing the sequence wouldn't solve the problem, you would then find that the new order is causing others characters to be rendered in a way you don't like.
_________________________
Looking for a good help channel about mIRC? Check #mircscripting @ irc.swiftirc.net

Top
#264242 - 11/11/18 08:46 PM Re: Unicode discrepancy [Re: Khaled]
Wims Offline
Planetary brain

Registered: 31/07/06
Posts: 3436
Loc: France
Right, thanks. But I felt like this was an issue with the existing code, can you explain why it's correctly rendered in some fonts?
_________________________
Looking for a good help channel about mIRC? Check #mircscripting @ irc.swiftirc.net

Top
#264244 - 11/11/18 09:47 PM Re: Unicode discrepancy [Re: Wims]
Protopia Offline
Fjord artisan

Registered: 30/08/03
Posts: 216
Loc: UK
I have started work on a Unicode script that will provide a Unicode database and fully unicode compatible identifiers so that you can do something like:
Code:
//if ($ulower() == $ulower()) echo -ag ok
I have had to stop work on this because we are moving home in a few weeks and then a second time a few weeks later, but I was modelling it to a large extent on the python unicode package.

It is a pity that mIRC v7 went for UCS-2 rather than UTF-16 - but that is history that cannot be rewritten, and I am sure there was a good reason at the time.

But if anyone is interested in picking up where I have stopped, I would be happy to share the beginnings of the code with them.

Indeed, if full Unicode support is a big task for Khaled, then perhaps we could work with him on a backwardly compatible definition of what mIRC will look like when it does fully support Unicode, so that we can create a script which essentially provides the closest possible functionality. Obviously in script you cannot fix:
Code:
//if ( == ) echo -ag ok
however you could ensure that if someone codes:
Code:
//if ($ulower() == $ulower()) echo -ag ok
using the identifiers I am writing then this would be backwardly compatible with the Unicode version of mIRC.

Top
#264245 - 11/11/18 09:54 PM Re: Unicode discrepancy [Re: Protopia]
Wims Offline
Planetary brain

Registered: 31/07/06
Posts: 3436
Loc: France
mIRC use utf16 internally, not UCS-2.
Having a workaround for scripting is a good idea until it's built-in, but keep in mind, as Khaled mentioned, that this is touching mIRC as a whole, the highlight features has to support that as well.
_________________________
Looking for a good help channel about mIRC? Check #mircscripting @ irc.swiftirc.net

Top
#264247 - 11/11/18 10:26 PM Re: Unicode discrepancy [Re: Wims]
Protopia Offline
Fjord artisan

Registered: 30/08/03
Posts: 216
Loc: UK
Originally Posted By: Wims
mIRC use utf16 internally, not UCS-2.
Having a workaround for scripting is a good idea until it's built-in, but keep in mind, as Khaled mentioned, that this is touching mIRC as a whole, the highlight features has to support that as well.

Perhaps my understanding of UCS-2 was incorrect. But if mIRC used utf-16 then $len of any unicode character encoded as a pair of 16-bit values would return 1 rather than 2. The fact that you have to manually encode the surrogate pairs indicates that this is not genuinely the case.

However, my offer to share my starting point for a Unicode script still stands.

Top
#264248 - 11/11/18 10:55 PM Re: Unicode discrepancy [Re: Wims]
Raccoon Online
Hoopy frood

Registered: 18/02/03
Posts: 2501
Originally Posted By: Wims
mIRC use utf16 internally, not UCS-2.

Where did you arrive at this knowledge? It's my observation that the reverse is true.
_________________________
doin� things a particle can

Top
#264249 - 11/11/18 11:13 PM Re: Unicode discrepancy [Re: Raccoon]
Khaled Offline


Planetary brain

Registered: 04/12/02
Posts: 4295
Loc: London, UK
Windows Unicode applications are UTF-16.

Thanks for your comments everyone.

Top