Special Chars on 7.02 - mIRC Discussion Forums

Chr 127-159 won't be displayed anymore.

Not a bug, see this post for an explanation.

Well uhm what about comparing text from old clients with new scripts? Do i have to add for each thing 2 if requests?

Example:
on *:text:*:#:{ if (mäh == $1-) || (mäh == $utfencode($1-)) { } }

First, ä isn't in that range, so let's look at something that is: —

You won't want to use $utfencode() at all. You can do a $replace of the old character with the unicode version of it (the character isn't gone, it just has a new code) before the check if you need to.

$replace($1-,$chr(151),$chr(8212))

That would be placed on the 7.x client. The 6.35 client should be able to handle just a check of for — or whatever other character.

Or you could just do if ($1- == text $+ $chr(151) $+ text) instead of if($1- == text—text). The downside here is that you would need 2 checks (one for each way). The first example only requires the one check.

I'm sure there are other ways as well.

Chr 128-159 are not being converted as they where before.

Output from a loop through chr 127-255

Code:

utflist {
  var %x = 126
  while (%x < 255) {
    inc %x
    echo * chr( $+ %x $+ ) => $asc($left($utfencode($chr(%x)),1)) $asc($right($utfencode($chr(%x)),1))
  }
}

returned on mIRC 6.35:
http://nopaste.php-q.net/284557

and this on 7.02
http://nopaste.php-q.net/284558

Both mIRC where running on Win7

As mentioned above, you should not be calling $utfencode on data from 7.x as it is already Unicode.

$asc($chr(%x)) should be sufficient in 7.x

Again, this is not a bug, it is an intentional benefit of mIRC's Unicode support

To deal with old scripts, use this alias:

Code:

alias myutfencode { return $iif($version < 7, $utfencode($1-), $1-) }

Though I wonder if maybe mIRC could just no-op the $utfencode/$utfdecode aliases from 7.x on to avoid confusion.

Originally Posted By: argv0

As mentioned above, you should not be calling $utfencode on data from 7.x as it is already Unicode.

It is not unless it's displayed.

So then it's the editbox or what? If i type ALT+0159 i can see Ÿ (Latin Captial Letter Y with diaeresis) and when i send it or echo it, it disappears. Or better it becomes a control code.

Quote:

"The characters 128-159 are not used in ISO 8859-1 and Unicode"

From http://www.pemberley.com/janeinfo/latin1.html

Therefore $chr(159) is meaningless to mIRC, it should not display anything. The issue with the editbox might be an issue with font-linking and Windows handling things a little differently than mIRC's end, which is strictly UTF-8. As Khaled has said, what happens in the editbox is entirely defined by MS's Richedit control behaviour.

Latin Capital Letter Y with Diaeresis is U+0178: http://www.fileformat.info/info/unicode/char/0178/index.htm

Use //echo -a $chr(376) instead (376 is 0178 in base-10).

As mIRC is a Windows Application it should take care of the Microsoft Codepage 1252.

And mIRC did this before thats why the results of $utfencode() differs in 6.35 to 7.02.

Quote:

The official CP1252<->Unicode conversion table is printed in the Unicode 2.0 standard for instance, and is available on <ftp://ftp.informatik.uni-erlangen.de/pub/doc/ISO/charsets/> in the file ucs-map-cp1252. [Now see the file ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT at the official Unicode site.]

mIRC does not handle CP1252 (or any codepages) anymore. This is an intentional design choice for Unicode compatibility. You should no longer be using any codepages in mIRC.

There is also no rule that says to be a Windows application it must handle CP1252. When your web browsers are set to UTF-8 content-types, they do not handle CP1252. Certainly web-browsers are Windows applications.

It should be noted that any CP1252 support in Windows applications is likely legacy only.

And no, the reason $utfencode differs is because you're attempting to $utfencode already encoded data; all data in mIRC has already been encoded in UTF-8, so calling $utfencode will double-encode the data. It would be equivalent to the following code in 6.35:

//echo -a $utfencode($utfencode(A))

You should never use $utf* functions in 7.x, full stop.

Originally Posted By: argv0

And no, the reason $utfencode differs is because you're attempting to $utfencode already encoded data; all data in mIRC has already been encoded in UTF-8, so calling $utfencode will double-encode the data. It would be equivalent to the following code in 6.35:

//echo -a $utfencode($utfencode(A))

You should never use $utf* functions in 7.x, full stop.

You are absolutly wrong. If i would encode it twice the result would be the same all the time, so i would just "encode" chr 194 all the time and this should end up in char 195 and 130 for all of them. All characters (128-159) start with the char 194 and then increase from 128 to 159 in unicode as you can see on the screenshot in my post above. I'm not saying that this is wrong, i'm just saying that mIRC should be compatible as it was in all versions before this mess started.

Originally Posted By: Sephiroth_

So then it's the editbox or what? If i type ALT+0159 i can see Ÿ (Latin Captial Letter Y with diaeresis) and when i send it or echo it, it disappears. Or better it becomes a control code.

When you type Alt+0159, Windows (and not mIRC) produces the Unicode equivalent of the char from the codepage that corresponds to your current input (keyboard) language. The reason you may sometimes see a block is probably because you are not using a Unicode font (such as Arial Unicode MS or Fixedsys Excelsior): in that case, fontlinking kicks in, grabbing a symbol (glyph) from the first font it finds in your system that claims to support that character. In practice this does not always work, so sometimes you get a block. Note that this will never happen if you switch to a Unicode font.

The aforementioned behaviour of Alt+NNNN can be handy in this case. For example, if your old scripts use a hardcoded char 159, you can simply delete it and hit Alt+0159 in the Editor to get the Unicode version of this char.

Quote:

i'm just saying that mIRC should be compatible as it was in all versions before this mess started.

The mIRC 7 beta is specifically *not* meant to be compatible with non-unicode data. This is why Khaled put out a beta, so people could test and get accustomed to the new behaviour before the new version is released. mIRC7 will not be compatible with 6.35 with regards to handling of Unicode, and, for the last time, the behaviour of $utfencode is undefined in mIRC7, so it should not be used. For all intents and purposes, $utf* identifiers do not "exist" in 7.

Please read the explicit disclaimer about codepage support on the beta page: http://www.mirc.com/beta.html

After i did /font and pressing [ok] this 'problem' was solved, i used the mirc.ini from 6.35 to test if my scripts are almost fully compatible with 7.x and a simple upgrade method.

In mIRC 6.35 the [fonts] entry was "fstatus=Lucida Console,412,0,1" after that it changed to "fstatus=Lucida Console,412,0,2,0" and everything worked fine.
The chr 159 was not 'transforming' to a control code anymore. (same for all other characters from 128-159)

In mIRC v6.35, if you have set your window font to use eg. a Greek script/codepage and you then use $utfencode(text) in that window, it will perform the conversion using the Greek codepage. Resetting the fonts also reset your script/codepage for that font, so that is probably why the issue resolved itself.

Also in mIRC v6.35, the characters 128-256 will appear differently to users on IRC depending on their system language or their selected font script in the a window. That means it is not reliable to use this range of characters on IRC - there is no guarantee that users will see the same characters.

That is why mIRC v7.x now uses UTF-8 for all text, to ensure that everyone, everywhere, consistently sees the same characters. In mIRC v7.x, all text is Unicode text. When you use $utfencode() or $utfdecode(), it will convert to/from Unicode/UTF-8. The Unicode/UTF-8 conversion routines check (on a per character basis) if text is already in UTF-8 format and will prevent it from being double-encoded.

If I remember mirc.ini format well enough, setting "2" in your font like you did will enabled the "encode and send" option in 6.35. That is probably related to why it works when you type something. Keep in mind that the change you mention is a change in whether or not it displays in 6.35. You originally were asking about 7.x and not 6.35. If you had said characters disappearing in 6.35 when you typed them (instead of 7.x), we could have given you that answer right away.