mIRC Home    About    Download    Register    News    Help

Print Thread
Page 1 of 2 1 2
Joined: Feb 2006
Posts: 38
N
Ameglian cow
OP Offline
Ameglian cow
N
Joined: Feb 2006
Posts: 38
Problem identification

"SJIS/JIS conversion" option situated in Options/Messages turns on the conversion for every information sent and received regarding the encoding the user uses in current window. It tries to find patterns of Shift-JIS in every piece of information, converts it to JIS and sends it to the server.

Reproduction

In any channel, set the encoding to Japanese (you'd need some Japanese font for that of course) and enable the SJIS/JIS conversion. Now, in some other channel of Cyrillic encoding, enter "да" (0xE4 0xE0 in widows-1251) in the input field ("да" means "yes" in English). On the other side, users with SJIS/JIS conversion off will see "$Bhb(B" (minus some ANSI codes), which is JIS for Shift-JIS hieroglyph "萵", or 0xE4 0xE0, which means "lettuce"--something pretty far away from "yes".

Possible solution

For adding some kind of option to allow users to turn SJIS/JIS conversion on in the /font dialog may possibly ruin some user scripts, the "Use SJIS/JIS convesion: on / default" option could be placed instead, allowing users to change the default setting in the Options/Messages dialog, like it is done with Unicode.

Apparently this bug was missed, though it was confirmed, hence i am writing this post on the advice of one of the ops of the EFNET #mirc channel.

Joined: Nov 2007
Posts: 1
B
Mostly harmless
Offline
Mostly harmless
B
Joined: Nov 2007
Posts: 1
confirmed, it annoys me very much, too mad

Joined: Nov 2007
Posts: 2
7
Bowl of petunias
Offline
Bowl of petunias
7
Joined: Nov 2007
Posts: 2
hello minna-san! ^__^ (i'm just learning Japanese)

one of my friend told me that he had seen a topic regarding the problem i used to have some time ago, and i am writing to say that i'd really appreciate if you'd fix this! though i'm not speaking russean, i do try to learn Greek (alongwith Japanese ^___^ i am planning to learn French afterwards, too ^____^) and have experienced the bug, too. the other party said i had typed some weird symbols, and i can remember $ and B there.

thanks for fixing in advance!


NEVER KNOWS BEST (FLCL^^)
Joined: Feb 2006
Posts: 38
N
Ameglian cow
OP Offline
Ameglian cow
N
Joined: Feb 2006
Posts: 38
though confirmed several times, it seems that administration or whoever is looking through these messages missed this one again. so.. just one more try to gain some attention to this serious bug.

Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
As you said yourself, you have mentioned it numerous times and there's no way Khaled has not read one of your reports. There's no point in repeating this/bumping threads, so please don't.


/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Feb 2006
Posts: 38
N
Ameglian cow
OP Offline
Ameglian cow
N
Joined: Feb 2006
Posts: 38
so, like, he did notice this just to think "naah, such a serious mistake in the code forcing users to add extra letters to words just so that other are able to read the text is of no importance, i'll leave it as it is"?

anyway, that little titleless "m" behind you nickname, i guess, means "moderator", so you must have access to bugtracker or whatever there is; you must be able to say if this issue has gained or is going to gain or has got the possibility to gain at least a tiny bit of progress. i am looking forward to you reply.

i wouldn't repeat anything, just no answer can mean either not reading or ignoring, the latter being impossible when we speak of such a matter, that concerns the very idea of the application.

Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
FYI: Moderators are moderators of the forum, not mIRC itself. Although some moderators may have access to bug trackers, not all will. Whether or not qwerty does, I have no idea. Just pointing that out.

As far as bugs go, I have yet to see any bugs reported as being on a list of this TO fix or NOT TO fix. About the best you'll see here are responses after it HAS BEEN fixed, with a note that it will be fixed in the next version by Khaled. Until such a response is seen, you can pretty much assume that it's either in the works or else not going to happen. And only Khaled knows that and he doesn't reveal such decisions. Usually the choice will be whether it is possible, whether it's really a bug, and then you have to consider the time to figure out how to fix the problem. I don't think anyone else can give you more than that. If it's reported, it will be seen. If it's verified, it will be noted and will likely be fixed at some point, but like I said, until it's fixed, we won't hear anything about it.

Also keep in mind that unicode is only recently starting to be supported and is not fully supported yet. Eventually it will be fully supported and then you won't have any problems. Until then, there are going to be things that don't work.


Invision Support
#Invision on irc.irchighway.net
Joined: Jan 2003
Posts: 2,523
Q
Hoopy frood
Offline
Hoopy frood
Q
Joined: Jan 2003
Posts: 2,523
Quote:
anyway, that little titleless "m" behind you nickname, i guess, means "moderator", so you must have access to bugtracker or whatever there is; you must be able to say if this issue has gained or is going to gain or has got the possibility to gain at least a tiny bit of progress.
Neither I nor anybody else except Khaled has any information on which bugs are being worked on. People are only made aware of bug fixes when a new version comes out (including beta versions, available to beta testers) or a short while earlier, by means of Khaled announcing the fix.

Quote:
i wouldn't repeat anything, just no answer can mean either not reading or ignoring
What Riamus said is pretty much what's going on. Khaled usually responds when the bug has been fixed, although in some cases bugs are fixed without Khaled announcing the fact in the forum. One thing is certain: Khaled sifts through all bug reports, so rest assured that your report has been read and taken into consideration, even if there is no reply (yet).

Last edited by qwerty; 16/12/07 10:52 PM.

/.timerQ 1 0 echo /.timerQ 1 0 $timer(Q).com
Joined: Feb 2006
Posts: 38
N
Ameglian cow
OP Offline
Ameglian cow
N
Joined: Feb 2006
Posts: 38
problem update:

with unicode, the bug causes even more problems, making people unable to type even long sentences. for example, a mirc with a iso-2022-jp-enabled channel in another utf-8-enabled window produces "Р° Р·Р°С$B.R(BµРј РІРѕРѕР±С$B2R(Bµ РЅС$B&R(B¶РµРЅ Р±С$B6R(B» переезд?" when one types "а зачем вообще нужен был переезд?" ("but why did you need to move, anyway?")

i'm wondering if i can make a script that adds something to the string i pass to the server to break mIRC's buggy SJIS/JIS conversion engine. any suggestions?

Joined: Oct 2003
Posts: 18
M
Pikka bird
Offline
Pikka bird
M
Joined: Oct 2003
Posts: 18
The JIS/SJIS conversion feature is broken and I'm betting it's never going to be fixed satisfactorily. One of the underlying problems is that when mIRC receives a line of text from the server, it won't know whether to apply the conversion to that particular line before the line is parsed to discover the channel name - but the channel name itself could change depending on whether the conversion is to be applied or not --- Do you see the problem here? It's a vicious cycle ;-)

(Sidenote: It's sort of like web browsers having to parse HTML pages before they come across the http meta-equiv charset setting, at which point they have to re-parse the page from the start.)

I propose that the JIS/SJIS conversion feature is ripped out altogether (because it's never going to work) and UTF-8 output is made the *default setting* in all new installs. Even if the former is not done, I insist the latter is. :-)

Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
You can't do that, though, because many people on IRC channels use SJIS/JIS.. you know, people who aren't on mIRC and will still have that choice even if mIRC gets rid of it.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: Oct 2003
Posts: 18
M
Pikka bird
Offline
Pikka bird
M
Joined: Oct 2003
Posts: 18
They are a dying breed, and rightly so. I think the function can be ripped out because it's not working correctly at the moment anyway.

Similar functionality can be achieved through some scripting/DLL magic. In fact, that's exactly what I did when I realized JIS/SJIS conversion and UTF-8 were never going to work well together in mIRC, but my DLL is just a quick hack and I'm not planning on releasing it.

Eventually I didn't even need that because I was able to convince my friends to switch to clients that support UTF-8. If you chat on Japanese channels where people insist on using Japanese software (or are simply unable to use mIRC because of the language barrier), tell them to switch to the latest version of an IRC client called Cotton. That one supports UTF-8 just fine.

And I still think mIRC should also output UTF-8 by default. Please, Khaled, I beg you.

Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
I think you're heavily exaggerating the way in which SJIS/JIS conversion is broken.

First off, this is only one bug report. Yes, there have been others, but there are probably as many Unicode related bugs reported than SJIS/JIS related ones, if not more. In fact, NineTails commonly reports SJIS/JIS bugs (he's probably the most common bug reporter for that feature), but if you look at his history he's made just as many reports about UTF-8. So, to make the claim that SJIS/JIS is fundamentally broken just isn't right.

Secondly, NineTails is reporting a rather uncommon use case here. While that's no excuse for the bug-- I would bet that most users who speak Japanese on IRC don't also speak Russian (or other Cyrillic languages) in another channel on the same network-- maybe English, but there would be no problem there.

This can and should be fixed, no question about it... but I think you're magnifying the way it affects the userbase; it is likely not nearly as bad as you make it sound. SJIS/JIS is not nearly as broken as you make it sound, or the bug report here would have been "SJIS/JIS BROKEN ALL THE TIME", not "SJIS/JIS not working when I speak one language on one channel and another language in another while eating cheesecake on tuesday" (exaggerated to illustrate my point).

Finally, UTF-8 is not the answer. As I mentioned before, there have been just as many posts about bugs in Unicode as there have been about SJIS/JIS. This entire conversation could have been about some Unicode issue just the same. Would you have suggested trashing UTF-8 in that scenario? No. Because you like UTF-8.. I assume you don't mind ridding the world of SJIS/JIS because you don't like or use SJIS/JIS, and it would not affect you.. well that's not fair to those whom it would affect.

Throwing out a buggy implementation is not the answer. If Khaled respects his users, he'll fix it because they rely on it. That's the real answer.

As an aside let me add that this post actually belongs more in the Feature Suggestions forum than the Bug Reports forum. Khaled had already mentioned that SJIS/JIS is intentionally applied server-wide because of the way the conversion works.. so it's not as if this behaviour is broken, but rather, not even implemented to behave the way NineTails expects. Now, you were right when you said it's probably a difficult task to undertake, and I would guess that Khaled tried and then scaled his solution back waiting for someone to specifically ask for such support (perhaps hoping no one would). However, difficult is not impossible, and there are ways to implement this.. we will see what Khaled chooses to do.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: Feb 2006
Posts: 38
N
Ameglian cow
OP Offline
Ameglian cow
N
Joined: Feb 2006
Posts: 38
well, as for Unicode. It's not buggy itself. It's that mIRC tries to guess the encoding and of course gets wrong. another issue is with nicknames/channels/etc that can be in unicode or any other encoding, and that interfere with the encodings used by users. but as for SJIS/JIS, i've got a solution. it took me just several hours to make a script, so i believe it must me much, much more simpler to fix this in the code of mIRC. my script is not perfect, though.

how to use:
0) DISABLE sjis/jis in Оptions/IRC/Messages; ENABLE ascii parsing
1) put the code in your remotes (Alt-R)
2) make sure the triggers of the script do not interfere with some other scripts
3) go to the ISO-2022-JP channel and check "Enable JIS" in the channel menu. voila!

*) you can use "&#12539;" for actions (e.g. &#12539;&#27515;&#12435;&#12384;). the script will indicate its engine-parsed messages with double <<>>'s and double **'s. quites/parts/topics/etc not parsed.

Code:
alias njau.enable {
  if ( $istok( %njau. [ $+ [ $1 ] ] , $network $+ $active , 32 ) ==  $true ) { 
    set %njau. [ $+ [ $1 ] ] $remtok( %njau. [ $+ [ $1 ] ] , $network $+ $active , 1, 32)
  }
  else set %njau. [ $+ [ $1 ] ] $addtok( %njau. [ $+ [ $1 ] ] , $network $+ $active , 32)
}

alias njau.isenabled {
  if ( $istok( %njau. [ $+ [ $1 ] ] , $network $+ $active , 32 ) ==  $true ) return 1
  else return 0
}

menu channel {
  $style( $njau.isenabled( jis ) ) Enable JIS:/njau.enable jis
}

on *:input:*:{
  if ( $istok( %njau.jis , $network $+ $active , 32 ) ==  $true ) {
    if ( $left($1-,2) == $chr(129) $+ E ) {
      .describe $active $sjis2jis($right($1-,-2))
      echo 3 -atn ** $me $right($1-,-2)
      halt
    }
    elseif ( $left($1-,3) == /me ) {
      .describe $active $sjis2jis($right($1-,-3))
      echo 3 -atn ** $me $right($1-,-3)
      halt
    }
    elseif ( $left($1-,1) != / ) {
      .msg $active $sjis2jis($1-)
      echo 3 -atn << $+ $me $+ >> $1-
      halt
    }
  }
}

on ^1:text:*:#:   if ( $istok( %njau.jis , $network $+ $chan , 32 ) ==  $true ) { haltdef | echo -mtl # << $+ $nick $+ >> $jis2sjis($1-) }
on ^1:action:*:#: if ( $istok( %njau.jis , $network $+ $chan , 32 ) ==  $true ) { haltdef | echo -mtl # ** $nick $jis2sjis($1-) }

alias -l sjis2jis {
  var %x = 1, %y, %c, %word, %out, %insequence = 0
  var %startsequence = $B
  var %endsequence = (B  

  while ( %x <= $numtok($1-, 32)) {
    %word = $gettok($1-, %x, 32)
    %y = 1
    set %o
    while (%y <= $len(%word)) {
      %c = $asc($mid(%word, %y, 1))
      if ( ((%c >= 129) && (%c <= 159 )) || ((%c >=  224) && (%c <= 239)) ) {
        if (%insequence == 0) { %o = %o $+ %startsequence | %insequence = 1 }
        { { { { {
                  var %a = $asc( $mid(%word, %y, 1) ), %b = $asc( $mid(%word, $calc(%y + 1),1) )
                  if (%a <= 159) %a = $calc(%a - 113)
                  else %a = $calc(%a - 177)
                  %a = $calc( %a * 2 + 1)
                  if (%b >= 127) %b = $calc(%b - 1)
                  if (%b >= 158) {
                    %b = $calc(%b - 125)
                    %a = $calc(%a + 1)
                  }
                  else %b = $calc(%b - 31)
                  %o = %o $+ $chr(%a) $+ $chr(%b)
        } } } } }
        %y = $calc( %y + 2 )
      }
      else {
        if (%insequence == 1) { %o = %o $+ %endsequence | %insequence = 0 }
        %o = %o $+ $mid(%word, %y, 1)
        %y = $calc( %y + 1 )
      }
    }
    if (%insequence == 1) %o = { %o $+ %endsequence | %insequence = 0 }
    %out = %out %o
    %x = $calc( %x + 1 )
  }
  return %out
}

alias -l jis2sjis {
  var %jis = /\$B(.+?)\([BJ]/g
  return $regsubex($1-, %jis, $_jis2sjis(\t))
}

alias -l _jis2sjis {
  var %x = 1, %y, %z, %j1, %j2, %s1, %s2
  while ( %x < $len($1-) ) {
    %j1 = $asc( $mid( $1-, %x, 1 ) )
    %j2 = $asc( $mid( $1-, $calc( %x + 1 ), 1 ) )
    if ( ( 33 <= %j1 ) && ( %j1 <= 96 ) )  %s1 = $calc( $int( $calc( ( %j1 + 1 ) / 2)) + 112 )
    if ( ( 97 <= %j1 ) && ( %j1 <= 126 ) ) %s1 = $calc( $int( $calc( ( %j1 + 1 ) / 2 )) + 176 )
    if ( $calc( %j1 % 2 ) == 1 ) %s2 = $calc( %j2 + 31 + $int( $calc( %j2 / 96 ) ) )
    else set %s2 $calc( %j2 + 126 )
    set %x $calc( %x + 2 )
    set %z %z $+ $chr( %s1 ) $+ $chr( %s2 )
  }
  return %z
}

Joined: Oct 2003
Posts: 110
D
Vogon poet
Offline
Vogon poet
D
Joined: Oct 2003
Posts: 110
Ninetails, this might help you, it's a little dll I did long ago:
mirc codepage conversion dll

Usage is fairly simple, just a:
Code:
$dll(mIconv.dll,_miconv@24, fromcharset $+ $chr(124) $+ tocharset $+ $chr(124) $+ texttoconvert )

will return the conversion of texttoconvert from fromcharset to tocharset. The codepage names used are those of iconv.

so for example //say $dll(D:\Local\DeathWolf\Application Data\mIRC\mIconv.dll,_miconv@24,UTF-8|ISO-2022-JP|&#12354;&#12356;&#12358;&#12360;&#12362;&#36066;)

Joined: Dec 2002
Posts: 5,411
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,411
The SJIS/JIS encoding is applied to the whole line that is sent to the server. When a line is received from the server, again the decoding is applied to the whole line.

I could change it so that it works like the UTF-8 implemention...

For UTF-8, mIRC extracts the target channel/query and the :message portion from PRIVMSG, NOTICE, PART, TOPIC message that is about to be sent, and encodes the :message based on the UTF-8 setting for the target channel/query. Also, if the server's numeric 005 token CHARSET=UTF-8 is set, mIRC applies UTF-8 encoding and decoding to the whole line.

The question is, are there Japanese servers that require whole line sjis/jis conversion? If there are, the above solution will break mIRC on these servers, unless they support a numeric 005 CHARSET=JIS or some similar value that would tell mIRC to perform whole line conversion.

And should mIRC be SJIS/JIS converting non-message lines as well? ie. channel or query names in JOIN, MODE, etc. events? I seem to remember seeing Japanese channel names that looked like they were in JIS format long before I added SJIS/JIS support to mIRC. If that's the case, the selective UTF-8 encoding method above won't work for SJIS/JIS.

Joined: Feb 2006
Posts: 38
N
Ameglian cow
OP Offline
Ameglian cow
N
Joined: Feb 2006
Posts: 38
the main problem is that the conversion is applied to every channel disregarding specified encodings.

Originally Posted By: Khaled
For UTF-8, mIRC extracts the target channel/query and the :message portion from PRIVMSG, NOTICE, PART, TOPIC message that is about to be sent, and encodes the :message based on the UTF-8 setting for the target channel/query.

mIRC could do the same for Japanese but, if it's more convenient, encode not the message, but the whole string. But channel settings should be taken into account! that's the main wish smile

another wish is to take network names into account, too. i've got two channels named '#Japanese', one of them is ISO-2022-JP and another is UTF-8, making one of them read-only for me.


(as for me, i can have a non-english nickname and type Japanese in a utf-8 enabled channels on some servers that do not specify 'CHARSET' at all in the numeric 005. but both my nickname and message are in Unicode)

(strings in ASCII and a string in ISO-2022-jp (JIS) do not differ if they contain latters, punctuation, etc ASCII stuff, much, if not all of the letters used in the IRC commands (see byte map - these symbols are the same in Shift-JIS as well). Hence, as i assume, the 'whole string conversion' will result in the same as would the 'partial' conversion of messages, and nicknames and channel names.)

(regarding the channel names and nicknames in ISO-2022-JP. well, haven't seen JIS nicknames ever, but regarding the channels, such do exist.)

Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
I think some of the issues you mentioned can be resolved if you have any information regarding the 2 questions Khaled asked in his post.

Specifically,

1. Are there Japanese servers that require whole line sjis/jis conversion? If so, do they support a numeric 005 CHARSET=JIS or some similar value that would tell mIRC to perform whole line conversion?

2. Should mIRC be SJIS/JIS converting non-message lines as well? ie. channel or query names in JOIN, MODE, etc. events?

I think you answered #2 indirectly with your #Japanese thing, but can you clarify on these points?


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: Feb 2006
Posts: 38
N
Ameglian cow
OP Offline
Ameglian cow
N
Joined: Feb 2006
Posts: 38
I'm not quite sure about terminology and such, but anyhow, as i've already said:

Originally Posted By: argv0
1. Are there Japanese servers that require whole line sjis/jis conversion? If so, do they support a numeric 005 CHARSET=JIS or some similar value that would tell mIRC to perform whole line conversion?


if by whole line you mean "PRIVMSG #foo :bar", then the question doesn't make sense, as with 'partial' conversion (conv. of channel names/possibly nicknames/message) the line will look the same).

plus, servers don't have encodings, channels do. irc is all about raw--that's why we have 'script' in the font settings. all the japanese networks i've seen have two basic encodings: unicode and iso-2022-jp, and have them simultaneously. and irc protocols are documented and everything. so how A CHANNEL's encoding could possibly ruin client's communication to the SERVER? this is not possible even with (some weird) servers that do encoding conversions and support different encodings on different ports.

Originally Posted By: argv0
2. Should mIRC be SJIS/JIS converting non-message lines as well? ie. channel or query names in JOIN, MODE, etc. events?


as i've said, i've seen channel names in JIS. and i've not seen nicknames in JIS. but as far as i understand, it's possible to make a server which'd support JIS nicknames

BUT please pay some attention to the following, as noone seems to be noticing it.

whole line conversion works fine and i don't see any need to drop it.
do JUST ONE THING: make it work for the channels which use JIS O N L Y.

there is no need to apply SJIS/JIS conversion to russian channels, unicode channels and others, it just ruins everything. this is the only problem, everything works pretty fine with Japanese.

+ nitice that there can be two channels, one in JIS, one in SJIS, so it'd be good if 'JIS Japanese' would be added to the scripts list:



this is the way it should work, as it's nothing but another encoding.

P.S. don't forget that #japanese in FooNet and #japanese in BarNet should be treated as two distinct channels, too!

Last edited by NineTails; 10/05/08 04:18 PM.
Joined: Feb 2006
Posts: 38
N
Ameglian cow
OP Offline
Ameglian cow
N
Joined: Feb 2006
Posts: 38
this problem still exists--after so many months and versions out. all the information and explanations have been given. users and bugs are being ignored--how good is that?

Page 1 of 2 1 2

Link Copied to Clipboard