mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Jan 2009
Posts: 7
T
Timiz0r Offline OP
Nutrimatic drinks dispenser
OP Offline
Nutrimatic drinks dispenser
T
Joined: Jan 2009
Posts: 7
Since updating to 7.1, my JSON identifier doesn't seem to work completely anymore. Whenever I use Google's language API to translate text from unicode (such as japanese), the text gets replaced with question marks.

Through testing, I was never able to find a problem with unicode in COM in any other situation. I was able to narrow down the problem to the part of my script that uses an XMLHTTP object to retrieve data. Sample code is below:

Code:
alias comtest {
  ;-----
  ; This possible bug occurs in my JSON identifier as of 7.x. When using Google's translation API, the data would receive just fine when receiving unicode.
  ; However, when trying to translate from unicode, it appears that those characters are being replaced by question marks.
  ;-----
  comopen test1 msscriptcontrol.scriptcontrol

  ;set lang to javascript and create a function to get data via http
  noop $com(test1,language,4,bstr,jscript) $com(test1,addcode,1,bstr,function httpjson(url) $({,0) y=new ActiveXObject("Microsoft.XMLHTTP");y.open("GET",url,false);y.send();return y.responseText; $(},0))

  ;these 2 lines show how it works fine receiving unicode
  noop $com(test1,eval,1,bstr,httpjson("http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&q=hungry&langpair=en|ja"))
  echo -a [COM Test (Works)] $com(test1).result

  ;these 2 lines dont work
  noop $com(test1,eval,1,bstr,httpjson("http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&q= $+ $+($chr(39138),$chr(12360),$chr(12383)) $+ &langpair=ja|en"))
  echo -a [COM Test (Fails)] $com(test1).result

  :error
  comclose test1
}


Other than that you can't paste those characters in 6.35, this type of script works fine in 6.35.

Last edited by Timiz0r; 09/08/10 08:51 PM.
Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
Same here. I can manually put that into IE and it translates fine, but not from mIRC 7.1. In mIRC 6.35, if I use the actual unicode characters in the script instead of $chr(), it also fails with ??'s. However, if I use the script provided in 6.35, I get H_ as the result instead of Hungry for some reason.

My guess based on that is that UTF8 doesn't work with COM very well and that may very well be a limitation of COM and not mIRC. The problem with 7.1 for this script is that it only output in UTF8, so you can't get around it by not encoding it as UTF8. Perhaps there's another COM method that will work with UTF8?

And maybe it is a bug in mIRC itself. That, I can't tell as I am not that knowledgeable with COM, though I know some.


Invision Support
#Invision on irc.irchighway.net
Joined: Jan 2009
Posts: 7
T
Timiz0r Offline OP
Nutrimatic drinks dispenser
OP Offline
Nutrimatic drinks dispenser
T
Joined: Jan 2009
Posts: 7
Well, the thing is it works fine in 6.35 for me.

Quote:

[5:35:58pm] [~Timi] !translate ja When in doubt, whip it out.
[5:35:59pm] [~Infinity] That text translates to "疑問がある場合、それをサッと取り出し。".
[5:36:02pm] [~Timi] !translate en 疑問がある場合、それをサッと取り出し。
[5:36:02pm] [~Infinity] That text translates to "If in doubt, whip it out.".


However, in a similar script on my local client, this is outputted:
Quote:

[Translation Party] To Japanese: 疑問がある場合、それをサッと取り出し。
[Translation Party] To English: ???????????????????


Also forgot to mention that it's likely the unicode is replaced by question marks before being sent off to be translated. However, in testing, I'm able to set a variable with such a character and retrieve it successfully.

Last edited by Timiz0r; 09/08/10 09:44 PM.
Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
This is not an issue with UTF-8 and COM, or at least, it shouldn't be. The real issue is that you're violating the HTTP protocol.

Specifically, you should not be sending unicode unencoded over HTTP. IIRC the RFC is pretty explicit about how to send non-ASCII data: http://www.w3schools.com/TAGS/ref_urlencode.asp as well as http://en.wikipedia.org/wiki/Percent-encoding

The correct way to send the data is to URLEncode the data (as described in those links above) using either %uxxxx (not widely supported) or by the binary encoding of the utf-8 data. Given that you have JScript, you can also use encodeURI() to do this for you, otherwise you will have to $utfencode() and URI encode each byte individually. Of course there might be a unicode-compliant urlencode snippet somewhere that you could use.

Code:
noop $com(test1,eval,1,bstr,httpjson("http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&q=%E9%A3%A2%E3%81%88%E3%81%9F $+ &langpair=ja|en"))


or:

Code:
alias comtest {
  comopen test1 msscriptcontrol.scriptcontrol
  ;set lang to javascript and create a function to get data via http
  noop $com(test1,language,4,bstr,jscript) $com(test1,addcode,1,bstr, $&
    function httpjson(url,query){ $&
    var qs = []; if (query){for (var x in query) qs.push(x+"="+encodeURI(query[x]));} $&
    var y=new ActiveXObject("Microsoft.XMLHTTP");y.open("GET",url+(qs.length>0?'?'+qs.join('&'):''),false);y.send();return y.responseText;})

  ;these 2 lines show how it works fine receiving unicode
  noop $com(test1,eval,1,bstr,httpjson("http://ajax.googleapis.com/ajax/services/language/translate",{v:"1.0",q:"hungry",langpair:"en|ja"}))
  echo -a [COM Test (Works)] $com(test1).result

  ;these 2 lines dont work
  noop $com(test1,eval,1,bstr,httpjson("http://ajax.googleapis.com/ajax/services/language/translate",{v:"1.0",q:"飢えた",langpair:"ja|en"}))
  echo -a [COM Test (Fails)] $com(test1).result

  :error
  comclose test1
}


I modified the JS function to auto-urlencode your query string params.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: Jan 2009
Posts: 7
T
Timiz0r Offline OP
Nutrimatic drinks dispenser
OP Offline
Nutrimatic drinks dispenser
T
Joined: Jan 2009
Posts: 7
Actually, the XMLHTTP object automatically encodes URIs, at least in past usage of the script. That's why I later took out code to encodeURIComponent. Not quite sure why it wasnt working in 7.1. Might also be possible that Windows 7 changed that since the 6.35 test was done on XP.

Either way, since encoding the URL worked fine, this topic can be locked. Thanks argv[0] laugh

Well, I checked with a packet sniffer since the question marks were still weird. The data that is being sent are question marks (not even encoded, so argv[0] is still right) and not raw unicode. So I'm wondering if this is still an mIRC issue.

Code:
0000:  2F 61 6A 61 78 2F 73 65 72 76 69 63 65 73 2F 6C  /ajax/services/l
0010:  61 6E 67 75 61 67 65 2F 74 72 61 6E 73 6C 61 74  anguage/translat
0020:  65 3F 76 3D 31 2E 30 26 71 3D 3F 3F 3F 26 6C 61  e?v=1.0&q=???&la
0030:  6E 67 70 61 69 72 3D 6A 61 7C 65 6E              ngpair=ja|en    


Edit again:
Considering how mIRC works and how I can't reproduce this kind of problem outside of XMLHTTP, I'll assume it's some weird problem with XMLHTTP that produces differently for reasons I can't think of laugh. So, i'll just reintroduce encoding stuffs and go with that.

Last edited by Timiz0r; 09/08/10 11:07 PM.

Link Copied to Clipboard