mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Aug 2006
Posts: 167
P
Vogon poet
OP Offline
Vogon poet
P
Joined: Aug 2006
Posts: 167
Long ago, I wrote a script for doing "hex editor" style displays of &binvar variable contents to custom windows:



It was possible to do this with very little compromise: only null and mIRC control code characters had to be replaced before performing the screen writes. Which was easy:

Code:
breplace &smdebvarline 0 255 2 255 3 255 9 255 15 255 22 255 27 255 29 255 31 255 32 255

As UTF became more prominent in mIRC, however, I began noticing instances where unwanted UTF decoding would occur because of the random nature of the bytes I was writing. My temporary solution to that problem was to simply do this...

Code:
/window -CDk0nz -t1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,18 @Debugger Terminal 12

...coupled with inserting a $chr(9) between each byte in the &binvar I was going to write, before writing it. The result: the UTF decoding was foiled (and ergo the screen capture above, which is from mIRC 6.35 with breplace + tabbing to foil UTF).

But then came mIRC 7.x. Now even tabs don't stop UTF decoding.

Tabs + invalid characters breplace'd with ASCII 255:


Tabs + invalid characters breplace'd with ASCII 46 (just for comparison):


mIRC appears to be "seeing right through" my tabs and attempting to decode UTF anyway.

Is there any way to prevent this, so I can have my "hex dumps" back? Or has UTF finally won and made pure 8 bit screen writes impossible?

Joined: Feb 2006
Posts: 546
J
Fjord artisan
Offline
Fjord artisan
J
Joined: Feb 2006
Posts: 546
if the binvar doesn't contain invalid UTF-8 then $utfencode($binvar(&var, 1-).text) is enough to produce its string of bytes.

otherwise, as in the general case, you need to loop through all bytes with $bvar(&var, N) and handle them individually. there may be another way involving /bwrite + another feature of mIRC, but as far as I can remember (and i don't have mIRC to play around with at the moment :P) there is nothing you can use to pull the string from the file without decoding UTF-8 or putting it back in a binvar.

edit: you can of course use $regsubex($bvar(&var, 1-), /(\d+) ?/g, $chr(\1)) to return that string of byte values, as an alternative to (scripted) looping. due to mIRC's length limits, you will only be able to handle the bvar that way, in the worst case, in 1,037 byte sized chunks.


"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde
Joined: Aug 2006
Posts: 167
P
Vogon poet
OP Offline
Vogon poet
P
Joined: Aug 2006
Posts: 167
I hope replies to threads this old don't evoke moderator fury here...

I somehow missed your original response, jaytea, and only noticed it now, a year later. frown

Suffice it to say, I never found a solution to this problem on my own. Trying your three suggestions now alas didn't improve anything either -- even the byte-by-byte method. The output looked as hopelessly mangled as before. I even confirmed that $utfencode($bvar(&var,1-).text) was properly generating UTF by /bwriting the before and after &binvars to disk. Sure enough, the before file contained no UTF while the after file did. So at least the $utfencode() part did its job.

A thought: under normal circumstances, my &binvar variables contain binary produced by functions like $compress, and that's why their contents are prone to triggering false UTF decoding. But since pre-UTF-encoding their contents doesn't seem to help, I'm now beginning to wonder if mIRC 7's screen writes recursively evaluate UTF. Where for example, even if you feed /echo a fully UTF encoded string (made with byte-by-byte UTF encoding), it will re-evaluate the bytes which that UTF decodes to, if it thinks it sees further UTF.

Sigh.

Below, I am including a basic version of my hexdumper code (pre-populated with two different data arrays). In mIRC 6.17, the output of that code does this:



In mIRC 7, that same code outputs this:



If anyone can figure out how to make this code work in mIRC 7 as it did in mIRC 6.17, I would be most grateful. Again, the following code produced both results above -- the only difference was the mIRC version.

Code:
alias hexdump {
  if ($1 == -plaintext) bset -t &smdebvar 1 what you see here is some very nice and tasty $&
    conventional plaintext
  elseif ($1 == -randbinary) {
    bset &smdebvar 1 144 202 199 14 14 24 254 137 46 177 211 136 241 232 156 182 243 40 65 64 $&
    27 150 61 209 220 217 241 145 168 2 189 137 43 22 241 137 219 191 5 205 177 2 74 159 14 81 $&
    141 164 119 195 209 109 235 27 72 254 40 246 194 237 48 47 172 185 11 157 61 254 59 109 $&
    181 28 152 244 245 113 15 87 136 14 93 238 150 124 145 123 29 140 162 204 22 120 53 111 38 $&
    14 178 182 7 239 93 164 46 97 139 128 106 24 158 252 144 122 151 101 170 159 226 125 69 42 $&
    114 4 208 26 130 182 241 69 115 162 39 116 138 92 76 142 96 135 204 226 180 59 35 104 233 $&
    223 84 213 254 65 204 128 164 71 61 129 232 113 17 163 127 254 191 16 86 1 40 223 104 89 2 $&
    160 174 68 243 150 170 232 121 227 98 90 104 102 179 17 79
    bset &smdebvar 188 31 53 180 189 192 13 235 28 205 37 156 180 183 84 239 65 250 231 35 135 $&
    211 39 128 211 211 127 135 24 17 60 75 45 30 135 60 231 189 255 192 208 90 126 116 145 19 $&
    79 185 70 116 169 40 207 75 118 143 16 3 231 29 191 29 111 62 237 229 82 60 65 62 185 247 $&
    163 184 106 152 172 178 63 6 18 225 92 71 73 76 66 205 119 46 176 139 255 64 4 212 126 237 $&
    184 13 195 89 203 67 92 19 203 176 168 23 126 18 151 115 88 247 240 129 139 212 83 145 42 $&
    41 160 81 172 8 1 165 240 239 254 142 12 205 220 127 60 252 49 52 87 190 21 182 255 10 85 $&
    42 125 244 78 255 248 46 70 226 147 110 218 138 151 41 115 115 199 64 63 214 24 97 176 157 $&
    49 105 75 57 140 143 187 138 116 47 60 16 161 235 95 86 26 209 235 168 41 102 76 99 159 $&
    107 53 15 124 229 212
    bset &smdebvar 392 178 210 202 246 46 54 77 250 237 14 121 87 139 8 94 145 80 216 69 191 $&
    107 237 102 84 168 151 106 195 176 95 158 176 149 12 50 185 62 48 111 147 189 136 4 250 $&
    216 9 165 26 219 162 178 249 203 15 208 55 127 62 243 128 125 124 36 39 4 114 110 107 63 $&
    172 70 150 202 68 46 55 185 59 119 44 202 115 40 204 116 192 225 124 253 250 115 58 184 $&
    250 64 120 90 210 25 89 65 121 208 122 94 75 129 98 9 26 89 80 58 70 72 92 68 45 240 24 81 230
  }
  else return


  /window -CDk0z -t18 @hexdump -1 -1 1175 -1 Terminal 12
  echo @hexdump $chr(15)
  aline -h @hexdump $chr(3) $+ 14__________________________________________________________ $+ $&
    _______________________________________________________________________________________ $+ $&
    _______________________________________________________________________________________ $+ $&
    _____________________________________________
  echo @hexdump $chr(15)
  echo -i @hexdump $chr(3) $+ 13New message in $chr(36) $+ hexdump() on $&
    $asctime($ctime,dddd $+ $chr(44) mmmm d $+ $chr(44) yyyy @ hh:nn:ss TT zz)
  echo @hexdump $chr(15)
  echo -i @hexdump $chr(3) $+ 04Message length: $bvar(&smdebvar,0) bytes (hexdump alias $&
    executed as: $chr(31) $+ /hexdump $1 $+ $chr(31) $+ )
  echo @hexdump $chr(15)


  var %tempPos = 1
  while (%tempPos <= $bvar(&smdebvar,0)) {

    ; grab the next 16 bytes from &smdebvar

    bcopy -c &smdebvarline 1 &smdebvar %tempPos 16

    ; make a hex dump of our 16 bytes

    var %tempHexPos = 1
    while (%tempHexPos <= $bvar(&smdebvarline,0)) {
      var %tempHexLine = %tempHexLine $base($bvar(&smdebvarline,%tempHexPos,1),10,16,2)
      inc %tempHexPos
    }

    ; prepare for ascii dump by replacing mIRC control codes (and spaces, due to the
    ; consecutive spaces issue) with ascii 255 (invisible space in Terminal font/CP437)

    breplace &smdebvarline 0 255 2 255 3 255 9 255 15 255 22 255 27 255 29 255 31 255 32 255

    ; now utf-encode our 16 bytes to CP437 to prepare it for mIRC 7 (and if this is our last
    ; trip through the while loop, pad the end with utf-encoded 255 until it's 16 bytes long)

    var %n = 1
    while (%n <= 18) {
      ; 18 above so there'll be 2 spaces between ascii and hex columns in hexdump window
      bset -t &damnedutf $calc($bvar(&damnedutf,0) + 1) $&
        $iif(%n <= $bvar(&smdebvarline,0),$utfencode($bvar(&smdebvarline,%n,1).text,437),$utfencode($chr(255),437))        
      inc %n
    }

    ; disaster time

    echo @hexdump $bvar(&damnedutf,1,$bvar(&damnedutf,0)).text $+ %tempHexLine
    bunset &damnedutf

    inc %tempPos $bvar(&smdebvarline,0)
    unset %tempHexLine
  }
}

Joined: Aug 2006
Posts: 167
P
Vogon poet
OP Offline
Vogon poet
P
Joined: Aug 2006
Posts: 167
Looks like there's more to this than I thought.

With the Terminal font, mIRC seems to be replacing ASCII >= 160 with an entirely different font. Meanwhile, ASCII 128 - 159 appear as invisible glyphs with only 5 exceptions.

Also, using a different CP 437 font (http://neveradudelikethisone.com/2010/01/cp437-fonts-for-windows/), the first problem disappeared, but the second (invisible glyphs) remained.

Finally, it now appears the UTF encoding is being written raw.

Any ideas, folks?

Terminal (without and with $utfencode())



CP 437 (without and with $utfencode()) (alternate font problem disappears, invisible glyphs remain)



Code:
alias ascii {
  /window -CDe3k0mz @test -1 -1 530 650 Terminal 12
  var %n = 0, %u = $false, %m = 0 2 3 9 15 22 29 31 32
  while (%n <= 255) {
    if (%n == 0) aline 15 @test $chr(15)
    var %c = $iif(%u,$utfencode($chr(%n),437),$chr(%n))
    var %s = %s $+ $iif($istok(%m,%n,32),$chr(3) $+ 13 $+ .,$chr(15) $+ %c)
    if ($calc((%n + 1) % 16) == 0) {
      aline @test $chr(3) $+ 03< $+ $base($calc(%n - 15),10,10,3) $+ - $+ $base(%n,10,10,3) $+ > $+ %s $+ $chr(3) $+ 09<EOL>
      unset %s
    }
    inc %n
    if (!%u) && (%n == 256) var %u = $true, %n = 0
  }
}

Joined: Jul 2006
Posts: 4,149
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,149
As you guessed, your problem is that with mIRC 7.x, the parameters passed to the /echo command are handled internally with utf8 and indeed it's a problem, you can't do it char by char but you can prevent the display problem by replacing invalid characters by a dot like you did, just add the tab in the /echo:

Code:
alias hexdump {
  if ($1 == -plaintext) bset -t &smdebvar 1 what you see here is some very nice and tasty $&
    conventional plaintext
  elseif ($1 == -randbinary) {
    bset &smdebvar 1 144 202 199 14 14 24 254 137 46 177 211 136 241 232 156 182 243 40 65 64 $&
      27 150 61 209 220 217 241 145 168 2 189 137 43 22 241 137 219 191 5 205 177 2 74 159 14 81 $&
      141 164 119 195 209 109 235 27 72 254 40 246 194 237 48 47 172 185 11 157 61 254 59 109 $&
      181 28 152 244 245 113 15 87 136 14 93 238 150 124 145 123 29 140 162 204 22 120 53 111 38 $&
      14 178 182 7 239 93 164 46 97 139 128 106 24 158 252 144 122 151 101 170 159 226 125 69 42 $&
      114 4 208 26 130 182 241 69 115 162 39 116 138 92 76 142 96 135 204 226 180 59 35 104 233 $&
      223 84 213 254 65 204 128 164 71 61 129 232 113 17 163 127 254 191 16 86 1 40 223 104 89 2 $&
      160 174 68 243 150 170 232 121 227 98 90 104 102 179 17 79
    bset &smdebvar 188 31 53 180 189 192 13 235 28 205 37 156 180 183 84 239 65 250 231 35 135 $&
      211 39 128 211 211 127 135 24 17 60 75 45 30 135 60 231 189 255 192 208 90 126 116 145 19 $&
      79 185 70 116 169 40 207 75 118 143 16 3 231 29 191 29 111 62 237 229 82 60 65 62 185 247 $&
      163 184 106 152 172 178 63 6 18 225 92 71 73 76 66 205 119 46 176 139 255 64 4 212 126 237 $&
      184 13 195 89 203 67 92 19 203 176 168 23 126 18 151 115 88 247 240 129 139 212 83 145 42 $&
      41 160 81 172 8 1 165 240 239 254 142 12 205 220 127 60 252 49 52 87 190 21 182 255 10 85 $&
      42 125 244 78 255 248 46 70 226 147 110 218 138 151 41 115 115 199 64 63 214 24 97 176 157 $&
      49 105 75 57 140 143 187 138 116 47 60 16 161 235 95 86 26 209 235 168 41 102 76 99 159 $&
      107 53 15 124 229 212
    bset &smdebvar 392 178 210 202 246 46 54 77 250 237 14 121 87 139 8 94 145 80 216 69 191 $&
      107 237 102 84 168 151 106 195 176 95 158 176 149 12 50 185 62 48 111 147 189 136 4 250 $&
      216 9 165 26 219 162 178 249 203 15 208 55 127 62 243 128 125 124 36 39 4 114 110 107 63 $&
      172 70 150 202 68 46 55 185 59 119 44 202 115 40 204 116 192 225 124 253 250 115 58 184 $&
      250 64 120 90 210 25 89 65 121 208 122 94 75 129 98 9 26 89 80 58 70 72 92 68 45 240 24 81 230
  }
  else return
  
  window -CDkz -t18 @hexdump -1 -1 1175 -1 Terminal 12
  var %maincount = 1
  while (%maincount <= $bvar(&smdebvar,0)) {
    bcopy -c &smdebvarline 1 &smdebvar %maincount 16
    var %hexcount = 1,%result
    while ($bvar(&smdebvarline,%hexcount,1)) {
      if ($v1 !isnum 32-127) bset &smdebvarline %hexcount 46
      %result = %result $base($v1,10,16,2)
      inc %hexcount
    }
    aline @hexdump $+($bvar(&smdebvarline,1-).text,$chr(9),%result)
    inc %maincount 16
  }
}

Last edited by Wims; 06/04/12 04:51 PM.

#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Aug 2006
Posts: 167
P
Vogon poet
OP Offline
Vogon poet
P
Joined: Aug 2006
Posts: 167
Thanks, Wims. I had in deed considered what you've suggested, but I really wanted to make doubly sure there wasn't another way to accomplish my goal. Replacing !32-127 leaves only 37% of all possible characters visible. CP437 is somewhat the traditional character set used by hex editors, and since people (including myself) often recognize many bytes by their CP437 glyphs, I wanted my dumps to have as little substitution as possible.

Anyway, like said above, it seems unwanted UTF-8 decoding was not the problem after all; rather, it is that:

* mIRC won't display characters 128, 130-140, 142, 145-156, 158-159 properly in any font (that I am aware of)

* mIRC substitutes another font for Terminal when trying to display characters 160-255 with it (even though mIRC displays characters 160-255 in other non-truetype, non-UTF-8 fonts, like this one, without font substitution).

To your or anyone else's knowledge, is there any explanation for these issues? Can you tell me whether my /ascii alias (see my previous post) produces both issues for you too (to rule out machine-specific problems at my end)? If no and yes, then I suppose I should report both as bugs.

Joined: Oct 2004
Posts: 8,330
Hoopy frood
Offline
Hoopy frood
Joined: Oct 2004
Posts: 8,330
The characters you're attempting to display that aren't displaying at all are not valid characters in Unicode, so they will not display. That's expected when dealing with Unicode. That being said, most (all?) of those characters should have alternatives in Unicode that you can use instead by replacing the character with the Unicode equivalent of it.

mIRC substitutes fonts if the font does not appear to display the character properly. It will choose another font on your system in an attempt to correctly display that character. I don't know the inner workings of that, but I have a feeling that it's based on how the font you're using is written. If mIRC can't tell if the font supports a character, it will try another font. It could be that Terminal isn't as compatible with mIRC's font-linking as other fonts.


Invision Support
#Invision on irc.irchighway.net
Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
All of the "defacto standard" glyphs in codepages should have Unicode equivalents, ultimately, this is the point of having a universal character encoding. For CP437, the list is here: http://en.wikipedia.org/wiki/Code_page_437

That said, $utfencode/$utfdecode should be able to perform conversion to and from codepage encoding to UTF-8, so you could use that, AFAIK.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: Aug 2006
Posts: 167
P
Vogon poet
OP Offline
Vogon poet
P
Joined: Aug 2006
Posts: 167
In reply to both of your posts, I wrote a response containing numerous UTF8-encoded characters -- only to discover that this forum has turned off its UTF-8 support. It supported UTF-8 in the past, but its HTML now contains <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />.

Irony? mad

Anyway. Please forgive the inconvenience of this, but to see my response, you'll need to go here (where I can do UTF-8):
http://pastehtml.com/raw/bu7vqs0u5.html

Again, apologies.


Link Copied to Clipboard