As mentioned in section (2) below, question related to the issue of utf8-encoding hash strings, is this being done also by $encode's Blowfish encryption when not using the 'l' switch? For $encode being compatible with OpenSSL, I would expect $encode(33333333,ae,gryffczmj) to generate matching output to:

openssl.exe enc -bf-ecb -in 33333333.txt -out test.out -nosalt -p -pass pass:gryffczmj

but I can't get them to both make the same output, so if I'm not doing something wrong, I thought it might be caused by $encode utf-8 encoding hash outputs?

Quote:

UTF-8 encoding all of the formats was something that was common to the implementations I saw, so that is what mIRC does. UTF-8 encoding obviously breaks keys that include null bytes. So the question is, are null bytes actually allowed?


If by 'key' you mean UTF-8 encoding the command-line parameter being used as input to $hmac or $totp or $hotp, or to a hash function to generate the hash being used to derive the actual encryption key, then yes I can see that being done, as it's needed to ensure input typed by different users would generate matching keys.

But once the binary hash is created, I can't see it having 0x00's stripped and the remaining bytes being individually UTF-8 encoded. UTF-8 encoding of a hash digest isn't needed to ensure compatibility between people using different languages. But also, if hash digest output were translated to utf-8 text, it would greatly weaken the encryption in most cases.

For example, AES-128 uses a 128-bit key, so if AES-128 used the first 128-bits of a hash digest as the key, on average 50% of the bytes would be in the 128-255 range. If the hash digest were $utfencoded, each of those bytes would be replaced 1 byte with 2 bytes. 0x00's would appear on average once every 16 128-bit hash strings. The combined effect of these factors is that utf-8 encoded text strings translated from a hash digest would expand a 16-byte 128-bit digest having a 50/50 mix of 128-255/0-127 into a text string of length averaging around 24 bytes. AES-128 would chop this UTF-8 string at 16 bytes. The first 128 bits of that 23-25 byte UTF-8 encoded string would contain approximately 2/3rds of 128 bits: 85 bits. If the first 8 bytes of that hash digest had only bytes 0x80 through 0xff, utf-8 encoding the hash results in the 128-bit key only having 64 bits in the key.

Also, the way Blowfish handles the input key, a hash output of all 0x00's handled this way would crash the program. Blowfish accepts a variable input of 1-56 characters, and uses a key schedule which expands the input to 72 bytes. If the input were length 0 because it were a hash digest consisted entirely of 0x00's, it's impossible to expand 0 bytes to 72 bytes.

--

Additional evidence I've found of using hash digests only as non-utf-8 encoded binary strings:

(1) HMAC itself. I can't find test vectors involving literal keys containing non-text, but I imagine that applications accepting text input to HMAC would be UTF-8 encoding that text. If using the default sha1, that hash has a 512-bit block size, so if the literal input key string is 512 bits (64 bytes) or shorter, then that literal string is used as the 'secret key'. The input is padded with enough 0x00 bytes to make it be 512 bits. However if the input is longer than 512 bits, the input string is hashed via sha1, and the secret key is instead replaced with the 20 binary bytes of the sha1 digest then padded with 12 0x00 bytes to fill the entire 512-bit block. The sha1 hash is not utf-8 encoded here, nor is it UTF-8 encoded nor 0x00's deleted when the derivative inner/outer keys are created by XOR'ing the binary hash digest with 0x5c and 0x36. When the interior hash digest is appended to the outer key for the outer hash digest, it's not utf-8 encoded either. When that HMAC hash result is used by $totp or $hotp, it's not being utf-8 encoded while calculating the 6-digit numeric password.

--

(2) I haven't yet matched mIRC's Blowfish against another application except matching its ECB mode when using $encode's 'l' switch for literal keys. I haven't been able to match $encode vs OpenSSL for any test vectors containing 0x00's or 0x80-0xff bytes because mIRC doesn't allow &binvar as a literal key. However I have identified OpenSSL using a hash digest as a key, where it uses the binary bytes of the hash without discarding 0x00's or utf-8 encoding the remainder as if it's text. That's the reason I had made a request about using binary keys for $encode's Blowfish. It was to allow using actual binary hashes as the key, and not having them utf-8 encoded as if text.
https://forums.mirc.com/ubbthreads.php/topics/261893/New_$encode_switches

To re-create the match between OpenSSL and my assembler program both using 0x7f-0xff bytes without UTF-8 encoding them:

Code:
//write -n test.dat 33333333

Then at the command prompt:

Code:
openssl.exe enc -bf-ecb -in test.dat -out test.out -nosalt -p -pass pass:gryffczmj


This gives the key display: 10A6FD97002DF6CC087802129F8CD064

The 5th byte is a 0x00 and 3 of the first 4 bytes are ASCII 128-255. This is the first 128 bits: //echo -a $upper($left($sha256(gryffczmj),32))

The output file's first 8 bytes are: 0xAD 0x1E 0xD9 0x2E 0xc7 0x43 0xea 0x40

I match this output in my assembler program by using the 16 bytes "10 A6 FD 97 00 2D F6 CC 08 78 02 12 9F 8C D0 64" as the literal key, without removing the 0x00 nor utf-8 encoding the remainder - showing that's how OpenSSL handles hash output.

(2b)

When not using the 'l' switch, are $encode's 'e' and 'c' switches UTF-8 encoding the hash output and stripping 0x00's? Because I can't figure out how to get compatible output to the above from $encode. I can't get compatible output when I use:

Code:
//bset -t &v 1 $str(3,16) | noop $encode(&v,bae,gryffczmj) | noop $decode(&v,ba) | echo -a $regsubex($bvar(&v,1-16),/(\d+)/g,$base(\t,10,16,2))

--

(3) I finally found Google Authenticator associated with a hex16 encoding of its key, and the hex string is not UTF-8 encoded.

https://lists.open.com.au/pipermail/radiator/2011-June/017420.html

It associates the hex and base32 strings as non-utf-8 equivalents, even when they contain ASCII 128-255 or 0x00's. It shows these as equivalents:

Code:
3132333435363738393031323334353637383930   GEZD GNBV GY3T QOJQ GEZD GNBV GY3T QOJQ
d8f828609e0f4056f852e4c9d75605099f483e20   3D4C QYE6 B5AF N6CS 4TE5 OVQF BGPU QPRA
b906daef6d002ec6cc89106df25f8268ce28f95e   XEDN V33N AAXM NTEJ CBW7 EX4C NDHC R6K6
0000000000000000000000000000000000000000   AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA


These were also 4 examples of the 160-bit Google Auth key encoded as Base32 containing spaces to make it easier for the user to type the key. This translates the above Base32 strings into the above hex equivalent, with spaces for easier reading:

Code:
//bset -t &v 1 $remove(XEDN V33N AAXM NTEJ CBW7 EX4C NDHC R6K6,$chr(32)) | noop $decode(&v,ba) | echo -a $regsubex($bvar(&v,1-),/(\d+)/g,$base(\t,10,16,2))


These 4 Base32 and Base16 strings decode to the same binary bytes. When $hotp and $totp are given the string of these 32 Base32 string without spaces, they correctly decodes it into the 128-bit binary key binary keys. When divided by spaces as above, it's incorrectly handled as if a 39 char literal text key because the length including spaces isn't a multiple of 8. Adding 1 additional space between any of the groups of 4 would make the length be 40, and would decode correctly because the length-with-spaces is a multiple of 8. When given the length 40 hex string equivalent, it's instead decoded to the same 20 binary bytes as the Base32 key, but it's being further processed, having the 0x00's removed, then the remaining bytes are individually utf-8 encoded and used as if a text string.