mIRC Home    About    Download    Register    News    Help

Topic Options
#263025 - 15/05/18 08:37 AM $hotp() base-32 decoding
maroon Offline
Hoopy frood

Registered: 12/01/04
Posts: 766
In a continuation of the other $hotp bug report. $hotp decodes key strings in a way that doesn't appear to be documented. The /help page for $hotp describes 4 methods of presenting the key:

Hex strings of length 40/64/128
Base32 strings of length 16/24/32
Google Authenticator Format
plain text

It specifically says that it decodes Base32 strings of the 3 lengths 16/24/32, however I've found that as long as the key is a string that's *any* multiple of 8 characters greater than length 8, and contains only the 32 case-insensitive characters of the Base32 alphabet excluding the '=' padding character, $hotp is using the Base32 decoded string instead of the case-sensitive literal string. The exception is strings of length 40/64/128 containing only hexadecimal characters.

This means that someone using a key that contains the properties of a Base32 encoded string loses password strength because their password is being handled in a case-insensitive manner.

Code:
//var %i 2 | while (%i isnum 1-25) { var %a 5 * %i | var %b $regsubex($str(x,%a),/x/g,$r(!,~)) | echo -a $hotp(%b,123) $hotp($lower($encode(%b,a)),123) $hotp($upper($encode(%b,a)),123) - %a %b | inc %i }



Can you give an example of how Google Authentication format differs from the other 3 described methods of of presenting the key? Other than hex strings of lengths 40/64/128 or strings of length 16+ which could be mistaken for a Base32 encoded string which doesn't contain the Base32 padding character, all other key strings I've tested appear to handle the key as the literal text string.

Top
#263029 - 15/05/18 07:06 PM Re: $hotp() base-32 decoding [Re: maroon]
Khaled Offline


Planetary brain

Registered: 04/12/02
Posts: 4107
Loc: London, UK
Thanks for your bug report. This sounds like the same issue as your previous post. Since the identifer does not provide a parameter that allows you to specify the actual format of the key, mIRC guesses at what the parameter is based on the number of characters and the whether they are in a hex/base32 format. So if you provide a parameter that overlaps in some way and is ambiguous, what you describe will happen.

Top
#263032 - 15/05/18 08:30 PM Re: $hotp() base-32 decoding [Re: Khaled]
maroon Offline
Hoopy frood

Registered: 12/01/04
Posts: 766
The /help says it decodes only 3 specific lengths of hex strings, and it does that. The /help says it decodes only 3 specific lengths of base32 strings, but instead it's looking at all other lengths greater than 8 which are multiples of 8 without containing the '=' padding character, to see if they match the pattern of base32 encoding.

Or is this decoding of all these additional base32 lengths what /help means by 'Google Authenticator' format? The /help mentions that as a 4th way of handling keys, but I couldn't find a 4th way beyond $utfencode($remove(decoding hex,0x00)), or decoding base32, or literal text.

Top
#263034 - 16/05/18 09:49 AM Re: $hotp() base-32 decoding [Re: maroon]
Khaled Offline


Planetary brain

Registered: 04/12/02
Posts: 4107
Loc: London, UK
Quote:
The /help says it decodes only 3 specific lengths of base32 strings, but instead it's looking at all other lengths greater than 8 which are multiples of 8 without containing the '=' padding character, to see if they match the pattern of base32 encoding.

That is correct. The comments in my code state that it should check for >=16 and multiples of 8 instead of just 16/24/32. I cannot remember why as I implemented this feature three years ago and researched it at that time. You will need to research this, and the Google Authenticator format, yourself I'm afraid.

Top
#263037 - 17/05/18 04:14 AM Re: $hotp() base-32 decoding [Re: Khaled]
maroon Offline
Hoopy frood

Registered: 12/01/04
Posts: 766
In summary results of my 'research', the /help reference to "base32 format of 16/24/32 chars" and "lower case with spaces" both appear to be references to Google Authenticator format, but $hotp implements them both incorrectly. The length should be strings of 16/26/32 Base32 characters, either as a continuous string or as separated by spaces into groups of 4 for easier readability. $hotp/$totp are also measuring those lengths before removing spaces but instead should measure the length after removing spaces.

The specified lengths of hex strings appears to be designed to support applications that use binary hash digests of lengths 20/32/64 for sha1/sha256/sha512. Any such application would have no reason to strip 0x00's and UTF-8 encode the remaining bytes as if they're text.

All issues affect $hotp and $totp equally, as $totp uses a count parameter derived from $ctime instead of being a purely sequential value.

- -

The closest I can find to identifying $hotp identifier's key parameter's parsing is a quote I find on the Wikipedia page for "Google Authenticator":

Quote:

"The service provider generates an 80-bit secret key for each user (whereas RFC 4226 4 requires 128 bits and recommends 160 bits).[39] This is provided as a 16, 26 or 32 character base32 string or as a QR code."


The /help says "base32 format of 16/24/32 chars", but the 24 appears to be trying to support Google Auth but the wrong number is printed and also incorrectly implemented into actual string handling. At first I saw the 26 as if it were a typo, but the 3 bit-lengths referenced in the quote are 80 128 and 160 bits. When excluding the '=' padding, 16 26 and 32 are the Base32 lengths which encode binary key lengths of 80 128 160:

Code:
//var %i 1 , %a 80 128 160 | while (%i isnum 1-3) { echo -a $len($remove($encode($str(X,$calc($gettok(%a,%i,32) /8)),a),=)) | inc %i }



Instead, $hotp is only decoding base32 strings whose length are an exact multiple of 8 greater than length 8, without allowing them to have any '=' padding. This means $hotp is not supporting 128-bit keys encoded as 26 characters. It appears Google Auth doesn't pad their 26-character strings with ='s.

The closest I could find where Google Authenticator is associated with base32 encoding to every multiple of 8 is the source code at: https://github.com/google/google-authent...authenticator.c

... where the line:

Quote:

#define SECRET_BITS 80 // Must be divisible by eight


... has the key length hard-coded as 80 bits, but the comment implies it can be edited to be any value that's a multiple of 8. But this is a multiple of 8 bits for the binary key, not the byte length of the base32 encoding.

Each group of 8 base32 characters can encode 40 bits, so having the base32 strings be multiples of 8 without padding assumes the binary key is always going to be a multiple of 40, which so far is true only for 80 and 160 bit keys.

The references to spaces and lower-case being part of Google Authenticator are at places like:

https://soeithelp.stanford.edu/hc/en-us/...d-or-iPod-Touch
https://garbagecollected.org/2014/09/14/how-google-authenticator-works/

The 1st link describes the key being a 26-character base32 string which can be typed as upper or lower case, and with-or-without spaces. The 2nd link shows a 160-bit key being presented as 8 groups of XXXX place-holders separated by spaces. I can't find a reference to how Google Auth handles the fact that 26 isn't a multiple of 4, but I'm guessing that there's either a couple groups of 5 or a final group of 2 characters. The keys are presented in small groups of digits to be user friendly.

I can't find reference to how rigid Google Auth is when someone enters their code using spaces, but I suspect that it allows as many spaces as the user wants, then simply removes the spaces to check if the remaining string is base32 of the appropriate length. But that can be too 'grabby' in this context where the identifier is trying to discern between literal plaintext and base32 encoded strings.

--

In addition to not supporting the 128-bit keys encoded as 26 base32 digits, $hotp is incorrectly supporting the 'lower-case-and-spaces' method, because it is measuring the 16/24/32 (instead of 16/26/32) length while they include spaces instead of verifying those lengths after the spaces are deleted. The base32 encoding of 12345678901 is the 18 character string GEZDGNBVGY3TQOJQGE. $hotp uses this string as a literal text key because the length isn't a multiple of 8. But when padded internally with 6 spaces to make the length be 24, $hotp then deletes the spaces, then it base32 decodes the remaining 18-char string into the underlying binary contents which in this example happen to also be bytes in the printable ASCII range. In this example, 3 different strings return the same password:

Code:
//var %a G E Z D G N BVGY3TQOJQGE         | echo -a $len(%a) $hotp(%a,1,sha1,9) $hotp($lower(%a),1,sha1,9) $hotp(12345678901,1,sha1,9)
//var %a G E Z D G N B VG Y3 T Q O JQ G E | echo -a $len(%a) $hotp(%a,1,sha1,9) $hotp($lower(%a),1,sha1,9) $hotp(12345678901,1,sha1,9)



As far as $hotp's checking is concerned, it doesn't matter where the spaces are inserted or how many non-space characters are in the string, as long as the spaces+alphanumeric string is a total length of 16/24/32. In the above example, the 2nd command returns identical passwords because the insert of 8 additional spaces brought the spaced-padded length to 32, causing the space-padded string to again be handled as base32.

Even if $hotp is fixed to evaluate the correct lengths after the spaces are removed, I'm not sure it's desirable to support space padding of alphanumeric strings in non 'official' groups of characters, or even supporting mixed case. This example shows the password is the same in upper/lower/mixed case due to the actual password being the base32 decoding of the 21 non-spaces into the underlying non-utf8-encoded binary string inside:

Code:
//var %a CuRiOsItY KiLlEd ThE CaT | echo -a $hotp($upper(%a),1) $hotp($lower(%a),1)  $hotp(%a,1)


Adding 8 additional non-consecutive spaces results in the key length increasing from 24 to 32, causing the same 21 non-space characters to be base32-decoded as the same key.

--

I haven't been able to track down any Google Auth references related to hex lengths of 40/64/128 chars needing to be decoded before being used as the key. Every reference of Google Auth keys being encoded has them being encoded as base32 not hex. But I can't imagine $hotp's current handling of hex strings matching any test vectors containing hex encoding of ASCII 00 or 128-255.

The 40/64/128 lengths seem obviously intended to be the hex-text display of 160/256/512 -bit key lengths where an application is wanting to decode the hex-text digests for sha1/sha256/sha512 into binary strings of length 20/32/64. This seems like the kind of thing OpenSSL would do, but I've been finding references to it encoding things as Mime or Base32 and not so much as Hex. It would not be desirable for an application to assume that the underlying contents needs to be UTF8-encoded after the 0x00's are stripped. A pair of hash digests which are identical except for location of 0x00 bytes would generate matching keys if all 0x00's were stripped.

Even when the decoded hex contents appears to already be UTF-8 encoded, the string is being encoded again, as shown by these matching passwords, where the hex encoded key already contains the UTF-8 encoding of $chr(10004):

Code:
//echo -a $hotp($utfencode($chr(10004)),1,sha1,9) $hotp($str(00,17) $+ E29C94,1,sha1,9)


--

This shows what I was trying to say in an earlier post, that Base32 and Hex16 encoded strings are not being handled the same way. The underlying decoded contents of Base32 strings are being used as their un-modified binary contents. Even though the above hex key is already a UTF8-encoded string, it is re-encoded again, so the "E2 9C 94" hex bytes are each encoded so the binary key for both usages becomes "C3 A2 C2 9C C2 94".

When $hotp recognizes a key as being a Base32 string, the key used is the binary contents that's 5/8ths as long. It does not have 0x00's stripped from the binary key nor does it have the remaining bytes UTF8-encoded. On the other hand, the underlying decoded contents of Hex16 strings are being UTF8-encoded after having 0x00's stripped.

In this example, the base32-decoded binary string is not altered, causing it to match the different literal text key and the hex-encoded key, where the hex digits are handled as if they're the encoding of non-UTF8-encoded text instead of encoding a binary hash digest. The latter 2 identical keys are obtained by UTF-8 encoding the 2nd and 3rd different strings into having the same password output for all 3 strings:

Code:
//bset &var 1 $str(195 169 $chr(32),10) | noop $encode(&var,ba) | var %a $bvar(&var,1-).text | echo -a %a $hotp(%a,1) / $hotp($str(,10),1) / $hotp($str(00,10) $+ $str(e9,10),1)



--

To fix these issues, it seems like the hierarchy of rules for handling the key parameter needs to change. Even though that will break backwards compatibility, it would restore support for 128-bit keys encoded by Google Authenticator into 26-character base-32 strings. It would also restore compatibility with applications that expect hex digests of length 40/64/128 to be binary keys of 20/32/64 bytes. It should also avoid false-matches of language passphrases containing spaces and no punctuation simply because their space-padded lengths happened to be 16/24/32.

1st rule:
Old: If key is a length 40/64/128 case-insensitive hex string, it is decoded to become a text string that is then UTF-8 encoded, and any 0x00's stripped. If the string is entirely 0x00's, the key is $null.
New: These hex strings should instead be decoded to binary keys of length 20/32/64 the same way base32 strings are being decoded to their binary contents.

2nd rule:
Old: If key length is 16 or greater and a multiple of 8 (except for lengths 40/64/128 containing only 0-9a-f), and if it's a valid case-insensitive Base32 encoded string without spaces or '=' padding, the key is the binary decoded contents whose length is 5/8ths the length of the Base32 string, with no 0x00's stripped and no UTF-8 encoding as if the contents are text.
New: no change

3rd rule:
Old: If the key length is 16/24/32 and contains only spaces or case-insensitive Base-32 characters, the spaces are stripped and the remaining Base-32 characters of arbitrary length are decoded into a binary key.
New: The target key lengths should instead be 16/26/32, and should be compared against the string length only after the spaces are removed. The spaces should be used only to group the characters into the same pattern presented by Google Authenticator, such as groups of 4 non-spaces, and valid strings should probably not include mixed-case letters.

4th rule:
Old: Any remaining strings not matching the 1st 3 patterns are considered literal text keys, and the input is assumed to already be UTF-8 encoded where necessary.
New: no change

Top
#263043 - 18/05/18 11:22 AM Re: $hotp() base-32 decoding [Re: maroon]
Khaled Offline


Planetary brain

Registered: 04/12/02
Posts: 4107
Loc: London, UK
Quote:
1st rule:
Old: If key is a length 40/64/128 case-insensitive hex string, it is decoded to become a text string that is then UTF-8 encoded, and any 0x00's stripped. If the string is entirely 0x00's, the key is $null.
New: These hex strings should instead be decoded to binary keys of length 20/32/64 the same way base32 strings are being decoded to their binary contents.

When I originally researched this topic, I based the design of $hotp()/$totp() on many of the real-world C/C++ examples, discussions, and examples I found. So, for example, UTF-8 encoding all of the formats was something that was common to the implementations I saw, so that is what mIRC does. UTF-8 encoding obviously breaks keys that include null bytes. So the question is, are null bytes actually allowed?

Quote:
3rd rule:
Old: If the key length is 16/24/32 and contains only spaces or case-insensitive Base-32 characters, the spaces are stripped and the remaining Base-32 characters of arbitrary length are decoded into a binary key.
New: The target key lengths should instead be 16/26/32, and should be compared against the string length only after the spaces are removed. The spaces should be used only to group the characters into the same pattern presented by Google Authenticator, such as groups of 4 non-spaces, and valid strings should probably not include mixed-case letters.

Puzzling. mIRC's implementation uses 16/24/32 because that is what other implementations were using.

As it took a lot of time to research, implement, and validate these identifiers originally, I will need to go through this process again. I have added this to my to-do list.

Top
#263051 - 20/05/18 04:29 AM Re: $hotp() base-32 decoding [Re: Khaled]
maroon Offline
Hoopy frood

Registered: 12/01/04
Posts: 766
As mentioned in section (2) below, question related to the issue of utf8-encoding hash strings, is this being done also by $encode's Blowfish encryption when not using the 'l' switch? For $encode being compatible with OpenSSL, I would expect $encode(33333333,ae,gryffczmj) to generate matching output to:

openssl.exe enc -bf-ecb -in 33333333.txt -out test.out -nosalt -p -pass pass:gryffczmj

but I can't get them to both make the same output, so if I'm not doing something wrong, I thought it might be caused by $encode utf-8 encoding hash outputs?

Quote:

UTF-8 encoding all of the formats was something that was common to the implementations I saw, so that is what mIRC does. UTF-8 encoding obviously breaks keys that include null bytes. So the question is, are null bytes actually allowed?


If by 'key' you mean UTF-8 encoding the command-line parameter being used as input to $hmac or $totp or $hotp, or to a hash function to generate the hash being used to derive the actual encryption key, then yes I can see that being done, as it's needed to ensure input typed by different users would generate matching keys.

But once the binary hash is created, I can't see it having 0x00's stripped and the remaining bytes being individually UTF-8 encoded. UTF-8 encoding of a hash digest isn't needed to ensure compatibility between people using different languages. But also, if hash digest output were translated to utf-8 text, it would greatly weaken the encryption in most cases.

For example, AES-128 uses a 128-bit key, so if AES-128 used the first 128-bits of a hash digest as the key, on average 50% of the bytes would be in the 128-255 range. If the hash digest were $utfencoded, each of those bytes would be replaced 1 byte with 2 bytes. 0x00's would appear on average once every 16 128-bit hash strings. The combined effect of these factors is that utf-8 encoded text strings translated from a hash digest would expand a 16-byte 128-bit digest having a 50/50 mix of 128-255/0-127 into a text string of length averaging around 24 bytes. AES-128 would chop this UTF-8 string at 16 bytes. The first 128 bits of that 23-25 byte UTF-8 encoded string would contain approximately 2/3rds of 128 bits: 85 bits. If the first 8 bytes of that hash digest had only bytes 0x80 through 0xff, utf-8 encoding the hash results in the 128-bit key only having 64 bits in the key.

Also, the way Blowfish handles the input key, a hash output of all 0x00's handled this way would crash the program. Blowfish accepts a variable input of 1-56 characters, and uses a key schedule which expands the input to 72 bytes. If the input were length 0 because it were a hash digest consisted entirely of 0x00's, it's impossible to expand 0 bytes to 72 bytes.

--

Additional evidence I've found of using hash digests only as non-utf-8 encoded binary strings:

(1) HMAC itself. I can't find test vectors involving literal keys containing non-text, but I imagine that applications accepting text input to HMAC would be UTF-8 encoding that text. If using the default sha1, that hash has a 512-bit block size, so if the literal input key string is 512 bits (64 bytes) or shorter, then that literal string is used as the 'secret key'. The input is padded with enough 0x00 bytes to make it be 512 bits. However if the input is longer than 512 bits, the input string is hashed via sha1, and the secret key is instead replaced with the 20 binary bytes of the sha1 digest then padded with 12 0x00 bytes to fill the entire 512-bit block. The sha1 hash is not utf-8 encoded here, nor is it UTF-8 encoded nor 0x00's deleted when the derivative inner/outer keys are created by XOR'ing the binary hash digest with 0x5c and 0x36. When the interior hash digest is appended to the outer key for the outer hash digest, it's not utf-8 encoded either. When that HMAC hash result is used by $totp or $hotp, it's not being utf-8 encoded while calculating the 6-digit numeric password.

--

(2) I haven't yet matched mIRC's Blowfish against another application except matching its ECB mode when using $encode's 'l' switch for literal keys. I haven't been able to match $encode vs OpenSSL for any test vectors containing 0x00's or 0x80-0xff bytes because mIRC doesn't allow &binvar as a literal key. However I have identified OpenSSL using a hash digest as a key, where it uses the binary bytes of the hash without discarding 0x00's or utf-8 encoding the remainder as if it's text. That's the reason I had made a request about using binary keys for $encode's Blowfish. It was to allow using actual binary hashes as the key, and not having them utf-8 encoded as if text.
https://forums.mirc.com/ubbthreads.php/topics/261893/New_$encode_switches

To re-create the match between OpenSSL and my assembler program both using 0x7f-0xff bytes without UTF-8 encoding them:

Code:
//write -n test.dat 33333333

Then at the command prompt:

Code:
openssl.exe enc -bf-ecb -in test.dat -out test.out -nosalt -p -pass pass:gryffczmj


This gives the key display: 10A6FD97002DF6CC087802129F8CD064

The 5th byte is a 0x00 and 3 of the first 4 bytes are ASCII 128-255. This is the first 128 bits: //echo -a $upper($left($sha256(gryffczmj),32))

The output file's first 8 bytes are: 0xAD 0x1E 0xD9 0x2E 0xc7 0x43 0xea 0x40

I match this output in my assembler program by using the 16 bytes "10 A6 FD 97 00 2D F6 CC 08 78 02 12 9F 8C D0 64" as the literal key, without removing the 0x00 nor utf-8 encoding the remainder - showing that's how OpenSSL handles hash output.

(2b)

When not using the 'l' switch, are $encode's 'e' and 'c' switches UTF-8 encoding the hash output and stripping 0x00's? Because I can't figure out how to get compatible output to the above from $encode. I can't get compatible output when I use:

Code:
//bset -t &v 1 $str(3,16) | noop $encode(&v,bae,gryffczmj) | noop $decode(&v,ba) | echo -a $regsubex($bvar(&v,1-16),/(\d+)/g,$base(\t,10,16,2))

--

(3) I finally found Google Authenticator associated with a hex16 encoding of its key, and the hex string is not UTF-8 encoded.

https://lists.open.com.au/pipermail/radiator/2011-June/017420.html

It associates the hex and base32 strings as non-utf-8 equivalents, even when they contain ASCII 128-255 or 0x00's. It shows these as equivalents:

Code:
3132333435363738393031323334353637383930   GEZD GNBV GY3T QOJQ GEZD GNBV GY3T QOJQ
d8f828609e0f4056f852e4c9d75605099f483e20   3D4C QYE6 B5AF N6CS 4TE5 OVQF BGPU QPRA
b906daef6d002ec6cc89106df25f8268ce28f95e   XEDN V33N AAXM NTEJ CBW7 EX4C NDHC R6K6
0000000000000000000000000000000000000000   AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA


These were also 4 examples of the 160-bit Google Auth key encoded as Base32 containing spaces to make it easier for the user to type the key. This translates the above Base32 strings into the above hex equivalent, with spaces for easier reading:

Code:
//bset -t &v 1 $remove(XEDN V33N AAXM NTEJ CBW7 EX4C NDHC R6K6,$chr(32)) | noop $decode(&v,ba) | echo -a $regsubex($bvar(&v,1-),/(\d+)/g,$base(\t,10,16,2))


These 4 Base32 and Base16 strings decode to the same binary bytes. When $hotp and $totp are given the string of these 32 Base32 string without spaces, they correctly decodes it into the 128-bit binary key binary keys. When divided by spaces as above, it's incorrectly handled as if a 39 char literal text key because the length including spaces isn't a multiple of 8. Adding 1 additional space between any of the groups of 4 would make the length be 40, and would decode correctly because the length-with-spaces is a multiple of 8. When given the length 40 hex string equivalent, it's instead decoded to the same 20 binary bytes as the Base32 key, but it's being further processed, having the 0x00's removed, then the remaining bytes are individually utf-8 encoded and used as if a text string.

Top