Invalid key lengths in $encode(data,<e[l]|cl>,key)

mIRC Homepage

I'm assuming the major goal for $encode is to be compatible with OpenSSL, so 'e' and 'cl' should handle code points 128+ or too-long key parameters the same way OpenSSL does (or should). $encode violates the Blowfish cipher's design requirements in hopefully rare cases:

* when using 'e' with $len(key) or $len($utfencode(key)) greater than 56. There's no issue with shorter keys containing codepoints 128+.

* Using 'el' or 'cl' where the key string contains characters in the 128+ range which are UTF-8 encoded into multiple bytes each. It rejects $len() strings shorter than 56 having the correct $utfencode() length of 56 bytes, but accepts $len() 56 strings incorrectly having more than 56 UTF-8 bytes.

If there's a decision to maintain backwards compatibility for people who have usages where they used too-long keys, perhaps adding a status window warning advising them to change this weak key.

-

The Blowfish design of 16-round Blowfish uses an expanded password string of 72 bytes, which it fills by allowing the input string to have no more than 56 bytes, then replicates that string as many times as needed until it contains 72 bytes. This requires at least 16 bytes of the 72-byte expanded key be repeats, making it difficult for an attack where someone creates completely different 72-byte expanded passwords which have matching 'sub-keys' within the encryption cipher.

When 'e' 'el' or 'cl' sends literal keys to Blowfish, they're using $utfencode(key) as the replicated byte pattern, as they should. But they're checking $len(key) instead of $len($utfencode(key)) to determine whether the string has the correct number of bytes.

Assuming mIRC is otherwise compatible with OpenSSL, its current behavior is equivalent to "bset -t &key 1 key-parameter" without the 'a' switch. The only changes needed to fix the length issue are:

* using $bvar(&key,1-56) instead of $bvar(&key,1-72) as the byte pattern expanded to 72 bytes. This should apply only to 'e' or 'cl' since hashing of long non-l 'c' strings doesn't violate Blowfish's design because they use 'hash(key+other)' instead of 'key'.
* the 'l' switch needs to check $bvar(&key,0) instead of $len(key-parameter) when verifying the length is 56.
* decide for compatibility purposes how to handle currently permitted illegal passwords whose $len($utfencode(key)) lengths are greater than 56.

The part below is just demonstrating how I identified that too-long UTF-8 encoded byte strings were being used.

--

Virtually all the test vectors for Blowfish have keys having bytes which get expanded by UTF-8 into bytes pairs, so those test vectors can't be replicated in mIRC without using a binary variable as the key, but that's a separate feature request. Here's an early Blowfish article which shows a test vector using only 7-bit text for both plaintext and key:

Quote:

http://www.drdobbs.com/the-blowfish-encryption-algorithm-one-ye/184409634

************** TEST VECTORS ***********************************
This is a test vector.
Plaintext is "BLOWFISH".
The key is "abcdefghijklmnopqrstuvwxyz".

#define PL 0x424c4f57l
#define PR 0x46495348l
#define CL 0x324ed0fel
#define CR 0xf413a203l
static char keey[]="abcdefghijklmnopqrstuvwxyz";

1) After loading the following code, using '/BFtest A 26' recreates the ciphertext defined in the article, where the plaintext is the 8-byte string BLOWFISH and the key is the 26-byte alphabet. The alias converts the ciphertext display of $bvar(&var,1-) from the 1-3 digit decimals to 2-digit hex, making it easier to match the vector listed in hex.

Code:

alias bvar2hex { var %h $1- | var %i $numtok(%h,32) | while (%i) { var %h $puttok(%h,$base($gettok(%h,%i,32),10,16,2),%i,32) | dec %i } | return %h }
alias BFtest {
  echo -a ---- /BFtest <a|b|c> [Length] | var %switches bm $+ $iif($1 isin c,cri,e) $+ $iif($1 isin bc,l) | bset -t &var 1 BLOWFISH123 | ; the 123 avoids length-8-binary bug
  var %IVchar 7 | var %ChopLength $iif(($2 isnum 1-) && (l !isincs %switches),$int($2),56)
  if ($1 isin c) { var %i 8 | while (%i) { bset &var %i $xor($bvar(&var,%i),%IVchar) | dec %i } }
  if ($1 isin a)  var %key $str(abcdefghijklmnopqrstuvwxyz,3)
  if ($1 isin bc) var %key $str(abc $+ $chr(233) $+ $chr(233),19) | var %key $left(%key,%ChopLength)
  echo switches: %switches keylen: $len(%key) utflen: $len($utfencode(%key)) key: %key plaintext hex/text: $bvar2hex($bvar(&var,1-8)) $bvar(&var,1-8).text
  if ($1 isin c) noop $encode(&var,%switches,%key,$str($chr(%IVchar),8))
  else           noop $encode(&var,%switches,%key)
  noop                $decode(&var,bm) | var %range 1-8 | if ($1 isin c) var %range 17-24
  echo $chr(3) $+ 0,4ciphertext range %range $+ : $bvar2hex($bvar(&var,%range)) / $bvar(&var,1-).text
  if ($1 isin abc) { bset -t &binkey 1 %key | echo pattern replicated to fill 72-byte expanded pass: $bvar(&binkey,1-72) $iif($bvar(&binkey,0) isnum 57-,$chr(22) bytes at the end that should not be used: $bvar(&binkey,57-72)) }
}

2) Using '/BFtest A 57' shows the ciphertext output where the key is the alphabet repeated until it's an invalid key length of 57. It expands to the 72-byte expanded password in a way that can't be created from the correct method of expanding a length 1-56 string to 72 bytes. I was able to duplicate mIRC's encoded output only by altering a Blowfish utility to allow the max input key length be greater than 56.

This issue should only affect switch 'e' and 'cl', because 'c' without 'l' hashes the key string to fewer than 56 bytes, regardless of the length of the key parameter.

3) As I understand OpenSSL's stated behavior, it converts keyboard input to UTF-8 bytes, and it looks like mIRC does that with the 'key' parameter when using the 'e' switch, and 'cl' does the same thing. However, if this string contains any characters which UTF-8 encodes into multiple bytes each:

* 'e' uses as many as 72 UTF-8 encoding bytes in the pattern that's replicating until it's 72 bytes. It should use a pattern no longer than 56 UTF-8 bytes regardless of $len(key).
* 'el' and 'cl' incorrectly accepts a key parameter whose 56-character string has a UTF-8 encoded length longer than 56 bytes
* 'el' and 'cl' incorrectly reject 56-byte UTF-8 encoded strings because the non-encoded length is less than 56.

'/BFtest B' uses the 'el' switches to require length of exactly 56. This key is a 5-character string that's repeated 11 times plus an ending 'a' for a key $len() of 5*11+1=56. However this new key contains characters which UTF-8 encodes to more than 1 byte, so %key has a $utfencode() length of 7*11+1=78. The only way to match $encode's output is when the 72-byte expanded password does not repeat any portion of the key, and instead becomes the first 72 out of those 78 bytes, a string which cannot be obtained by doing as it should, replicating no more than the first 56 UTF-8 encoded bytes.

4) '/BFtest C' has 'cl' replicating the 'el' behavior by using the same password, but XOR's the plaintext by the same ASCII value used to fill the fixed IV. This allows the 'cl' CBC feedback to encrypt only the 1st 8-byte block the same way as done by 'el'. In CBC feedback, the 1st plaintext block is XOR'ed against the IV before it's encrypted, so this XOR causes the IV in 'cl' to XOR the 1st plaintext block back to the same 'BLOWFISH' string seen by 'el'.

Changing %IVchar from 7 to a number in the 128-255 range doesn't alter the encrypted ciphertext, so the 4th parameter isn't UTF-8 encoded for the 'i' switch and probably not for 's' either. I'm pretty sure this is compatible behavior with OpenSSL, as UTF-8 encoding the 's' and 'i' values would greatly reduce the possible values which could fit within 8 bytes, and could cause some to be unexpectedly chopped.

If altering this snippet for testing different key, remember that Blowfish keys whose UTF-8 encoded length is not longer than half the allowed 56 length are equivalent, because they expand to the same 72-byte expanded keys. i.e. 'abcd' and 'abcdabcd' are the same key, as are '/BFtest a 26' and '/BFtest a 52'. Because of the current incorrect 'e' expansion of the key, 'BFtest a 72' is possible and is identical to 26 and 52 instead of being chopped to 56 and making a different expanded key.

Thanks for your bug report. Can you summarize the issue in one, short paragraph with a minimal script that reproduces the issue? :-)

1. Summary of issue in one or two lines.
2. Call to only one or two commands/identifiers, if possible.
3. Current output: X.
4. Expected output: Y.

That's all I need. If I cannot understand a short bug report, or cannot reproduce the issue, I will ask for more details.

2 issues related to invalid key lengths:

#1) When key is longer than 56 bytes (not necessarily same as $len() = 56), Blowfish should either reject as invalid length or chop at 56 bytes so it can repeat a minimum 16 bytes while expanding key to 72 bytes. %badkey returns the same test vector as in the link above, but should have returned the same string as %goodkey because both share the same 1st 56 bytes, and therefore should have expanded to the same 72-byte pattern.

Code:

alias Test_e {
  var %badkey $left($str(abcdefghijklmnopqrstuvwxyz,3),72) | var %goodkey $left(%badkey,56)
  bset -t &data1 1 BLOWFISH | noop $encode(&data1,bme,%badkey ) | noop $decode(&data1,bm) | echo 4 -a Wrong: $bvar(&data1,1-8)
  bset -t &data2 1 BLOWFISH | noop $encode(&data2,bme,%goodkey) | noop $decode(&data2,bm) | echo 3 -a Correct: $bvar(&data2,1-8)
}

#2) el and cl correctly give UTF-8 $utfencode strings to Blowfish, but incorrectly validate the $utfdecode length, accepting 57-72 byte keys because $len() is 56, but rejecting 56-byte key which have $len() shorter.

I got 'el' and 'cl' to both return the same vector by having cl's XOR of lower-case plaintext and IV of eight 0x20 spaces cancel each other out.

'cl' should have either rejected %badkey as having 57 bytes or used key containing the 56 bytes of $bvar(&bad57,1-56) and should have accepted $len() 55 %goodkey as a valid key containing 56 bytes.

Correct ciphertext for key being first 56 bytes of $bvar(&bad57,1-56) is:

hex: 43 37 A2 45 17 96 A3 01
decimal: 67 55 162 69 23 150 163 1

Code:

alias Test_cl {
  var %badkey  $str(a,55) $+ $chr(233)
  var %goodkey $str(a,54) $+ $chr(233)
  bset -t &data1 1 BLOWFISH | noop $encode(&data1,bmel ,%badkey                 ) | noop $decode(&data1,bm) | echo 4 -a 57-byte key: $bvar(&data1,1-8)
  bset -t &data2 1 blowfish | noop $encode(&data2,bmcli,%badkey,$str($chr(32),8)) | noop $decode(&data2,bm) | echo 4 -a 57-byte key: $bvar(&data2,1-8)

  bset -t &bad57  1 %badkey  | echo -a Above Accepts $bvar(&bad57 ,0) bytes: $bvar(&bad57 ,1-)
  bset -t &good56 1 %goodkey | echo -a Below Rejects $bvar(&good56,0) bytes: $bvar(&good56,1-)

  echo -a Next 2 lines should return same vector, key has 56 UTF-8 bytes but 'cli' rejects as invalid parameter:
  bset -t &data1 1 BLOWFISH | noop $encode(&data1,bme  ,%goodkey                 ) | noop $decode(&data1,bm) | echo 3 -a 56-byte key: $bvar(&data1,1-8)
  bset -t &data2 1 blowfish | noop $encode(&data2,bmcli,%goodkey,$str($chr(32),8)) | noop $decode(&data2,bm) | echo 3 -a 56-byte key: $bvar(&data2,1-8)
}

I don't know how 'c' without 'l' hashes the key, but I expect that it correctly inputs the UTF-8 bytes to the hash, and returns the correct hash output. Since the hash output is shorter than 56, there will not be an issue of invalid key length there.

Thanks. I was able to reproduce both issues. I have changed the behaviour so that the key and the salt/iv are chopped at 56 and 8 UTF-8 characters respectively. The encode/decode routines treat both of these as circular buffers either way. These changes will be in the next version.

Quote:

so that the key and the salt/iv are chopped at 56 and 8 UTF-8 characters respectively

Thanks, but a clarification: I did not mention the UTF-8 issue related to Salt/IV in my latest post, because I did not think $encode handles Salt/IV wrong. The only reason one of my examples used an IV was that an IV was the only way to keep 'cl' from using a random salt, allowing to demonstrate the 'cl' key's UTF-8 behavior in an unchanging vector.

Unless you're finding $encode is not handling ASCII 128-255 within IV in a compatible way with OpenSSL, I don't think the Salt/IV need changing. Current behavior is to not UTF-8 encode the IV/Salt into longer byte strings when ASCII 128-255 are used, so I was not finding a length issue there.

The red/blue lines show that Salt and IV are not storing ASCII 128-255 into the ciphertext header as UTF-8 byte pairs.

The blue/maroon lines show $encode doesn't use a UTF-8 encoded IV internally either, or else blue/maroon would not have matching ciphertexts. Matching ciphertexts could not happen if maroon's IV is full of UTF-8 byte pairs while the binary plaintext contains 8 identical bytes. I assume the switch "s" Salt is handled the same way as IV internally, but I couldn't verify without knowing how $encode hashes the key and Salt together.

(Bump for my feature request to permit defining key/salt/iv as binary variables using capital switches KSI.)

Code:

alias test_ivsalt {
  var %data8 abc $+ $chr(233) $+ $chr(233) $+ def | echo -a iv/salt for red/green is %data8
  bset -t &data1 1 BLOWFISH1234567 | noop $encode(&data1,bmcri,key,%data8) | noop $decode(&data1,bm) | echo 3 -a As Text: $bvar(&data1,1-).text | echo 3 -a Len $bvar(&data1,0) Bytes: $bvar(&data1,1-)
  bset -t &data2 1 BLOWFISH1234567 | noop $encode(&data2,bmcs ,key,%data8) | noop $decode(&data2,bm) | echo 4 -a As Text: $bvar(&data2,1-).text | echo 4 -a Len $bvar(&data2,0) Bytes: $bvar(&data2,1-)

  var %asc 116
  var %xor1 000 | var %iv1 $str($chr($xor(%asc,%xor1)),8)
  var %xor2 157 | var %iv2 $str($chr($xor(%asc,%xor2)),8)
  bset -c &data1 1 $str($xor(%asc,%xor1) $chr(32),8) $str(a $chr(32),7) | echo 2 -a XOR by %xor1 Data: $bvar(&data1,1-) IV: %iv1 | noop $encode(&data1,bmcri,key,%iv1) | noop $decode(&data1,bm) | echo 2 -a As Text: $bvar(&data1,1-).text | echo 2 -a Len $bvar(&data1,0) Bytes: $bvar(&data1,1-)
  bset -c &data2 1 $str($xor(%asc,%xor2) $chr(32),8) $str(a $chr(32),7) | echo 5 -a XOR by %xor2 Data: $bvar(&data2,1-) IV: %iv2 | noop $encode(&data2,bmcri,key,%iv2) | noop $decode(&data2,bm) | echo 5 -a As Text: $bvar(&data2,1-).text | echo 5 -a Len $bvar(&data2,0) Bytes: $bvar(&data2,1-)
}

Thanks, yes, the only change is that both are chopped at the appropriate length, although looking at the code this is not necessary for the salt/iv as the routines that use it only use the first 8 bytes anyway.

Thanks. Looks like both Blowfish bugs are fixed. I can't get it to use 57-or-more bytes of the 'key' parameter - whether or not it contains UTF-8 byte-pairs, and short lengths of binary variables no longer return "line too long" error.

There is a change caused during this fix. The 'l' switch now causes a literal key regardless of length, no longer enforcing length 56. This means 'l' has no effect when used with the 'e' switch, and 'cl' accepts a literal key of any length of 1-56 with CBC feedback, and doesn't hash the key. This results in output compatible with 'e' for the first 8-byte block only, as long as 'cli' uses an IV which is the XOR of its own data and the data used by 'e'.

It's up to you whether you consider this a bug. I think it's fine to leave it as-is, since the new 'l' behavior for non-56 length is returning values where it had formerly returned an error. It would just need /help to remove the "must be 56 characters" portion of the 'l' description.

This is the earlier post's link's test vector, and new beta returns the same output as before:

Code:

//bset -t &data 1 BLOWFISH | noop $encode(&data,bme, abcdefghijklmnopqrstuvwxyz ) | noop $decode(&data,bm) | echo 3 -a $bvar(&data,1-8)

If you change the above switches from 'bme' to 'bmel', you now get the same output. Previously, adding the 'l' switch generated an error because the key length wasn't 56.

'cl' now allows literal keys shorter than 56, and I can tell the 'cl' switches use literal instead of hashed keys for non-56 lengths by the fact that below returns the same test vector output as above for just the 1st 8 bytes, because the IV uses the space character which is the XOR of UPPER/lower case data:

Code:

//bset -t &data 1 blowfish | noop $encode(&data,bmcli, abcdefghijklmnopqrstuvwxyz , $str($chr(32),8) ) | noop $decode(&data,bm) | echo 4 -a $bvar(&data,1-8)

Thanks, I have removed the "must be 56 characters" portion of the 'l' description for the next version.

I see the salt/iv parameter is no longer forced to be exactly length 8, truncating to 8 if longer, and if the parameter is shorter than 8, it encrypts using a salt/iv padded with $chr(0)'s to length 8.

However, when the salt/iv parameter is used but the contents are $null, $encode now returns $null and the binary variable as the 1st parameter is unchanged. Not sure if this is intended. Perhaps it would be better if it either returns an error for invalid parameter or encrypts using salt/iv consisting of 8 $chr(0)'s.

Code:

//var %salt $null | bset -t &data 1 TEST | echo -a result: $encode(&data,bmcs,key,%salt) | echo -a contents: $bvar(&data,1-).text

This is actually typical of most identifiers. If a parameter is $null and the identifier does accept $null for that parameter, it returns $null as the result and does not process the identifier. In this case, $null is not allowed for the salt parameter, so the identifier returns $null without doing anything.

In this case, I can change $encode()/$decode() to allow $null for all parameters. This will be in the next beta.

Update: Actually, making that change could be a problem as it could affect backwards compatibility. I will make the above change so that it only applies to the 'ec' encryption switches.

This is a follow-up post to correct part of my earlier post, and to document Blowfish issues for anyone searching the forum later.

There are 2-3 bug-ish behaviors in the current handling of the key or salt|IV parameters.

1. It turns out that key input longer than 56 bytes is not quite the 100% invalid as I had thought.

2. key parameter longer than 56 bytes is silently chopping extra bytes

3. salt|IV parameter is silently turning all codepoints 256+ into the '?' character.

--

1. The issue of whether the max input string is 56 or 72 is caused by conflicting information. A good discussion of this issue is at:

Quote:

https://bugs.launchpad.net/pycrypto/+bug/695417

I do agree with the conclusion in that thread, that input could be allowed up to 72 bytes if it's done as an optional switch.

The documents describing the Blowfish design all say it's a 448-bit cipher, and state that not all bits of the final 16-of-72 bytes affect all bits of the encrypted data. A 72-byte key consisting of 56 bytes worth of constants and 16 bytes of actual key material would have lesser strength if the 16 bytes of secret key material were located in bytes 57-72 than if it were anywhere earlier in the key.

The confusion comes from some of the example source codes linked at Schneier's Blowfish page which don't all check for input longer than 56, or even whether it's longer than 72, meaning that someone could use that code to make use of keys longer than 56 bytes. The test vectors linked there give examples only of key lengths of 1-24 bytes, with none consisting entirely of text in the ASCII 33-126 range.

2. Keys now being chopped after byte 56 means that people can use $encode key parameters containing data that doesn't affect the encryption, but the lact of effect is disguised by the default random salt changing the ciphertext each time. These examples always generate identical ciphertext because the changing portion of the key parameter exists beyond position 56:

Code:

//echo -a $encode(testtest,cms,$str(a,55) $+ $rand($chr(192),$chr(255)) ,saltsalt )
//echo -a $encode(testtest,cms,$str(a,56) $+ $rand(a,z) ,saltsalt )
//echo -a $encode(testtest,cms,$str(a,72) $+ $str(b,$rand(0,1)) ,saltsalt )

A solution could be to allow a new switch to be valid only with byte lengths 57-72, and reject a key parameter longer than 56 bytes without using the switch. This should restore support for pre7.52 keys which were longer than 56 bytes, while also alerting people trying to use a key which does not conform to the official Blowfish design.

3. The salt|IV parameter does not UTF-8 encode codepoints 128-255, which is good because that would greatly reduce the number of unique user-defined salt|IV strings that could fit into the 8-byte string. However if codepoints 256+ are used with 's' or 'i', it substitutes them with the 0x3f '?' character, potentially causing duplicate salt|IV strings. I'm not sure what a fix would be, other than rejecting salt|IV strings that would create an IV longer than 8 bytes, allowing the salt|IV parameter to be listed as a hex string or treating salt|IV parameter containing codepoint 256+ as an invalid parameter.

Code:

//var %iv ABCDEFGH | bset -t &v 1 $encode(&v,cmir,key,%iv) | noop $decode(&v,bm) | echo -a $bvar(&v,1-16)

output: 82 97 110 100 111 109 73 86 65 66 67 68 69 70 71 72

By decoding only the mime layer, this reveals that the encrypted string has a header "RandomIV" followed by the IV parameter as bytes 9-16. The next example places codepoints from both ranges 128-255 and 256+ into the IV:

Code:

//var %iv $chr(233) $+ $chr(234) $+ $chr(10004) $+ $chr(10005) $+ ABCD | bset -t &v 1 $encode(&v,cmir,key,%iv) | noop $decode(&v,bm) | echo -a $bvar(&v,1-16) | clipboard $bvar(&v,1-16)

output: 82 97 110 100 111 109 73 86 233 234 63 63 65 66 67 68

The 128-255 characters are correctly placed as a single byte, but all codepoints I've tested in the 256+ range are replaced with the 63 "?" character.

For the above 2 examples, the same thing happens with the user defined salt parameter. When replacing the 'ir' switches with 's', the "RandomIV" header is replaced with "Salted__", and the same bytes appearing in the 9-16 position are used as the salt instead of as the IV.

Bug: CBC with non 'l' switch chops the key parameter's input to the underlying $md5 hash function.

The limit of 56-bytes is being applied as an output filter to the key parameter, instead of being applied only as the input filter to the key schedule subroutine which accepts a string of 1-56 bytes and expands it to length 72.

This affects the CBC switch combos which do not use the 'l' switch: 'cr' and 'ci' which use hash(key parameter) as input to calculating (secret56), and 'c' and 'cs' use hash(key_parameter:salt) to generate 64 digits for (secret56:IV8). The purpose of the $md5 hash is to ensure the input can't be longer than the 56 limit, so there's no reason to limit that type of input. This alias shows 4 examples of 94-byte key parameters which are able to be decrypted with non-literal 56-byte keys, where this is caused by the input to md5(string) limiting the 'key' portion of the input at 56 bytes:

Code:

alias bf_hash_input_limit56 {
  var -s %key94 $regsubex(junk,$str(x,94),/x/g,$chr($calc(32+ \n))) , %key56 $left(%key94,56)
  var -s %a $encode('mc' hash key94 + random salt,mc,%key94)
  echo -a $decode(%a,mc,%key56)
  var -s %a $encode('mcs' hash key94 + fixed salt,mcs,%key94,SaltOrIV)
  echo -a $decode(%a,mc,%key56)
  var -s %a $encode('mcr' hash key94 only + random iv  ,mcr,%key94)
  echo -a $decode(%a,mcr,%key56)
  var -s %a $encode('mci' hash key94 only + fixediv  ,mci,%key94,SaltOrIV)
  echo -a $decode(%a,mci,%key56,SaltOrIV)
}

http://www.herongyang.com/Blowfish/Perl-Crypt-CBC-Salted-Key-Test-Cases.html

The 2nd of 4 examples on this page is a test vector where the input key is a binary string of length 80 bytes. It does not display the 56-byte secret key generated by the hash, but it does display the IV as 3c05d2f32c8d1d14, which matches the output from the above algorithm modified to accept binary keys, and which does not limit the length of the string being hashed by $md5. In this alias, $md5 is hashing a binary string of length 16+56+8=80 bytes, and the test vector's IV is the last 8 of 64 bytes generated by the hashing subroutine.

Code:

alias Openssl_salted_keygen_binary {
  bunset &raw &key
  var -s %binkey 1122334455667788990011223344556677889900112233445566778899001122334455667788990011223344556677889900112233445566778899001122334455667788990011223344556677889900
  var -s %binsalt 0000000000000000
  bset -c &pass+salt 1 $regsubex(junk,%binkey $+ %binsalt,/(..)/g,$base(\t,16,10) $chr(32))
  while ($bvar(&key,0) < $calc(56+8)) { noop $salted_digest_to_binary }
  bcopy -c &pass 1 &key 1 56 | bcopy -c &iv  1 &key 57 8
  var %bin_key  $regsubex($bvar(&key,1-56) ,/(\d+)/g,$base(\t,10,16,2) $chr(32))
  var %bin_salt $regsubex($bvar(&key,57- ) ,/(\d+)/g,$base(\t,10,16,2) $chr(32))
  echo 4 -a literal key in hex: %bin_key
  echo 4 -a literal  iv in hex: %bin_salt
}
alias salted_digest_to_binary {
  if ($bvar(&hash,0)) bcopy -c &raw 1 &hash 1 -1 | bcopy &raw -1 &pass+salt 1 -1
  bset -c &hash 1 $regsubex($md5(&raw,1),/(..)/g,$base(\1,16,10) $chr(32)) | bcopy &key -1 &hash 1 -1
  if ($bvar(&key,0) > $calc(56+8)) bcopy -c &key 64 &key 64 1
  echo 3 -a $bvar(&key,0) of 64 generated: $regsubex($bvar(&key,1-) ,/(\d+)/g,$base(\t,10,16,2) $chr(32))
}

Result from running: /Openssl_salted_keygen_binary

literal key in hex: C3 63 D2 5C 49 8B 5B E0 D5 5C 23 38 06 D8 89 BC 73 4C 49 FE E8 71 BB E1 73 24 0C 38 EA CC B8 5A 73 9D BA 62 8D 2D 64 15 FE 34 61 EC 17 69 02 E8 71 15 CE 92 A7 81 EA 34
literal iv in hex: 3C 05 D2 F3 2C 8D 1D 14

... which matches the IV shown in the linked test vector.

--

The other routine used by 'cr' and 'ci' is described at:

http://www.drdobbs.com/web-development/encryption-using-cryptcbc/184416083

It limits the non-literal key's strength to 128 bits because it expands the $md5 hash digest from 16 to 56 bytes in a way where identical first-16 bytes must always have identical bytes 17-56. After describing the method of md5-hashing the non-literal key, it says:

"On Line 7, it is perfectly all right to use a key of arbitrary length because regenerate_key is set to 1."

I read 'arbitrary' as meaning the input to the hash is allowed to be any length, including those longer than the 56 limit of the secret key. I haven't located any test vectors to prove that 'arbitrary' means md5(string longer than 56).

In addition to limiting the key-parameter's portion of the input to the md5 hash function at 56 bytes, $encode is also not treating the Blowfish key parameter as required, permitting it to be $null, resulting in numerous switch configurations where there's little or no secret material used to generate the key.

Code:

//var %a $null | echo -a $encode(test,mcrl,%a)
//var %a $null | echo -a $encode(test,me,%a)
both keys are same as if the key 0x00 hex is used
//var %a $null | echo -a $encode(test,mcr,%a)
key derived from binary digest of $md5($null)
//var %a $null | echo -a $encode(test,mcs,%a,SaltSalt)
key derived from binary digest of $md5($null $+ SaltSalt)

Quote:

Isn't this the change that I mentioned in my previous post? That it now allows all parameters to be $null. It is up to the scripter to provide the correct parameters.

I have to say, I feel like I am going around in circles with this. When I first implemented these identifiers, they were intended to be OpenSSL-compatible by default. However, the scripters testing them made a number of requests regarding how parameters should be handled/truncated/converted to UTF-8/etc. and they ended up not being OpenSSL-compatible by default. The code contains numerous commented out checks for lengths/conversions/etc that were originally used but were changed on request.

Quote:

1. It turns out that key input longer than 56 bytes is not quite the 100% invalid as I had thought.

2. key parameter longer than 56 bytes is silently chopping extra bytes

A solution could be to allow a new switch to be valid only with byte lengths 57-72, and reject a key parameter longer than 56 bytes without using the switch. This should restore support for pre7.52 keys which were longer than 56 bytes, while also alerting people trying to use a key which does not conform to the official Blowfish design.

Okay, I will revert this change so that literal keys are limited to 56 bytes again but non-literal keys are not. This will halt/break scripts that use longer literal keys and break scripts that use longer non-literal keys.

Quote:

3. salt|IV parameter is silently turning all codepoints 256+ into the '?' character.

I'm not sure what a fix would be, other than rejecting salt|IV strings that would create an IV longer than 8 bytes, allowing the salt|IV parameter to be listed as a hex string or treating salt|IV parameter containing codepoint 256+ as an invalid parameter.

There is commented out code that rejects salt/IVs not 8 bytes long - someone requested that any length be allowed. I will change it so that it will halt with an error if a scripter tries to use codepoints 256+ in the salt/IV. This will break all scripts that use codepoints 256+ in the salt/IV.

Quote:

Actually, instead of halting the script with an error, it could just UTF-8 encode codepoints 256+. This would still break older scripts but it would allow newer scripts to use Unicode characters if they wished.

As long as the IV/Salts can only be input as a text string, the current behavior of not UTF8 encoding codepoints 128-255 is good, because the alternative is to greatly reduce the number of possible salts. Allowing salts shorter than 8 also increases the number of possible salts, since that's the only way to get 0x00 into the salt. If using $chr(10004) were another way of using $chr(226) $+ $chr(156) $+ $chr(148) that would be better than silently changing all 256+'s into codepoint 63's, as long as someone's use of $chr(10004) didn't cause it to silently ignore 2 other characters of the salt.

As for breaking older scripts, they could still decode their messages by changing those characters into questionmark 63's.

Unless it needs to maintain compatibility with something else that does so, I don't think it's a good idea to silently ignore portions of the key/salt/IV parameters because they're too long. If it's longer than a valid encryption parameter, then it can't be a valid parameter. The switch combos not using the 'l' literal use the key parameter as input to an MD5 hashing, and it seems reasonable that a string of any length should be valid as input to a hash digest. So far, the only example I can find describing the key being hashed the same way $encode hashes non-literal keys is from the Crypt::CBC package, and I showed a link where it accepted MD5 input longer than 56 bytes, then use the output of the recursive hashing to make a 56 byte key out of it.

I guess a 'chop me' switch could be confusing if it makes the user think their 16 byte Salt or 80-byte literal key would not be chopped in the absence of that switch. I was thinking of it being more for the edge case of forcing too-long keys to be used in a valid manner, including the current handling of a UTF encoded character that's split across both sides of the 56 border.

The CBC 'salt' should be a 64-bit binary value regardless of contents, but instead is sometimes truncated to a shorter string when $encode removes any 0x00 byte and anything following it. I also found this in v7.55 no-beta and v7.36, so it's not related to the recent fixes.

This results in 1 of every 256 random 8-byte salt strings handled by being truncated to $null due to the first byte randomly being 0x00. There are additional identical truncated salts where the first occurrence of the 0x00 is at a different byte position, increasing the frequency of these dupes. The test vector I linked above produces the correct outcome only when that salt of all zeroes is handled as eight 0x00 bytes instead of appending $null to the password.

Bytes 9-16 within the mime string contains the 8 byte salt, so this portion of the string is almost always unique when containing a random salt string. However the encrypted data is often identical because the salted key and IV are created using matching truncated salts. This next example almost always finds somewhere between 100-130 messages using a truncated salt that's previously used in this same group of messages.

Note: this alias takes approx 3 minutes, and the display total does not include salts which were truncated without being duplicated. If they were included, the total would have been closer to 800.

Code:

//var %i 25600 , %count 0 | hfree -w salt | while (%i) { var %a $mid($encode(abcdefghijklmnopqrstuvwxyz,mc,key),23) | if ($hfind(salt,%a)) { inc %count | echo -a i: %i match# %count : $v1 } | else hadd -m salt %a | dec %i }

Update:

The true ratio of random salts containing the 0x00 byte would be 1 out of every 32.44 messages using an incompatible salt shorter than 8 bytes.

Code:

//echo -a $calc(1/ (1-(255/256)^8) )

The workaround until this is fixed would be to limit the salt to one of the 97% of salt strings which don't contain the 0x00 byte

Code:

alias randsalt returnex $regsubex($str(x,8),/x/g,$chr($rands(1,255)))

then use it with the 's' parameter when creating salted messages

Code:

$encode(message,mcs,key,$randsalt)

After the fix, I estimate that this should decrypt as normal text:

Code:

//echo -a $decode(U2FsdGVkX18AAAEAAOkAAIC4MK5GejRGGZ/W3g4BJoY=,mc,test key)

and

Code:

//echo -a $encode(message,mcs,key1234,5678)

result now: U2FsdGVkX181Njc4AAAAAOZf+ZmPFy3l same as $encode(message,mcs,key,12345678)
after fix:: U2FsdGVkX181Njc4AAAAAKR3f8+Rmn/r

The salt is a string, a %var not a &binvar. It cannot contain null characters. If you would like the salt to be interpreted as a &binvar, we would need to add a new switch.

I think for futurization with other ciphers and with other applications using these ciphers, it may be necessary to support either binvar content or hexadecimal strings.

My argument would be to automatically detect and support a &binvar if that &binvar presently exists, else treat it as a literal string that happens to begin with an ampersand.

The salt *parameter* is a text string, yes. Rather than having parameters be &binvars, my list of improvements had previously suggested a switch where the key and salt/iv parameters both be seen as hex instead of UTF8 text. I was not asking here for it to accept a &binvar parameter, I'm referring to how the 64-bit salt value seen in the string's header at byte positions 9-16 should be handled internally as that same 64-bit value.

Using the 's' switch is just overriding the effect of how the 64bit salt is randomly created, and using a salt parameter shorter than 8 is no different than a random salt that happened to generate 0x00 for the 8th byte. $decode can always decrypt without using the 's' switch because the 64bit salt created by the 's' salt-parameter is stored there, and $decode has no way of knowing whether that salt was generated randomly or by user input. If 1 or more of the final bytes of the 64bit salt is the 0x00 byte, then it should still be part of the 8-byte 64-bit value, the same as the 0x00 would be if it were randomly generated.

The random salt has always been created as a binary value, where each of the 8 bytes has a 1/256 chance of being a 0x00 byte. 1 out of every 32.4 64-bit random values has at least one 0x00 byte in it, and they're being hashed differently than other programs would expect. As I've seen described at the link I pasted, the method of combining the salt and password involves a hash function where the password can be of variable length, but the salt is a fixed length of 8. By not including the trailing 0x00's padded to the end of user-input salts, it creates incompatible hashes due to a not=8 length input being combined with the passphrase as input to a hash function which generates the IV and the salted-key.

In addition to being incompatible, this behavior of chopping at the 1st 0x00 byte is happening for the default generating of random 64-bit salts, and there it's causing many messages with unique 64bit salts and the same key parameter to be encrypted identically, only differing by having the 'unique' salt stored in the encrypted string's header. In addition to being incompatible with the test vectors, this is contrary to the intent of what a random salt should be doing.

By having a 64-bit salt combined with the key, it's supposed to allow somewhere in the neighborhood of 2^64 different salted-keys being generated from the same passphrase key+salt. 1 out of every 256 random salts has the 1st byte of the 64bit salt randomly generated as the 0x00 byte. By truncating the random salt at the first occurrence of the 0x00 byte, this group of random salts has 2^56 members in it, and they're all hashed by combining the passphrase with the $null string instead of combining the passphrase with the 8-byte value shown in the header as bytes 9-16. The same thing is happening to other groups of random salts which have 2^48 members, 2^40 members, etc. Instead of the 'birthday paradox' causing a random 64bit salt to have a 50% chance of being duplicated in 4 billion messages, the example showed truncated duplicates were happening over 100 times in 25k messages.

At first glance, a lot of these strings appear different, but that's only because of containing a unique salt in their header, which sometimes is mostly ignored. In this example, the "+++++++++" is where most of the truncated 64bit salt is stored in the header. The +'s can be replaced with any other mime character and it has no effect on the decryption.

//echo -a $decode(U2FsdGVkX18A+++++++++0rIX3dSCYYa216ecXj5pkL9ki5Fa+iJR2jmd2mPUIjP,mc,key)

I'm anticipating the salt parameter should be handled the way the IV parameter and the randomly generated IV's are being handled internally.

$encode(message,mcir,key,iv) vs $encode(message,mcs ,key,salt)

The IV parameter shorter than 8 is being padded to length 8 with 0x00's and stored in the header, the same as done with the short salt parameter. By allowing shorter than length 8 IV parameters, this allows additional IV's to be created which otherwise could not be created from a text string. When the 1st byte of a random generated IV is 0x00 and is followed by other non-0x00 bytes, it's not being used as if the IV were entirely 0x00's.

Other programs seeing 0x00's in the 64bit salt, regardless how they were generated randomly or by user input, would be generating completely different salted-keys and IV out of hashing the passphrase and salt together. If there is a reason that $encode and $decode can't internally handle 64bit salts containing the 0x00 byte, then $encode can retain compatibility with the test vectors by not creating the 3.1% of random 64bit values that contain the 0x00 byte, using the method in my 'randsalt' alias where it generates 8 different random numbers from the range 1-255. If 0x00's would no longer be allowed in the 64bit salt, this also would mean the salt parameter would need to go back to requiring the length of the salt being exactly 8 bytes. However this would still retain incompatibility with how other programs would occasionally be generating random 64bit salts containing 0x00's.

Ok, thanks for the explanation. To summarize: the salt should always be zero-padded to eight bytes.

When I make this change in the key derivation function, it passes the tests in your scripts.

That said, I am guessing this will break backwards compatibility. But if I understand you correctly, this will make it compatible with the standard implementation.

This change will be in the next beta.

Originally Posted By: Khaled

To summarize: the salt should always be zero-padded to eight bytes.

This summary is true for user-defined salts created using the 's' switch, because user-defined text cannot create salt strings with embedded 0x00 bytes followed by other not-0x00 bytes. To include the randomly created salts, more accurate would be that the salt should always use the entire 8 byte salt at bytes 9-16 inside the mime string - which the 's' switch matching a salt-parameter shorter than length 8 are being padded to 8 with 0x00's, and salt/iv parameters longer than 8 bytes are being silently chopped to 8.

Originally Posted By: Khaled

That said, I am guessing this will break backwards compatibility. But if I understand you correctly, this will make it compatible with the standard implementation.

True.

There are 3 incompatibilities introduced in the recent posts of this thread. For anyone encountering incompatibilities, these workarounds should solve most cases:

1. For syntax which does not use the 'l' switch for a literal key, the input to the hashing algorithm is no longer chopping the key parameter at 56 UTF-8 encoded bytes. This alias should allow most older long text passwords to be used after these fixes.

Code:

alias chop_key_to_56 {
  noop $regsubex(foo,$1,,,&maroon.tmp)
  returnex $bvar(&maroon.tmp,1-56).text
}

$decode(mime_string,mc,$chop_key_to_56(key_parameter) )
$decode(mime_string,mr,$chop_key_to_56(key_parameter) )
$decode(mime_string,mi,$chop_key_to_56(key_parameter),user_iv )

The exception would be any keys where any codepoint 256+ is partly within/beyond the hash algorithm's former chop limit.

Code:

//echo -a $chop_key_to_56($str(a,55) $+ $chr(233) )

In this case, 1 of the 2 encoding bytes was used and the other was ignored. Since this used an invalid UTF8 string as the key, there's no workaround other than allowing key/salt/iv to be hex/binvar.

2. The fix where codepoints 256+ in a salt/iv parameter were always used as the '?' character is easy to fix. If this were a user-defined salt created with the 's' switch, the '?' is already in the mimestring's header, so the file can be decrypted as normal without the 's' switch. Same applies to user-defined IV placed into the mime's header using 'mcir' or 'mcirl'. When using 'i' without using 'r', the IV is not placed into the header, so would need to be decoded by substituting the '?' in place of the codepoint 256+'s in the 'i' switch's IV parameter.

3. When the Salt was zero-truncated shorter than 8 bytes before being used, the mime string can be modified to move bytes from the key parameter into the salt, which should allow the new $decode behavior to arrive at the same salted-key+IV used to encrypt the file. There's 2 exceptions which would require hex keys to fix:

a. Depending on where the 0x00 first appears in the salt string, up to 8 bytes might need to be moved from the key parameter to the salt. If the key was not long enough, then the following alias won't be able to cannibalize the key parameter to create the 8 byte salt that's required. My example in post 265381 used the 3 byte string 'key' as the key parameter when creating these example strings, so the only mime strings which could be fixed for the new decoding behavior would be those where the 0x00 was not in the first 8 bytes of the salt.

b. The key parameter is UTF-8 encoded, but the salt is not, so - depending on the location of codepoints 256+ near the end of the key parameter - the key cannibalizing might not be able to create a valid 8-byte salt and a valid UTF-8 text string at the same time.

This alias should fix mime strings encountering issue#3 except for the 2 exceptions noted above:

Code:

alias upgrade_salt {
  echo -a syntax: //noop $ $+ upgrade_salt(old_mime_string,key_parameter)
  var %pattern /^[0-9a-zA-Z/+]+={0,2}/g
  if (($len($1) !isnum 32-) || (!$regex($1,%pattern))) { echo -a invalid mime string $1 | return }
  bset -t &maroon.tmp 1 $1 | noop $decode(&maroon.tmp,bm)
  echo -a $bvar2hex(&maroon.tmp,1-16) $bvar(&maroon.tmp,1-16).text
  if ($bvar(&maroon.tmp,1-8).text !=== Salted__) { echo -a this was not created with a random/user salt | return }
  if (!$istok($bvar(&maroon.tmp,9-16),0,32)) { echo -a salt does not contant 0x00 and doesn't need to be fixed | return }
  var -p %key $2- , %i 9 , %salt_used , %keylen $len(%key)
  while (%i isnum 9-16) {
    if ($bvar(&maroon.tmp,%i) > 0) var -s %salt_used %salt_used $v1 | else var %i 16 | inc %i
  }
  while ($numtok(%salt_used,32) < 8) {
    var -s %byte $asc($mid(%key,%keylen,1))
    noop $regsubex(foo,$mid(%key,%keylen,1),,,&maroon.moved.char)
    dec -s %keylen | var -s %salt_used $bvar(&maroon.moved.char,1-) %salt_used
    if ($numtok(%salt_used,32) > 8) { echo -a unable to fix due to UTF8 char %a in key | halt }
    if (($numtok(%salt_used,32) < 8) && (%keylen == 0)) { echo -a unable to fix due to too-short key | halt }
  }
  bset &maroon.tmp 9 %salt_used | noop $encode(&maroon.tmp,bm)
  noop $regsubex(foo,$left(%key,%keylen),,,&maroon.tmp.key)
  echo -a try to decode with: //echo -a $ $+ decode( $bvar(&maroon.tmp,1-).text ,mc, $left(%key,%keylen) )
  echo -a if contents of mime are binary you may need to load this mime string into a binvar and decode that
  echo -a characters in the key: $bvar(&maroon.tmp.key,1-)
}

If the shortened key contains a trailing space character, you might need to use $+ $chr(32) to re-create the key string shown in the last line of the alias's display.

In the earlier post's example where "//echo -a $encode(message,mcs,key1234,5678)" was creating a key using only the 4 byte salt '5678' instead of also including the 4 0x00's in the mime header, the 7.55-and-earlier output is:

U2FsdGVkX181Njc4AAAAAOZf+ZmPFy3l

To repair this so the new $decode behavior can decrypt it, the mime must be altered to move 4 bytes from the key into the salt:

//noop $upgrade_salt(U2FsdGVkX181Njc4AAAAAOZf+ZmPFy3l,key1234)

The alias modifies the salt string inside the mime and recommends trying to decode with a key shortened from 'key1234' to 'key' :

//echo -a $decode( U2FsdGVkX18xMjM0NTY3OOZf+ZmPFy3l ,mc, key )

Which should work after the fix.

The above 7.55 beta changes are looking good now for CBC.

For ECB mode I noticed that 'er' and 'ei' should probably be invalid switch combos like 'es' is, instead of being silently ignored.

Also, for documenting the ECB behavior for future searchers, key lengths 1-56 remain identical with/without the 'l' switch. While 'el' now invalidates keys longer than 56 bytes, dropping the 'l' switch now reverts to pre v7.52 behavior of silently chopping ECB keys at 72 bytes, as shown by the following example with/without the 'l' switch.

If wanting to preserve support for the non-l literal 57-72 byte ECB keys, ECB mode without the 'l' switch should at least be reporting an error if the key parm is longer than 72, since the non-literal ECB key isn't an unlimited string being filtered through MD5.

Code:

//var %i 55 | while (%i isnum 1-73) { echo -a %i $encode(message,em,$left($str(abcde,20),%i)) | inc %i }

Thanks these checks have been added to the next version.