mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Jan 2004
Posts: 2,127
maroon Offline OP
Hoopy frood
OP Offline
Hoopy frood
Joined: Jan 2004
Posts: 2,127
Updated $encode Blowfish improvements wishlist.

1. New padding method 0-7 zeroes
2. New $encode format Base85
3. Restore no-limit hash(key length) for non-literal keys
4. A new switch to recognize key and iv|salt parameters as both being hex strings
5. New $encode format Hex.
6. New switch to defend against CBC bit-flipping of IV

(1) New padding method 0-7 zeroes
(2) New $encode format Base85

Since a likely use for the encryption is for channel messages, these would allow longer encoded messages within the limited length of a PRIVMSG. Comparing the 5:4 ratio of a base85 message to the 4:3 ratio for mime/base64, a 360 byte channel message would have a mime length of 360*4/3=480, while base85 would encoded it to only 360*5/4=450. To support the N chunk parameter, it could be the 60-character encoding of 48 bytes, though as you pointed out in an older post, UTF8 means encoding bytes can be split across 2 chunks, making the chunk parameter somewhat obsolete except for some variant of Uuencode which can have a variable chunk length preceded by a variable length-byte.

Also of consideration for shorter length is that Base85 doesn't need the padding which many version of mime expect.

The main reason for the new padding method is to allow 1/8th of all messages to be 8 bytes shorter, which means their mime encoding would be 10-11 bytes shorter. By appending 0-7 bytes, this padding method would differ from 'z' padding where it does not need to append 8 0x00's to a message that's already a multiple of 8 bytes. Since the 0x00 cannot exist in a UTF8 encoded text message, there would be no risk of incorrectly stripping the 0x00 byte belonging to the original message. This new padding method would be compatible with the existing 'z' when the message is a text string which can't contain 0x00's, or even a binary string with a format that can never end with 0x00.

The Wikipedia page for Base85 lists several kinds of 85-character alphabets. Unless there are considerations where it's helpful to exclude $ %, the RFC1924 variant is supposed to be compatible with JSON, and hopefully would be regex friendly.

(3) Restore no-limit hash(key length) for non-literal keys, as described here

(4) A new switch to recognize both strings in the 'key' and 'iv|salt' parameters as hex strings. Shouldn't need to have separate switches to have only one of the pair being hex.

As the examples here show, OpenSSL and the Crypt::CBC which $encoded has compatibility with both support allowing the passphrase|Salt|IV to all be hex.

Trying to force the literal key to always be a UTF8 text string greatly limits the number of possible valid encryption keys, and using UTF8 text can easily cause portions of the key to be silently ignored due to being pushed past the 56th byte. As mentioned here, the Salt|IV are avoiding the UTF8-encoding of codepoints 128-255 into 2 bytes each, but are silently replacing all codepoints 256+ with codepoint 63 "?".

The user may be unaware that, with and without the 'l' switch, the random characters have no effect on changing the encryption because the changing byte of the key is beyond position 56, and the random salt character is always replaced by "?", resulting in this command always having identical encryption.

//echo -a $encode(message,mcil,$str(a,55) $+ $chr($rand(192,255)),$chr($rand(256,10004)))

Hex strings would enable setting the secret key to all 2^448 values, and enable setting the Salt|IV to all 2^64 values.

(5) New $encode format 'Hex' to define the encoding format for the $1 parameter. This can be used in a way unrelated to Blowfish. If it needed to support N chunks, 30 bytes as a chunk of 60 encoding characters.

(6) New switch to defend against CBC bit-flipping of IV

The switch defends the first block of the plaintext message from being bit-flipped by manipulating the IV, without altering or garbling the rest of the message. This alias shows how an attacker can manipulate the contents within an encrypted message's 1st block if it uses CBC feedback. For this to work, the attacker needs to know only:

A. The exact not-encrypted plaintext bytes among the 1st 8 bytes of the message, which they want to change into something else.
B. The value of the corresponding byte positions in the IV used as feedback against block#1
C. The 'something else' they want to change those bytes into

They do not need to know the password.
They do not need access to the encrypted portion of the message or know the ciphertext.

This has nothing to do with Blowfish itself, it would affect any cipher in CBC mode having a public IV without an authentication string. AES would've had this same issue, where its 128 bit block size exposes double the number of bytes at the beginning of the message.

The new switch would be invalid in combination with 'e' ECB mode. The ECB mode is not vulnerable to this issue, because there is no feedback.

The syntax using the 's' switch is not vulnerable to this, because the salt hash generates the secret56:IV8 using a method which shields the resulting IV from someone who doesn't know the key.

Sufficient behavior for the new switch would be to continue storing the user-defined or RandomIV the way it currently does, but prior to being used by the encryption or decryption, there would be an ECB mode encryption of the public IV before the CBC feedback begins. The original unaltered IV would continue being stored in the encrypted message as part of the RandomIV header. This would shield the IV used by block#1 from someone who did not know the key parameter. The only compatibility issue with this switch is that decrypting without also using the new switch garbles the 1st 8 bytes of the decrypted message, but the remainder decrypts fine.

When the public IV is known, and the plaintext byte is known, each byte of the new IV becomes (old-public-IV-byte XOR existing-plaintext-byte XOR new-faked-byte).

As a test vector, the following command and output are:

//echo -a $encode(20190101 this can't be backdated,mcirl,foobar,text_iv8)
result: UmFuZG9tSVZ0ZXh0X2l2ODz3ktZrCBsKkBN1bgqwfXpp6x+aAa+5tRy95xUUyfdoSD/PXrNNWgo=

Note that the (Item#1) padding method would have shorter output due to not needing to append 8 bytes padding to a text string (or a string which would never end with 0x00) whose length was already a multiple of 8.

new pad result: UmFuZG9tSVZ0ZXh0X2l2ODz3ktZrCBsKkBN1bgqwfXpp6x+aAa+5tRy95xUUyfdo

When the IV is bitflipped, the stored IV changes, but the encrypted message is not altered:

//echo -a $decode(UmFuZG9tSVZ0ZXh1Xmp1ODz3ktZrCBsKkBN1bgqwfXpp6x+aAa+5tRy95xUUyfdoSD/PXrNNWgo=,mcirl,foobar,text_iv8)

output: 20181231 this can't be backdated

When using the new switch, where the public IV is ECB encrypted before being used to begin the CBC feedback, the encryption output changes to:

UmFuZG9tSVZ0ZXh0X2l2OEzew/Uu6XVkdQKh/u6c7UnY282m9e07WXcF1qXSgtPyzc+ecxlV8X4=

Both the attempt to bit-flip the output, or decrypting without using the new switch - would both result in the date stamp in the 1st 8 bytes being pseudo-randomly garbled, but the remainder of the message decrypts ok.

Code:
alias blowfish_CBC_bitflip_demo {
  ; next line should be one of the following choices: mcirl mcir mcr mcrl
  var %switches mcr

  :enter_key
  var -s %key $input(Input any encryption key (encode uses only the 1st 56 bytes),e)
  if (%key == $null) goto enter_key

  :enter_message
  var -s %original_message $input(Enter any Original Message 16+ characters (longer the better),e)
  if ($len(%original_message) < 16) goto enter_message

  :fake_message
  var %fake_message $input(Enter 8 characters to change the beginning of the message into,e)
  noop $regsubex(junk,%fake_message,,,&fake_msg) | if ($bvar(&fake_msg,0) < 8) goto fake_message
  echo -a encode( %original_message ,%switches, %key )

  if (i isincs %switches) {
    :enter_IV
    var -s %iv $input(Enter any text string used as IV,e) | if ($len(%iv) < 8) goto enter_IV
    var    -s %old_mime $encode(%original_message,%switches,%key, %iv)
  }
  else var -s %old_mime $encode(%original_message,%switches,%key)

  bset -t &encrypted 1 %old_mime | noop $decode(&encrypted,bm)
  bset -t &guessed_original_msg 1 %original_message | bcopy -c &guessed_original_msg 1 &guessed_original_msg 1 8
  bcopy -c &old_iv 1 &encrypted 9 8
  echo 3 -a at this point the attack begins, using ONLY the following info which does NOT include knowing the key:
  echo 4 -a assumes the first 8 bytes of the not-encrypted original message guessed as: $bvar(&guessed_original_msg,1-) $qt($bvar(&guessed_original_msg,1-).text)
  echo 4 -a knows the 8 bytes of the public IV are: $bvar(&old_iv,1-)
  echo 4 -a does not know the key, but wants to change the assumed 1st 8 bytes to: $bvar(&fake_msg,1-8) $qt($bvar(&fake_msg,1-8).text)

  ; now replace IV_byte with: old_iv_byte XOR old_plaintext_byte XOR faked_plaintext_byte
  var %i 1, %new_iv | while (%i isnum 1-8) {
    var %j $xor($bvar(&fake_msg,%i),$bvar(&guessed_original_msg,%i))
    var %j $xor(%j,$bvar(&old_iv,%i)) | var %new_iv %new_iv %j | inc %i
  }

  echo -a now the attacker replaces the old iv $qt($bvar(&old_iv,1-)) with $qt(%new_iv) then lets the intended receiver decrypt the message
  bset &encrypted 9 %new_iv | echo -a noop $encode(&encrypted,bm)
  var -s %faked_mime $bvar(&encrypted,1-).text
  echo 3 -a this is the 1st usage of the KEY since the message was encrypted,
  echo 3 -a yet the attacker can change the 1st 8 bytes of the message, without knowing the key,
  echo 3 -a just by knowing the original not-encrypted bytes, without even needing to know how those 8 bytes were encrypted
  var %a %old_mime vs %faked_mime
  if (i isincs %switches) {
    echo 4 -a decrypt original: $decode(%old_mime  ,%switches,%key,%iv)
    echo 4 -a decrypt hacked::: $decode(%faked_mime,%switches,%key,%iv)
  }
  else {
    echo 4 -a decrypt original: $decode(%old_mime  ,%switches,%key)
    echo 4 -a decrypt hacked::: $decode(%faked_mime,%switches,%key)
  }
}

Joined: Jan 2004
Posts: 2,127
maroon Offline OP
Hoopy frood
OP Offline
Hoopy frood
Joined: Jan 2004
Posts: 2,127
(#7)

If (#4) would be difficult to add a new switch allowing key and IV/Salt be as hex, something that would gain the vast majority of the benefit would be a switch allowing the 'l' literal key parameter to have the key parameter use the existing string handler used by the Salt/IV parameter beginning v7.56 - where codepoint 128-255 are each the single byte values 128-255 but codepoint 256+ is invalid. Maybe an "L" switch being mutually exclusive with 'l'.

This would allow all literal keys to be set, except for the 20% of 56-byte strings containing the 0x00 byte - the same way that handling Salt/IV as ANSI is allowing the 97% of 8-byte Salt/IV strings which don't containing an embedded 0x00.

As a way of showing the usefulness of this switch, this is the only way I know to check whether a binvar is a utf8 string, without making a long alias parsing it byte-by-byte:

Code
alias is_binvar_utf8 {
  noop $regsubex(foo,$bvar($1,1-).text,,,&maroon.tmp)
  if (($bvar($1,0) != $bvar(&maroon.tmp,0)) || ($sha1($1,1) != $sha1(&maroon.tmp,1))) { return $false } | return $true
}


The percentage of length 8 random binvar's, without 0x00, which are valid UTF8 is only 1% or so, so the ANSI handling of the Salt/IV parm greatly increases the number of valid combos. By the time the &binvar length is increased to 16, it becomes difficult to find any valid UTF8 strings, and becomes exponentially harder with each +1 length. But with this new switch, 100% of these strings not containing 0x00 would be valid literal keys.

Code
//var %i 9999 , %tot %0 , %yes 0 | while (%i) { bset &v 1 $regsubex(foo,$str(x,8),/x/g,$r(1,255) $+ $chr(32) ) | if ($is_binvar_utf8(&v)) inc %yes | inc %tot | dec %i } | echo -a yes %yes of %tot


Joined: Jan 2004
Posts: 2,127
maroon Offline OP
Hoopy frood
OP Offline
Hoopy frood
Joined: Jan 2004
Posts: 2,127
Re: (#1) Padding method 0-7 0x00's

I see now that this would be incompatible with current behavior where $decode ignores the padding switch used, but tries to match any of 4 different padding methods.

(#1a) Switch to enforce $decode not ignore switches
(#8) Switch to Skip 8 byte text header

* Suggestion for a new switch which causes $decode to stop trying to guess what to do, based on the content of the 1st block of ciphertext or the content of the last block of decrypted ciphertext, and instead decrypt based on the switches used. For example purposes I'll refer to this switch as 'D'.

* Related suggestion for a new switch which causes $encode to not insert the magic text 'RandomIV' or 'Salted__' in front of the IV or Salt string at the front of encrypted messages. Depending on the usage of the other switches, this switch would cause $decode to behave as if these magic strings actually were inserted in front of the input string, even if the string already begins with 'RandomIV' or 'Salted__' due to choices of the 4th parameter, or the possible result of an encrypted string. For example purposes I'll refer to this switch as 'Q'.

+ The new 'D' switch would cause $decode to check only for 1 type of padding based on usage of the 'pnz' switches, or allows the non-usage of these 3 to check only for the PKCS#7 padding. If $decode used a different padding switch than used by $encode, then no padding would be removed if there was not a match with that specific padding.

+ 'D' would avoid the false matches where $encode applied 'z' padding to these messages but $decode removes 'n' padding if the last character of the message was a codepoint greater than 64 that's a multiple of 64:

Code
//bset &v 8 0 | while ($bvar(&v,0) == 8) { bset -tc &v 1 $str(.,$rand(1,6)) $+ $chr($calc(64*$rand(2,700))) } | echo -a original: = $bvar(&v,1-) | noop $encode(&v,bmcz,key) $decode(&v,bmcz,key) | echo 4 -a decrypted = $bvar(&v,1-)



+ This would also enable the ability to eventually have new padding switches without false matches, such as the suggested 0-7 0x00's that avoids increasing the encrypted string's length by +8 for all messages which are already a length that's a multiple of 8, or would allow something similar to OpenSSL's -nopad switch where no padding is added to a message that either doesn't need it or the user has already padded it themselves.

+ This would also prevent $decode from searching for the magic strings "Salted__" or "RandomIV" unless $decode uses the switches related to them. i.e. $decode would not check for "RandomIV" and use bytes 9-16 as the IV unless 'D' were accompanied by the 'r' switch, and $decode would not check for "Salted__" and use bytes 9-16 as a salt unless 'D' were accompanied by 'c but neither of the 'ir' switches.

* When encrypting, using the new 'Q' switch along with 'cir' or 'cr', would continue storing the 8-byte IV at the front of the header, but 'Q' would prevent inserting the text "RandomIV" preceding that.

+When encrypting, using 'c' or 'cs' without using 'i' or 'r', $encode would continue storing the 8-byte salt at the front of the encrypted string, but also using 'Q' would not insert the text "Salted__" preceding that.

+ When decrypting, using the 'Q' switch would prevent $decode from checking for the magic strings 'Salted__' or 'RandomIV', because 'Q' indicates the magic string needs to be added. Using 'crQ' would behave as if the magic string 'RandomIV' needed to be inserted in front of the inpt string, and using 'cQ' otherwise without 'i' or 'r' would behave as if the magic 'Salted__' needed to be inserted. 'ciQ' would decrypt the entire string using the 4th parameter as the IV, regardless of what the input string looks like.

The 1st 8 bytes of the $1 encrypted string would then be handled as if that was the Salt or IV expected to be following the magic string either being inserted or setting a flag indicating how to handle the remainder of the string differently.

The extra 8 bytes of the magic string at the front are more useful to an eavesdropper than to the people sharing encrypted messages, and it just makes the mime string be an extra dozen or so characters against the server's message limit. It requires a lot of steps for a script to trim then restore them:

Code
//var %msg test string , %key test key | bset -tc &v 1 %msg | noop $encode(&v,bmc,%key) | echo 4 -a $bvar(&v,1-).text with header length $bvar(&v,0) | noop $decode(&v,bm) | bcopy -c &v 1 &v 9 -1 | noop $encode(&v,bm) | var %enc $bvar(&v,1-).text | echo 3 -a %enc without "Salted__" header length $bvar(&v,0) | bset -tc &v2 1 %enc | noop $decode(&v2,bm) | bcopy &v2 9 &v2 1 -1 | bset -t &v2 1 Salted__ | noop $encode(&v2,bm) $decode(&v2,bmc,%key) | echo -a decoded: $bvar(&v2,1-).text



***

* By taking action only as indicated by the switches used by $decode, this also avoids the issue where $decode ignores the 'i' switch when the encrypted string begins with a block matching either of the magic strings 'RandomIV' or 'Salted__'.

In these examples, decoding with the 'wrong' switch still 'works' because it takes action based on the header, ignoring the 'i' 'r' and 's' switches other than generating an error when 's' or 'i' isn't accompanied by parm4.

//var -s %enc $encode(test message,mc ,key) | echo -a $decode(%enc,mcr,key)
//var -s %enc $encode(test message,mcr,key) | echo -a $decode(%enc,mcs,key,saltsalt)

***

However in the rare cases where the encryption creates ciphertext beginning with bytes forming either of the magic strings "RandomIV" or "Salted__", $decode ignores using the 'i' switch except for requiring the parm4 string, and the decryption instead acts based on the first 8 bytes of the string happening to match one of the magic values.

In this 1st example, $decode sees the encrypted string beginning with the bytes forming 'Salted__', so it decodes by ignoring %iv, skipping the 1st 8 bytes of the ciphertext as if a magic header, then uses bytes 9-16 as if a Salt in combo with %key. The decrypted output is now the garbled decryption of the 3rd block and later:

//var -s %key spjbavdk , %iv test_iv , %enc $encode(ODMJuQPFabcdefghtestmsg ,mcli,%key,%iv) | echo -a $decode(%enc,mcli,%key,%iv)

decrypted as: õ°ë?²

In this 2nd example, $decode sees the encrypted string beginning with the bytes forming 'RandomIV', so it decodes by again ignoring %iv, again skipping the 1st 8 bytes of the ciphertext as if a magic header, then uses bytes 9-16 as if the IV in place of the parm4 string used with the 'i' switch while in the absence of using the 'r' switch. The decrypted output then skips decryption of the 1st 16 bytes of the original message:

//var -s %key wohccemg , %iv test_iv , %enc $encode(ZGQQrvQU/\/\/\/\3rdblock,mcli,%key,%iv) | echo -a $decode(%enc,mcli,%key,%iv)

Decrypted as: 3rdblock

However, if you edit any of the 4 %key or %iv strings in both examples, or edit the 1st 8 bytes of either original message, the decryption usually works correctly because the 1st block of ciphertext no longer matches one of the 2 magic strings, and $decode then falls back to using the 'li' switches actually used.

This should avoid messages which cannot be corrected decrypted if they were encrypted with padding switch 'z', or encrypted using 'i' without 'r'.

Joined: Jan 2004
Posts: 2,127
maroon Offline OP
Hoopy frood
OP Offline
Hoopy frood
Joined: Jan 2004
Posts: 2,127
Re: (#3) I see v7.56 fixes the truncation of non-literal keys being hashed by MD5.

--

Re: (#2) Base85

Aliases at the end shows how base85 would work. The alias by default should be compatible with Python's a85encode. Python also has a b85encode that's a little friendlier to mIRC script evaluations, but I've included a $3 == 1 variant using an alphabet that should be much more compatible with avoiding mIRC scripting evaluations due to strings containing comma $ % unmatched parenthesis, etc.

Base85 has an advantage over mime, because it has a much smaller encoding overhead, and the lengths of Blowfish encrypted strings will always be a multiple of of base85's input chunk size of 4, while 1/3rd of mime strings would have padding added to their string lengths.

Code
alias unsafe85 { returnex $!base85decode( $+ $base85encode($1,,1) $+ ,,1) }


Using this unsafe85 alias, there could be a .prop or 2nd parameter of $unsafe which makes it use shorter base85 A=1 strings instead of mime:

//timertest85 1 1 echo -a $unsafe85( $ $+ version ) | timertest85

=+=+=+=+=+=
Code
/*
{
  Base85 encoder and decoder by maroon 2019

  85 is used as the base because it's the smallest integer where N^5 >= 256^4, so the
  benefit of base85 over mime is that it can encode 4 bytes into 5 text, for a 125% output length,
  compared to the 133% length where mime encodes 3 bytes as 4 text. When encoding a binary string
  of length 280 bytes, the base85 encoding would be *5/4=350 bytes - while the mime encoding would be
  *4/3=374 bytes plus often being padded with an additional 2 '=' characters.

  There are several encoding alphabets used, and this alias defaults to use the choice by the base85
  in the btoa utilities, Python's a85encode, and in Adobe.

  This alias also offers a $3 == 1 alternate alphabet that's more friendly to mIRC scripting

  Base85 string should not need padding, as the decoded length is calculated from the encoded length. The python
  a85encode has a switch to pad the input string with 0x00's to make it be a length multiple of 4, which also
  eliminates the need for padding, and is friendly to decoded text strings which usually don't need the padding.

  Syntax: $base85encode(any string,N,A) $base85decode(base85text,N,A)
  N: If $2 is 1: $1 is name of &binvar, otherwise $1 is text
  A: If $3 is 1: input/output uses mIRC-friendly alphabet, otherwise uses Ascii85 as described
  at https://en.wikipedia.org/wiki/Ascii85

  examples:
  //echo -a $base85encode(maroon) vs $base85encode(maroon,,1)
  //var -s %a $base85encode(maroon) , %b $base85decode(%a)
  //var -s %a $base85encode(maroon,,1) , %b $base85decode(%a,,1)
  //bset -tc &v 1 maroon | noop $base85encode(&v,1) | echo -a $bvar(&v,1-).text

  The standard base85 encoding originating as one of the methods in btoa, and is as described
  for a85encode at https://docs.python.org/3/library/base64.html#base64.a85encode

  Python's b85encode appears to be the one used by Git diffs, and differs by not using the 'y' 'z' shortcuts
  and uses a different alphabet:

  0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!#$%&()*+-;<=>?@^_`{|}~
  which excludes: "',.:/[\]

  Neither of these is an 85 character alphabet that's completely friendly to mIRC scripting due to using
  characters like " , $ % etc - so care must be taken with strings encoded in the default alphabet.
  Such as avoiding placing them into a timer's command line.

  The A=1 option's mIRC-friendly 85-item alphabet instead excludes these 9 printable chars: "$%'(),:\
  then uses : as the rarely used 'z' symbol which symbolizes an input 4-byte chunk of 4 0x00's
  then uses ' as the rarely used 'y' symbol which symbolizes an input 4-byte chunk of 4 0x20 spaces

  These 'y' and 'z' symbols may be useful in many binary or text strings,
  but are not likely to appear in an encrypted string.

  $3's A=1 mIRC-friendly alphabet attempts to exclude characters which tend to cause problems being evaluated
  by scripts and timers. Most importantly avoids $ and %. Avoiding " allows string inside $qt() and $noqt()
  and skipping \ avoids problems in $reg*() strings. Skipping ) and ( and comma avoids problems trrying to place
  a literal string having unmatched parenthesis or comma inside an identifier without first parking in a %var.
}
*/

; I'm not attached to this specific A=1 alphabet, but it avoids being evaluated in scripts, timers, etc
alias base85_mirc_friendly_alphabet {
  return !#&*+-./0123456789;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~
}

/*
{
  If you want to change to a different A=1 alphabet, such as b85encode, by using different characters
  or rearranging them:

  After changing the above alias, the below lookup MUST be created to match the above 85-char string of
  base85_mirc_friendly_alphabet. It creates a bogus entry 99 for invalid characters, and intentially avoids
  an entry for $chr(0) to avoid the need for a $calc(value-1) in an array where the 1st item is array[1] not array[0].

  If $chr(N) is a valid encoding character, the Nth token in the lookup below contains the 0-84 value
  matching items 1-85 in the above list. If the Nth token of the lookup table is anything outside the
  range 0-84, that char is invalid as a base85 encoding character. The numbers 0-84 MUST appear only
  1 time in the lookup below. Also, the : or ' replacements for the standard btoa's 'z' and 'y' symbols being found
  anywhere except the 1st character of an encoding 'chunk' should be invalid, which is why the lookup
  table informs that : and ' are invalids encoding chars.

  After changing the above A=1 alphabet, run: //create_lookup85 $base85_mirc_friendly_alphabet

  then validate the new alphabet:

  //bset -t &v 1 $base85_mirc_friendly_alphabet $+ aaa | echo 3 -a $bvar(&v,1-).text should match | noop $base85decode(&v,1,1) | noop $base85encode(&v,1,1) | echo 4 -a $bvar(&v,1-).text

  Rather than finding 'z' or ':', and 'y' or "'", in a lookup, those are currently hardcoded into the aliases.
}
*/
alias mirc_friendly_base85decode_lookup {
  ; '0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
  var %base85_lookup $&
    99    99 99 99 99 99 99 99 99 99 99 99 99 99 99 $&
    99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 $&
    99  0 99  1 99 99  2 99 99 99  3  4 99  5  6  7 $&
    8   9 10 11 12 13 14 15 16 17 99 18 19 20 21 22 $&
    23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 $&
    39 40 41 42 43 44 45 46 47 48 49 50 99 51 52 53 $&
    54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 $&
    70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 99 $&
    99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 $&
    99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 $&
    99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 $&
    99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 $&
    99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 $&
    99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 $&
    99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 $&
    99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
  return %base85_lookup
}

alias create_lookup85 {
  var %a $1
  if ($len(%a) != 85) { var %err input MUST be length 85: $1 | goto create85_err }
  if ($regex(foo,$1,/(.).*\1/)) { var %err input MUST not contain dupes: $regml(foo,1) $1 | goto create85_err }
  if ($regex(foo,$1,/([^!-~])/u)) { var %err input MUST not contain outside range !-~: $regml(foo,1) $1 | goto create85_err }
  bset -c &maroon.tmp 1 $str(99 $chr(32),255)
  var %i 85 | while (%i) { var %a $asc($mid($1,%i,1)) | dec %i | bset &maroon.tmp %a %i }
  echo 3 -a $bvar(&maroon.tmp,1-) existing checksum: $crc( $bvar(&maroon.tmp,1-) ,0)
  echo 4 -a $mirc_friendly_base85decode_lookup new checksum: $crc( $mirc_friendly_base85decode_lookup ,0)
  echo -a To use this 85char alphabet as the new A=1 alphabet, (1) paste the following alias above the existing same name alias:
  echo 3 -a alias mirc_friendly_base85decode_lookup $chr(123) return $bvar(&maroon.tmp,1-) $chr(125)
  echo -a (2) if using a different symbol than ':' in place of input chunk of 4 0x00's, search both base85encode and base85encode aliases for "A=1" and edit as instructed. Should NOT be part of 85-char list
  return
  :create85_err
  echo -sc info *create_lookup85: %err
  echo -sc info syntax: input = string of non-duplicated 85 characters in range ! through ~
}

alias base85encode {
  if ($2 == 1) { if (!$bvar($1,0)) { var %err invalid binvar $1 | goto base85_encode_error }
  bcopy -c &maroon.base85.in 1 $1 1 -1 }
  else noop $regsubex(foo,$1,,,&maroon.base85.in)
  var %in.ptr 1 , %chop 0 , %len $bvar(&maroon.base85.in,0) , %base85.output
  if (!%len) {
    :base85encode_bad_input
  var %err zero length input string | goto base85_encode_error }

  while (%in.ptr <= %len) {

    var %remain $calc(%len - %in.ptr +1)

    ; if at least 4 bytes remaining in input string, check if the 4 bytes at the beginning of a
    ; chunk of 4 are all 0x00 bytes. If so, output a single 'z' (or its mIRC-friendly replacement ':')
    ; ditto for output a 'y' if the chunk of 4 is all $chr(32) spaces.
    ; base85 translates 4 bytes into 5 text, so it's efficient to handle as if it's translating
    ; the 4-byte input chunk as if it's a big-endian unsigned 32-bit integer
    var %uint32 $bvar(&maroon.base85.in,%in.ptr).nlong

    ; at the beginning of each full chunk, check if it's 4 0x00's or 4 0x20's
    ; if customizing $3 A=1 to use a different alphabet, make sure these 2 chars on the ($3 == 1) lines
    ; aren't in the new alphabet, but are the replacements for the 'z' and 'y' shortcuts:

    if ((%remain >= 4) && ($istok(0 538976288,%uint32,32))) {
      if (!%uint32) {
        if ($3 == 1) var -p %base85.output %base85.output $+ :
        else         var -p %base85.output %base85.output $+ z
      }
      else {
        if ($3 == 1) var -p %base85.output %base85.output $+ '
        else         var -p %base85.output %base85.output $+ y
      }

      ; if chunk of 4 is all 0x00's or all 0x20's, then write 'z' or 'y' (or ':' vs "'") only
      goto next_group_of_4
    }

    ; If final input chunk's length wasn't 4, append missing N of 4 bytes as 0x00's, but keep track
    ; so the same number of chars can be stripped from text output
    ; This appends 3 0's to ensure .nlong doesn't fail, but sets %chop to the actual number of
    ; 0's needed to make the final input chunk be length 4
    if (%remain isnum 1-3) {
      var %chop 4 - $v1 | bset &maroon.base85.in $calc(1+%len) 0 0 0
      ; re-defining %uint32 using the appended 0x00's
      var %uint32 $bvar(&maroon.base85.in,%in.ptr).nlong
    }

    var %j 0 , %divisor 85 ^ 4 | while (%j < 5) {
      inc %j
      ; by repeatedly floor-dividing by a shrinking divisor, has the effect of outputting the
      ; encoding digits in the expected big-endian order
      var %pos $calc(%uint32 // %divisor) , %uint32 $calc(%uint32 - %divisor * %pos) , %divisor %divisor / 85

      ; when using sequential base85 alphabet, don't need lookup table
      ; chosen divisor always returns the modulo result in big-endian order
      ; var -p handles case where encoding alphabet contains double-quote char which isn't a good idea
      if ($3 == 1) var -p %base85.output %base85.output $+ $mid($base85_mirc_friendly_alphabet,$calc(1+%pos),1)
      else         var -p %base85.output %base85.output $+ $chr($calc(33 + %pos))
    }
    :next_group_of_4
    inc %in.ptr 4
  }
  ; if added 1-3 0x00 bytes to complete a final chunk of 4, chop that many chars from text output
  ; because encoding length for 1-4 byte chunks is always 1+input_chunk_length
  if (%chop) var -p %base85.output $left(%base85.output,- $+ %chop)

  ; If N=1, replace original &binvar's contents and change return value to new &binvar length
  if ($2 == 1) { bset -tc $1 1 %base85.output | var %base85.output $bvar($1,0) }
  return %base85.output
  :base85_encode_error
  echo -sc info *base85encode: %err
  if ($2 == 1) bunset $1
  ; your choice whether to /halt at error, but would need to decide how to handle binvar
  halt
}

alias base85decode {
  var %last.char u , %in.ptr 1 , %out.ptr 1 , %chop 0 , %4zeroes.char z , %4spaces.char y

  if ($2 == 1) { if (!$bvar($1,0)) { var %err invalid binvar $1 | goto base85_decode_error }
    ; need to unset binvar to facilitate handling contents being 1 valid/invalid base85 char
    else { var %from.string $bvar($1,1-).text | bunset $1 }
  }
  else var -p %from.string $1
  var %len $len(%from.string) | if (!%len) { var %err zero length base85 string | goto base85_decode_error }

  if ($3 == 1) {
    ; if customizing $3 A=1 to use a different alphabet, make sure %4zeroes.char and %4spaces.char on next line
    ; aren't in new alphabet. They are the replacements for the 'z' and 'y' shortcuts
    var %last.char $right($base85_mirc_friendly_alphabet,1) , %4zeroes.char : , %4spaces.char '
    bset -c &maroon.base85.lookup 1 $mirc_friendly_base85decode_lookup
  }

  bunset &maroon.base85.out
  while (%in.ptr <= %len) {
    var -p %this_char $mid(%from.string,%in.ptr,1) , %uint32 0 , %j 0

    ; if 1st byte of chunk is 'z' or 'y' (or ':' or "'" if $3 == 1) then 'z' decode as 4 0x00 bytes
    ; using %uint32 == 0, or 'y' decodes to 4 spaces with %uint32 = $base(20202020,16,10)

    if (%this_char === %4zeroes.char) { inc %in.ptr                         | goto next_group_of_5 }
    if (%this_char === %4spaces.char) { inc %in.ptr | var %uint32 538976288 | goto next_group_of_5 }

    ; if length mod 5 is 1, that final char has no effect so would be safe to ignore
    ; other than validating if it's a valid base85 char
    ;   if (%len == %in.ptr) { dec %len | break }

    ; append %last.char 'u' padding if final group of 5 isn't complete (or append ~ if A=1 $3 == 1)
    ; but keep track of how many chars were added, so that same length can be removed from output
    if ($calc( %len - %in.ptr) < 4) { inc %chop $calc(4+%in.ptr -%len)
      var %from.string %from.string $+ $str(%last.char,%chop)
    }

    while (%j < 5) {
      ; for the default variant using the sequential alphabet, doesn't need the lookup table
      ; because it's a simple calculation by subtracting 33 from $asc(char)
      ; it's much faster to keep the lookup table in a binvar than a text string
      if ($3 == 1) var %s $bvar(&maroon.base85.lookup,$asc($mid(%from.string,%in.ptr,1)))
      else         var %s $asc($mid(%from.string,%in.ptr,1)) - 33

      ; switched to checking for invalid chars here, to simplify the handling of requiring 'z'
      ; appear only at the beginning of a chunk
      if (%s !isnum 0-84) { var %err invalid base85 string | goto base85_decode_error }

    var %uint32 $calc(%uint32 * 85 + %s ) | inc %j | inc %in.ptr }

    :next_group_of_5
    ; cheating by using $longip instead of looping 4x to parse 32-bit value into 4 big-endian byte values
    bset &maroon.base85.out %out.ptr $replace($longip(%uint32),.,$chr(32)) | inc %out.ptr 4
  }

  ; if final block was padded with 1-4 %last.char's, now chop that many extra decoded bytes
  ; because decoded output length == (1 less than input base85 chunk length)
  ; at same time, subtract 1 so %out.ptr = length of output
  var %out.ptr $calc(%out.ptr - %chop -1)
  ; Length of the decoded string is calculated from the length of the encoded string. Also, there's not many un-used chars remaining...

  ; debug output in 3 format types, numeric bytes, hex bytes, text
  ;echo 3 -a bindec: $bvar(&maroon.base85.out,1,%out.ptr)
  ;echo 5 -a binhex: $regsubex(foo,$bvar(&maroon.base85.out,1,%out.ptr),/(\d+)/g,$base(\1,10,16,2))
  ;echo 4 -a text: $bvar(&maroon.base85.out,1,%out.ptr).text

  if ($2 == 1) {
    if (%out.ptr) bcopy -c $1 1 &maroon.base85.out 1 %out.ptr
    else noop $regsubex(foo,$null,,,&maroon.base85.out)
    return %out.ptr
  }
  returnex $bvar(&maroon.base85.out,1,%out.ptr).text
  ; input string was exactly 1 base85 char, so return $null
  return $null
  :base85_decode_error
  echo -sc info *base85decode: %err
  ; input binvar already zeroed, so no need to delete it even if /halt removed
  halt
}

; debugging strings
; test vector at Wikipedia's base85 page for the adobe variant which doesn't use 'y' or 'z' shortcuts:
;//var %a 9jqo^BlbD-BleB1DJ+*+F(f,q/0JhKF<GL>Cj@.4Gp$d7F!,L7@<6@)/0JDEF<G%<+EV:2F!,O<DJ+*.@<*K0@<6L(Df-\0Ec5e;DffZ(EZee.Bl.9pF"AGXBPCsi+DGm>@3BB/F*&OCAfu2/AKYi(DIb:@FD,*)+C]U=@3BN#EcYf8ATD3s@q?d$AftVqCh[NqF<G:8+EV:.+Cf>-FD5W8ARlolDIal(DId<j@<?3r@:F%a+D58'ATD4$Bl@l3De:,-DJs`8ARoFb/0JMK@qB4^F!,R<AKZ&-DfTqBG%G>uD.RTpAKYo'+CT/5+Cei#DII?(E,9)oF*2M7/c | echo -a $base85decode(%a)
;//var %i 111 | while (%i) { bset -c &v 1 $regsubex($str(x,$rand(40,43)),/x/g,$r(0,255) $chr(32)) | var %a $bvar(&v,1-) , %b $base85encode(&v,1,) , %c $base85decode(&v,1,) , %d $bvar(&v,1-) | if (%a !== %d) echo -a %i diff %a vs %d | ;else echo -a %i same %a  as %b $bvar(&v,1-) | dec %i } | echo -a .
;//var %i 111 | while (%i) { var %a $regsubex($str(x,$r(40,43)),/x/g,$r(a,z)) , %b $base85encode(%a,,1) , %c $base85decode(%b,,1) | if (%a !== %c) echo -a %i diff %a vs %c | ;else echo -a %i same %a  as %c  | dec %i } | echo -a .
;//echo -a ----- | bset -c &v 1 97 97 98 99 32 32 32 32 99 | echo -a $bvar(&v,1-).text | noop $base85encode(&v,1) | echo 3 -a $bvar(&v,1-).text | echo 4 -a $base85decode( $bvar(&v,1-).text ) | echo noop $base85decode(&v,1) | echo -a $bvar(&v,1-)


Joined: Jan 2004
Posts: 2,127
maroon Offline OP
Hoopy frood
OP Offline
Hoopy frood
Joined: Jan 2004
Posts: 2,127
The previous suggestions for $encode Blowfish can pretty much be boiled down to just this main request that can't be scripted around, which is having some kind of 'k' switch that makes the Blowfish 3rd and 4th parameters be seen as the hex encoding of binary byte strings, rather than seeing them as text.

--

My main interest in this would be using the hex inputs as a literal key and literal IV, and it wouldn't need to be supported as the hex encoding of the input to the MD5 hashing, unless it's easiest to have all switch combos handle string lengths the same way regardless if 'k' were used.

With this switch, the 612 hash limit wouldn't need to be fixed other than just documenting it in /help, since it would be pointless to use a literal binary key that just gets fed to the MD5 hash.

This change would make it possible to script a better method of hashing strings into a literal-hex-key and literal-hex-iv, instead of the historical method using MD5. So if this switch was valid only with the 'clir' 'e' 'el' syntaxex, that'd be fine, unless it's easier to just have all formats be identical except for the inputs seen as hex instead of text.

If using the 'k' switch, the Parm4 IV would be 16 hex digits forming the user-defined 8-byte string, and the key would be seen as being binary encoded as hex, and could be a binary string loaded the same way that ECB mode handles the 56 or 72 byte limit.

But other than being able to handle the input string as pure binary, nothing should need to change for how things are handled when the 'k' switch isn't used.

Once I see such a 'k' switch implemented, I can post some brief demo commands that show several simple ways that encoding with hex key/iv can easily interact with $decode the way $decode currently behaves. Other than $decode also handling the hex key the same way.

The 'k' switch for hex encoding of both key and IV parms solves issues:

  • #1 Allowing all IV strings to be defined with the 4th Parameter.
    .
    Approximately 1/37 of 8 byte strings randomly created in the absence of the 4th parm contain 0x00 bytes which are followed by a different byte value, and these cannot be defined using the 4th paramater as a text string.
    .
  • #2 Currently can't define literal keys if they're not UTF8 text. Bruce Schneier's Blowfish page has 59 different test vectors, and only 4 of the 59 test keys is valid UTF8 text, and none of those is normal text in the printable range.
    .
  • #3 Allows a script to create a replacement for the default key hashing design, without needing to add another switch to define an improved hash.

    The key hashing methods are calculated correctly according to the intended design, but they're an archaic method, and a superior replacement can be scripted (moreso if HMAC itself starts accepting binary keys). An example is how $encode(string,mcir,abc,12345678) does hash the 'abc' key into a 56-byte non-UTF8 binary key, but it's a key where the first 16 bytes is the $md5(abc) string, and there are never any hashed messages keys where the 1st 16 bytes are identical to each other, but where the remaining 40-of-56 can be anything other than 1 specific string.


Once there's such a 'k' switch, I can post example scripts of how to use 'k' when scripting a superior replacement for the current methods of hashing non-literal keys, as well as doing a scripted substitute for some of the other earlier feature requests, where they were intended to help shrink the length of the ciphertext, which helps longer plaintext messages squeeze within the length of an IRC message, in spite of the needed mime encoding that lengthens the message to be a little longer than 4/3rds as long as the plain text.


Link Copied to Clipboard