|
|
|
Joined: Jan 2004
Posts: 2,127
Hoopy frood
|
OP
Hoopy frood
Joined: Jan 2004
Posts: 2,127 |
The /help says the N parameter is referencing the Nth 60-character chunk, which also can include a 61st length-byte character if the encoding method is 'u'. For 'm' and 'u', it's equivalent to say that a chunk is the text encoding of the Nth chunk of 45 bytes from the input string, because of the 3-to-4 ratio encoding for those methods. However Base32 'a' has a ratio of encoding 5-to-8, and this isn't compatible with assuming the output chunk is 60 characters. For Mime/UUencode, the length of input for 60 encoded characters is 60*(3/4)=45 bytes, but for Base32 it would be 60*(5/8)=37.5 bytes, and $decode can't decode correctly if a byte's bits are split across separate chunk strings. In this example, the $chr(255) in the 38th position has its bytes split half/half between chunks 1 and 2, and the 1-bits are not output into either chunk 1 or 2:
//var %n 1 | while (%n isnum 1-2) { bset -c &v 38 255 | noop $encode(&v,ba,%n) | var %text $bvar(&v,1-).text | noop $decode(&v,ba) | echo -a length $bvar(&v,0) $+(n=,%n,:) $bvar(&v,1-) %text | inc %n }
If you alter the /bset string to have a 0 in the 39th position following the 255, $decode does fill 4 of the 8 1-bits into Chunk 2. If you move the 255 by editing the 38 to 33, the "P" and "6" being split across 2 chunks are the same encoded text chars appearing together in the new output, so the output isn't being garbled, it's just not possible for $decode to properly decode bits that aren't entirely present in the same input string. Possible solutions: 1. The user needs to stitch 2 Base32 chunks together before trying to decode either. 2. Allow N to reference 45 input bytes for Base32 the way it references 45 input bytes for Mime and UUencode. This means Base32 needs the chunk size to be 72 not 60. 3. Allow the chunk size for Base32 to change from 60 to another multiple of 8 if 72 is too long. 4. If Base32's chunk size needs to remain 60, allow the N parameter to be also be a N1-N2 'range' value. This would allow $encode(&v,ba,1-2) to be a string that $decode can handle. Even if #4 isn't chosen to fix the Base32 problem, this feature would also have a benefit for all 3 encoding methods, allowing someone to store a longer-than-4000 &binvar across several %variables or diskfile lines, without needing to encode dozens of times and stitch them together to have several 3600-char mime %variables or lines of a text file. i.e. $bvar(&longbinvar,bm,1-60) $bvar(&longbinvar,bm,61-120) etc.
|
|
|
|
Joined: Dec 2002
Posts: 5,505
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 5,505 |
Thanks for your bug report. Can you provide examples that I can use to verify that changes I make to this feature are working? For example: $encode(input) -> current output -> expected output
|
|
|
|
Joined: Jan 2004
Posts: 2,127
Hoopy frood
|
OP
Hoopy frood
Joined: Jan 2004
Posts: 2,127 |
This alias contrasts new-vs-old behavior for the N parameter, though it doesn't support encryption switches or parameters, just buma switches. $new_encode(new,target text|&binvar, buma , N ) = new behavior $new_encode(new,target text|&binvar, buma , N ) = old behavior
//echo -a $new_encode(new,Thé qúíck brówñ fóx júmps óvér thé lázý dóg.,a,2) //echo -a $new_encode(old,Thé qúíck brówñ fóx júmps óvér thé lázý dóg.,a,2) //echo -a $new_encode(new,Thé qúíck brówñ fóx júmps óvér thé lázý dóg.,a,1-2)
For "N", new behavior is the same as old, except it considers the chunk to be 45 input bytes that have already been UTF-8 encoded, not 60 output chars, then outputs as if those 1-45 input bytes were the entire string: m=45*(4/3)=60 u=45*(4/3)+1=61 a=45*(8/5)=72 If you think 72 is too long for the output of a Base32 chunk, then can adjust Base32 to have an input length be another multiple of 5 like 35 or 40. If Base85 were implemented, the handling of the N parameter would need to change slightly so the max size is an integer. 45*(5/4)=56.25, so perhaps Base85 chunks could remain having output at 60 chars, and the input as 48 bytes, where 48*(5/4)=60 output. Also, new behavior allows n1-n2 or n1- so you can retrieve multiple chunks with 1 call. N=5-10 returns chunks 5-10, N=5- returns chunks 5-last. Under the old method, it's possible for Base32 chunk to be only a portion of the padding: /encode_test_binary old a 36 Another way of showing that contents of even-numbered chunks were garbled: /encode_test_binary old a /encode_test_binary new a
and with text: /encode_test_text old a 100 /encode_test_text new a 100
and showing effect of N being integer vs n1-n2 /encode_test_n1-n2 a 234 1 /encode_test_n1-n2 a 234 2 /encode_test_n1-n2 m 234 1 /encode_test_n1-n2 m 234 2
alias new_encode {
var %mode $remove($3,b) | if (!$istok(a m u,%mode,32)) var %mode u
var %n1 $gettok($4,1,45) , %n2 $gettok($4,2,45) , %range_len 45 , %bin $iif(b isin $3,b)
if ((%n1 !isnum 0-) && (%n1 != $null)) goto syntax
if ((%n2 !isnum 0-) && (%n2 != $null)) goto syntax
if (?*-?* iswm $4) {
if (%n2 >= %n1) var %range_len $calc((1 + %n2 - %n1) * %range_len)
elseif (%n2 == $null) var %range_len -1 | else goto syntax
}
if ($1 != new) { if ($0 >= 4) return $encode($2,$3,$4) | if ($0 = 3) return $encode($2,$3) | return $encode($2) }
if (b isin $3) bcopy -c &maroon.tmp 1 $2 1 -1
else bset -tc &maroon.tmp 1 $2
if (%n1 == $null) { if (%bin) { noop $encode($2,$3) | return $bvar($2,0) }
else return $encode($2,%mode)
}
if (%n1 == 0) { return $ceil( $calc( $bvar(&maroon.tmp,0) / 45 ) ) }
var %begin $calc(1 + (45 * (%n1 -1)) ) | if (%begin > $bvar(&maroon.tmp,0)) { return $null }
bcopy -c &maroon.tmp 1 &maroon.tmp %begin %range_len | noop $encode(&maroon.tmp,b $+ %mode)
if (%bin) { bcopy -c $2 1 &maroon.tmp 1 -1 | return $bvar($2,0) }
return $bvar(&maroon.tmp,1-).text
:syntax | echo -sc info Error:$new_encode(new|old, target , [b]uma [, N|n1-n2|n1-] )
}
; default encoding 'a' unless $1 = m or u
; default length 999 unless $2 = number in 1-99999
; default chunks shown together 5 unless $3 = number in 1-50
alias encode_test_n1-n2 {
bread $qt($mircexe) 9999 $iif($2 isnum 1-999999,$2,999) &v | var %range $iif($3 isnum 1-50,$int($3),5)
var %switch b $+ $iif($istok(u m a,$2,32),$2,a) , %n 1 , %rows $new_encode(new,&v,%switch,0)
echo -a === behavior: only new switch: %switch rows: %rows total bytes: $bvar(&v,0) chunks/line: %range
while (%n <= %rows) {
var %nparm %n $+ - $+ $calc(%n + %range -1)
bcopy -c &tmp 1 &v 1 -1 | noop $new_encode(new,&tmp,%switch,%nparm)
echo -a n= %nparm $bvar(&tmp,1-).text | inc %n %range
}
}
; default new behavior unless $1 = old
; default encoding 'a' unless $2 = m or u
; default length 256 unless $3 = number in 1-256
alias encode_test_binary {
bset &v 1 $regsubex($str(x,$iif($3 isnum 1-256,$3,256)),/x/g, $calc(\n -1) $chr(32))
var %oldnew $iif($1 == old,old,new) , %switch b $+ $iif($istok(u m a,$2,32),$2,a) , %n 1 , %rows $new_encode(%oldnew,&v,%switch,0)
echo -a === behavior: %oldnew switch: %switch rows: %rows total bytes $bvar(&v,0)
while (%n <= %rows) {
bcopy -c &tmp 1 &v 1 -1 | noop $new_encode(%oldnew,&tmp,%switch,%n) | echo -a n= %n $bvar(&tmp,1-).text
noop $decode(&tmp,%switch) | echo 4 -a decoded: $bvar(&tmp,1-) | inc %n
}
}
; default new behavior unless $1 = old
; default encoding 'a' unless $2 = m or u (old/new doesn't change for m and u)
; default length 504 unless $3 = number in 1-999 (base32 line-length error above 518 )
alias encode_test_text {
var %oldnew $iif($1 == old,old,new) , %switch $iif($istok(u m a,$2,32),$2,a) , %n 1
var %s $regsubex($str(x,$iif($3 isnum 1-999,$3,100)),/x/g,$base(\n,10,10,4) $+ _) , %rows $new_encode(%oldnew,%s,%switch,0)
echo -a === behavior: %oldnew switch: %switch rows: %rows total bytes: $len(%s)
while (%n <= %rows) {
var %chunk $new_encode(%oldnew,%s,%switch,%n) | echo -a n= %n %chunk $decode(%chunk,%switch) | inc %n
}
}
|
|
|
|
|
|
|
|