mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Jul 2006
Posts: 4,149
W
Wims Offline OP
Hoopy frood
OP Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,149
This post is meant to offer a solution to the problem mentioned here

As mIRC does not support 64 bits numbers, it's possible to do math on string.

Code
alias base2 {
  var %a 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ, %i -1, %t 0, %r 
  while ($mid($1,%i,1) >= 0) var %v $v1,%pos $pos(%a,%v) - 1,%pow $pow_($2,$calc((%i + 1) * -1)),%m $multiply_(%pow,%pos),%t $add_(%t,%m),%i %i - 1
  if ($3 == 10) return %t
  while %t >= 1 {
    %i = $mod_(%t,$3) + 1
    %r = $mid(%a,%i,1) $+ %r 
    %t = $divide_(%t,$3)
  }
  return %r
}

alias divide_ {
  var %res, %idx 1, %temp = $mid($1,1,1)
  while (%temp < $2) {
    inc %idx
    %temp = $calc(%temp * 10 + $mid($1,%idx,1));
  }
  while ($mid($1,%idx,1) != $null) {
    %res = %res $+ $int($calc(%temp / $2))
    inc %idx
    %temp = $calc((%temp % $2) * 10 + $mid($1,%idx,1))
  }
  if ($len(%res) == 0) return 0
  return %res
}

alias pow_ {
  if ($2 == 0) return 1
  var %temp $1,%n $2 - 1,%r $1
  while (%n) {
    %r = $multiply_(%r,%temp)
    dec %n
  }
  return %r
}

alias add_ {
  if ($len($2) > $len($1)) tokenize 32 $2 $1
  set -l %result
  set -l %carryover
  set -l %temp
  while ($right($1,15) isnum) {
    %temp = $base($calc($v1 + $right($2,15) + %carryover),10,10,$len($v1))
    %carryover = $mid(%temp,1-,-15)
    %result = $right(%temp,15) $+ %result
    tokenize 32 $mid($1,1-,-15) $mid($2,1-,-15)
  }
  return %result
}

alias mod_ {
  var %res 0,%a 1
  while ($mid($1,%a,1) != $null) {
    %res = $calc(%res * 10 + $v1) % $2
    inc %a
  }
  return %res
}

alias multiply_ {
  if ($len($2) > $len($1)) tokenize 32 $2 $1
  set -l %temp
  set -l %result
  set -l %multiplicand $1
  set -l %multiplier
  set -l %remaining
  set -l %interval
  set -l %padding
  set -l %carryover
  tokenize 1 $2
  while ($1 isnum) {
    %multiplier = $right($1,7)
    %remaining = $mid($1,1-,-7)
    %carryover = 0
    if (%multiplier > 0) {
      tokenize 1 %multiplicand
      while ($right($1,7) isnum) {
        %temp = $base($calc($v1 * %multiplier + %carryover),10,10,$len($v1))
        %carryover = $mid(%temp,1-,-7)
        %interval = $right(%temp,7) $+ %interval
        tokenize 1 $mid($1,1-,-7)
      }
      if (%carryover) %interval = %carryover $+ %interval
      %result = $add_(%result,%interval $+ %padding)
    }
    %padding = %padding $+ $str(0,$len(%multiplier))
    tokenize 1 %remaining
  }
  return %result
}
Credit to Dazuz for the two aliases $add_ and $multiply_, which can add and multiply two integers together, these integers can be of any length.
original $base2 alias by visionz
I created $mod_ $pow_ and $divide_, but these do not work with both integer being of any length, only the first integer can be of any length, which is enough for the purpose of $base.


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Wims #270948 31/10/22 08:19 AM
Joined: Jan 2004
Posts: 2,127
Hoopy frood
Offline
Hoopy frood
Joined: Jan 2004
Posts: 2,127
$Qbase $base3

Below are 2 aliases related to $base2 which may help $base itself be faster in general, or can show $base how to optimize speed for certain inbase/outbase combos. Qbase is a demo showing how $base can perform accurate translations between bases without ever touching a number larger than 2^53, and demonstrates how it's easy to extend support to more bases than 2-36.

* *

The first alias is $base3, which is much faster than $base2, but accomplishes that by replacing most of the doubles-mode-only aliases with $calc itself doing math in bigfloat mode. That means $base3 can return correct results only if using beta 1603+later. $base3 still needs to use an alias to avoid using the ^ operator because that can still produce wrong results for large outputs, specifically those longer than 2^256-or-so.

* *

The 2nd alias is Qbase, which like $base2 is compatible with doubles-only past versions, so $base2 and $qbase can both return accurate results from v7.71-no-beta before $bigfloat existed.

However, Qbase uses a different design than the design of $base2 and $base3, it can be faster in doubles mode than $base2.

Even though $qbase does not use any bigfloat features, it is competitive with the speed of $base3-pow3, even though $base3 is able to substitute a single $bigfloat $calc() in place of many doubles-range $calc() being done by $qbase. So, perhaps there are things in the $qbase design which could help bigfloat mode $base become faster.

The original idea for $base3 was to make $base2 more speed competitive, and I assumed I could modifying the $bf_modpow script to calculate the numbers the same way except without using a modulus. I hoped that would let me use it as an alias that I could hotswap it in place of the $pow_ alias used by $base2, which seemed to be the source of the $base2 slowness and the rounding problems in $base. To make it doubles-safe, pow_3 needed to use the other 'multiply_' alias used by $base2. However some of the other aliases used by multiply_ were designed to only handle when 1 of the terms was small, so when they were fed the square-and-multiply output from $pow_3, they couldn't handle it.

So instead, I did a further make-over by keeping pow_3 as a replacement for the problematic ^ operator, but used bigfloat $calc math in place of the other doubles-only aliases. So now $base3 uses only pow_3 as an alias, and only does that because the ^ operator in $calc isn't always correct. As soon as beta is published where ^ is fixed, then $qbase can easily jump to use that instead.

Quote
Possible fix for $calc ^

When the exponent is an integer, it's possible that the $calc ^ operator using something like pow_3 as the back-side would be fast and accurate because there's nothing non-integer about this, it's just multiplying and then checking to see if all the bits of the exponent are used up. The native code likely could replace the $calc(%base.bf * %base.bf) with using m_apm_square as I mentioned for $powmod. Note that this is just quick validation that doesn't account for negatives. For that, base^1/integer could use 1/(base^integer).

Quote
alias calc_^_operator return pow_3($1,$2)

alias pow_3 {
var %base.bf $1 , %exponent.bf $2 , %answer.bf 1
if (%exponent.bf < 2) { if ($2 == 0) return 1 | if ($2 == 1) return $1 }
if (%base.bf < 2) return $v1
while (%exponent.bf) {
if ($calc(%exponent.bf % 2)) var %answer.bf $calc(%answer.bf * %base.bf)
var %exponent.bf $calc(%exponent.bf //2)
if (%exponent.bf) var %base.bf $calc(%base.bf * %base.bf)
}
return %answer.bf
}

As demo programs, $base3 and $Qbase also show how $base can easily be modified to support additional 'number bases'.

There have occasionally been scripters having the need for something that can compress a lot of bits into a shorter text string. For one example, they had a huge database that was too large to keep in memory, so they wanted to keep hash(long strings) in ram and put the full data on disk. Then when they need to search if they have something, they can search their index in ram to see if there's a matching hash, and if so, they can do the slow disk read to see the data associated with their match, rather than searching a huge disk file for every no-match.

They wanted an index that was as condensed as possible, so they made a hex hash that they used $base to translate to base36, but that was still 'too big', so they were wanting something even more condensed.

So, $base3 and $Qbase show how to translate to/from all bases from 2 through 64 without much difference from the code that doesn't support it. The most complicated part was handling the fact that '0' isn't zero in all of those bases, and that wasn't difficult at all. And I could easily have supported bases larger than 64 by simply defining a bigger alphabet for those larger bases, and I did to an example alphabet for base 85. For base 2-36 I continued letting it have the quirky behavior of letting all 36 bases use all 36 letters in the input string, but since base37+ need to use the lower-case alphabets, those bases can't be case-insensitive inbases like how $base handles 2-36 now.

For base 63-64, I let them use the same alphabet that $encode base64 does, so you can translate between hex and mime by using base 16 and 64, but only when the length of input base=16 is a multiple of 6 or input base=64 string is a multiple of 4.

I interrupted the bases 37-62 so that 52 would use a case-sensitive alphabet containing only the upper+lower alpha characters, which is another case where zero is not '0'.

As with $base2, these aliases support ONLY integers correctly. For Qbase I added tighter restriction for input values, so they reject anything that's not in whatever alphabet they're using, because that lets me add cheater shortcuts. So floats are invalid because '.' isn't in the alphabet.

Quote
Possible alternate way for $base to handle fractions!

My assumption is that the vast majority of usage of $base against large numbers is when they are integer, and any use of $base where a fraction is wanted in another base is most likely going to be for base=10 and hex, so it seems reasonable that $base can have an optimized handler for integers, and the input float strings could get treated like red-headed step-children.

However, if it's not already doing this, the built-in $base can be easily modified to handle big floats efficiently by handling the input as 2 separate string manipulations. It could use the Qbase style to handle $gettok(input,1,46). If there's no decimal following it, then that's the only string to return, barring any zero padding.

If $base isn't doing so already, a fast way to translate digits of a float between bases is to treat the float number Integer.Frac1 as 2 separate integer base conversions, which enables $base to pretend it doesn't even know what a fraction is, while still outputting accurate fractions.

The 'Integer' portion could be converted between bases as if it were the entire input, then the input fraction to the right of the decimal could be adapted based on the inbase/outbase, and then translated from the intermediate number into a 'fake integer' encoded as the outbase.

This method could easily be handled by an alias that would call $Qbase to get the conversion for the Integer portion, and then if there's a fraction, the alias could do big-integer math to modify the fraction string as demonstrated below, and then call $Qbase again to translate that also into the new base. The alias would then return %result1 if no fraction, or return $+(%result1,.,%result2) if there is.

The method is to simply treat Frac1 as an inbase integer whose length is Frac1D. If Inbase is not 10, then Frac1 would need to be translated to base 10. The calculation would be slightly different than mentioned if the input is a negative.

The executable code would do this slightly differently if they use an internal format that's not base 10, but the principle is the same. After calculation of...

Frac2 = $round($calc(Frac1 * (inbase^Frac1D) / outbase^Frac1D),0)

you can then do integer base conversion against Frac2 as if an integer string that's appended after the decimal. Example:

Quote
$pi .bf = 3.14159265358979323846264338327950288419716939937510
%Integer = 3
%Frac1.bf = 14159265358979323846264338327950288419716939937510
Frac1D = 50
Inbase = 10
Outbase = 16

%Frac2.bf = $calc(14159265358979323846264338327950288419716939937510 * 16^50 // 10^50)

%result_is.bf: 227530621841023045347523864218926005325435324977874670807984
%should_be.bf: 227530621841023045347523864218949028648034440140275111685147

The above %result_is.bf calculation will almost certainly return the correct answer along with the fix of $calc's ^ operator. So treat the %should_be.bf result as an integer that needs to be translated from base10 to base16

Quote
%fraction.bf = . $+ $base(%should_be.bf,10,16)

//var -s %fraction.bf = . $+ $base(227530621841023045347523864218949028648034440140275111685147,10,16)

%fraction.bf is: .243F6A8885A308D313198A2E03707344A40938222771A7EC1B
true hexdigit::: .243F6A8885A308D313198A2E03707344A4093822299F31D0082EFA98

This 'integer' is small enough to be within the accuracy range of $base(), so comparing the base conversion of %fraction.bf with the actual hex digits of PI found in Blowfish source code, shows that the first 41 digits of the fake base16 fraction pseudo-integer are accurate compared to a 50-digit base10 fraction. Which of course is reasonable since it's now in a higher base, just like translating from base10 to base2 would have many more accurate digits. The expected accuracy when converting a 50-digit base10 fraction to base 16 would be

50 * $log(10) / $log(16) = 41.52 digits, so the above has the expected level of precision, indicating this cheat is legit.

Unfortunately, both $base2 and $qbase generate the output by generating the leftmost digit last of all, so you can't just stop the .fraction result when you have enough digits, except perhaps from using (Frac2 // (outbase^something) ) to shorten the result you feed to $base, but I've not tested how that affects the output, such as whether that should be division rounded to 'nearest integer' or just chopped.

* *

$qbase could potentially be a different way for $base to translate large integers between bases accurately without doing any math involving a number outside the doubles range. In fact, I have debugging code inside the alias to detect if it ever ends up using a number larger than 2^53, which would mean the answer is almost surely wrong, and I need to fix something.

Any native code that translates between bases would likely use either the $base2 method or the $qbase method, except the executible's intermediate state between inbase and outbase would be a native data-type such as uint32, because it just degrades performance if it needs to translates an integer to double or 'text' just to do math on it, before then translating back to uint32.

The steps to translating between bases are to translate the input number out of 'inbase' into whatever format it needs to have for its internal intermediate state, before then translating that internal state into whatever is the outbase, 1 digit at a time.

For $base2 and $base3, the internal state is a base10 string identical to what outbase=10 would use, so the internal state gets a free 'i am done' when outbase=10.

For $qbase, the internal state is a series of chunk pockets chained together where each holds a base10 string whose value can be as large as inbase^N, and are chunks of the actual output like $base2 does - only if both inbase=outbase=10. But that's of little help since below shows there's a much faster way to optimize that combo when it doesn't need to support $base(ZZ,10,10)

The longest part of the base conversion for both methods is converting the input number from the 'inbase' to the internal storage format.

Qbase does things somewhat differently than $base2 does, more like a pen-and-paper long division, except each 'digit' in the 1st number is now itself a number that clusters together several inbase digits without having ( (number_A + carry) / number_B) being a result above 2^53. This has 2 advantages over using m_apm_integer_pow or base2's pow_ calculations against huge numbers.

One advantage is that it doesn't need to use exponents in order to correctly handle very large numbers. It uses the add/subtract/multiply/divide/modulo calculations done by $base2 and $base3, but it always uses them against small-enough numbers whose results that are not too large to overflow whatever is the datatype used internally, which for this script is 2^53, and for the executable would probably be uint32.

Also, by grouping several input digits together, this means there can be fewer than 1/10th as many links in the calculation chain in order to arrive at each digit. You can look at this another way, from the example of conversion to base 16, where you need to keep dividing a big number by 16 in order to get each digit. You could instead divide by 256's and end up with something you can turn into two hex digits, which means can get the 2 digits from the same price as needed to get just the 1 digit, and you'll finish quicker.

In fact, that's an optimization I hadn't yet tried to do. However, my method allows each pocket to hold numbers very close in size to 2^53, while the method of grabbing several output digits at the same time would limit the size of the number in each pocket chunk. It would also need to sanitize the final number to avoid having leading 0's that shouldn't be there.

When printing a number to outbase=36 that's stored internally as a 100-digit integer, $base3 - and probably $base itself - perform a single $calc(big-integer % 36) calculation to obtain that first digit of the base36 result. $base2 would use 100 divisions in order to get that same result. $Qbase instead stores that 100 digit number as somewhere around 8-9 different chunks depending on the inbase, and would only need those 8 or 9 $calc(double % 36) to get the same result. The $qbase timing is competitive with the $base3 time, in spite of being penalized by the overhead of using many more extra scripting commands, and there's not much difference in the timing between $calc(small % 10) or $calc(big % 10).

I believe that native code doing big-math operations like $base3 does could be faster if it switched to a series of uint32 calculations of the style Qbase is doing, instead.

Since $base appears to be affected by the same thing that caused problems for the $calc ^ operator, it looks like $base could benefit from doing something like $qbase does, which internally does several small calculations instead of 1 really big one.

The speed advantages for the $base3 vs $qbase method can be different for different lengths. The $base3 calculations begins to overcome the speed of $qbase in the benchmark table below when the string size becomes really large, but again, that might simply be a penalty from the scripting engine. As an exampe of the relative speeds:

number = $sha512(abc) base16 -> 10

$base2 = 11700 ticks
$base3 = 15 ticks using $calc ^ with errors above 2^260-ish
$base3 = 79 ticks using scripted pow_3 in place of ^ exponent
$qbase = 115 ticks

But at 128-bit length, $qbase had the speed advantage over $base3, and the $Qbase disadvantage at 512 could disappear if both alternatives were as C code.

I'm not ragging on $base2, especially since $base3 clones its design exactly except for the subroutine aliases getting called. Just noting the comparison that if $base is using elements of a method similar to $base2 or $base3 that can be switched to the Qbase method, that might speed things up for $base while avoiding any any exponentiation slowdowns.

* *

Quote
Qbase speedups

Qbase also has some additional features not seen yet in $base, and some of them could be grafted into $base:

* Support for additional bases up to 64 (And I could have added a base94 alphabet to use all the characters from '!' through '~' but it can be tricky for scripts to deal with numbers that could translate to a base94 string like an actual pair of double quotes, a single | character etc. Included in Qbase is a sample 85-item 'base' that avoids using most of the symbols having special evaluation by mIRC, and is the right size of an alphabet to translate between 5 text and 4 binary. This means if you convert from hex to base85, each 8 hex digits is zero padded to 10 base85 digits, and I could have included an optimization to handle conversion between base=10 and base=85 in chunks of these sizes.

* Optional 'enhanced mode', which disallows invalid characters that can't be generated when that inbase is used as the outbase, such as $base(zzz,2,10)

* Within 'enhanced mode', certain inbase/outbase combos can be optimized to be incredibly faster instead of using the default method that needs to support $base(zzz,13,34)

The alias would have been much shorter if the only thing it did was to support bases 2-36 in the mode where it allowed all 36 alphabets to use the same length 36 alphabet. But, Qbase shows there are 2 kinds of speed-up advantages, which can allow $base to increase speed for the 99.9% of $base users who don't need to support invalid numbers when trying to use Z in a base smaller than 36, and who don't need to use the same slow method of translating between bases 10/16 as it would for translating between 17/31.

One kind of optimization depends on specific in/out pairs, but there's also a minor speed benefit by being able to squeeze more digits of the input into each middle-step 'pocket' if you don't need to support smallest inbase numbers potentially having 'Z' be valid.

* *

Normal mode $Qbase(number,10,16) -> Enhanced Mode $Qbase(number,10,16X)

To enter the 'enhanced mode' which enables various speed optimizations, I just chose to make Qbase remove 'X' from $3 when using that as outbase, but then look to see if $3 contains the 'X' when checking to see if it can enter enhanced mode. But, I could've easily just had that be an extra parameter. So $qbase(ZZ,2,10) is still valid, but $qbase(ZZ,2,10X) quits with an error because of the illegal Z character not found in the 2 items of the base=2 alphabet.

I continued using base 2-36 as having the existing 0-9A-Z alphabet. For my additional alphabets, I made 64 use the same alphabet as mime does, but it generates the identical string that the mime in $encode does if the output is a multiple of 4.

$base limits the zero-pad to 100, but since Qbase uses string manipulation, the sky's the limit.

I made base=62 be an alphabet the same as base=36, except where the extra 26 digits is the entire LOWER CASE alpabet, so just like with base64, all bases 37-64 would need to be handled as case-sensitive strings. Bases 37-62 would use anywhere from the single lowercase 'a' in base 37 through all 26 lowercase letters in base 62. For base 63, should anyone actually want to use that, it would be the first 63 digits of the base64 alphabet, since there wasn't a better choice offhand. Though the script can easily be designed to have any alphabet you want for base63.

So, a random alphanumeric string in base36 would be

//var -s %a.bf $rand(0,$calc(36^8 -1)) , %password $qbase(%a.bf,10,36,8)

... and in base62 it would be:

//var -s %a.bf $rand(0,$calc(62^8 -1)) , %password $qbase(%a.bf,10,62,8)

Since there was no backwards compatibility to be maintained for the new Qbases, I made it so that using any of the 37-64 bases has the same behavior as appending 'X' to the outbase, which is to enforce that the input number contains only the N digits of an alphabet matching the N used as inbase. This means that $base(Z,inbase,52) would not be valid with inbase 2-35 because using base 37-64 as either the inbase or outbase flips Qbase into the mode having the strict rules for the input alphabet.

As an override, I gave a different behavior for base=52, which does not contain any numbers, but instead contains the uppercase A-Z followed by the lowercase a-z. So 52 is the only number in the 37-62 bases that doesn't share the same alphabet as its neighbors.

There's nothing in Qbase that prevents it from working if you change the max size for a valid alphabet above 64 and then define the digits of another alphabet to match the new size. You could also do similar as I did with base=52 to define base=32 to match the base32 alphabet used in $encode, or even to make base=64 match the uuencode alphabet in $encode, though you'd need to edit the optimization that uses $encode(,m).

A sample alphabet above base=64 is in Qbase, which has an 85-item alphabet which starts with the 94 characters from "!" through "~" then removes some of them that are likely to cause problems in string evaluations, and after removing "#$%(),\| the remaining characters are an alphabet that could store a number in extremely dense format without being binary or UTF8.

The alias is longer than it would normally be, partly to support the $3 $+ X mode for alphabets needed for base 37-64, as well as additional code as a demo for the kind of optimizations that can be used in $base itself, but only if they're in 'enhanced mode' which invalidates 'illegal' input strings.

* The best example of optimizing $base is:

For example, it's common practice by scripts to do $base(number,10,10,length) where they simply want to zero-pad '0' digits to make smaller integers be at least as long as 'length'. Just like it's commonly used for hex numbers, to pad them to length 8 to make them look like a 32-bit number, etc.

However, $base current behavior always supports the possibility the input could have illegal alphabet letters that need to be given values 10-35, so $base is always translating 10-to-10 in a mode that translates the input number from base10 to internal binary, then decodes that to the output format, just in case someone wanted $base(Z,10,10,5) to be output as 00035. And only at that point does $base look to see if it needs to insert some extra zero padding. You can also see the same thing going on when handling long hex strings, where $base($sha512(abc),16,16) was always returning the original hex string with a lot of zeroes.

When in the enhanced 'X' mode, if $Qbase sees that inbase and outbase are the same numbers, it simply jumps immediately to the 'display it' line, then checks to see if it should use string manipulation to insert some text '0' in front of it. And the padding is not really the '0', it's whatever is the 1st character of the output alphabet, so where outbase 37-64 have capital 'A' as the 1st digit of their alphabet, those outbase would insert 'A' padding instead. And the demo outbase=85 would insert '!' as padding.

The only things my demo doesn't do is strip leading zeroes from $Qbase(00009,10,10X,3) or make sure base 11-36 strings are uppercase, but that could also be solved by an extra step of string manipulation.

This is an optimize that $base could easily make if it had an enhanced mode and recognized matching bases or certain other in/out pairs that can take advantage of shortcuts that don't require lots of calculations. $base doesn't appear to be doing so, since $base(number,10,10) takes the same time as when changing one of the number digits to a letter.

- -

Some of the other optimizations in the script are things that would be easily adaptable to executable code, since all they do is string manipulation. For example, when converting between base 16 and 2 - other than ensuring 2 as the inbase needs to zero-pad the input number to have a length that's a multiple of 4, going either direction between 16 and 2 can be a simple translation between a hex digit and a lookup-table of 16 items containing a group of 4 text 0/1 chars, and the speed for this is extremely fast both directions compared to translating large numbers between bases 2-to-15 or 16-to-3.

An optimize that Qbase couldn't take advantage of, but that would be available for executable code, would be to translate digits of an inbase=base16 string into the 4-bit 'nibbles' of uint32 binary storage, without needing to do all the multiplication across 'pockets' that would be required for other bases. All it would need to do is pretend there are the needed leading '0' hex digits that would make the input length be compatible with the unit size of the internal binary storage variables.

Inside Qbase i'm using $base itself for quick translation between bases because things like that would be available to executable code, and I'm also using it as a more accurate equivalent than the 6-digit output of $log for determining the max number of digits that can fit in each 'pocket'. For bases 37-64 I can't use that shortcut, so for those I just count how many times I can multiply a number by 'inbase' without overflowing the doubles limit, and for executable code this is probably faster than calculating $log, unless they're able to access a math chip.

For inbase=64 I can 'cheat' and use the mime code from $encode. All I need to do is insert enough 'A' padding to make the input string be a length that's a multiple of 4, and $encode will 'mime' it into being a &binvar that's exactly the same way that a huge number would be stored in 'network' order aka 'big-endian'. Unfortunately, while it's a wildly faster shortcut for the inbase=64 outbase=16 pair, it didn't turn out to be a help when base64 is paired with the other outbases. If $base added more valid inbase/outbase to include a base64 that matches the mime alphabet, it wouldn't need to call the mime subroutine, but the way it would handle base64 input can be 99% identical to translating inbase=64 to the internal byte storage.

While $base2 has the instant optimization available due to the internal storage being identical to outbase=10, none of my optimizations are able to have a quick way of translating from the array chain of 'pockets' of the internal storage, except for inbase=outbase=10 when not using the string manipulation shortcut due to not being in $3=X mode. So the only optimizations I have are the kind which can speed up putting the numbers into the internal storage, or which bypass the internal storage entirely through string manipulation or doing fake mime etc.

While Qbase obviously cannot be nearly as fast as the built-in identifier, especially considering it uses $base to 'do its thing', I'm hoping that some of the ideas can be applied to executable code used by $base to make it faster, if $base is doing some things like $base2 does which can be sped up by using a method similar to how $qbase does them.

maroon #270949 31/10/22 08:21 AM
Joined: Jan 2004
Posts: 2,127
Hoopy frood
Offline
Hoopy frood
Joined: Jan 2004
Posts: 2,127
Warnings about using $Qbase.

This is faster than $base2 because I'm using a different internal storage that doesn't need to use $calc(input-digit ^ not-trivially-small-number) to store digits into the intermediate state as the authentic outbase=10 string. Part of my speed comes from trying to be aggressive in squeezing as many input digits as possible into each pocket of the 'working' array. While testing this, I found some examples where I had optimized an inbase/outbase combo intended to give it a better speed, only to find it was benchmarking slower than the non-optimized variant. It turned out the cause of the slowdown was that the optimize wasn't putting as many digits into each 'pocket' as it should, which caused the "internal -> outnumber" translation to be slower than it needed to be. In many cases, $base(number,inbase,outbaseX) is faster with the 'X', than without it, simply by being able to skip supporting the 'Z' in the 2-35 bases and squeeze more digits into the pocket. For example, if when inbase=2 allows only the 0-1 digits being valid, you can fit more digits into a 'pocket' number that's within the 2^53 doubles range than if you need to defend against inbase=2 having 'larger' digits like '2-9 A-Z' being valid.

This warning is that there may have been some edge cases I need to fix, where for some in/out syntax I allow it to put too large of a number into the internal 'pocket', and the 'carry' value between pockets in the chain ends up being large enough to cause calculation results above 2^53, and the script halts with an error. I believe I've stamped them out, but if any forum users find any, please let me know the number/inbase/outbase combo that triggers the error.

To the best of my knowledge, Qbase always returns the correct result for non-negative integers, same as $base2 can do, unless it quits with an overflow error due to allowing an internal pocket to be too large, or if I somehow made a mistake in one of the inbase/outbase pair optimizations. It was harder to test base 37-64 to verify they return the correct answer, since I couldn't test the result gainst $base.

I could've tested Qbase against $base2 by modifying the 'alphabet' string it uses, but that would've been very slow against large numbers, and some problems I fixed didn't appear until the numbers were large-enough. But I did test Qbase 37-64 to see if using swapped inbase/outbase against the 1st Qbase output would match the original input number.

This is not a rag on $base2, which was written for brevity and simplicity, and it doesn't try any optimizations except for the instant outbase=10 which Qbase can't do. And $base2 was very helpful while testing numbers that $base could not yet report accurately. The intent for showing the differences in speed is the hope that these same speed gains can help $base itself become better stronger faster.

This next performance table has a few examples at different lengths, to show how doubling the bit length of the input number changes the speed. And, assuming that $base has some elements similar to the design of $base2, applying some of these changes can speed things up for the built-in $base.

Each section of the table uses a number which originated from a hex number of the stated length, which had been previously translated from hex to whatever is the value in that inbase. So, while the original 512-bit number would be an input string of 128 digits in base16, it would be a longer input string than in base10, and would be a shorter string in base36.

The columns indicate the approximate time in $ticksqpc needed by each script to translate between bases. The 1st 2 columns are for $base2 then $base3, and the last 2 columns are from $qbase being either with/without the 'X' being part of $3 where many of the optimizations can be enabled. Since Qbase couldn't take advantage of $base for some internal calculations, base 37-64 would be slightly slower than when using base 36.

While some of my optimizations look like they have very little savings as far as total ticks, I look at it as a detectable percentage of improvement that can help $base be faster in the in/out combos most commonly used. Just like my $bf_modpow() alias was suffering an 800% speed difference when doubling the bit length of the input, I'm seeing a lot of that here in both Qbase and $base2 in how they perform at different lengths. And though my optimizations don't have quite the boost at really long lengths, I believe much of that is because of the need to use really long strings and many 'pockets' and many commands when handling these longer numbers.

Qbase does not attempt to support extremely long numbers in some optimizations to/from hex, and the /bset method requires extra code to support super long hex strings that Qbase can't. For example, in order to 'cheat' and use $regsubex and /bset to translate to/from hex format quickly, that can require feeding a string to /bset which can be double the length of the resulting hex string. Each pair of 2 hex digits is created using an input to /bset having an extra space plus a 1-3 digit integer, so it can require a length 200 string to create 100 hex digits. In other words, if translating a string from mime/inbase=64 to hex where the length is 6000 hex digits, that could fail because of the internal step needing to create a /bset command using a string that's longer than $maxlenl.

Below lists the (inbase,outbase,optional pad parameter) for that row's times
The times for $base2 fluctuate greatly because the long interval allows extra interruptions.
Code
Columns:

A	$base2(N,in,out)
B	$base3(N,in,out)*
C	$base3(N,in,out)
D	$qbase(N,in,out,pad)
E	$qbase(N,in,outX,pad)

* The difference in B and C is that B is faster because it uses $calc(base^exp) while C uses $pow_3() script. $pow_3 returns accurate results but is slower because it must loop 9 times for a 512-bit number, for every digit of the input string. $calc() as a built-in identifier is faster, but can have inaccurate results when the result is greater than around 2^256.

N = $md5(foo)<tab-complete>
		A	B	C	D	E
(16,10)		360	4	15	15	12
(10,16)		600	7	21	14	10
(10,10,200)	530	5	18	3	1
(10,36)		590	7	20	11	8
(36,10)		220	3	11	15	12
(16,2)		650	13	23	43	2
(2,16)		5700	16	77	21	1
(17,31)		420	6	17	11	8
(10,64)		n/a	7	20	11	7
(64,10)		n/a	2	9	15	15
(16,64)		n/a	5	16	10	1
(64,16)		n/a	5	11	13	2
(10,52)		n/a	7	21	11	8
(52,10)		n/a	3	10	15	15

N = $sha512(test)<tab-complete>
		A	B	C	D	E
(16,10)		11000	15	79	135	115
(10,16)		17000	27	106	116	99
(10,10,600)	16500	18	92	13	2
(10,36)		17200	26	104	92	78
(36,10)		7000	12	59	126	118
(16,2)		15900	55	113	416	2
(2,16)		182000	68	389	143	2
(17,31)		12000	23	85	94	78
(10,64)		n/a	24	103	82	68
(64,10)		n/a	10	49	135	128
(16,64)		n/a	21	85	81	3
(64,16)		n/a	25	59	115	3
(10,52)		n/a	18	101	86	70
(52,10)		n/a	11	51	128	128

Just to see how the performance works at huge sizes, I tested to see how fast Qbase can translate a 2048-bit number to base10, and it's definitely too slow to use any of these aliases as a hotswap for $base. But even though Qbase starts to be slower than $base3 at these lengths, I'm not so sure the difference wouldn't be the other way as executible code.

N = 2048-bit = $str($sha512(test),4)<tab-complete>
		A	B	C	D	E
(16,10)		not 	64	440	1830			1640
(10,16)		not 	132	530	1540			1400
How to benchmark the different aliases:

//var -s %inbase 16, %outbase 36, %pad 0, %num $str($sha512(abc),1) , %num $qbase(%num,16,%inbase) , %b2 $base2(%num,%inbase ,%outbase,%pad,1), %b3 $base3(%num,%inbase,%outbase,%pad,1) , %qn $qbase(%num,%inbase,%outbase,%pad,1) , %q2 $qbase(%num,%inbase,%outbase $+ X,%pad,1) | if (%b3 !=== %q2) echo 4 -a fail base3: $v1 vs qbase: $v2 | else echo -a ok

The above editbox command creates an arbitrary hex string, or you can make your own string. It changes the hex string into whatever you use as %inbase, and then it calls $base2 $base3 and the 2 formats of $qbase where it either does/doesn't have 'X' as part of $3, and displays to check their time.

I've pasted below a modified $base2 that enables it to do like the other aliases, which is display debug info when $5 is non-zero. But you would need to continue using the other aliases it calls found in the earlier post, because I didn't change those.

The $base3 has a comment line showing where you can edit the script between using the slower call to the pow_3 alias or to try to use $calc(number^number). This is currently accurate only through around 2^256, but should eventually be fixed.
Code
;$base2(number,inbase,outbase,ignored,1=debug)
; using this alias in place of the base2 alias pasted above can report time taken
alias base2 {
  var %a 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ, %i -1, %t 0, %r, %time $ticksqpc
  while ($mid($1,%i,1) >= 0) var %v $v1,%pos $pos(%a,%v) - 1,%pow $pow_($2,$calc((%i + 1) * -1)),%m $multiply_(%pow,%pos),%t $add_(%t,%m),%i %i - 1
  var %debug debug: $2 -> 10 : ticks $calc($ticksqpc - %time)
  if ($3 == 10) { if ($5) { echo -ag %debug } | return %t }
  while %t >= 1 {
    var %i = $mod_(%t,$3) + 1
    var %r = $mid(%a,%i,1) $+ %r
    var %t = $divide_(%t,$3)
  }
  if ($5) { echo -ag %debug tot: $2 -> 10 -> $3 : ticks  $calc($ticksqpc - %time)  }
  return %r
}

; only other aliases called by base3 is the pow_3 beneath it
;$base3(number,inbase,outbase,zeropad,1=debug)
alias base3 {
  var %a.2-62 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
  var %a.64   ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
  var %a.in  $iif($2 isnum 2-62,%a.2-62,%a.64)
  var %a.out $iif($3 isnum 2-62,%a.2-62,%a.64) ,  %i -1, %t.bf 0, %r, %time $ticksqpc
  var %pad $4, %flag $5 | if ($2 isnum 2-36) tokenize 32 $upper($1) $2 $3 %pad %flag
  if ($2 == 52) var %a.in  $left(%a.64,52)
  if ($3 == 52) var %a.out $left(%a.64,52)

  ; pick ONE. 1st line is the less-fast that calls pow_3 alias. 2nd line is faster calling $calc but has errors above 2^256
  while ($mid($1,%i,1) != $null) var %v $v1,%pos $poscs(%a.in,%v) - 1,%pow $pow_3($2,$calc((1+%i) * -1)),%m.bf %pow    * %pos ,%t.bf %t.bf + %m.bf,%i %i - 1
  ;while  ($mid($1,%i,1) != $null) var %v $v1,%pos $poscs(%a.in,%v) - 1,%pow.bf $calc($2^ ((1+%i) *-1) )    ,%m.bf %pow.bf * %pos ,%t.bf %t.bf + %m.bf,%i %i - 1

  var %debug debug: $2 -> 10 : ticks  $calc($ticksqpc - %time) 
  if ($3 == 10) { var %r %t.bf | goto returnval }
  while %t.bf >= 1 {
    var  %i    = $calc(%t.bf % $3) + 1
    var  %r    = $mid(%a.out,%i,1) $+ %r
    var  %t.bf = $calc(%t.bf // $3)
  }
  :returnval
  if (%flag) { echo -ag %debug tot: $2 -> 10 -> $3 :  ticks 8,4 $calc($ticksqpc - %time)  }
  if ($len(%r) < %pad) var %r $str($left(%a.out,1),$calc($v2 -$v1)) $+ %r
  return %r
}
alias pow_3 {
  var %base.bf $1 , %exponent.bf $2 , %answer.bf 1
  if (%exponent.bf < 2) { if ($2 == 0) return 1 | if ($2 == 1) return $1 }
  if (%base.bf < 2) return $v1
  while (%exponent.bf) {
    if ($calc(%exponent.bf % 2)) var  %answer.bf $calc(%answer.bf * %base.bf)
    var %exponent.bf $calc(%exponent.bf //2)
    if (%exponent.bf) var %base.bf $calc(%base.bf * %base.bf)
  }
  return %answer.bf
}

; only alias called by $Qbase is the validation at top
; $Qbase(number,inbase,outbase,zeropad,1=debug)
; bases can be 2-64 or 85, with zero being '0' only in base 2-36
; if $3 outbase contains 'X' then enables example speed optimizations for certain in/out combos
; strict enforcement of N-digit input number if $3 contains X or if either inbase/outbase > 36
alias qbase_validate_alphabet { var %pattern $+(^[,$2,]+$) | return $regex(foo,$1,%pattern) }

alias qBase {
  var %num $1 , %inbase $remove($2,X) , %outbase $remove($3,X) , %pad 0 $+ $4 , %v1 $v1 , %v2 $v2
  if (($replace(%inbase,85,2) !isnum 2-64) || ($replace(%outbase,85,2) !isnum 2-64) || (%pad !isnum 0- $maxlenl) || ($1 == $null)) goto syntax
  var %alphabet.36 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
  var %alphabet.52 ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
  var %alphabet.62 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
  var %alphabet.64 ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
  var %alphabet.85 !&'*+-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{}~

  if (!$regex(foo,%num %inbase %outbase %pad,^[A-Za-z0-9/+]+ [0-9]+ [0-9]+ [0-9]+)) goto syntax
  if     (%inbase isnum  2-36) var %alphabet.in %alphabet.36 , %fake.base 36 , %num $upper($1)
  elseif (%inbase ==       52) var %alphabet.in %alphabet.52 , %fake.base 52
  elseif (%inbase isnum 37-62) var %alphabet.in %alphabet.62 , %fake.base 62
  elseif (%inbase isnum 37-62) var %alphabet.in %alphabet.85 , %fake.base 85
  else                         var %alphabet.in %alphabet.64 , %fake.base 64
  if (X isin $3) var %alphabet.in $left(%alphabet.in,%inbase) , %fake.base %inbase
  if ((%inbase == 16) && (0x* iswm %num)) var %num $mid(%num,3)
  if (!$qbase_validate_alphabet(%num,%alphabet.in)) goto syntax

  if (%outbase > %fake.base) var %fake.base $v1
  if     (%outbase isnum  2-36) var %alphabet.out %alphabet.36
  elseif (%outbase ==       52) var %alphabet.out %alphabet.52
  elseif (%outbase isnum 37-62) var %alphabet.out %alphabet.62
  elseif (%outbase isnum    85) var %alphabet.out %alphabet.85
  elseif (%outbase isnum 63-64) var %alphabet.out %alphabet.64
  ; for alphabets beginning with not-zero, use 1st digit of alpabet
  var %out.zero $left(%alphabet.out,1)

  bset -tc &qbase.alphabet.in  1 %alphabet.in
  bset -tc &qbase.alphabet.out 1 %alphabet.out

  ; support size of max(fakebase,inbase,outbase)
  var %carry , %working , %out.num , %pocketmax $calc( 9007199254740991 // %fake.base )
  ; all lines mentioning maxcarry are just debug to make sure didn't overflow 2^53
  var %QBase.maxcarry 0 , %ticks $ticksqpc , %debug

  if (x isin $3) {
    if ( %inbase  == %outbase) {
      ; faster string manipulation for commonly used zero padding like $base(num,16,16,8)
      var %out.num %num
      if (%out.zero $+ ?* iswmcs %out.num) var %out.num $regsubex(foo,%out.num,^ $+ %out.zero $+ +(?!$),)
      goto display_it
    }
    if (%inbase == 10) {
      ; Optimizes Base 10->anything to avoid mult/div/mod
      var %n $len(%pocketmax) - 01 , %-n 0 - %n , %carry_mult 10 ^ %n
      while (%num != $null) { var %working $right(%num,%n) %working, %num $left(%num,%-n) }
      goto ten_to_outbase
    }
    :continue_from_inbase_64
    if (%inbase == 16) {
      if (%inbase %outbase == 16 64) {
        ; Optimizes Base 16->64 to avoid mult/div/mod 6@4 -> mime 4@6 bytes
        if ($calc((((6- $len(%num)) % 6)+6) % 6)) var %num $str(0,$v1) $+ %num
        bset -c &qbase.tmp 1 $regsubex(foo,%num,/(..)/g,$base(\t,16,10) $+ $chr(32) )
        noop $encode(&qbase.tmp,bm) | var %out.num $bvar(&qbase.tmp,1-).text
        if (A?* iswmcs %out.num) var %out.num $regsubex(foo,%out.num,^A+(?!$),)
        goto display_it
      }
      if (%inbase %outbase == 16 2) {
        ; Optimizes Base 16->2 to avoid mult/div/mod
        var %out.num $replacexcs(%num,0,0000,1,0001,2,0010,3,0011,4,0100,5,0101,6,0110,7,0111,8,1000,9,1001,A,1010,B,1011,C,1100,D,1101,E,1110,F,1111)
        if (0?* iswmcs %out.num) var %out.num $regsubex(foo,%out.num,^0+(?!$),)
        goto display_it
      }

      if (%inbase == 16) {
        ; Optimizes Base 16->10 to avoid mult/div/mod
        if     (%fake.base isnum  2-16) var %n 12
        elseif (%fake.base isnum 17-36) var %n $len($base(%pocketmax,10,16)) - 1
        else {
          var %carry_mult %fake.base , %n 1
          while ($calc(%carry_mult * %fake.base) < %pocketmax) { var %carry_mult $v1 | inc %n }
        }
        var %-n 0 - %n , %carry_mult 16 ^ %n
        while (%num != $null) { var %working $base($right(%num,%n),16,10) %working, %num $left(%num,%-n) }
        goto ten_to_outbase
      }
    }
    if (%inbase == 64) {
      if ( %inbase %outbase == 64 16) {
        ; Optimizes Base 64->16 to avoid mult/div/mod
        if ($calc((((4- $len(%num)) % 4)+4) % 4)) var %num $str(A,$v1) $+ %num
        bset -tc &qbase.tmp 1 %num | noop $decode(&qbase.tmp,bm)
        var %out.num $regsubex(foo,$bvar(&qbase.tmp,1-),/([0-9]+) ?/g,$base(\t,10,16,2))
        if (0?* iswmcs %out.num) var %out.num $regsubex(foo,%out.num,^0+(?!$),)
        goto display_it
      }
      if ( %inbase == 64notfaster ) {
        ; Optimizes Base 64->* to avoid mult/div/mod
        if ($calc((((4- $len(%num)) % 4)+4) % 4)) var %num $str(A,$v1) $+ %num
        bset -tc &qbase.tmp 1 %num | noop $decode(&qbase.tmp,bm)
        var %num $regsubex(foo,$bvar(&qbase.tmp,1-),/([0-9]+) ?/g,$base(\t,10,16,2))
        var %inbase 16 | goto continue_from_inbase_64 | goto from64
      }
    }
    if (%inbase == 2) {
      if (%inbase %outbase == 2 16) {
        ; Optimizes Base 2->16 to avoid mult/div/mod replace binary with hex digit after making mult of 4
        if ($calc((((4- $len(%num)) % 4)+4) % 4)) var %num $str(0,$v1) $+ %num
        var %out.num $replacexcs(%num,0000,0,0001,1,0010,2,0011,3,0100,4,0101,5,0110,6,0111,7,1000,8,1001,9,1010,A,1011,B,1100,C,1101,D,1110,E,1111,F)
        if (0?* iswmcs %out.num) var %out.num $regsubex(foo,%out.num,^0+(?!$),)
        goto display_it
      }
    }
    :from64
    if ( %fake.base isnum 2-36) {
      var %n $len($base(%pocketmax,10,%inbase)) - 1 , %-n 0 - %n , %carry_mult %inbase ^ %n
      while (%num != $null) { var %working $base($right(%num,%n),%inbase,10) %working, %num $left(%num,%-n) }
      goto ten_to_outbase
    }
  }
  ; handle the un-optimized inbase=2-64 outbase=2-64
  var %tmp %fake.base , %n 1 - 0
  while ($calc(%tmp * %inbase) < %pocketmax) { var %tmp $v1 | inc %n }
  var %-n 0 - %n , %carry_mult %inbase ^ %n
  while (%num != $null) { var %snip $right(%num,%n) , %buildworking 0 , %i 1 , %num $left(%num,%-n)
    while  ($mid(%snip,%i,1) != $null) { var %digitval $bfind( &qbase.alphabet.in ,   1, $v1 ).textcs - 1 , %buildworking $calc( ( %buildworking * %inbase ) + %digitval ) | inc %i }
    ;while ($mid(%snip,%i,1) != $null) { var %digitval $poscs( %alphabet.in       , $v1,   1 )        - 1 , %buildworking $calc( ( %buildworking * %inbase ) + %digitval ) | inc %i }
    if (%buildworking > %QBase.maxcarry) { var %QBase.maxcarry %buildworking }
    var %working %buildworking %working
  }
  :ten_to_outbase
  var %debug %n %debug time %inbase -> 10:  $calc($ticksqpc - %ticks)  $qt(%working)

  if (%inbase %outbase == 10 10) {
    var %out.num $regsubex(foo,%working,/([^ ]+) ?/g,$right($+($str(0,%n),\t),%n))
    goto display_it
  }
  :next_outbase_digit
  ; strip leading all-zeroes pockets aka $gettok(%working,2-,32)
  while (0 * iswm %working) { var %working $mid(%working,3) }

  var %carry 0 , %i 0 , %tokens $numtok(%working,32) | while (%i < %tokens) {
    inc %i | var %temp $calc( $gettok(%working,%i,32) + (%carry * %carry_mult) )
    if (%temp > %QBase.maxcarry) { var %QBase.maxcarry $v1 }
    var %carry $calc(%temp % %outbase) , %temp2 $calc(%temp // %outbase) , %working $puttok(%working,%temp2,%i,32)
  }
  :handled_digit
  if ((%working %carry != 0 0) || (%out.num == $null)) var %out.num $bvar(&qbase.alphabet.out,$calc(1+%carry)).text $+ %out.num
  if (%working != 0) goto next_outbase_digit
  :display_it
  if (%QBase.maxcarry > $calc(2^53)) { echo -ag 8,4 Warning: probable math error using value above 2^53 %QBase.maxcarry Please let maroon know which number/inbase/outbase produces this $1 $2 $3 n: %n fb: %fake.base | halt }
  if ($len(%out.num) < %pad) var %out.num $str(%out.zero,$calc($v2 -$v1)) $+ %out.num
  if ($5) echo 3 -ag %debug tot $2 -> $3 time:  $calc($ticksqpc - %ticks)  n: %n
  if (%v1 == %v2) noop | return %out.num
  :syntax
  echo -sc info2 *$QBase(number,inbase,outbase [,pad length N][,anything=debug info]) $v1 vs $v2 : fb %fake.base n %n : $1-3
  halt
}

maroon #270964 01/11/22 06:45 PM
Joined: Jan 2004
Posts: 2,127
Hoopy frood
Offline
Hoopy frood
Joined: Jan 2004
Posts: 2,127
Qbase speed doubles!

I tried out something that I'd mentioned in the prior post as an optimization to make Qbase faster, and it really works better than I'd hoped.

The normal method used by $base2 and Qbase to create the display-string is to take the big number in the internal state and whittle it down by continuously dividing it by 10 or 16 or whatever is the outbase. So if the number is N digits long, that forces you to go through that step N times. The main difference between $base2/$base3 and $Qbase is that Qbase groups several digits into chunks so that each of those N digit-shavings has fewer loops to accomplish the task. While it does get faster for the final digits as you've whittled it to a much smaller number, it can still take a lot of time.

The new change is to simply whittle it down by outbase^squared, so you can gather 2 digits at a time instead of just 1. For example, when translating to base16, instead of having a number from 0-15 that's your 1 shaved digit, you now have instead a number 0-255 that you turn into 2 digits. The time needed to shave a digit out of the internal number is pretty much the same as the time to shave 2 digits, regardless if you're dividing by 16 or 256, so this would be close to half the time.

It's closer to 60% because there's a minor bit of extra work to chop the 0-256 into 2 numbers before finding them both in the out-alphabet. Plus, the time is higher than 50% because doing this requires the pockets of the internal state be 1/outbase the size they formerly could be, so for big numbers this can result in there being additional pocket chunks in the chain, and the additional chain link comes with the time needed for an extra loop.

Also this requires a final step at the finish because, now that you're processing outbase digits in pairs, this always results in an even-length string with a leading 0 if the string would've otherwise had an odd-length. So there's a minor extra step to trim off a leading zero.

This means that re-running the benchmark timings for the table in the previous post will have the times in the 'E' column be somewhere from 50-60% of the time listed in the table.

Note that the times in 'E' are greatly affected by the overhead of the scripting engine itself, due to the need to having many commands using $calc in order to simulate something that can be done with a single $calc by the 'B' column except for the ^ operator not working for numbers above 2^256 or so, and the way to get accurate integers at that scale from MAPM is to ramp up the precision for fractions to an extreme level, due to the internal MAPM method of doing division by creating 1/divisor as a long fraction so you can multiply by it, and the extra precision for the longer fraction makes it take longer for both that division and for the multiply-by-it. That can be a shortcut that's not noticeable for floats where any rounding is below the 30 digits of display, but it does affect calculations which need zero precision digits to the right of a decimal but need the left side to be accurate.

Also as an example of the overhead imposed on Qbase by doing extra commands, are the commands where I have 2 places that I'm checking to make sure that %QBase.maxcarry does not exceed another number which would cause an inaccurate doubles math result above 2^53. All this did was "if (var1 > var2) set var2 = var1", and none of this is happening in /bigfloat mode. When i simply commented out those 2 lines in the script, the benchmark time dropped by around 5%

So, I believe that if $base is internally doing things similar to how is done in column "B" (which is $base3 using $calc instead of the pow_3 alias scripted replacement), and could switch to using things more similar to column "E", that this would also speed up handling of large numbers. For the bottom row of the benchmark table that's using a 2048-bit number, these 2 changes changed the benchmark time from 1400 to 720.

There would be diminishing returns from trying to shave off 3 digits at a time, since the gain from 1/1 -> 1/2 is greater than from 1/2 -> 1/3, and processing 3 digits at a time instead of 2, is more than 3/2 the complication.

---

Below is the updated Qbase, where the changes mentioned here are speeding things up - only if you have $3 be like 10x or 16x instead of just the integer. I've also commented-out the double-check to see if an internal variable is exceeding 2^53, so you won't see that warning unless you find the comments having the '+++' and change these back to being not-comments.

--
Code
; only alias called by $Qbase is the validation at top
; $Qbase(number,inbase,outbase,zeropad,1=debug)
; bases can be 2-64 or 85, with zero being '0' only in base 2-36
; if $3 outbase contains 'X' then enables example speed optimizations for certain in/out combos
; strict enforcement of N-digit input number if $3 contains X or if either inbase/outbase > 36
alias qbase_validate_alphabet { var %pattern $+(^[,$2,]+$) | return $regex(foo,$1,%pattern) }
alias qBase {
  var %num $1 , %inbase $remove($2,X) , %outbase $remove($3,X) , %pad 0 $+ $4 , %v1 $v1 , %v2 $v2
  if (%inbase == 16) var -s %num $remove(%num,:,$cr,$lf,$chr(32))
  ;  clipboard %num
  if (($replace(%inbase,85,2) !isnum 2-64) || ($replace(%outbase,85,2) !isnum 2-64) || (%pad !isnum 0- $maxlenl) || (%num == $null)) goto syntax
  var %alphabet.36 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
  var %alphabet.52 ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
  var %alphabet.62 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
  var %alphabet.64 ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
  var %alphabet.85 !&'*+-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{}~

  if (!$regex(foo,%num %inbase %outbase %pad,^[A-Za-z0-9/+]+ [0-9]+ [0-9]+ [0-9]+)) goto syntax
  if     (%inbase isnum  2-36) var %alphabet.in %alphabet.36 , %fake.base 36 , %num $upper(%num)
  elseif (%inbase ==       52) var %alphabet.in %alphabet.52 , %fake.base 52
  elseif (%inbase isnum 37-62) var %alphabet.in %alphabet.62 , %fake.base 62
  elseif (%inbase isnum 37-62) var %alphabet.in %alphabet.85 , %fake.base 85
  else                         var %alphabet.in %alphabet.64 , %fake.base 64
  if (X isin $3) var %alphabet.in $left(%alphabet.in,%inbase) , %fake.base %inbase
  if ((%inbase == 16) && (0x* iswm %num)) var %num $mid(%num,3)
  if (!$qbase_validate_alphabet(%num,%alphabet.in)) goto syntax

  if (%outbase > %fake.base) var %fake.base $v1
  if     (%outbase isnum  2-36) var %alphabet.out %alphabet.36
  elseif (%outbase ==       52) var %alphabet.out %alphabet.52
  elseif (%outbase isnum 37-62) var %alphabet.out %alphabet.62
  elseif (%outbase isnum    85) var %alphabet.out %alphabet.85
  elseif (%outbase isnum 63-64) var %alphabet.out %alphabet.64
  ; for alphabets beginning with not-zero, use 1st digit of alpabet
  var %out.zero $left(%alphabet.out,1)

  bset -tc &qbase.alphabet.in  1 %alphabet.in
  bset -tc &qbase.alphabet.out 1 %alphabet.out

  ; support size of max(fakebase,inbase,outbase)
  var %squared.outbase 1
  if (X isin $3) var %squared.outbase %outbase
  var %carry , %working , %out.num , %pocketmax $calc( 9007199254740991 / %squared.outbase // %fake.base )
  ; all lines mentioning maxcarry are just debug to make sure didn't overflow 2^53
  var %QBase.maxcarry 0 , %ticks $ticksqpc , %debug

  if (x isin $3) {
    if ( %inbase  == %outbase) {
      ; faster string manipulation for commonly used zero padding like $base(num,16,16,8)
      var %out.num %num
      if (%out.zero $+ ?* iswmcs %out.num) var %out.num $regsubex(foo,%out.num,^ $+ %out.zero $+ +(?!$),)
      goto display_it
    }
    if (%inbase == 10) {
      ; Optimizes Base 10->anything to avoid mult/div/mod
      var %n $len(%pocketmax) - 01 , %-n 0 - %n , %carry_mult 10 ^ %n
      while (%num != $null) { var %working $right(%num,%n) %working, %num $left(%num,%-n) }
      goto ten_to_outbase
    }
    :continue_from_inbase_64
    if (%inbase == 16) {
      if (%inbase %outbase == 16 64) {
        ; Optimizes Base 16->64 to avoid mult/div/mod 6@4 -> mime 4@6 bytes
        if ($calc((((6- $len(%num)) % 6)+6) % 6)) var %num $str(0,$v1) $+ %num
        bset -c &qbase.tmp 1 $regsubex(foo,%num,/(..)/g,$base(\t,16,10) $+ $chr(32) )
        noop $encode(&qbase.tmp,bm) | var %out.num $bvar(&qbase.tmp,1-).text
        if (A?* iswmcs %out.num) var %out.num $regsubex(foo,%out.num,^A+(?!$),)
        goto display_it
      }
      if (%inbase %outbase == 16 2) {
        ; Optimizes Base 16->2 to avoid mult/div/mod
        var %out.num $replacexcs(%num,0,0000,1,0001,2,0010,3,0011,4,0100,5,0101,6,0110,7,0111,8,1000,9,1001,A,1010,B,1011,C,1100,D,1101,E,1110,F,1111)
        if (0?* iswmcs %out.num) var %out.num $regsubex(foo,%out.num,^0+(?!$),)
        goto display_it
      }

      if (%inbase == 16) {
        ; Optimizes Base 16->10 to avoid mult/div/mod
        ; if     (%fake.base isnum  2-16) var %n 12
        ; elseif (%fake.base isnum 17-36) var %n $len($base(%pocketmax,10,16)) - 1
        if     (%fake.base isnum  2-36) var %n $len($base(%pocketmax,10,16)) - 1
        else {
          var %carry_mult %fake.base , %n 1
          while ($calc(%carry_mult * %fake.base) < %pocketmax) { var %carry_mult $v1 | inc %n }
        }
        var %-n 0 - %n , %carry_mult 16 ^ %n
        while (%num != $null) { var %working $base($right(%num,%n),16,10) %working, %num $left(%num,%-n) }
        goto ten_to_outbase
      }
    }
    if (%inbase == 64) {
      if ( %inbase %outbase == 64 16) {
        ; Optimizes Base 64->16 to avoid mult/div/mod
        if ($calc((((4- $len(%num)) % 4)+4) % 4)) var %num $str(A,$v1) $+ %num
        bset -tc &qbase.tmp 1 %num | noop $decode(&qbase.tmp,bm)
        var %out.num $regsubex(foo,$bvar(&qbase.tmp,1-),/([0-9]+) ?/g,$base(\t,10,16,2))
        if (0?* iswmcs %out.num) var %out.num $regsubex(foo,%out.num,^0+(?!$),)
        goto display_it
      }
      if ( %inbase == 64notfaster ) {
        ; Optimizes Base 64->* to avoid mult/div/mod
        if ($calc((((4- $len(%num)) % 4)+4) % 4)) var %num $str(A,$v1) $+ %num
        bset -tc &qbase.tmp 1 %num | noop $decode(&qbase.tmp,bm)
        var %num $regsubex(foo,$bvar(&qbase.tmp,1-),/([0-9]+) ?/g,$base(\t,10,16,2))
        var %inbase 16 | goto continue_from_inbase_64 | goto from64
      }
    }
    if (%inbase == 2) {
      if (%inbase %outbase == 2 16) {
        ; Optimizes Base 2->16 to avoid mult/div/mod replace binary with hex digit after making mult of 4
        if ($calc((((4- $len(%num)) % 4)+4) % 4)) var %num $str(0,$v1) $+ %num
        var %out.num $replacexcs(%num,0000,0,0001,1,0010,2,0011,3,0100,4,0101,5,0110,6,0111,7,1000,8,1001,9,1010,A,1011,B,1100,C,1101,D,1110,E,1111,F)
        if (0?* iswmcs %out.num) var %out.num $regsubex(foo,%out.num,^0+(?!$),)
        goto display_it
      }
    }
    :from64
    if ( %fake.base isnum 2-36) {
      var %n $len($base(%pocketmax,10,%inbase)) - 1 , %-n 0 - %n , %carry_mult %inbase ^ %n
      while (%num != $null) { var %working $base($right(%num,%n),%inbase,10) %working, %num $left(%num,%-n) }
      goto ten_to_outbase
    }
  }
  ; handle the un-optimized inbase=2-64 outbase=2-64
  var %tmp %fake.base , %n 1 - 0
  while ($calc(%tmp * %inbase) < %pocketmax) { var %tmp $v1 | inc %n }
  var %-n 0 - %n , %carry_mult %inbase ^ %n
  while (%num != $null) { var %snip $right(%num,%n) , %buildworking 0 , %i 1 , %num $left(%num,%-n)
    while  ($mid(%snip,%i,1) != $null) { var %digitval $bfind( &qbase.alphabet.in ,   1, $v1 ).textcs - 1 , %buildworking $calc( ( %buildworking * %inbase ) + %digitval ) | inc %i }
    ;while ($mid(%snip,%i,1) != $null) { var %digitval $poscs( %alphabet.in       , $v1,   1 )        - 1 , %buildworking $calc( ( %buildworking * %inbase ) + %digitval ) | inc %i }
    ;debug warning +++ if (%buildworking > %QBase.maxcarry) { var %QBase.maxcarry %buildworking }
    var %working %buildworking %working
  }
  :ten_to_outbase
  var %debug %n %debug time %inbase -> 10:  $calc($ticksqpc - %ticks)  $qt(%working)

  if (%inbase %outbase == 10 10) {
    var %out.num $regsubex(foo,%working,/([^ ]+) ?/g,$right($+($str(0,%n),\t),%n))
    goto display_it
  }

  if ((X isin $3) && (%outbase isnum 2-36)) {
    var %outbase^2 %outbase ^ 2
    :next_outbase_digitsq
    ; strip leading all-zeroes pockets aka $gettok(%working,2-,32)
    while (0 * iswm %working) { var %working $mid(%working,3) }

    var %carry 0 , %i 0 , %tokens $numtok(%working,32) | while (%i < %tokens) {
      inc %i | var %temp $calc( $gettok(%working,%i,32) + (%carry * %carry_mult) )
      ;debug warning +++ if (%temp > %QBase.maxcarry) { var %QBase.maxcarry $v1 }
      var %carry $calc(%temp % %outbase^2) , %temp2 $calc(%temp // %outbase^2) , %working $puttok(%working,%temp2,%i,32)
    }
    :handled_digitsq
    if ((%working %carry != 0 0) || (%out.num == $null)) {
      var %dig1 $calc(%carry % %outbase)
      var %dig2 $calc(%carry // %outbase)
      var %out.num $bvar(&qbase.alphabet.out,$calc(1+%dig1)).text $+ %out.num
      var %out.num $bvar(&qbase.alphabet.out,$calc(1+%dig2)).text $+ %out.num
      ; var %out.num $bvar(&qbase.alphabet.out,$calc(1+%carry)).text $+ %out.num
    }
    if (%working != 0) goto next_outbase_digitsq
    if (%out.zero $+ *? iswm %out.num) var %out.num $mid(%out.num,2)
    goto display_it
  }
  else {
    :next_outbase_digit
    ; strip leading all-zeroes pockets aka $gettok(%working,2-,32)
    while (0 * iswm %working) { var %working $mid(%working,3) }
    var %carry 0 , %i 0 , %tokens $numtok(%working,32) | while (%i < %tokens) {
      inc %i | var %temp $calc( $gettok(%working,%i,32) + (%carry * %carry_mult) )
      ;debug warning +++ if (%temp > %QBase.maxcarry) { var %QBase.maxcarry $v1 }
      var %carry $calc(%temp % %outbase) , %temp2 $calc(%temp // %outbase) , %working $puttok(%working,%temp2,%i,32)
    }
    :handled_digit
    if ((%working %carry != 0 0) || (%out.num == $null)) var %out.num $bvar(&qbase.alphabet.out,$calc(1+%carry)).text $+ %out.num
    if (%working != 0) goto next_outbase_digit
  }
  :display_it
  ;debug warning +++  if (%QBase.maxcarry > $calc(2^53)) { echo -a 8,4 Warning: probable math error using value above 2^53 %QBase.maxcarry Please let maroon know which number/inbase/outbase produces this $1 $2 $3 n: %n fb: %fake.base  %working | halt }
  if ($len(%out.num) < %pad) var %out.num $str(%out.zero,$calc($v2 -$v1)) $+ %out.num
  if ($5) echo 3 -ag %debug tot $2 -> $3 time:  $calc($ticksqpc - %ticks)  n: %n
  if (%v1 == %v2) noop | return %out.num
  :syntax
  echo -sc info2 *$QBase(number,inbase,outbase [,pad length N][,anything=debug info]) $v1 vs $v2 : fb %fake.base n %n : $1-3
  halt
}

maroon #271181 08/01/23 02:07 PM
Joined: Jan 2004
Posts: 2,127
Hoopy frood
Offline
Hoopy frood
Joined: Jan 2004
Posts: 2,127
Thought I'd posted this, but can't find it when trying to refer someone to it...

Thanks to Talon for pointing out that Qbase wasn't actually working for inbase=85, just a few lines to change in the last version of the Qbase script:

Code
old:
alias qbase_validate_alphabet { var %pattern $+(^[,$2,]+$) | return $regex(foo,$1,%pattern) }
new:
alias qbase_validate_alphabet { var %pattern $+(^[\Q,$replacexcs($2,\E,\E\\E\Q),\E]+$) | return $regex(foo,$1,%pattern) }

old:
elseif (%inbase isnum 37-62) var %alphabet.in %alphabet.85 , %fake.base 85
new:
elseif (%inbase isnum    85) var %alphabet.in %alphabet.85 , %fake.base 85

old:
if (!$regex(foo,%num %inbase %outbase %pad,^[A-Za-z0-9/+]+ [0-9]+ [0-9]+ [0-9]+)) goto syntax
new:
if (!$regex(foo,%num %inbase %outbase %pad,^\S+ [0-9]+ [0-9]+ [0-9]+)) goto syntax



Link Copied to Clipboard