$Qbase $base3

Below are 2 aliases related to $base2 which may help $base itself be faster in general, or can show $base how to optimize speed for certain inbase/outbase combos. Qbase is a demo showing how $base can perform accurate translations between bases without ever touching a number larger than 2^53, and demonstrates how it's easy to extend support to more bases than 2-36.

* *

The first alias is $base3, which is much faster than $base2, but accomplishes that by replacing most of the doubles-mode-only aliases with $calc itself doing math in bigfloat mode. That means $base3 can return correct results only if using beta 1603+later. $base3 still needs to use an alias to avoid using the ^ operator because that can still produce wrong results for large outputs, specifically those longer than 2^256-or-so.

* *

The 2nd alias is Qbase, which like $base2 is compatible with doubles-only past versions, so $base2 and $qbase can both return accurate results from v7.71-no-beta before $bigfloat existed.

However, Qbase uses a different design than the design of $base2 and $base3, it can be faster in doubles mode than $base2.

Even though $qbase does not use any bigfloat features, it is competitive with the speed of $base3-pow3, even though $base3 is able to substitute a single $bigfloat $calc() in place of many doubles-range $calc() being done by $qbase. So, perhaps there are things in the $qbase design which could help bigfloat mode $base become faster.

The original idea for $base3 was to make $base2 more speed competitive, and I assumed I could modifying the $bf_modpow script to calculate the numbers the same way except without using a modulus. I hoped that would let me use it as an alias that I could hotswap it in place of the $pow_ alias used by $base2, which seemed to be the source of the $base2 slowness and the rounding problems in $base. To make it doubles-safe, pow_3 needed to use the other 'multiply_' alias used by $base2. However some of the other aliases used by multiply_ were designed to only handle when 1 of the terms was small, so when they were fed the square-and-multiply output from $pow_3, they couldn't handle it.

So instead, I did a further make-over by keeping pow_3 as a replacement for the problematic ^ operator, but used bigfloat $calc math in place of the other doubles-only aliases. So now $base3 uses only pow_3 as an alias, and only does that because the ^ operator in $calc isn't always correct. As soon as beta is published where ^ is fixed, then $qbase can easily jump to use that instead.

Quote
Possible fix for $calc ^

When the exponent is an integer, it's possible that the $calc ^ operator using something like pow_3 as the back-side would be fast and accurate because there's nothing non-integer about this, it's just multiplying and then checking to see if all the bits of the exponent are used up. The native code likely could replace the $calc(%base.bf * %base.bf) with using m_apm_square as I mentioned for $powmod. Note that this is just quick validation that doesn't account for negatives. For that, base^1/integer could use 1/(base^integer).

Quote
alias calc_^_operator return pow_3($1,$2)

alias pow_3 {
var %base.bf $1 , %exponent.bf $2 , %answer.bf 1
if (%exponent.bf < 2) { if ($2 == 0) return 1 | if ($2 == 1) return $1 }
if (%base.bf < 2) return $v1
while (%exponent.bf) {
if ($calc(%exponent.bf % 2)) var %answer.bf $calc(%answer.bf * %base.bf)
var %exponent.bf $calc(%exponent.bf //2)
if (%exponent.bf) var %base.bf $calc(%base.bf * %base.bf)
}
return %answer.bf
}

As demo programs, $base3 and $Qbase also show how $base can easily be modified to support additional 'number bases'.

There have occasionally been scripters having the need for something that can compress a lot of bits into a shorter text string. For one example, they had a huge database that was too large to keep in memory, so they wanted to keep hash(long strings) in ram and put the full data on disk. Then when they need to search if they have something, they can search their index in ram to see if there's a matching hash, and if so, they can do the slow disk read to see the data associated with their match, rather than searching a huge disk file for every no-match.

They wanted an index that was as condensed as possible, so they made a hex hash that they used $base to translate to base36, but that was still 'too big', so they were wanting something even more condensed.

So, $base3 and $Qbase show how to translate to/from all bases from 2 through 64 without much difference from the code that doesn't support it. The most complicated part was handling the fact that '0' isn't zero in all of those bases, and that wasn't difficult at all. And I could easily have supported bases larger than 64 by simply defining a bigger alphabet for those larger bases, and I did to an example alphabet for base 85. For base 2-36 I continued letting it have the quirky behavior of letting all 36 bases use all 36 letters in the input string, but since base37+ need to use the lower-case alphabets, those bases can't be case-insensitive inbases like how $base handles 2-36 now.

For base 63-64, I let them use the same alphabet that $encode base64 does, so you can translate between hex and mime by using base 16 and 64, but only when the length of input base=16 is a multiple of 6 or input base=64 string is a multiple of 4.

I interrupted the bases 37-62 so that 52 would use a case-sensitive alphabet containing only the upper+lower alpha characters, which is another case where zero is not '0'.

As with $base2, these aliases support ONLY integers correctly. For Qbase I added tighter restriction for input values, so they reject anything that's not in whatever alphabet they're using, because that lets me add cheater shortcuts. So floats are invalid because '.' isn't in the alphabet.

Quote
Possible alternate way for $base to handle fractions!

My assumption is that the vast majority of usage of $base against large numbers is when they are integer, and any use of $base where a fraction is wanted in another base is most likely going to be for base=10 and hex, so it seems reasonable that $base can have an optimized handler for integers, and the input float strings could get treated like red-headed step-children.

However, if it's not already doing this, the built-in $base can be easily modified to handle big floats efficiently by handling the input as 2 separate string manipulations. It could use the Qbase style to handle $gettok(input,1,46). If there's no decimal following it, then that's the only string to return, barring any zero padding.

If $base isn't doing so already, a fast way to translate digits of a float between bases is to treat the float number Integer.Frac1 as 2 separate integer base conversions, which enables $base to pretend it doesn't even know what a fraction is, while still outputting accurate fractions.

The 'Integer' portion could be converted between bases as if it were the entire input, then the input fraction to the right of the decimal could be adapted based on the inbase/outbase, and then translated from the intermediate number into a 'fake integer' encoded as the outbase.

This method could easily be handled by an alias that would call $Qbase to get the conversion for the Integer portion, and then if there's a fraction, the alias could do big-integer math to modify the fraction string as demonstrated below, and then call $Qbase again to translate that also into the new base. The alias would then return %result1 if no fraction, or return $+(%result1,.,%result2) if there is.

The method is to simply treat Frac1 as an inbase integer whose length is Frac1D. If Inbase is not 10, then Frac1 would need to be translated to base 10. The calculation would be slightly different than mentioned if the input is a negative.

The executable code would do this slightly differently if they use an internal format that's not base 10, but the principle is the same. After calculation of...

Frac2 = $round($calc(Frac1 * (inbase^Frac1D) / outbase^Frac1D),0)

you can then do integer base conversion against Frac2 as if an integer string that's appended after the decimal. Example:

Quote
$pi .bf = 3.14159265358979323846264338327950288419716939937510
%Integer = 3
%Frac1.bf = 14159265358979323846264338327950288419716939937510
Frac1D = 50
Inbase = 10
Outbase = 16

%Frac2.bf = $calc(14159265358979323846264338327950288419716939937510 * 16^50 // 10^50)

%result_is.bf: 227530621841023045347523864218926005325435324977874670807984
%should_be.bf: 227530621841023045347523864218949028648034440140275111685147

The above %result_is.bf calculation will almost certainly return the correct answer along with the fix of $calc's ^ operator. So treat the %should_be.bf result as an integer that needs to be translated from base10 to base16

Quote
%fraction.bf = . $+ $base(%should_be.bf,10,16)

//var -s %fraction.bf = . $+ $base(227530621841023045347523864218949028648034440140275111685147,10,16)

%fraction.bf is: .243F6A8885A308D313198A2E03707344A40938222771A7EC1B
true hexdigit::: .243F6A8885A308D313198A2E03707344A4093822299F31D0082EFA98

This 'integer' is small enough to be within the accuracy range of $base(), so comparing the base conversion of %fraction.bf with the actual hex digits of PI found in Blowfish source code, shows that the first 41 digits of the fake base16 fraction pseudo-integer are accurate compared to a 50-digit base10 fraction. Which of course is reasonable since it's now in a higher base, just like translating from base10 to base2 would have many more accurate digits. The expected accuracy when converting a 50-digit base10 fraction to base 16 would be

50 * $log(10) / $log(16) = 41.52 digits, so the above has the expected level of precision, indicating this cheat is legit.

Unfortunately, both $base2 and $qbase generate the output by generating the leftmost digit last of all, so you can't just stop the .fraction result when you have enough digits, except perhaps from using (Frac2 // (outbase^something) ) to shorten the result you feed to $base, but I've not tested how that affects the output, such as whether that should be division rounded to 'nearest integer' or just chopped.

* *

$qbase could potentially be a different way for $base to translate large integers between bases accurately without doing any math involving a number outside the doubles range. In fact, I have debugging code inside the alias to detect if it ever ends up using a number larger than 2^53, which would mean the answer is almost surely wrong, and I need to fix something.

Any native code that translates between bases would likely use either the $base2 method or the $qbase method, except the executible's intermediate state between inbase and outbase would be a native data-type such as uint32, because it just degrades performance if it needs to translates an integer to double or 'text' just to do math on it, before then translating back to uint32.

The steps to translating between bases are to translate the input number out of 'inbase' into whatever format it needs to have for its internal intermediate state, before then translating that internal state into whatever is the outbase, 1 digit at a time.

For $base2 and $base3, the internal state is a base10 string identical to what outbase=10 would use, so the internal state gets a free 'i am done' when outbase=10.

For $qbase, the internal state is a series of chunk pockets chained together where each holds a base10 string whose value can be as large as inbase^N, and are chunks of the actual output like $base2 does - only if both inbase=outbase=10. But that's of little help since below shows there's a much faster way to optimize that combo when it doesn't need to support $base(ZZ,10,10)

The longest part of the base conversion for both methods is converting the input number from the 'inbase' to the internal storage format.

Qbase does things somewhat differently than $base2 does, more like a pen-and-paper long division, except each 'digit' in the 1st number is now itself a number that clusters together several inbase digits without having ( (number_A + carry) / number_B) being a result above 2^53. This has 2 advantages over using m_apm_integer_pow or base2's pow_ calculations against huge numbers.

One advantage is that it doesn't need to use exponents in order to correctly handle very large numbers. It uses the add/subtract/multiply/divide/modulo calculations done by $base2 and $base3, but it always uses them against small-enough numbers whose results that are not too large to overflow whatever is the datatype used internally, which for this script is 2^53, and for the executable would probably be uint32.

Also, by grouping several input digits together, this means there can be fewer than 1/10th as many links in the calculation chain in order to arrive at each digit. You can look at this another way, from the example of conversion to base 16, where you need to keep dividing a big number by 16 in order to get each digit. You could instead divide by 256's and end up with something you can turn into two hex digits, which means can get the 2 digits from the same price as needed to get just the 1 digit, and you'll finish quicker.

In fact, that's an optimization I hadn't yet tried to do. However, my method allows each pocket to hold numbers very close in size to 2^53, while the method of grabbing several output digits at the same time would limit the size of the number in each pocket chunk. It would also need to sanitize the final number to avoid having leading 0's that shouldn't be there.

When printing a number to outbase=36 that's stored internally as a 100-digit integer, $base3 - and probably $base itself - perform a single $calc(big-integer % 36) calculation to obtain that first digit of the base36 result. $base2 would use 100 divisions in order to get that same result. $Qbase instead stores that 100 digit number as somewhere around 8-9 different chunks depending on the inbase, and would only need those 8 or 9 $calc(double % 36) to get the same result. The $qbase timing is competitive with the $base3 time, in spite of being penalized by the overhead of using many more extra scripting commands, and there's not much difference in the timing between $calc(small % 10) or $calc(big % 10).

I believe that native code doing big-math operations like $base3 does could be faster if it switched to a series of uint32 calculations of the style Qbase is doing, instead.

Since $base appears to be affected by the same thing that caused problems for the $calc ^ operator, it looks like $base could benefit from doing something like $qbase does, which internally does several small calculations instead of 1 really big one.

The speed advantages for the $base3 vs $qbase method can be different for different lengths. The $base3 calculations begins to overcome the speed of $qbase in the benchmark table below when the string size becomes really large, but again, that might simply be a penalty from the scripting engine. As an exampe of the relative speeds:

number = $sha512(abc) base16 -> 10

$base2 = 11700 ticks
$base3 = 15 ticks using $calc ^ with errors above 2^260-ish
$base3 = 79 ticks using scripted pow_3 in place of ^ exponent
$qbase = 115 ticks

But at 128-bit length, $qbase had the speed advantage over $base3, and the $Qbase disadvantage at 512 could disappear if both alternatives were as C code.

I'm not ragging on $base2, especially since $base3 clones its design exactly except for the subroutine aliases getting called. Just noting the comparison that if $base is using elements of a method similar to $base2 or $base3 that can be switched to the Qbase method, that might speed things up for $base while avoiding any any exponentiation slowdowns.

* *

Quote
Qbase speedups

Qbase also has some additional features not seen yet in $base, and some of them could be grafted into $base:

* Support for additional bases up to 64 (And I could have added a base94 alphabet to use all the characters from '!' through '~' but it can be tricky for scripts to deal with numbers that could translate to a base94 string like an actual pair of double quotes, a single | character etc. Included in Qbase is a sample 85-item 'base' that avoids using most of the symbols having special evaluation by mIRC, and is the right size of an alphabet to translate between 5 text and 4 binary. This means if you convert from hex to base85, each 8 hex digits is zero padded to 10 base85 digits, and I could have included an optimization to handle conversion between base=10 and base=85 in chunks of these sizes.

* Optional 'enhanced mode', which disallows invalid characters that can't be generated when that inbase is used as the outbase, such as $base(zzz,2,10)

* Within 'enhanced mode', certain inbase/outbase combos can be optimized to be incredibly faster instead of using the default method that needs to support $base(zzz,13,34)

The alias would have been much shorter if the only thing it did was to support bases 2-36 in the mode where it allowed all 36 alphabets to use the same length 36 alphabet. But, Qbase shows there are 2 kinds of speed-up advantages, which can allow $base to increase speed for the 99.9% of $base users who don't need to support invalid numbers when trying to use Z in a base smaller than 36, and who don't need to use the same slow method of translating between bases 10/16 as it would for translating between 17/31.

One kind of optimization depends on specific in/out pairs, but there's also a minor speed benefit by being able to squeeze more digits of the input into each middle-step 'pocket' if you don't need to support smallest inbase numbers potentially having 'Z' be valid.

* *

Normal mode $Qbase(number,10,16) -> Enhanced Mode $Qbase(number,10,16X)

To enter the 'enhanced mode' which enables various speed optimizations, I just chose to make Qbase remove 'X' from $3 when using that as outbase, but then look to see if $3 contains the 'X' when checking to see if it can enter enhanced mode. But, I could've easily just had that be an extra parameter. So $qbase(ZZ,2,10) is still valid, but $qbase(ZZ,2,10X) quits with an error because of the illegal Z character not found in the 2 items of the base=2 alphabet.

I continued using base 2-36 as having the existing 0-9A-Z alphabet. For my additional alphabets, I made 64 use the same alphabet as mime does, but it generates the identical string that the mime in $encode does if the output is a multiple of 4.

$base limits the zero-pad to 100, but since Qbase uses string manipulation, the sky's the limit.

I made base=62 be an alphabet the same as base=36, except where the extra 26 digits is the entire LOWER CASE alpabet, so just like with base64, all bases 37-64 would need to be handled as case-sensitive strings. Bases 37-62 would use anywhere from the single lowercase 'a' in base 37 through all 26 lowercase letters in base 62. For base 63, should anyone actually want to use that, it would be the first 63 digits of the base64 alphabet, since there wasn't a better choice offhand. Though the script can easily be designed to have any alphabet you want for base63.

So, a random alphanumeric string in base36 would be

//var -s %a.bf $rand(0,$calc(36^8 -1)) , %password $qbase(%a.bf,10,36,8)

... and in base62 it would be:

//var -s %a.bf $rand(0,$calc(62^8 -1)) , %password $qbase(%a.bf,10,62,8)

Since there was no backwards compatibility to be maintained for the new Qbases, I made it so that using any of the 37-64 bases has the same behavior as appending 'X' to the outbase, which is to enforce that the input number contains only the N digits of an alphabet matching the N used as inbase. This means that $base(Z,inbase,52) would not be valid with inbase 2-35 because using base 37-64 as either the inbase or outbase flips Qbase into the mode having the strict rules for the input alphabet.

As an override, I gave a different behavior for base=52, which does not contain any numbers, but instead contains the uppercase A-Z followed by the lowercase a-z. So 52 is the only number in the 37-62 bases that doesn't share the same alphabet as its neighbors.

There's nothing in Qbase that prevents it from working if you change the max size for a valid alphabet above 64 and then define the digits of another alphabet to match the new size. You could also do similar as I did with base=52 to define base=32 to match the base32 alphabet used in $encode, or even to make base=64 match the uuencode alphabet in $encode, though you'd need to edit the optimization that uses $encode(,m).

A sample alphabet above base=64 is in Qbase, which has an 85-item alphabet which starts with the 94 characters from "!" through "~" then removes some of them that are likely to cause problems in string evaluations, and after removing "#$%(),\| the remaining characters are an alphabet that could store a number in extremely dense format without being binary or UTF8.

The alias is longer than it would normally be, partly to support the $3 $+ X mode for alphabets needed for base 37-64, as well as additional code as a demo for the kind of optimizations that can be used in $base itself, but only if they're in 'enhanced mode' which invalidates 'illegal' input strings.

* The best example of optimizing $base is:

For example, it's common practice by scripts to do $base(number,10,10,length) where they simply want to zero-pad '0' digits to make smaller integers be at least as long as 'length'. Just like it's commonly used for hex numbers, to pad them to length 8 to make them look like a 32-bit number, etc.

However, $base current behavior always supports the possibility the input could have illegal alphabet letters that need to be given values 10-35, so $base is always translating 10-to-10 in a mode that translates the input number from base10 to internal binary, then decodes that to the output format, just in case someone wanted $base(Z,10,10,5) to be output as 00035. And only at that point does $base look to see if it needs to insert some extra zero padding. You can also see the same thing going on when handling long hex strings, where $base($sha512(abc),16,16) was always returning the original hex string with a lot of zeroes.

When in the enhanced 'X' mode, if $Qbase sees that inbase and outbase are the same numbers, it simply jumps immediately to the 'display it' line, then checks to see if it should use string manipulation to insert some text '0' in front of it. And the padding is not really the '0', it's whatever is the 1st character of the output alphabet, so where outbase 37-64 have capital 'A' as the 1st digit of their alphabet, those outbase would insert 'A' padding instead. And the demo outbase=85 would insert '!' as padding.

The only things my demo doesn't do is strip leading zeroes from $Qbase(00009,10,10X,3) or make sure base 11-36 strings are uppercase, but that could also be solved by an extra step of string manipulation.

This is an optimize that $base could easily make if it had an enhanced mode and recognized matching bases or certain other in/out pairs that can take advantage of shortcuts that don't require lots of calculations. $base doesn't appear to be doing so, since $base(number,10,10) takes the same time as when changing one of the number digits to a letter.

- -

Some of the other optimizations in the script are things that would be easily adaptable to executable code, since all they do is string manipulation. For example, when converting between base 16 and 2 - other than ensuring 2 as the inbase needs to zero-pad the input number to have a length that's a multiple of 4, going either direction between 16 and 2 can be a simple translation between a hex digit and a lookup-table of 16 items containing a group of 4 text 0/1 chars, and the speed for this is extremely fast both directions compared to translating large numbers between bases 2-to-15 or 16-to-3.

An optimize that Qbase couldn't take advantage of, but that would be available for executable code, would be to translate digits of an inbase=base16 string into the 4-bit 'nibbles' of uint32 binary storage, without needing to do all the multiplication across 'pockets' that would be required for other bases. All it would need to do is pretend there are the needed leading '0' hex digits that would make the input length be compatible with the unit size of the internal binary storage variables.

Inside Qbase i'm using $base itself for quick translation between bases because things like that would be available to executable code, and I'm also using it as a more accurate equivalent than the 6-digit output of $log for determining the max number of digits that can fit in each 'pocket'. For bases 37-64 I can't use that shortcut, so for those I just count how many times I can multiply a number by 'inbase' without overflowing the doubles limit, and for executable code this is probably faster than calculating $log, unless they're able to access a math chip.

For inbase=64 I can 'cheat' and use the mime code from $encode. All I need to do is insert enough 'A' padding to make the input string be a length that's a multiple of 4, and $encode will 'mime' it into being a &binvar that's exactly the same way that a huge number would be stored in 'network' order aka 'big-endian'. Unfortunately, while it's a wildly faster shortcut for the inbase=64 outbase=16 pair, it didn't turn out to be a help when base64 is paired with the other outbases. If $base added more valid inbase/outbase to include a base64 that matches the mime alphabet, it wouldn't need to call the mime subroutine, but the way it would handle base64 input can be 99% identical to translating inbase=64 to the internal byte storage.

While $base2 has the instant optimization available due to the internal storage being identical to outbase=10, none of my optimizations are able to have a quick way of translating from the array chain of 'pockets' of the internal storage, except for inbase=outbase=10 when not using the string manipulation shortcut due to not being in $3=X mode. So the only optimizations I have are the kind which can speed up putting the numbers into the internal storage, or which bypass the internal storage entirely through string manipulation or doing fake mime etc.

While Qbase obviously cannot be nearly as fast as the built-in identifier, especially considering it uses $base to 'do its thing', I'm hoping that some of the ideas can be applied to executable code used by $base to make it faster, if $base is doing some things like $base2 does which can be sped up by using a method similar to how $qbase does them.