mIRC Home    About    Download    Register    News    Help

Print Thread
Joined: Jan 2004
Posts: 1,949
maroon Offline OP
Hoopy frood
OP Offline
Hoopy frood
Joined: Jan 2004
Posts: 1,949
Due to the significant time involved in calculating them using an iterative script, I'm hoping that at least 1 additional identifier common to big integer libraries, perhaps more, could be made available.

This 1st one is almost exactly like the m_apm_integer_pow_nr functions already in the MAPM library, and is designed to work only with integer inputs.

These would be functions for integer-only, that do not want or need any fractions, as calling any function designed to support fractions would just slow these down.

If one of the forum members is able to code an additional integer function into this MAPM library's language, and can PM me, I can explain the other identifier I have in mind, that I hope would be simple for someone who knows C coding better than I do. I also have a simple alias that can perform the other function's calculation too. The other alias is not much longer than the example alias I posted at the bottom here, but it also needs to loop an iteration many times, and as a script it's a lot slower than it should be. It only needs to replicate the $calc functions for add/subtract/multiply/divide/modulo

--

The primary function I'm interested in, I'm hoping would be easy for Khaled to modify an existing MAPM function into the new $identifier. It looks it would be almost exactly like the m_apm_integer_pow_nr in mapm_pwr2.c, except for a minor difference, which is that there would be a 3rd positive integer functioning as the modulus that would be used in 2 places inside the loop, but I don't understand how to insert it. It looks like the modulo operation would use m_apm_integer_div_rem but I'm not sure how.

Inside the source code on github, the function.ref file describes M_pow_mod, which is exactly what I'm looking for, but when I search the rest of the source code I can't find it mentioned anywhere else. This does look unusual, in that it uses the parameters as (exponent,base,modulus) instead of the normal (base,exponent,modulus)

Hopefully the cause of $calc(2^103) returning the wrong result is using m_apm_integer_pow instead of m_apm_integer_pow_nr, and switching to the other function would fix the $calc ^ operator when used with integers. (nr = not rounding)

Below is a copypaste of the m_apm_integer_pow_nr function in mapm_pwr2.c, where I've indicated the 2 places where the 'r' result be would replaced by the remainder from dividing itself by the 3rd parm modulus immeidately after the multiply and squares.

If the cause of the $calc(10^10360) gpf is caused by this library function, modifying it to use a modulus should be able to keep this variant from having the problem that $calc has. This modulo ensures the output result is always smaller than 3rd parameter, and would never create an internal subtotal greater than parm3^squared, so there shouldn't need to be any slowdown to defend against an overflow, other than possibly verifying that the bit length of the 3rd parm modulus does not exceed half the 'safe' answer length.

The library function has an early exit if the exponent is 0 or 1, and when using a modulus parm it should do the same for that too.

I tried to add the modified early exit checks for the 3 variables into the quoted code below. The 1st check is if the modulus is < 2, and should return zero for the same reasons that $calc(9 % 0) and $calc(9 % 1) do. The next check for the exponent is slightly different than in the unmodified. It would continue returning 1 if the exponent is zero, just like $calc(anything^0)==1 but for exponent=1 it should return (%Parm1 % Parm3) in case the 'base' is greater than the modulo.

This variant and the original should probably both have an early-exit check for the base being 0 or 1. Since the $calc(0^0) case has already been handled, this check just needs to handle $calc(0^positive)==0 and $calc(1^positive)==1, so if the base is less than 2, the 'base' is the value to be returned.

At the bottom of this post is the alias I scripted that works perfectly in BF mode except for being super slow due to needing to loop N times for an N bit number. Yes this can be slow for very big modulus numbers, but shouldn't be nearly this slow. When I use the scripted alias equivalent of (m_apm_integer_pow_nr with modulo) against the example 4096-bit modulus, the script takes 32 seconds to finish, and I'm hoping that a compiled equivalent would be much quicker.

For this to work, it looks like 't1' needs to be pulling a '1' off the stack

Quote
void m_apm_integer_pow_nr(M_APM r, M_APM x, unsigned n)
{
if (parm3_modulus < 2) { m_apm_copy(r,MM_Zero); return; }
; if (n < 2) {m_apm_copy(r, n ? x : MM_One); return;}
if (n < 2) {m_apm_copy(r, n ? (x mod parm3_modulus) : MM_One); return;}
if (x < 2) {m_apm_copy(r, x); return;}
M_APM t1 = M_get_stack_var();
m_apm_integer_pow_nr(t1, x, n>>1);
if (n & 1) {
M_APM t2 = M_get_stack_var();
m_apm_multiply(t2, x, t1);
m_apm_multiply(r, t1, t2);
r = r % parm3_modulus
M_restore_stack(2);
return;
}
m_apm_square(r, t1);
r = r % parm3_modulus
M_restore_stack(1);
}
--

One of the uses for these common BigInteger functions is in this 'challenge' scheme...

https://forums.mirc.com/ubbthreads.php/topics/270513/challenge-or-additional-crypto-identifiers

When I had taken a look at this before the MAPM beta, I was fairly certain I could script something to perform the calculation, except for the lack of an identifier that could perform a modular exponentiation calculation, and it also needed something that can quickly and accurately translate between base10 and hex. And, also must be able to do them without taking a lot of time.

And now the BigFloat mode looks like it can reach most of the way there, except for a couple of issues. One of them is the loss of precision when $base tries to translate between base 10/16, which I assume is a problem that goes away when the ^ operator in $calc is fixed, possibly even using the original m_apm_integer_pow_nr without the modulus.

However, it should be possible to perform $base() math without needing to use a pow() where the exponent can grow with the size of the input number, regardless which of the 36 outbases is involved. It should be possible to calculate results by chopping the number into pieces, and perform math based on which chunk something is in, without using a large exponent created by repeated multiplications.

Translation between base 10 and 16 should be able to have a quick shortcut, at least in the 10->16 direction. It looks like the library stores integer input into some kind of format where the value is divided into 32-bit values, and binary storage should be able to turn into a hex string very quickly, and $base(bignumber,10,16) should be able to be much quicker than when outbase is 15 or 17.

And same story when translating from base16 to base10, where it could convert a hex input string to binary storage, then use its normal integer-print routine immediately, and outbase=10 would be much faster than =9 or =11.

Of course these assume the shortcut of bases 10 and 16 not recognizing the base36 alphabet.

---

The other issue preventing scripting a 'challenge' alias is that the calculation time from scripting the Modular Exponentiation is very long.

A while ago, I had already made an alias that worked in doubles mode to perform the modular exponentiation calculation, but because it involved squaring before dividing by the modulus, it couldn't be guaranteed to be accurate within the doubles limit when the modulus was larger than sqrt(2^53).

It turned out I had reinvented the wheel, because my alias was basically the same thing as m_apm_integer_pow_nr except for having a 3rd parameter modulus applied against the result of the multiplication and squarings.

I adapted the alias to work in BigFloat mode, and once I replaced (Base^2) with (Base*Base), I found - yay! - that in BigFloat=on I've not been able to find it returning inaccurate calculation even when I had the 3 inputs being 8192-bit integers.

However, the script can't do the calculation in a reasonable time. It can be quite slow because it needs to loop the BigFloat calculations 'B' times for an exponent whose bit length is 'B'. The command here below creates sample sumbers of the designated bit length, and then calls the alias to performs the modular exponentiation calculation on them, and it's accurate for all the numbers I've tested, regardless how huge. However, when the bit size of the inputs is 2048 bits, it takes a little more than 4 seconds, which is too slow to be a usable script, and that's just 1 part of the 'challenge' calculation that the script would need to handle. And when the bit length doubles to 4096 by editing the 2048, the time increases 8-fold to 32 seconds, yikes.

//var -s %len $calc(2048*$log(2)/$log(10)//1) - 2, %base.bf 7 $+ $str(2,%len) $+ 1, %exp.bf 8 $+ $str(4,%len) $+ 1, %mod.bf 9 $+ $str(6,%len) $+ 1 , %t $ticksqpc | var %result $bf_modpow(%base.bf,%exp.bf,%mod.bf) | echo -a time $calc($ticksqpc - %t) ticks

---

Below is the alias that's identical to m_apm_integer_pow_nr except it uses a 3rd parameter as a modulus which keeps the result from growing in spite of very huge exponent. The original alias was a little longer in order to handle negative parameters according to the modulo math rules, but I took them out to make this as close to the MAPM function as possible.

There's no general concensus over what to name this function. The Wikipedia page names several names, including ModPow PowMod PowerMod ModPower ModExp etc. In some cases they staple this as an optional 3rd parameter onto the normal exponent function that takes only 2 parameters. The mysteriously missing library function was named M_pow_mod.

Code
alias bf_modpow {
  var %base.bf $1 , %exponent.bf $2 , %modulus.bf $3
  var %answer.bf 1
  if (!$regex($1-3,^\d+ \d+ \d+$)) goto syntax
  if (%modulus.bf < 2) return 0
  if (%exponent.bf < 2) {
    if (%exponent.bf == 0) return 1
    if (%exponent.bf == 1) return $calc( %base.bf % %modulus.bf)
  }
  if (%base.bf < 2) return %base.bf
  ; echo -s base: %base.bf exp: %exponent.bf modulus: %modulus.bf
  while (%exponent.bf != 0) {
    if ($calc(%exponent.bf % 2)) { var %answer.bf $calc( (%answer.bf * %base.bf) % %modulus.bf ) }
    ; this next calc multiplies base*base because base^2 returns the wrong result in .bf mode
    var %base.bf $calc( (%base.bf * %base.bf) % %modulus.bf ) , %exponent.bf $calc(%exponent.bf //2)
  }
  return %answer.bf
  :syntax
  echo -sc info *$bf_modpow(base,exponent,modulus)
  halt
}

Joined: Dec 2002
Posts: 5,230
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,230
Quote
Inside the source code on github, the function.ref file describes M_pow_mod, which is exactly what I'm looking for, but when I search the rest of the source code I can't find it mentioned anywhere else. This does look unusual, in that it uses the parameters as (exponent,base,modulus) instead of the normal (base,exponent,modulus)
It can be found in mapm_pwr3.c as m_apm_powmod(). It was not in the C++ wrapper either.

I have added support for it and added a $powmod() identifier that calls this function. Your script completes in half the time when using it.

I also implemented your $bf_modpow() identifier in C.

My development computer is about ten years old, so it is pretty slow. When I run your script with the following, the timings are:

$bf_modpow(): 2048 -> 7.4s, 4096 -> 57s

m_apm_powmod(): 2048 -> 3.7s, 4096 -> 29s

$bf_modpow() in C: 2048 -> 2.9s, 4096 -> 23s

All methods return the same results.

So the $bf_modpow() in C is faster than the m_apm_powmod() function. However, there seems to be more going on in m_apm_powmod() but I cannot tell which implementation is better in terms of edge cases, etc.

Let me know which one you want $powmod() to use.

Update: I am going to go with the faster implementation for the next beta and we can see how it works out.

Last edited by Khaled; 29/10/22 04:13 PM.
Joined: Jan 2004
Posts: 1,949
maroon Offline OP
Hoopy frood
OP Offline
Hoopy frood
Joined: Jan 2004
Posts: 1,949
Thanks for the $powmod. And also the $gcd

I was hoping that the cause of my slowness was due to the overhead from the scripting engine, and a compiled version would do it in much less time. The shorter time for $powmod increases the places it can be used for, but is still high enough to limit where it can be used.

For example, in that challenge script for moonmoon, if the RSA length were 4096, there would be 2 of the 2048 powmods that would take a combined 15 seconds during ON CONNECT, plus the other calculations that it would do. The challenge would also need something that can do $base(4096-bit,16,10), and so far base isn't always accurate above 256-bit or so, and I'm afraid that increasing float precision to support larger integers would just slow a lot of other things down, including $powmod itself. I'm worried that getting useful big number calculations of from a floating point library that tries to use float calculations to perform them - would just make the calculations for both areas be slower than they'd otherwise need to be.

- -

The $gcd was actually one of the the other big number functions I was hoping to get, but I wasn't sure how much I could trust the GCD I saw in MAPM, considering the issues with ^ and % operators for larger numbers, as well as the speed of imitiation-of-base scripts as they were fed larger numbers. i.e. if there are cases where 'close math' is returning the GCD between 'A' and 'almost-B'.

The $gcd is actually sister to another math function that I was interested in, which is the modular multiplicative inverse.

- -

This is the $ModInverse function I referenced in the updated validating of $powmod should $powmod handle negative exponents eventually. And to be clear, I wasn't wanting ModInverse for usage just as a parameter in powmod, that's just a place where it can plug in.

The mod_inverse is closely related to GCD, because when you have 2 positive numbers A,B you'll either be able to get a valid inverse from mod_inverse(A,B) or else you'll have GCD(A,B) being greater than 1. But you won't get both and you won't get neither.

I have a slow script that can calculate the $ModInverse and $Gcd the same way, but my algorithm is different than the way MAPM calculated their GCD, so I need to figure out how to script their GCD to see how the speed of the 2 methods compares, and if it turns out that the MAPM design of GCD was faster as a script, then I'd try to see if there's a way to modify the faster GCD design to find the Inverse instead.

It's hard to benchmark how long this takes, because it depends on the combo of (A,B). It looks like their method repeats a loop B times for a B-bit number, but the math inside the loop is different. But my other method can sometimes be as few as a dozen loops for some (A,B) pairs while other pairs could require somewhere close to B * 60%. Also making the time harder to judge is how the modulus gets whittled down shorter each loop, unlike $powmod where it remains the same

Or, if I'm lucky, the Inverse could be borrowed from OpenSSL as I mentioned in the .bf thread. Here's the reference where OpenSSL has the inverse function.

Quote
https://www.openssl.org/docs/man1.1.1/man3/BN_mod_inverse.html

BIGNUM *BN_mod_inverse(BIGNUM *r, BIGNUM *a, const BIGNUM *n,
BN_CTX *ctx);
This calculates r=mod_inverse(a,n) where 'r' is an integer number from [0,n-1] where ((r*a) % n = 1.

It's kind of like the m_apm_reciprocal used by MAPM, except this is for integers only, and can only returns an integer instead of a long fraction.

inverse(5,7) = 3, because 3*5 % 7 = 1
inverse(21,35) = error, because there's no possible inverse
inverse(-3,11) means A is 3 less than B, so is inverse(8,11)=7 because 8*7 % 11 = 1

--

Since there's 2 terms and no exponent, the validation rules for negative/0/1 are shorter and simpler, and I'm not sure how OpenSSL would validate them. If OpenSSL handles all of these, there's no need to validate them other than making them be integers. After the subroutine returns the result, if there is no valid inverse, an identifier return value of 0 seems appropriate, since that's not a valid inverse for anything.

modulo always ignores the sign in the modulus
if (B < 0), B = 0 - B
now: A=any B>=0

divide by zero
if (B = 0) return $calc(0/0)
now: A=any B>0

negative -> [0,B-1] by adding B until >= 0
if (A < 0), A = B - (B % abs(A)) = range [0,B-1]
now: A>=0 B>0

if needing to prevent A=0, must first ensure A < B
otherwise the 1st loop makes A<B then no need for pre-check
if (A >= B) A = A % B
now: A=[0,B-1] B>0

0 * anything cannot be 1 greater than a multiple of anything
if (A = 0) return $calc(0/0)
now: A=[1,B-1] B>0

- -

The reason this is so closely related to GCD is that, when GCD is greater than 1, that means (A,B) share a factor other than 1, and it would be impossible to find an inverse to multiply by A that would be 1 greater than a multiple of B. But on the other hand, when there are no shared factors between A and B, you're guaranteed to eventually be able to multiply A by something that's 1 greater than a multiple of B.

Just like with GCD, if there are MAPM approximations during the calculation, that could end up finding either of

value * almost-A % B = 1
value * A % almost-B = 1

Joined: Jan 2004
Posts: 1,949
maroon Offline OP
Hoopy frood
OP Offline
Hoopy frood
Joined: Jan 2004
Posts: 1,949
I'm observing some unusual fluctuations in benchmark times for both $powmod and the original scripted bf_modpow alias, with their timing changes not happening together.

When you showed your benchmark time for bf_modpow as 7.4sec compared to the 4.3sec I was getting in 1275, I assumed that since my bf_modpow time was close to half your benchmark time, that my $powmod time would be close to half your 2.9sec time for the compiled C.

But then the 1743 beta containing $powmod came out, and my $powmod time was close to your compiled C time, and only then did I notice that 1603 beta had increased to give me times for bf_modpow closer to yours.

The table below shows my ticks count for the above //command in the 1st column, and the other columns either use 2*2048 or replace $bf_modpow with $powmod, or both.

In 2385 my bf_modpow times went back to the time it had in 1275, while 3337 has $powmod's time increased to nearly match the scripted alias's time. For the 4096 level that Libera recommends for certs, that 'challenge script' would do 2 of these 2048's, so this portion of the script time would now be 8sec.

Code
	2048				4096
	bf_modpow	powmod		bf_modpow	powmod
1275	4350		n/a		32400		n/a
1603	7000		n/a		53000		n/a
1743	7000		2760		52900		21570
2385	4370		2770		32500		21600
3337	4300		4090		32400		31800

Joined: Jan 2004
Posts: 1,949
maroon Offline OP
Hoopy frood
OP Offline
Hoopy frood
Joined: Jan 2004
Posts: 1,949
Additional info related to prior post

From my examination of $isbit, I'm wondering if - either the same thing causing $isbit to be slow could be the same thing causing $powmod to slow down, or else this might be a way to optimize $powmod, depending on how much time is spent checking the Nth bit of the exponent compared to doing the multiplies and squares. Below is a solution that makes $isbit faster than it currently is, though it might not be the optimum long-term solution.

--

I know you had mentioned that a past GPF in powmod had been caused by ever-increasing length of a fraction, so maybe the 40% chance in $powmod speed for this beta is related to getting the Nth bit of the exponent a different way. And, this might also be related to the reason why $isbit is slow.

The alias below creates a random number having bit length of N, and then uses 6 methods to rebuild it bit-by-bit.

#1 the new .bf mode capability for $isbit to get the value of each bit position

#2 a kludge that uses $base() to translate the number into a long base2 text string, then uses $mid to see what's the Nth bit from the right.

#3 Same kludge as #2 except it cheats by doing something $isbit can't do, which is assuming the input number is constant, so it can pre-calculate the $base result 1 time to avoid N-1 calls to $base, using $mid against the already existing string.

#4 Same cheat as #3, except using a &binvar instead of a text string.

#5 Another cheat taking advantage of the input number being constant, making a copy of the input number then calculates the Nth bit similar to how $powmod parses the exponent

#6 While this is slower than #4, it is not a cheat relying on the input number being constant, and is significantly faster than #1 doing the same thing using $isbit

Benchmark times compared to #1's usage of $isbit seem to be...

#2 kludge is fairly consistent around 200% the time as #1, though intuitively I would've expected the multiplier distance to be much greater.

#3 vs #2 shows that the bottleneck for #2 is almost entirely the repeated calls to $base, and without the repeated calls to $base the time is less than 2% that of $isbit.

#4 vs #3 shows that #4 has a significant percent speed above #3 by using a &binvar instead of a %text.string.

#5 Because this imitates $bf_modpow, and is slower than #4 imitating $bf_modpow2, that might indicate a significant speedup for the portion of $powmod's time spent manipulating the exponent when it's not busy squaring and multiplying

#6 This one doesn't use the #3-5 cheat of a constant input number, so is relevant to the speed of $isbit.

While the #3 and #4 cheats are not the kind of thing that can help $isbit speed up, they're precisely the kind of thing that might help $powmod, if $powmod is currently doing math similar to #5.

$isbit:

I'm not saying that #6 is the most efficient way to calculate the Nth bit of a number, but it benchmarks at like 2% of the time that $isbit takes in #1, and Talon can probably suggest an even faster way to calculate it.

--

So below is a scripted bf_modpow2 update to my original alias, where in the original I was repeatedly dividing the exponent by 2 and calculating the modulo-2 result, even though I was interested only in the value of the Nth bit, and I knew there would be a performance penalty from trying to access a text string, but hadn't considered using a &binvar yet. Since this style of alias doesn't care about the full value of the exponent at each intermediate stage, the libraries would do the floor-divide-by-2 as shifting all the bits 1 position to the right. And the modulo-2 is then done instead by just checking the lowest bit. Or, combining both the shift and peek at the lowest bit as a getbit(). But I gather that's not compatible with how MAPM stores data.

So if a profiler determines that $powmod is spending a lot of time diving the N-bit exponent N times or otherwise checking the Nth bit, it might be faster to use the style#4 method to pre-calculate values of all bit positions, hopefully something more efficient than a call to $base(). And then during each loop could check array[pointer] rather than doing big-math. The cost/benefit timing from changing a script is almost surely different for compiled code.

This modified bf_modpow2 is the same as the original bf_modpow except for 2 things. It has updated handling to discard fractions and handle negatives. But the main difference is how it checks for the bit values of the exponent. Where B is the number of bits in the exponent:

old:
a.
b.
c.
d. calculate (exponent % 2) to see if result == 1, repeated B times
e. calculate (exponent // 2) then set exponent to new value, repeated B times
f. check if potentially huge bigfloat exponent is now zero, repeated B times

new:
a. use $base to translate exponent to base2 text string, done 1 time
b. create &temp binvar containing that text string, done 1 time
c. create %bit.pos pointing to the least-value bit position, done 1 time
d. check if the Nth lowest byte of &temp is '1', repeated B times
e. decrease %bit.pos pointer by 1, repeated B times
f. check if short integer is now zero, repeated B times

For bf_modpow2 I am seeing much wider fluctuations in completion time than I saw for bf_modpow, but the best times at 2048 and 4096 in the various betas have a very minor gain in speed compared to bf_modpow. Intuitively I would expect there's more of a performance gain by checking the value of a &binvar's byte instead of perform a modulo-2 calculation against a huge bigfloat number, and then repeatedly floor dividing a bigfloat number by 2, which remains a bigfloat until the last 53 loops - than the performance hit for the time needed to translate a very large integer from base10 to base2 and then stuff it into a &binvar.

So if the slowness for $isbit is the same thing that caused $powmod to be 40% slower in 3337 than 2385, perhaps a single pre-calculation of the exponent might enable $powmod to return to its former speed, or to possibly even have the same speed relationship in 1743 where $powmod took 40% of the bf_modpow time instead of the current 95-98%.

And, the style#6 can help the speed of $isbit be 50x faster, unless there's a better method than I created off-the-cuff.

* *

Updated bf_modpow2 has interchangeable syntax with $powmod, so both custom aliases and $powmod can be used interchangably as the identifier in the earlier posts's //command.

The mapm_isbit alias defaults to bit length 512, but can be changed by $1 being as high as $maxlenl. It creates a random number of the specified bit length, then creates from it a similar exponent then shows the time required to perform the slow $powmod calculation against a number is faster than the time needed by $isbit to access all the bits. It then shows the time needed to re-create the random number using each of the 6 methods described above, of which only style#2 is slower. It verifies the result matches the original random number and resets /fupdate back to original, having changed it to avoid fupdate=0 affecting the timing.

Code
alias bf_modpow2 {
  var %base.bf $gettok($1,1,46) , %exponent.bf $gettok($2,1,46) , %modulus.bf $gettok($3,1,46)
  var %answer.bf 1

  if (!$regex(%base.bf %exponent.bf %modulus.bf,^-?\d+ -?\d+ -?\d+$)) return $calc(0/0)
  ; (base=any integer, exp=any integer, mod=any integer) result always [0,modulus-1]

  ; modulus = abs(modulus)
  if (-* iswm %modulus.bf) var %modulus.bf $regsubex(foo,$v2,^(-*)(.*)$,\2)
  ; (base=anything, exp=anything, mod >= 0)

  ; (any % 1 = 0); (any % 0 = divide by zero)
  ; if (%modulus.bf == 0) return $calc(0/0)
  if (%modulus.bf < 2) return 0
  ; (base=anything, exp=anything, mod >= 2)

  ; if (base < 0) base = modulus - (abs(base) % modulus)
  if (-* iswm %base.bf) var -s %base.bf $calc( %modulus.bf - $regsubex(foo,$v2,^(-*)(.*)$,\2) )
  ; aka -N % mod -> [0,mod-1] -> (-N+mod) % mod
  ; (base >= 0, exp=anything, mod >= 2)

  if (%exponent.bf < 2) {
    ; any^0 = 1
    if (%exponent.bf == 0) return 1
    ; base is positive, so for base^1 return [0,mod-1] as (base % modulus)
    if (%exponent.bf == 1) return $calc( %base.bf % %modulus.bf)

    ; remaining case here is (exp < 0)
    if (%exponent.bf  < 0) {
      ; powmod(+base,-exp, +mod) -> powmod( mod_inverse(+base,modulus) , +exp, +mod)
      ; exp = abs(exp)
      var %exponent.bf $regsubex(foo,%exponent.bf,^(-*)(.*)$,\2)
      ; base = inverse(+base,modulus)
      var %base.bf $ModInv( %base.bf , %modulus.bf )
      ; 6^-2 mod 15 -> 6^(-1*2) mod 15  -> 6^(-1)^2 mod 15 -> (1/6)^2 mod 15 -> modinv(6,15)^2 mod 15 -> 3^2 mod 15 = 9
      ; if (%base.bf == -1) return -1
      ; -1 when there is no inverse ie (5,-anything,15)
    }
  }
  ; (base >= 0, exp >= 2, mod >= 2)

  if (%base.bf < 2) {
    ; 0^any = 0; 1^any = 1; i.e. return val matches base
    return %base.bf
  }
  ; (base >= 2, exp >= 2, mod >= 2)

  ;  echo -s $scriptline base: %base.bf exp: %exponent.bf modulus: %modulus.bf

  ; intentional skip of leading zero bits, requires code above to prevent seeing exponent <= 0
  bset -tc &temp 1 $base(%exponent.bf,10,2)
  ;breplace &temp 48 0
  var %bit.pos $bvar(&temp,0)
  ;echo -a $bvar(&temp,0) / $bvar(&temp,1-)
  ;var %binary_exponent.bf $base(%exponent.bf,10,2) , %bit.pos $len(%binary_exponent.bf)
  ;    if ($mid(%binary_exponent.bf,%bit.pos,1)) { var %answer.bf $calc( (%answer.bf * %base.bf) % %modulus.bf ) }
  while (%bit.pos > 0) {
    if ($bvar(&temp,%bit.pos) == 49) { var %answer.bf $calc( (%answer.bf * %base.bf) % %modulus.bf ) }
    ; this next calc multiplies base*base because base^2 returns the wrong result in .bf mode
    var %base.bf $calc( (%base.bf * %base.bf) % %modulus.bf ) , %bit.pos %bit.pos - 1
  }
  return %answer.bf
  :syntax
  echo -sc info *$bf_modpow(base,exponent,modulus)
  halt
}
[code]



[code]
alias mapm_bitwise_style1 { if ($isbit($1,$2)) return 1 | return 0   }
alias mapm_bitwise_style2 { return $mid($base($1,10,2),- $+ $2,1)    }
alias mapm_bitwise_style3 { return $mid($hget(maroon,tmp),- $+ $2,1) }
alias mapm_bitwise_style4 { return $bvar($1,$2)                      }
alias mapm_bitwise_style5 { return $calc(($1 //(2^($2 -1))) % 2)     }

alias mapm_isbit {
  var -s %fupdate $fupdate
  fupdate 95
  var %bits $int($1)
  if ($1 !isnum 1- $maxlenl) { echo -ag $ $+ 1 not 1-10240 - using 512 bit number | var %bits 512 }
  echo -ag processing %bits bit number, warning this can be slow - press ctrl+break if needed
  var %num.bf $rand($calc(2^(%bits -1)),$calc(2^%bits))
  var %exp.bf $calc(%num.bf - 3) , %ticks $ticksqpc , %powmod $powmod(2,%exp.bf,%num.bf)
  echo -a powmod time: $calc($ticksqpc - %ticks)

  var %i 1, %build , %ticks $ticksqpc
  while (%i isnum 1- %bits) {
    var %b $mapm_bitwise_style1(%num.bf,%i) | var %build %b $+ %build
    inc %i
    if (*00 iswm %i) echo -a debug: i %i
  }
  var %stop $ticksqpc, %n2 $base(%build,2,10)
  echo -a diff: $calc(%num.bf - %n2) style#1 isbit time: $calc($ticksqpc - %ticks)

  var %i 1, %build , %ticks $ticksqpc
  while (%i isnum 1- %bits) {
    var %b $mapm_bitwise_style2(%num.bf,%i) , %build %b $+ %build
    inc %i
    if (*00 iswm %i) echo -a debug: i %i
  }
  var %stop $ticksqpc, %n2 $base(%build,2,10)
  echo -a diff: $calc(%num.bf - %n2) style#2 base(num,10,2) at each bit time: $calc(%stop - %ticks)

  var %i 1, %build , %ticks $ticksqpc | hadd -mu9999 maroon tmp $base(%num.bf,10,2)
  while (%i isnum 1- %bits) {
    var %b $mapm_bitwise_style3(%num.bf,%i) , %build %b $+ %build
    inc %i
    if (*00 iswm %i) echo -a debug: i %i
  }
  var %stop $ticksqpc, %n2 $base(%build,2,10)
  echo -a diff: $calc(%num.bf - %n2) style#3 base(num,10,2) 1 time then fetch from hashtable time: $calc(%stop - %ticks)

  var %build , %ticks $ticksqpc
  bset -tc &temp 1 $base(%num.bf,10,2) | breplace &temp 48 0 | breplace &temp 49 1
  var %i $bvar(&temp,0)
  while (%i) {
    var %b $mapm_bitwise_style4(&temp,%i) , %build %b $+ %build
    dec %i
    if (*00 iswm %i) echo -a debug: i %i
  }
  var %stop $ticksqpc, %n2 $base(%build,2,10)
  echo -a diff: $calc(%num.bf - %n2) style#4 base(num,10,2) 1 time then fetch from &binvar time: $calc(%stop - %ticks)

  var %i 1, %build , %ticks $ticksqpc
  var %temp.bf %num.bf
  while (%i isnum 1- %bits) {
    var %b 0 | if ($calc(%temp.bf % 2)) var %b 1 | var %build %b $+ %build
    var %temp.bf $calc(%temp.bf // 2)
    inc %i
    if (*00 iswm %i) echo -a debug: i %i
  }
  var %stop $ticksqpc, %n2 $base(%build,2,10)
  echo -a diff: $calc(%num.bf - %n2) style#5 at each bit bit=num%2 then num=num//2 time: $calc(%stop - %ticks)

  var %i 1, %build , %ticks $ticksqpc
  while (%i isnum 1- %bits) {
    var %b $mapm_bitwise_style5(%num.bf,%i) , %build %b $+ %build
    inc %i
    if (*00 iswm %i) echo -a debug: i %i
  }
  var %stop $ticksqpc, %n2 $base(%build,2,10)
  echo -a diff: $calc(%num.bf - %n2) style#6 at each bit calc(num//(2^(bitpos-1))) % 2)   time: $calc(%stop - %ticks)

  fupdate %fupdate
}

Joined: Dec 2002
Posts: 5,230
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,230
Quote
I'm observing some unusual fluctuations in benchmark times for both $powmod and the original scripted bf_modpow alias
That looks fine to me. The only changes to $powmod() were the additions to it from your previous post.

There were no other changes to the library other than fixes to specific functions in MAPM that were needed to prevent freezes - these changes simply reverted previous changes I had made to the library which I thought would speed it up but clearly caused other issues.

Another possible cause for a slowdown is that I increased MAPMs maximum precision setting so that it was at least 10 digits greater than the maximum 30 decimal places that mIRC returns for bigfloat calculations.

At this point, we don't really have a baseline for how fast bigfloat should/will be, since we are still experimenting, tweaking, and fixing issues. Once It looks like it is stable and is returning reasonably correct results, we can think about how to improve performance, although at the end of the day, this will really be limited by MAPM.

Joined: Dec 2002
Posts: 5,230
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,230
Quote
From my examination of $isbit, I'm wondering if - either the same thing causing $isbit to be slow could be the same thing causing $powmod to slow down
These are unrelated, other than the global MAPM precision setting.

The slow performance of the bigfloat btiwise operations is due to converting MAPM bigfloat into a bitfield, applying changes, and then converting back to MAPM.

As for $isbit()... that is hilarious. I was so focused on optimizing the MAPM from/to conversion to apply bitwise operations that I missed the simple one line bit test. Sheesh! The $isbit() change will be in the next beta.

Joined: Jan 2004
Posts: 1,949
maroon Offline OP
Hoopy frood
OP Offline
Hoopy frood
Joined: Jan 2004
Posts: 1,949
Rats, I was hoping there was something here that would optimize $powmod. And the same kind of shortcut can benefit $biton here, as well as $bitoff. But if there's a shortcut for the other bitwisers where the numbers can have many bits set, that's above my paygrade.

The only shortcut I came up with for $xor or $and or $or was to translate both numbers to hex, then padd them so they could be lined up as equal length byte arrays, then once they're both in a &binvar the byte values can be individually $XOR'ed or $AND'ed or $OR'ed against each other.

But scripting that had too many calls to $base and $regsubex and $xor itself that the scripted method wasn't competitive. Though, if the 2 numbers can be efficiently exported from MAPM over to a uint32 array, that's 99% of the battle.

Below is the scripted $biton that borrows from the $isbit shortcut to determine whether it needs to add 2^something or not. And same with $bitoff knowing whether to subtract 2^something. Though there's got to be a better way to simulate biton/bitoff than add/subtract. This alias benchmarks to like 10% of the $biton time, with $crc just a quick check that both results match.

Code
alias mapm_isbit_style5 { return $calc(($1 //(2^($2 -1))) % 2)     }
alias mapm_biton {
  var -s %fupdate $fupdate
  fupdate 95
  var %bits $int($1)
  if (%bits !isnum 100-10000) var %bits 512
  var -s %t 9
  var %low.bf $calc(2^(%bits - 12)) , %hi.bf $calc(2^%bits)
  var %rand.bf $rands(%low.bf,%hi.bf)
  var %i %t , %ticks $ticksqpc , %check
  while (%i) {
    var %j %bits
    while (%j) {
      var %a.bf %rand.bf
      var %a.bf $biton(%rand.bf,%j) , %check $crc(%check %a.bf,0)
      dec %j
    }
    dec -s %i
  }
  echo -a style#1: biton time: $calc($ticksqpc - %ticks) %check

  var -s %i %t , %ticks $ticksqpc , %check
  while (%i) {
    var %j %bits
    while (%j) {
      var %a.bf %rand.bf
      if (!$mapm_isbit_style5(%a.bf,$calc(%j))) var %a.bf %a.bf + $calc(2^(%j -1)) 
      var %check $crc(%check %a.bf,0)
      dec %j
    }
    dec -s %i
  }
  echo -a style#2  time: $calc($ticksqpc - %ticks) %check
  fupdate %fupdate
}

Joined: Dec 2002
Posts: 5,230
Hoopy frood
Offline
Hoopy frood
Joined: Dec 2002
Posts: 5,230
Quote
Rats, I was hoping there was something here that would optimize $powmod.
Hmm. Okay. I took another look at it. I replaced one modulus 2 call with the MAPM function is_odd(), I used the MAPM sqr() function for the square, and replaced the other modulus calls and one division call with the integer versions. I am shocked, I tell you, shocked that it is now twice as fast :-] This change will be in the next beta.

Joined: Jan 2004
Posts: 1,949
maroon Offline OP
Hoopy frood
OP Offline
Hoopy frood
Joined: Jan 2004
Posts: 1,949
Great, I hope the twice is relative to $powmod's prior speed not the new, lol. The big integer function should be able to do this well below 500 ticks, but hopefully this will be fast enough to make it usable in more situations.

Your reply reminded me that I was still using (base * base) inside both variants of bf_modpow, because in the earlier beta the ^ operator wasn't working right. I'd forgotten to change that alias to use (base ^2) now that it's fixed, so now when I make just that 1 change in the alias, that original 2048-bit //command consistently reduces from 4360 down to 4140 ticks, which reduces the time to 95%.

Based on your mention of $isodd, I tested to see the effect of changing the if() in my loop. bf_modpow2 can't use it because it changed to see if the byte in &temp was 49, so there's 3 alternative ways for the if() to check if the lowest bit is odd:

old:: if ($calc(%exponent.bf % 2))
new1: if (2 \\ %exponent.bf)
new2: if ($right(%exponent.bf,1) isin 13579)
new3: if (%exponent.bf & 1)

When I changed from 'old' to 'new1', the time was smaller by maybe 5-10 ticks, but is within the margin of error, and is probably doing the same thing under the hood as calc % 2 does.

Doing the string compare in 'new2' was faster by 40-50 ticks, for a good 1% gain above 'old'.

I can't test 'new3' because it turned out that the '&' operator refuses to leave the 2^32 range of doubles mode. The test below returns 100% $true for (term1 & 1) as 2^32 and above, and 100% false for (term1 !& 1), and editing the 33 will verify the fix works for really big numbers too.

//var %i 1000 , %bits 33 , %min.bf 2 ^ $calc(%bits -1), %max.bf 2 ^ %bits , %count1 0 , %count2 0 | while (%i) { var %rand.bf $r(%min.bf,%max.bf) | if (%rand.bf & 1) inc %count1 | if ($right(%rand.bf,1) isin 13579) inc %count2 | dec %I } | echo -a bits %bits : count rand & 1: %count1 : count right(,1) isin 13579 : %count2

Joined: Jan 2004
Posts: 1,949
maroon Offline OP
Hoopy frood
OP Offline
Hoopy frood
Joined: Jan 2004
Posts: 1,949
The time for $isbit now beats the scripted mimic. Here's an updated scripted alias for $biton/$bitoff that's 28x faster at 512 bits, and 60x faster at 1024 bits. It just uses $isbit(N) to determine whether to add/subtract 2^(N-1). I didn't include logic to handle negative numbers, but should be similar for those.

Code
alias mapm_biton_bitoff {
  var %bits $int($1) , %t 1
  if (%bits !isnum 100-10000) var %bits 512 
  var %low.bf $calc(2^(%bits - 1)) , %hi.bf $calc(2^%bits) , %rand.bf $rands(%low.bf,%hi.bf)
  var %base2 $base(%rand.bf,10,2)
  echo -ag $count(%base2,1) of $len(%base2) bits are 1's. set $ $+ 2 = 1 to include crc checksum
  var %i %t , %ticks $ticksqpc , %check
  while (%i) {
    var %j %bits
    while (%j) {
      var %a.bf %rand.bf
      var %a.bf $biton(%a.bf,%j)
      if ($2) var %check $crc(%check %a.bf,0)
      dec %j
    }
    dec -s %i
  }
  echo -a style#1: biton time: $calc($ticksqpc - %ticks) %check
  var %i %t , %ticks $ticksqpc , %check
  while (%i) {
    var %j %bits , %a.bf %rand.bf
    while (%j) {
      var %a.bf %rand.bf
      if (!$isbit(%a.bf,%j)) var %a.bf %rand.bf + $calc(2^(%j -1))
      if ($2) var %check $crc(%check %a.bf,0)
      dec %j
    }
    dec -s %i
  }
  echo -a style#2 scripted-biton time: $calc($ticksqpc - %ticks) %check
  var %i %t , %ticks $ticksqpc , %check
  while (%i) {
    var %j %bits
    while (%j) {
      var %a.bf %rand.bf
      var %a.bf $bitoff(%a.bf,%j)
      if ($2) var %check $crc(%check %a.bf,0)
      dec %j
    }
    dec -s %i
  }
  echo -a style#1: bitoff time: $calc($ticksqpc - %ticks) %check
  var %i %t , %ticks $ticksqpc , %check
  while (%i) {
    var %j %bits , %a.bf %rand.bf
    while (%j) {
      var %a.bf %rand.bf
      if ($isbit(%a.bf,%j)) var %a.bf %a.bf - $calc(2^(%j -1))
      if ($2) var %check $crc(%check %a.bf,0)
      dec %j
    }
    dec -s %i
  }
  echo -a style#2 scripted-bitoff time: $calc($ticksqpc - %ticks) %check
}

Joined: Jan 2004
Posts: 1,949
maroon Offline OP
Hoopy frood
OP Offline
Hoopy frood
Joined: Jan 2004
Posts: 1,949
Updated $powmod timing table, it looks like my half-time for $powmod is relative to the increased time in 3337, and is now around 90% of the beta 2385 time

Code
	2048				4096
	bf_modpow	powmod		bf_modpow	powmod
1275	4350		n/a		32400		n/a
1603	7000		n/a		53000		n/a
1743	7000		2760		52900		21570
2385	4370		2770		32500		21600
3337	4300		4090		32400		31800
3586	4370		2516		32480		19663

From the //command below, it looks like all of the time difference between 2385 and 3586 comes from things other than the time for the (answer*base) multiply.

My original example was trying to create a 'fair' exponent that had close to a 50/50 mix between 1-bits and 0-bits, because the 1-bit count has a big effect on the time, due to making that extra multiply. This next example has an exponent where only 2 of 2048 bits are 1's, so that means it performs the (answer*base) multiply only on the 1st and last bit.

//var -s %m.bf $calc(2^2048-3) , %exp.bf $calc(%m.bf +4), %t $ticksqpc | echo -a $qt($beta) $powmod(7,%exp.bf,%m.bf) time: $calc($ticksqpc - %t) * bits(m) $len($base(%m.bf,10,2)) * 1-bits(exp) $count($base(%exp.bf,10,2),1)

My time for this command in beta 3586 is 1614 vs the 1861 of beta 2385, which is the same 250 time difference I see from the original benchmark //command that did the extra multiply on half the bit positions. And when I change the +4 to -4 so nearly all of the exponent's bits are 1's, the time is of course much slower than the benchmark, but the relative times between beta's 3586 and 2385 is the same 250 ticks difference, which confirms no time change for the multiply itself. With an eventual openssl subroutine behind $powmod, the total execute time for the original benchmark would be easily below 100 ticks.

* *

As for the loop itself, this alias below benchmarks various methods exludes the (base^2) and (base*answer) and looks just for the 'if ($isodd)' method, in case there's some speed change for the part of the loop excluding the multiply and square. The 'count' is just to verify that the various methods aren't seeing the 1-bits wrong.

Assuming that $powmod is using something similar to style 1/4/5, it was surprising to see that the string manipulation of style 3 is faster, and style 10 is also faster than 1/4/5, in spite of the time needed to get the base2 text from $base then use $replace() on it in order to feed it to /bset, which I assume are things that the executable code can do much more efficiently when they don't need to create text strings.

Because Style 6&7 take 150x as long as 1/4/5 and Style 8 is even 4x slower than that, styles 6/7/8 don't run by default, and are just for future reference by forum readers about avoiding "if (num & X)" or "$and(num,X)" when num is very big and X is a power of 2.

Code
alias mapm_powmod_loop {
  var %bits 2048 | if ($1 isnum 32-10240) var %bits $int($1)
  if ($1 !isnum 32-10240) { echo -ag $ $+ 1 not 32-10240 - using 2048 bit number }
  echo -ag processing %bits bit number, warning can be slow if (678 isin $ $+ 2) - press ctrl+break if needed
  var %num.bf $rand($calc(2^(%bits -1)),$calc(2^%bits))
  var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
  while (%exp.bf) {
    if ($isbit(%exp.bf,1)) inc %count
    var %exp.bf $calc(%exp.bf // 2)
  }
  echo -a style#1 *$isbit(exp,1) count: %count $chr(22) time: $calc($ticksqpc - %ticks )
  var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
  while (%i <= %bits) {
    if ($isbit(%exp.bf,%i)) inc %count
    inc %i
  }
  echo -a style#2 *$isbit(exp.bf,bit.position) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
  while (%exp.bf) {
    if ($right(%exp.bf,1) isin 13579) inc %count
    var %exp.bf $calc(%exp.bf // 2)
  }
  echo -a style#3 (right(exp,1) isin 13579) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
  while (%exp.bf) {
    if ($calc(%exp.bf % 2)) inc %count
    var %exp.bf $calc(%exp.bf // 2)
  }
  echo -a style#4 (exp % 2 == 1) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
  while (%exp.bf) {
    if (2 \\ %exp.bf) inc %count
    var %exp.bf $calc(%exp.bf // 2)
  }
  echo -a style#5 (2 \\ exp) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  if (6 isin $2) {
    var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
    while (%exp.bf) {
      if (%exp.bf & 1) inc %count
      var %exp.bf $calc(%exp.bf // 2)
    }
    echo -a style#6 if (exp.bf & 1) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  }
  if (7 isin $2) {
    var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
    while (%exp.bf) {
      if ($and(%exp.bf,1)) inc %count
      var %exp.bf $calc(%exp.bf // 2)
    }
    echo -a style#7 *$and(exp.bf,1) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  }
  if (8 isin $2) {
    var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
    while (%i <= %bits) {
      if ($and(%exp.bf,$calc(2^ (%i -1)))) inc %count
      inc %i
    }
    echo -a style#8 *$and(exp.bf,2^(bit.pos)) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  }
  var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
  while (%i isnum 1- %bits) {
    if ($mapm_bitwise_style5(%num.bf,%i)) inc %count
    inc %i
  }
  echo -a style#9 at each bit calc(exp//(2^(bitpos-1))) % 2) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  var %i $bvar(&base2,0) , %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
  bset -c &base2 1 $replace($base(%num.bf,10,2),1,1 $+ $chr(32),0,0 $+ $chr(32))
  ; bset -c &base2 1 $regsubex($base(%num.bf,10,2),/(.)/g,\t $+ $chr(32))
  var %ticksbase $ticksqpc , %i $bvar(&base2,0)
  while (%i isnum 1- %bits) {
    ; if (*9 iswm $bvar(&base2,%i)) inc %count
    if ($bvar(&base2,%i)) inc %count
    dec %i
  }
  echo -a style#10 at each bit check pre-built array for 0/1 count: %count $chr(22) time: $calc($ticksqpc - %ticks) including $calc($ticksqpc - %ticksbase) for calls to base()/replace()/bset
}



Link Copied to Clipboard