Updated $powmod timing table, it looks like my half-time for $powmod is relative to the increased time in 3337, and is now around 90% of the beta 2385 time

Code
	2048				4096
	bf_modpow	powmod		bf_modpow	powmod
1275	4350		n/a		32400		n/a
1603	7000		n/a		53000		n/a
1743	7000		2760		52900		21570
2385	4370		2770		32500		21600
3337	4300		4090		32400		31800
3586	4370		2516		32480		19663

From the //command below, it looks like all of the time difference between 2385 and 3586 comes from things other than the time for the (answer*base) multiply.

My original example was trying to create a 'fair' exponent that had close to a 50/50 mix between 1-bits and 0-bits, because the 1-bit count has a big effect on the time, due to making that extra multiply. This next example has an exponent where only 2 of 2048 bits are 1's, so that means it performs the (answer*base) multiply only on the 1st and last bit.

//var -s %m.bf $calc(2^2048-3) , %exp.bf $calc(%m.bf +4), %t $ticksqpc | echo -a $qt($beta) $powmod(7,%exp.bf,%m.bf) time: $calc($ticksqpc - %t) * bits(m) $len($base(%m.bf,10,2)) * 1-bits(exp) $count($base(%exp.bf,10,2),1)

My time for this command in beta 3586 is 1614 vs the 1861 of beta 2385, which is the same 250 time difference I see from the original benchmark //command that did the extra multiply on half the bit positions. And when I change the +4 to -4 so nearly all of the exponent's bits are 1's, the time is of course much slower than the benchmark, but the relative times between beta's 3586 and 2385 is the same 250 ticks difference, which confirms no time change for the multiply itself. With an eventual openssl subroutine behind $powmod, the total execute time for the original benchmark would be easily below 100 ticks.

* *

As for the loop itself, this alias below benchmarks various methods exludes the (base^2) and (base*answer) and looks just for the 'if ($isodd)' method, in case there's some speed change for the part of the loop excluding the multiply and square. The 'count' is just to verify that the various methods aren't seeing the 1-bits wrong.

Assuming that $powmod is using something similar to style 1/4/5, it was surprising to see that the string manipulation of style 3 is faster, and style 10 is also faster than 1/4/5, in spite of the time needed to get the base2 text from $base then use $replace() on it in order to feed it to /bset, which I assume are things that the executable code can do much more efficiently when they don't need to create text strings.

Because Style 6&7 take 150x as long as 1/4/5 and Style 8 is even 4x slower than that, styles 6/7/8 don't run by default, and are just for future reference by forum readers about avoiding "if (num & X)" or "$and(num,X)" when num is very big and X is a power of 2.

Code
alias mapm_powmod_loop {
  var %bits 2048 | if ($1 isnum 32-10240) var %bits $int($1)
  if ($1 !isnum 32-10240) { echo -ag $ $+ 1 not 32-10240 - using 2048 bit number }
  echo -ag processing %bits bit number, warning can be slow if (678 isin $ $+ 2) - press ctrl+break if needed
  var %num.bf $rand($calc(2^(%bits -1)),$calc(2^%bits))
  var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
  while (%exp.bf) {
    if ($isbit(%exp.bf,1)) inc %count
    var %exp.bf $calc(%exp.bf // 2)
  }
  echo -a style#1 *$isbit(exp,1) count: %count $chr(22) time: $calc($ticksqpc - %ticks )
  var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
  while (%i <= %bits) {
    if ($isbit(%exp.bf,%i)) inc %count
    inc %i
  }
  echo -a style#2 *$isbit(exp.bf,bit.position) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
  while (%exp.bf) {
    if ($right(%exp.bf,1) isin 13579) inc %count
    var %exp.bf $calc(%exp.bf // 2)
  }
  echo -a style#3 (right(exp,1) isin 13579) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
  while (%exp.bf) {
    if ($calc(%exp.bf % 2)) inc %count
    var %exp.bf $calc(%exp.bf // 2)
  }
  echo -a style#4 (exp % 2 == 1) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
  while (%exp.bf) {
    if (2 \\ %exp.bf) inc %count
    var %exp.bf $calc(%exp.bf // 2)
  }
  echo -a style#5 (2 \\ exp) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  if (6 isin $2) {
    var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
    while (%exp.bf) {
      if (%exp.bf & 1) inc %count
      var %exp.bf $calc(%exp.bf // 2)
    }
    echo -a style#6 if (exp.bf & 1) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  }
  if (7 isin $2) {
    var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
    while (%exp.bf) {
      if ($and(%exp.bf,1)) inc %count
      var %exp.bf $calc(%exp.bf // 2)
    }
    echo -a style#7 *$and(exp.bf,1) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  }
  if (8 isin $2) {
    var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
    while (%i <= %bits) {
      if ($and(%exp.bf,$calc(2^ (%i -1)))) inc %count
      inc %i
    }
    echo -a style#8 *$and(exp.bf,2^(bit.pos)) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  }
  var %i 1, %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
  while (%i isnum 1- %bits) {
    if ($mapm_bitwise_style5(%num.bf,%i)) inc %count
    inc %i
  }
  echo -a style#9 at each bit calc(exp//(2^(bitpos-1))) % 2) count: %count $chr(22) time: $calc($ticksqpc - %ticks)
  var %i $bvar(&base2,0) , %count 0 , %exp.bf %num.bf , %ticks $ticksqpc
  bset -c &base2 1 $replace($base(%num.bf,10,2),1,1 $+ $chr(32),0,0 $+ $chr(32))
  ; bset -c &base2 1 $regsubex($base(%num.bf,10,2),/(.)/g,\t $+ $chr(32))
  var %ticksbase $ticksqpc , %i $bvar(&base2,0)
  while (%i isnum 1- %bits) {
    ; if (*9 iswm $bvar(&base2,%i)) inc %count
    if ($bvar(&base2,%i)) inc %count
    dec %i
  }
  echo -a style#10 at each bit check pre-built array for 0/1 count: %count $chr(22) time: $calc($ticksqpc - %ticks) including $calc($ticksqpc - %ticksbase) for calls to base()/replace()/bset
}