|
Joined: Dec 2002
Posts: 343
Pan-dimensional mouse
|
OP
Pan-dimensional mouse
Joined: Dec 2002
Posts: 343 |
I assume Wolfram Alpha is correct, but that's an assumption I'm making. F is 15, also known as 2^4 -1. FF is 255, also known as 2^8 -1. These can be checked using calc in Windows and going from dec to hex. So, sixteen F, or FFFFFFFFFFFFFFFF, should be 2^64 -1. https://www.wolframalpha.com/input?i=convert+2%5E64+-1+to+base+36 They give 3w5e11264sgsf mIRC 7.69 //echo $base(FFFFFFFFFFFFFFFF,16,36) 3W5E11264SG0G mIRC 6.03 //echo $base(FFFFFFFFFFFFFFFF,16,36) 3W5E11264SGSF So, did newer versions of mIRC start miscalculating using $base or is Wolfram Alpha incorrect? Or possibly both incorrect? Did a search and came across https://books.google.com/books?id=qDszEAAAQBAJ&pg=PT237&lpg=PT237&dq="3W5E11264SGSF" Maybe they made a mistake there, but, maybe someone else here can verify what base 16 FFFFFFFFFFFFFFFF should be in base 36. I suspect newer versions of mIRC have a math bug in $base. Edited... More things I found out playing around. //inc %n 1 | tokenize 32 $calc(20000000000000 + %n) | echo -s $base($1-,16,36) In 7.69 Done seven times 2GOSA7PA2GW 2GOSA7PA2GY 2GOSA7PA2H0 2GOSA7PA2H0 2GOSA7PA2H0 2GOSA7PA2H2 2GOSA7PA2H4 In 6.03 2GOSA7PA2GX 2GOSA7PA2GY 2GOSA7PA2GZ 2GOSA7PA2H0 2GOSA7PA2H1 2GOSA7PA2H2 2GOSA7PA2H3 In 7.69 //echo -s $base(2GOSA7PA2H0,36,16) //echo -s $base(2GOSA7PA2H1,36,16) Same result of 20000000000004 In 6.03 //echo -s $base(2GOSA7PA2H0,36,16) 20000000000004 //echo -s $base(2GOSA7PA2H1,36,16) 20000000000005 Wolfram Alpha https://www.wolframalpha.com/input?i=convert+2GOSA7PA2H1_36+to+base+1620000000000005_16 So, Wolfram Alpha takes 2GOSA7PA2H1 in base 36 and converts it to 20000000000005 in base 16.
|
|
|
|
Joined: Mar 2008
Posts: 96
Babel fish
|
Babel fish
Joined: Mar 2008
Posts: 96 |
3W5E11264SGSF should be the correct value. C# code on SharpLab.IO
|
|
|
|
Joined: Dec 2002
Posts: 5,502
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 5,502 |
Thanks for your bug report. This is related to rounding/precision/etc. The result you are seeing in the latest version of mIRC is the expected, albeit wrong, result. mIRC is a 32bit application. For it's $calc()/$base()/scripting calculations it can handle a maximum of a double value. IEEE 64-bit double precision provides exactly 53 bits of mantissa. The remaining bits are for the sign and exponent. So a 64bit value cannot fit in a double value. However, the fact that a twenty year old version of mIRC, v6.03, returned the correct result, is intruiging. If I use the exact same code from v6.03 in v7.69, I still see the same incorrect result in v7.69. Which is what mIRC should be returning: the correct incorrect result ;-) mIRC v6.1 returns the same result as v7.69. So I assume that it was the switch from an older Visual C++ compiler, used for v6.03, to the newer Visual C++ .Net used for v6.1 that resulted in this change. My guess is that Microsoft fixed an issue in their floating point library to make it comply with IEEE double precision. Update: And there you go: https://stackoverflow.com/questions/7120710/why-did-microsoft-abandon-long-double-data-type
|
|
|
|
Joined: Dec 2002
Posts: 343
Pan-dimensional mouse
|
OP
Pan-dimensional mouse
Joined: Dec 2002
Posts: 343 |
It'd be slower, but what about creating a new $base called $basepc for base precision in which it'd do it in a long-hand method while keeping the current $base that is used in 6.1 and higher (for speed purposes)?
|
|
|
|
Joined: Dec 2002
Posts: 5,502
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 5,502 |
It'd be slower, but what about creating a new $base called $basepc for base precision in which it'd do it in a long-hand method while keeping the current $base that is used in 6.1 and higher (for speed purposes)? This topic has indeed been discussed many times before, ie. adding support for arbitrary precision arithmetic, such as BigNum. There is no practical way to add support for this since the result would not be usable anywhere else in mIRC, as no other feature would be able to parse that value, use it in a calculation, etc. Every scripting command/identifier would eventually need to have its own $XXXpc() extended version to handle BigNum values, and because the scripting language is so closely tied to whole of mIRC, including the GUI, every scripting feature would need to perform conversions from BigNum to non-BigNum values in order for the values to be consistently usable everywhere else, with the resulting decrease in performance due to BigNum handling.
|
|
|
|
Joined: Dec 2002
Posts: 5,502
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 5,502 |
I have been reviewing/testing several BigFloat libraries over the last few weeks to see what it would take to add support for this to the scripting language. A BigFloat library needs to meet several criteria: compile under Visual C++ 2008, use a BSD/MIT/similar license, support logarithmic/trigonometric/etc. functions, have a C++ wrapper, be stable/tested, and so on. This really narrows down the options. After looking through a large number of libraries (including Boost, APFloat, LibBF, and many more), compiling and testing them (many needed code changes before they would compile), I whittled down the options to TTMath and MAPM. TTMath is not arbitrary precision. The required precision is set at compile time. It is thread-safe. I set it to use Big<2,4> floats to allow a 128bit number for the Mantissa. The larger the <E,M>, the slower it gets, so this was a practical choice and provides support for reasonably large numbers and precision. In my script unit tests, which repeatedly call a variety of commands and identifiers such as /set/inc/dec/$calc()/$base()/math/trig/log/etc. functions, it is at least four times slower than normal calculations - slower depending on the type of call. For Big<4,8>, it is eight times slower. MAPM is arbitrary precision. It is old and well-tested. It has both thread/non-thread safe calls. mIRC currently uses the non-thread-safe calls as it is not used in any multi-thread contexts. By chance, I found an updated version at https://github.com/achan001/MAPM-5. It compiles easily and has a C++ wrapper and supports all of the necessary functions. The only major issue is that on an out-of-memory error, MAPM immediately exits() the application with no freeing/unwinding/error reporting. I made several attempts to update it to unwind and report errors correctly in a thread-safe way but this snowballed into hundreds of code changes. In the end, I tried a different approach and made as few changes to the MAPM code as possible to allow it to unwind safely, free memory where necessary, and to set a non-thread-safe global error flag. Not ideal but a step in the right direction. In my script unit tests, it is two times slower than normal calculations and gets gradually slower as the numbers you use increase in size. The latest beta uses the MAPM library. To support bigfloats, inline branching code and overloaded functions were added that duplicate, line for line, the existing, unchanged non-bigfloat code. The following were updated: /var /set /inc /dec /hinc /hdec /if /while $base() $calc() $isnum() $rand() $abs() $ceil() $floor() $int() $min() max() $rand() $round() $acos() $asin() $atan() $atan2() $cos() $cosh() $hypot() $log() $log10() $sin() $sinh() $tan() $tanh() $sqrt() Bigfloat can be enabled in two ways: First, by using the /bigfloat [on|off] command to enable it for a script locally. It will remain enabled until the script exits. Second, by using a %var that ends in .bf, ie. %var.bf. This naming convention allows vars to be recognized as bigfloats in the scripting language both visually and internally. The parser enables the bigfloat state the moment it comes across a %var.bf and only for the command in which it is found. The bigfloat state is propagated forward through called commands/identifiers. A script can use $bigfloat to check if bigfloat is enabled or not. This feature is still at the experimental stage, so will need testing. Please let me know if you spot any issues.
|
|
|
|
Joined: Jul 2006
Posts: 4,193
Hoopy frood
|
Hoopy frood
Joined: Jul 2006
Posts: 4,193 |
Cool stuff! I'm a bit worried with the %variables.bf usage though. Second, by using a %var that ends in .bf, ie. %var.bf. This naming convention allows vars to be recognized as bigfloats in the scripting language both visually and internally. The parser enables the bigfloat state the moment it comes across a %var.bf and only for the command in which it is found. Does the recognition of the .bf string happens before or after evaluation (literal or not?) because dynamic variables associated with nickname such as %foo. [ $+ $nick ] would easily conflict with this new addition, ".bf" is probably not wide/long/specific enough. If the purpose of .bf is to be a marker for the given identifier/command, then instead of breaking %variable support, a construct like $bigfloat() could be used ($bf might break script but why not), where $bigfloat(%var) would work the same for recognition, returning the value and enabling bigfloat. Is it too late to change this?
#mircscripting @ irc.swiftirc.net == the best mIRC help channel
|
|
|
|
Joined: Dec 2002
Posts: 5,502
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 5,502 |
Thanks for the feedback.
The %var.bf is the whole variable name. Any scripts that already use variables ending in .bf will work in the same way, ie. it is backward compatible. The naming method also means that it can be persistently stored and recognized as a bigfloat in vars.ini and when used by the scripting language.
I experimented with a variety of methods to enable bigfloat: "/bigfloat command", "$bigfloat()", adding flags/switches to existing commands/identifiers, as well as creating around forty commands/identifiers prefixed with /bnXXX and $bnXXX() that match existing commands/identifiers (in this case, new identifiers were needed for if/while comparisons as I did not want to create /bnif and /bnwhile commands), and so on. I also originally considered calling it /bignum and using %var.bn.
In the end, my preference was for simplicity - using %var.bf makes it easy/seamless to use, persistent/recognizable in different contexts, and backward compatible.
If your script happens to use a %var.bf already, for whatever purpose, it will be treated in the same way as usual - except that if you try to use for it number calculations, it will be able to calculate arbitrarily large numbers.
|
|
|
|
Joined: Dec 2002
Posts: 253
Fjord artisan
|
Fjord artisan
Joined: Dec 2002
Posts: 253 |
What about binary operations like $biton $bitoff $isbit $and $or $xor $not etc? With the current limitations in place mIRC (before big-math) only allows these to function up to 32bit, so in order to get up to possible 53bits for like an "array of booleans" you gotta do some math on it to split the number into the HI/lo, depending on operation, perform the old methods on BOTH(and or xor not) and re-assemble back to a single N. Same applies to biton/bitoff except depending on V2 (if <= 31 lo) you do the operation on the hi or the lo then re-assemble. For reference: here's my $bitwise53(A,N).[and/or/xor/not/biton/bitoff/isbit]
bitwise53 {
var %hi = 2147483648 , %a = $calc($1 // %hi) , %b = $calc($1 % %hi)
if ($prop == isbit) {
if ($2 <= 31) { return $isbit(%b,$2) }
else { return $isbit(%a,$calc($2 - 31)) }
}
elseif ($istok(biton bitoff,$prop,32)) {
if ($2 <= 31) {
if ($prop == biton) { var %b = $biton(%b,$2) }
else { var %b = $bitoff(%b,$2) }
}
else {
if ($prop == biton) { var %a = $biton(%a,$calc($2 - 31)) }
else { var %a = $bitoff(%a,$calc($2 - 31)) }
}
return $calc(%a * %hi + %b)
}
elseif ($istok(and or xor not,$prop,32)) {
var %c = $calc($2 // %hi) , %d = $calc($2 % %hi)
if ($prop == and) { return $calc($and(%a,%c) * %hi + $and(%b,%d)) }
if ($prop == or) { return $calc($or(%a,%c) * %hi + $or(%b,%d)) }
if ($prop == xor) { return $calc($xor(%a,%c) * %hi + $xor(%b,%d)) }
if ($prop == not) { return $calc($not(%a,%c) * %hi + $not(%b,%d)) }
}
}
It would be very handy to avoid the $calc() and the identifiers themselves support this natively up to the current 53bit precision, and possibly exceed the 53bit restriction if bigfloat has been enabled. I use this all the time, especially with dialogs with lots of checkboxes. I can store up to 53 checkbox states to restore on a dialog init in one integer, instead of some other thing like individual hash/ini/txt entries per checkbox with 0-1, or $true - $false. There's other very handy reasons but this is just probably where I apply it the most.
|
|
|
|
Joined: Dec 2002
Posts: 5,502
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 5,502 |
What about binary operations like $biton $bitoff $isbit $and $or $xor $not etc? Only scripting features that use double values for calculations have been updated to use big floats with MAPM. Bitwise operators do not belong in this category. Apart from that, in the case of bitwise operator identifiers, changing the number of bits they can handle would change their results, eg. $not(1) would give different results for 32bit vs 64bit, so extending them would very likely need new identfiers and/or parameters/properties. For abitrary precision, ie. unlimited bits, my guess is that we would need to add a bit size parameter to each command/identifier. In addition, MAPM does not currently support bitwise operations on artbitrary precision numbers. If someone here would like to look at the source code for MAPM and add support for bitwise operations, I'd be happy to include it.
|
|
|
|
Joined: Jan 2004
Posts: 2,127
Hoopy frood
|
Hoopy frood
Joined: Jan 2004
Posts: 2,127 |
Bigfloat feedback #1. Just mentioning now before it's too late to change to consider a change. //var -s %foo. [ $+ [ $me ] ] 1 As Wims did, I also thought that this could cause some scripts to behave differently based on someone's nick, but that can already happen for things in the [windows] section, such as getting a full screen window by being querying by the nick 'main', or having logging or font settings affected by being queried by nick 'wstatus'. My suggestion for a marker was something like %bf.var or %.bf.var since compound variables usually place the nick at the end, and having the marker at the beginning calls attention to the special nature of the variable when debugging. At first I thought maybe also having the marker be case-sensitive, but that would probably cause too much problems in a language where case-sensitivity is not the norm. That's been one of my biggest problems when I'd tried to code in 'C', where i'd have unknown variables because of not having the case be perfect. #2 -switch for /hdec /hinc + $hget(table,item).bf Unless it's considered redundant, another possible way for a flag is to give a new switch to the commands like /set /var /inc /dec /hinc /hdec While bigfloat can look at /set /var /inc /dec to see what kind of variable they use, it would be helpful for at least /hinc and /hdec to have a switch to put them into BF mode, because the workaround is to feed it a %var.bf name that needs to be either $null or 1 //hadd -m maroon foo $calc(2^53) | echo -a old: $hget(maroon,foo) | hinc maroon foo | echo -a new: $hget(maroon,foo) old: 9007199254740992 new: 9007199254740992 //hadd -m maroon foo $calc(2^53) | var %foo.bf 1 | echo -a old: $hget(maroon,foo) | hinc maroon foo %foo.bf | echo -a new: $hget(maroon,foo) old: 9007199254740992 new: 9007199254740993 //hadd -m maroon foo $calc(2^53) | echo -a old: $hget(maroon,foo) | hinc maroon foo %null_variable.bf | echo -a new: $hget(maroon,foo) old: 9007199254740992 new: 9007199254740993 The switches for these commands are all case-sensitive, so a capital letter like -M or -B wouldn't interfer with -m or -b, but to add distinction to it, perhaps it should avoid those letters and instead use -f or -F var -F %var 2 ^ 72 hinc -F maroon foo A -f switch would allow enable using a value from another hashtable item withou needing to either use the /bigfloat command or create a dummy %var.bf just to use as a stepping-stone. //hadd -mz maroon bar $calc(2^53+12345) | mhadd -z maroon foo $hget(maroon,bar) | timertest 44 1 echo -st result: $!hget(maroon,foo) Unless restricted to hashtable itemnames like whatever.bf (which /hinc /hdec don't seem to do) it would also be useful to have $hget(table,item).bf to indicate if the 'foo' item is incrementing or decrementing in BF mode, or whether it would get stuck due to loss of doubles precision. //hadd -m maroon foo $calc(2^53-9) | hinc -c maroon foo | timertest 44 1 echo -st result: $!hget(maroon,foo) Instead of returning $true, maybe it should return the switch letter, which would allow using the return string as part of the switches. #3 Yay, $round in bigfloat mode is much better at rounding '5' the correct direction, and my simplistic example on wikichip doesn't fail in bigfloat mode. //bigfloat on | var %i 1 , %list | while (%i isnum 1-100) { var %list %list $round(%i $+ .05 ,1) | inc %i } | echo -a round: %list #4 To answer your reply on Bitwise. Some of the bitwise operator could be supported as integers. $isbit $biton $bitoff $and() $xor() $or() $not() And only in some cases would there need to be a 'bits' parm to determine how to handle the number. In the following example cases where they cannot return an accurate result, the design decision can choose between returning zero or $null in the absence of parm3. The 'bits' parm3 can also be useful in doubles mode, where you want $and(-number,number) to be cast as 16 bits instead of the default 32 bits. Since BF mode would no longer have 2^32-1 as the range limit, there would also be a design decision of what to do for something like $not($calc(-2^71)). I'm guessing that the error condition for the invalid parameters mentioned below should either return '0' or $null or halt with an error. It might be desirable for the error condition to be something that's impossible to be confused for a valid return value. #4a: $isbit $biton $bitoff. They will be much faster than translating large integers to base 2 in order to perform string surgery on them. //bigfloat on | var %i 1 | while (%i isnum 1-64) { echo -a %i : $isbit(999999999999999999999999999999999,%i) | inc %i } $biton and $bitoff should not return a result when num1 is negative and the bits parm is't used, because of the abiguity in how to cast it as positive. As for $isbit(bignegative,bit-position), I don't see them requiring the 'bits' parm, except to limit the bit positions seen for a negative number. Without that bits parm, I *think* that since the 1-bits would extend infinitely to the left, the result would be '1' for all bit positions greater than 'something'. #4b: $not() Yes, this should require a 'bits' parm, because this 'flip all the bits', and need the 'bits' parm to know what 'all' means. #4c1: num1 & num2 both positive $and(num1,num2) $or(num1,num2) $xor(num1,num2) These 3 operators are perfectly safe to use when both num1 and num2 are positives. Any usage of the bits parm against positive inputs would be ambiguous. [/quote] #4c2: exactly 1 of num1/num2 is negative $and(num1,num2) $or(num1,num2) $xor(num1,num2) [/quote] I believe that $and and $or can both be used when only 1 of the 2 numbers is negative, by using the bit length of the positive number to determine which 2^n number should be added to the negative when casting it as positive. And from there, it could continue pretending that both parameters had been positive. However, $xor(-1,positive) would be ambiguous without the 'bits' parm. #4c3 both of num1/num2 are negative $and(neg1,neg2) $or(neg1,neg2) $xor(neg1,neg2) $and(negative,negative) and $or(negative,negative) cannot return a result without the 'bits' parm, because there's no positive number available to determine how to cast the negatives. However, $xor(negative,negative) can return a credible result without the bits parm, which would only serve as a bitmask to shrink the result. The infinite 1-bits to the left cancel each other out, and the biggest_negative can be used for casting both numbers to positive. - - #5 $calc ^ operator fails My tests so far mainly involve big integers. I was hoping for something that was exclusively big integer at arbitrarily large size, where fractions are ignored or invalid, but I know the floats will make a lot of people happy too. And double-yay, because so far I'm finding the + - * / % operators do seem to perform accurate results against those arbitrarily huge integers, but as you mentioned about slowness, the calculation times become exponentially slower in relation to the increase in the size of the numbers. And I suspect that much of this is due to supporting fractions in case these integers happen to generate one. However, while + - * / % seemed to retain accuracy when used against numbers several thousand bits in size, the ^ operator loses accuracy when the result is in the neighborhood of 2^103. //var -s %a $calc(2^52-1) , %b1.bf $calc(%a * %a) , %b2.bf $calc(%a ^2) * Set %a to 4503599627370495 * Set %b1.bf to 20282409603651661416747996545025 * Set %b2.bf to 20282409603651661416747996545020 The result ending with 025 is the correct answer. //var -s %a.bf $calc(2^103) result: 10141204801825835211973625643010 the correct last 3 digits should be 008 instead of 010, and as the 2^n gets larger, the number of trailing zeroes increases: //var -s %a.bf 2 ^ 120 result: 1329227995784915872903807060280000000 should be: 1329227995784915872903807060280344576 - - #6 $base() rounding/dropped digits This test of $base found 2 kinds of fails from hex numbers having at least 26 digits: dropped digits and lost precision. The //command below is set at 32 hex digits for 128 bits, and the test is to see if (hex -> integer -> hex) arrives back at the original hex number, where it can fail for either base conversion. When %hexdigits is 30 or greater I have yet to find it not fail from this test, but these are just random numbers within a huge range. I limit the output to avoid spamming myself. //var %i 10000 , %hexdigits 32 , %count 0 | while (%i) { var %string $regsubex($str(x,%hexdigits),/x/g,$base($r(0,15),10,16)) , %b $base(%string,16,10) , %c $base(%b,10,16,%hexdigits) | if (%string !== %c) { if (%count !> 9) echo 4 -a %string !== %c : b: %b | inc %count } | dec %i } | echo -a fail: %count However when %hexdigits is reduced to be around 27-29, the fail rate drops slightly, and ranges from 93% through 99%. From a glance they appeared to be caused by loss of precision among the least significant digits. From setting %hexdigits to 25 or lower, I've yet to find a random hex number giving an error, but when %hexdigits = 26 there are rare failures somewhere around 8 per 10,000 where they completely drop a digit. One example: //var -s %a CB28B03A8106893332FB39ABEC , %b $base(%a,16,10) , %c $base(%b,10,16) * Set %a to CB28B03A8106893332FB39ABEC * Set %b to 16095909438010123464851209169900 * Set %c to CB28B03A8106893332FB39ABF The %b result is correct, but %c drops a digit in the process of rounding. The commonality seems to be when the hex number is translated from a base10 integer whose last 2 digits are '00'. But it's not all such numbers, or else the error would have shown up 1% of the time. Now that I knew something to look for, I then went back to look at longer hex strings, and found both the rounding error across multiple digits, and the digit being dropped. This next example also compares against the $base2() demo previously posted on the forum: //var -s %a 7BA9BB2A5B63F8520160B5C82458 , %b1 $base(%a,16,10) , %b2 $base2(%a,16,10) , %c $base(%b1,10,16) * Set %a to 7BA9BB2A5B63F8520160B5C82458 * Set %b1 to 2508183865617366621511317042243000 * Set %b2 to 2508183865617366621511317042242648 * Set %c to 7BA9BB2A5B63F8520160B5C825E While the prior example had the correct answer when translating from hex to base10, this example has a larger hex number which reaches into the range where the %b1 translation to base10 has a rounding error while %b2 using $base2() is correct, though is slower. The output from the next command shows that the error in the lower bits is something other than adding extra '0' digits. //var %hexlen 25 | while (%hexlen isnum 25-40) { var %a.bf $left($sha1(abc),%hexlen) | var -s %hexlen %hexlen, %b1.bf $base(%a.bf,16,10) , %b2.bf $base2(%a.bf,16,10) | inc %hexlen } If $base() losing precision near 26+ hex digits is due to using an algorithm that uses the same thing used by the ^ operator, that can explain the loss of precision in these results, but may not explain the digit being dropped. I tried briefly to find a result where 2 digits are dropped, but couldn't. I also didn't try to see if a digit can get dropped when translating hex->base10. I think I can come up with something that is as fast as $base2() without using pow(). I'm not sure how my algorithm would translate to executable, because it would chop the number into separate pocket chunks instead of keeping it as a single string. But it wouldn't need to calculate 2^400 while translating a 512-bit number. I also have something that would repair the ^ operator, but it only works for non-negative integer exponents. -- Some combinations of in/out bases can have alternate calculations that would be much faster, and even faster without the need to support the entire base36 alphabet when translating from hex-to-base10, etc. Such as, translating from base16 by left-padding an odd-length with a '0' then storing the hex pairs as bytes inside the internal storage array. Then, each of those bytes are a 'pocket' where the 'carry' value isn't larger than 255. While the $base2() demo translated everything to base10 before translating to outbase, for executible code I imagine the efficiency would be translating inbase to the hex format of the internal storage, and from there translate to outbase. $bytes is another candidate for the larger BF range: //var -s %foo.bf 2 ^ 72 | echo -a $bytes(%foo.bf,b) I'm curious how the $rand result is obtained for a range size larger than 2^64? As I understand how the $rands SystemFunction036 works, gives you the number of random bytes requested, and the large range could be obtained by just requesting more bytes. But JSF64 returns just a single uint_64, so I wasn't sure if the BF results are obtained by appending several JSF outputs together until the underlying pseudo-random results is >= the requested range size. //var -s %256bitrand.bf $rand(0,115792089237316195423570985008687907853269984665640564039457584007913129639935) , %randhex $base(%256bitrand.bf,10,16,64) Or, is BF mode trying to use the Knuth LCG that's baked into the bigfloat package? If I understand how this LCG works, it uses a 49-bit prime 'm', and returns a number in the range [0,m-1] which matches the new internal state. It adjusts the returned number to the range by dividing by 'm' to get a fraction between 0-1, which can then be multiplied based on the requested range size. JSF64 has a 64*4=256-bit internal state, and regardless of the value of the prior 64-bit output, it's possible for every 64-bit value to be the next output, including repeating the prior value, and it's not trivial to determine what the next output is. On the other hand, the LCG returned output is 100% of the value of the internal RNG state, and each of the 49-bit RNG outputs can't be returned again until an interval of 2^49 outputs has passed where each of the other 49-bit numbers have their turn at being returned by the LCG. If 'm' is known, someone can request a range size matching 'm', which will return the value matching the entire internal state, which allows them to predict all the following 'random' number outputs except when something else interrupts to ask for its own random number. -- Whatever is happening with $rand in BF mode, it's causing the time for $rand(1,9) to increase 7x, which is something I'm not seeing for similar small range like $calc(2+2) being in/out of BF mode. //var %z 99999 , %i %z , %t $ticksqpc | while (%i) { !var %a.bx $rand(1,9) | !dec %i } | var %t1 $calc($ticksqpc - %t) , %i %z, %t $ticksqpc | while (%i) { !var %a.bf $rand(1,9) | !dec %i } | var %t2 $calc($ticksqpc - %t) | echo -a bf mode %t2 / doubles mode %t1 = $calc(%t2 / %t1) #9 $pi Returns the value of the mathematical constant pi to 20 decimal places. Nobody laugh, but in doubles mode the increased precision from $pi can affect the result, as in $calc(100000000*$pi). Since $pi in doubles mode had enough digits to make it less likely to be the source of a rounding error, it seems reasonable that - in BF mode where fractions are returned with 30 digits, that the 20 digit fraction of $pi should be lengthened to have more digits than that. In doubles mode, the fraction for $pi was 20/6 = 333% the length returned by $calc, so since $calc can return 30 digit fractions, then $pi keeping up with that would have 100 digits. And no I don't think the fraction needs to be that long, but should probably be at least slightly more than the 30 $calc can return.
|
|
|
|
Joined: Dec 2002
Posts: 253
Fjord artisan
|
Fjord artisan
Joined: Dec 2002
Posts: 253 |
ah I overlooked NOT.. my code is wrong... Here's V2 with a proposed solution Since these are strings taken to literal integer with some internal function I'd assume: atoi(String), inside these bitwise operations, mIRC itself doesn't know what "bit-depth" it is, and I'm guessing just assumes something like uint32? If you take the number and if it's > 0, then log2(N) + 1 is the index of the most significant bit (the leftmost bit that's on) log2(1) + 1 = 1 log2(2) + 1 = 2 log2(4) + 1 = 3 log2(8) + 1 = 4 etc... If the most significant bit is <= 32, do the normal operation else: perform the operation on the number at bit-depth: MostSignificantBit Note: mIRC does not have $log2() so to simulate this: it's $log(N) // $log(2) In the example identifier below, it does this. I had to fix and/or/xor/not based on the most significant bit of $1 or $2 (and/or/xor use $2, not doesn't) and/or/xor steps up to 64bit(53bit actual cause of $calc() limits) if $1 or $2 has a msb > 32 where $not() steps up to an arbitrary bit-depth Examples: 1 in Binary (32 bits): 00000000000000000000000000000001 //var %n = 1 , %msb = $iif(!%n,0,$calc($log(%n) // $log(2) + 1)) , %not = $bitwise53(%n).not | echo -a Not: %not Binary: $base(%not,10,2,%msb) Output: Not: 4294967294 Binary: 11111111111111111111111111111110 4294967296 in Binary (33 bits): 100000000000000000000000000000000 //var %n = 4294967296 , %msb = $iif(!%n,0,$calc($log(%n) // $log(2) + 1)) , %not = $bitwise53(%n).not | echo -a Not: %not Binary: $base(%not,10,2,%msb) Output: Not: 4294967295 Binary: 011111111111111111111111111111111
bitwise53 {
var %div = 2147483648 , %V1High = $calc($1 // %div) , %V1Low = $calc($1 % %div)
if ($prop == isbit) {
if ($2 <= 31) { return $isbit(%V1Low,$2) }
else { return $isbit(%V1High,$calc($2 - 31)) }
}
elseif ($istok(biton bitoff,$prop,32)) {
if ($2 <= 31) {
if ($prop == biton) { var %V1Low = $biton(%V1Low,$2) }
else { var %V1Low = $bitoff(%V1Low,$2) }
}
else {
if ($prop == biton) { var %V1High = $biton(%V1High,$calc($2 - 31)) }
else { var %V1High = $bitoff(%V1High,$calc($2 - 31)) }
}
return $calc(%V1High * %div + %V1Low)
}
elseif ($istok(and or xor not,$prop,32)) {
var %V2High = $calc($2 // %div) , %V2Low = $calc($2 % %div)
var %V1msb = $iif(!$1,0,$calc($log($1) // $log(2) + 1)) , %V2msb = $iif(!$2,0,$calc($log($2) // $log(2) + 1))
if ($prop == and) {
if (%v1msb > 32 || %v2msb > 32) { return $calc($and(%V1High,%V2High) * %div + $and(%V1Low,%V2Low)) }
return $and($1,$2)
}
if ($prop == or) {
if (%v1msb > 32 || %v2msb > 32) { return $calc($or(%V1High,%V2High) * %div + $or(%V1Low,%V2Low)) }
return $or($1,$2)
}
if ($prop == xor) {
if (%v1msb > 32 || %v2msb > 32) { return $calc($xor(%V1High,%V2High) * %div + $xor(%V1Low,%V2Low)) }
return $xor($1,$2)
}
if ($prop == not) {
if (%v1msb > 32) { return $base($regsubex($base($1,10,2,%v1msb),/([0-1])/g,$iif(\1,0,1)),2,10) }
return $not($1)
}
}
}
|
|
|
|
Joined: Jan 2004
Posts: 2,127
Hoopy frood
|
Hoopy frood
Joined: Jan 2004
Posts: 2,127 |
BF mode Feedback#2 In BF mode the $calc() operator ^ causes a GPF when the return value is too long. I'm hoping the fix is just to defend against long strings returned from the libary, and not that the library itself would need to be slowed down to defend against big internal values. I haven't found the problem in multiplying, where it changes the value to zero when it's too big. //var %a.bf 1 $+ $str(0,5168) | echo -a $len( $calc(%a.bf * %a.bf) ) -> 10337 ok //var %a.bf 1 $+ $str(0,5169) | echo -a $calc(%a.bf * %a.bf) -> 0 too-big=zero ok //var -s %a.bf $len($calc(10^10359)) ok //var -s %a.bf $len($calc(10^10360)) gpf -- Another example of dropping a digits, this time a whole lotta. The string should be longer in a smaller outbase //var -s %digits 60 | echo -a $bigfloat $len( $base(1 $+ $str(0,%digits),10,9) ) result: $false 48 ... and even for outbase 36 the number is way to short. -- Fix for the m_apm_integer_pow_nr function:
This looks like the function that the $calc operator ^ should be using under-the-hood when the exponent is an integer, and I hope the ^ and $base() problems are caused by using the other function designed to do roundings, and switching to this will fix it easily.
But this is separate from that issue. If the '^' operator wrapper is not doing it already, the integer_pow should have an additional early exit - for the base being 0 or 1. Since 0^positive=0 and 1^positive=1, the check can just return 'x' if 'x' is less than 2, and I assume it would look like so.
if (n < 2) {m_apm_copy(r, n ? x : MM_One); return;} if (x < 2) {m_apm_copy(r, x); return;} --- Looks like I spoke a little too soon about $round being golden. While $calc outputs a fraction up to length 30, it looks like $round can only output length 20. //var -s %a.bf $calc(0.123456789012345678941234567894) , %b.bf $round(%a.bf,30) * Set %a.bf to 0.123456789012345678941234567894 * Set %b.bf to 0.12345678901234567894 However, it does look like $round always rounds a 21st digit '5' up, even if there is nothing beyond that 21st digit. -- As far as the rounding done by $calc itself, the rounding is slightly different than before. The 1st example creates an internal 7-digit fraction ending with '5', and under the older doubles method, calc sometimes rounds the 7th digit 5 up or down, due to the known float rounding issues. //bigfloat off | var %i 1 , %list | while (%i isnum 1-100) { var %list %list $calc(%i + 25/10^7) | inc %i } | echo -a round: %list But under BF mode, it looks like it's always rounding a 31st digit 5 down. If changing the 25.0 to 26, it always rounds the 30th digit up. But 25.0 rounds down. However, that's the only case I've found $calc chopping off the 5, because if editing the 25.0 to 25.000000000000001 or even having a ridiculous number of zeroes in there, it sees the extra precision somehow and rounds it up. //bigfloat on | var %i 1 , %list | while (%i isnum 1-100) { var %list %list $calc(%i + 25.0/10^31) | inc %i } | echo -a round: %list So in conclusion, so far this is the only 'wrong' round I've found in BFcalc so far, and only seems to happen when the fraction is exactly 31 digits and ends with a 5. -- I'm wondering if the .deg prop for the trig functions is using a double for either the 360 or the pi? //echo -a %null.bf $bigfloat $cos(270).deg result: -0.000000000000000000003965074919 As for inputs 360 degrees or greater, when they're measured in radians it's probably best to leave them alone, but perhaps in .deg mode the large inputs should be mod-360? I can see scripts assuming 360 degrees gives the same answer as zero. //echo -a %null.bf $sin(360).deg $sin(3600).deg result: -0.000000000000000000005286766559 -- If there won't be something to override this next behavior, it's something scriptors will need to be aware of, as they may have unwanted results if they use a BF-mode identifier in the middle of an echo. //var -s %a.bf 0 | echo -a $calc(1/7) $bigfloat $calc(%a.bf) $bigfloat $calc(1/7) 0.142857 $false 0 $true 0.142857142857142857142857142857 And even replacing the $calc(%a.bf) with custom alias $justreturn$1(%a.bf) is enough to trigger flipping to BF mode. The only way I found to shield %a.bf from flipping into BF mode is $eval(%a.bf,0) and that's no help. Maybe a $bigfloatoff in the middle of a command line that flips the mode, or something somewhat like $shield( $return$1(%a.bf) ) that allows an identifier to use a variable without flipping the whole echo to BF mode. Even $unsafe( %null.bf ) flips to BF mode. //echo -a $bigfloat $unsafe(%null.bf) $bigfloat result: $false $true Wouldn't need a $bigfloaton identifier because $left(%null.bf,0) does that.
|
|
|
|
Joined: Dec 2002
Posts: 5,502
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 5,502 |
Thanks for the feedback. #1. Just mentioning now before it's too late to change to consider a change. I experimented with both %var.bf and %bf.var and prefered the current naming method with .bf at the end, which also works out nicely with %.bf. #2 -switch for /hdec /hinc + $hget(table,item).bf I experimented with different ways of supporting bigfloats. In the end, I chose to avoid making changes to commands, identifiers, and the scripting language in general. This means not adding switches, options, properties, and so on. Bigfloat should just work and, currently, you can make it work just by using /bigfloat [on|off] or by using a %var.bf. #3 Yay, $round in bigfloat mode is much better at rounding '5' the correct direction, and my simplistic example on wikichip doesn't fail in bigfloat mode. I honestly have no idea whether this is the correct behaviour. What you are seeing is due to the way MAPM handles precision and rounding. Specifically, in this case, it is due to modf(), used in the rounding function to get the fraction. The Visual C++ modf(), TTMath modf(), MAPM modf(), and likely modf() in other libraries, all return slightly different fractions. The number 0.05 is being returned as: Visual C++: 0.049999999999999822000000000000 TTMath: 0.049999999999999996000000000000 MAPM: 0.050000000000000003000000000000 IEEE 754 Floating Point ( https://baseconvert.com/ieee-754-floating-point): 32bit: 0.0500000007450580596923828125 64bbit: 0.05000000000000000277555756156289135105907917022705078125 If I decide to switch to a different bigfloat library at some point, the results will likely be different. I do not plan to add support for this unless someone adds it to the MAPM library. #5 $calc ^ operator fails It looks like a bug in the MAPM pow() function. It was not calculating the required amount of precision correctly. I have changed it and it looks like it is working as expected now. #6 $base() rounding/dropped digits This again looks like a precision issue in the MAPM library. It was using one less digit of precision than necessary in some contexts. Fixing this seems to resolve this issue and hopefully other issues. I do not plan to update commands/identifiers that are not directly involved in calculations. #8 $rand $rands. But JSF64 returns just a single uint_64, so I wasn't sure if the BF results are obtained by appending several JSF outputs together until the underlying pseudo-random results is >= the requested range size. Yes, this is how I implemented it. Whatever is happening with $rand in BF mode, it's causing the time for $rand(1,9) to increase 7x Ha! I was trying to be smarter than I am and was using a convoluted method for determining the required bits based on the range size. Fixed for the next beta. #9 $pi Returns the value of the mathematical constant pi to 20 decimal places. I have changed $pi so that, in bigfloat mode, it includes 50 decimal places. In BF mode the $calc() operator ^ causes a GPF when the return value is too long. Fixed next beta. Looks like I spoke a little too soon about $round being golden. While $calc outputs a fraction up to length 30, it looks like $round can only output length 20. That's a maximum length I left in the $round() identifier by mistake. Fixed next beta. Please hold off making new feature suggestions for now. I would really like to make sure this feature is working correctly before making any more changes. Please also hold off any more bug reports until the next beta, as it includes quite a few fixes.
Last edited by Khaled; 25/10/22 04:21 PM.
|
|
|
|
Joined: Dec 2002
Posts: 5,502
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 5,502 |
The latest beta includes the changes mentioned in my above post. I had to make small changes to the MAPM 5.1 library code where it was not using enough precision when calculating results. I tested the original MAPM 4.95a library and it had some of same issues. Hopefully my changes to MAPM 5.1 resolve these.
|
|
|
|
Joined: Jan 2004
Posts: 2,127
Hoopy frood
|
Hoopy frood
Joined: Jan 2004
Posts: 2,127 |
BF Feedback on 2nd beta
Looks like the issue of dropping a digit when translating certain numbers to %hexlen >= 26 is fixed, but the hex result are still 93% inaccurate and higher in that example for %hexlen 27 and greater.
And, the examples are still wrong for 2^103 2^120 and longer.
I think the issue is like I mentioned earlier, where they have a m_apm_integer_pow and m_apm_integer_pow_nr, and the m_apm_integer_pow_nr (not rounding) might be slower, but it doesn't return inaccurate results which appear somewhere around 2^103. I don't understand what m_apm_integer_pow is doing, but I see the subroutine using m_apm_iround which indicates that it plans to be inaccurate at larger, and this is probably affecting $base() since the %hexlen 27 errors are creeping in at close to the range where ^ fails.
If there's a big speed in difference, then perhaps an early check of $log(base^exponent) to see which algorithm should be used if exponent is an integer and it's it's not obvious from the base/exponent combo.
--
The M_pow_mod function for modular exponentiation that I was asking if could be made an identifier, but was mentioned only in function.ref - somehow I missed them having a slightly different spelling in filename mapm_pwr3.c as m_apm_powmod.
However their m_apm_powmod appears to be not what I'm looking for, because it looks to be of the rounding variety, and would probably not return correct results eventually, as I see it also using m_apm_iround same as m_apm_integer_pow does, and the comment inside it makes it look like it's intentionally supporting fractions.
--
However, when using the demo bf_modpow() alias I posted which mimics my proposed recreation of the assumed missing mod_pow function by modifying m_apm_integer_pow_nr, and use it with an absurdly huge modulus that's intended to not affect the 2^120 result, it does produce the correct result though it wastes a lot of time doing modular divisions that aren't needed:
//var -s %good.bf $bf_modpow(2,120,$str(9,120)) , %nope.bf $calc(2^120)
* Set %good.bf to 1329227995784915872903807060280344576 * Set %nope.bf to 1329227995784915872903807060280000000
If their rounding-modpow is close enough in behavior to their rounding-integer-pow, it probably still returns inaccurate for close to roundedmodpow(2,120,number larger than 2^120), but is instead designed to not bomb out at roundedmodpow(10,10360,modulus)
--
$rand(1,9) is down to 190% time factor in BF mode, so that looks like the kind of overhead seen from going into BF mode, as I see it from other small numbers like $base(100,10,36), so scripts may need to be judicious rather than always using /bigfloat on all the time.
|
|
|
|
Joined: Dec 2002
Posts: 5,502
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 5,502 |
Looks like the issue of dropping a digit when translating certain numbers to %hexlen >= 26 is fixed, but the hex result are still 93% inaccurate and higher in that example for %hexlen 27 and greater. Can you provide an example that does not use $base2() or any other custom identifiers? I need to test this with built-in identifiers. $rand(1,9) is down to 190% time factor in BF mode, so that looks like the kind of overhead seen from going into BF mode, as I see it from other small numbers like $base(100,10,36), so scripts may need to be judicious rather than always using /bigfloat on all the time. Yes, using any form of big num or float is always going to be slower. MAPM seems to be quite fast compared to the other libraries I tested, so if I decide to switch to a different bigfloat library at some point, bigfloat may end up being even slower.
|
|
|
|
Joined: Jan 2004
Posts: 2,127
Hoopy frood
|
Hoopy frood
Joined: Jan 2004
Posts: 2,127 |
The ^ is the only broken operator i've found, so can be built by multiplying smaller numbers against themselves: //var -s %a.bf 2 ^ 52 , %correct.bf %a.bf * %a.bf , %wronggg.bf %a.bf ^ 2
* Set %a.bf to 4503599627370496
* Set %correct.bf to 20282409603651670423947251286016
* Set %wronggg.bf to 20282409603651670423947251286020
|
|
|
|
Joined: Dec 2002
Posts: 5,502
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 5,502 |
Okay, if I change m_apm_integer_pow() to use m_apm_iround(r, -1), which disables rounding, this shows matching results for $base() and $base2().
The m_apm_iround(r, places + X) function is used throughout MAPM with extra X, presumably to extend the rounding precision after a calculation to cater for a larger value result. The X value varies throughout the code.
There is a m_apm_cpp_precision() that allows you to set the minimum precision for the entire library. If you set it too low, it is overriden by the maximum number of digits in the calculation parameters to ensure sufficient precision.
For now, I am going to call m_apm_cpp_precision() with a very large value to effectively set unlimited precision. This affects all calls to the MAPM library. This will be in the next beta.
Update: Right. I see what is happening. In m_apm_integer_pow(), it calls M_integer_pos_pow() and sets r with the result and passes it to m_apm_iround(r, places) to round. However, the "places" value is based on the value of r prior to the power being calculated, so it is not of sufficient precision for the result. m_apm_iround() is used in this way in many functions, where the result is rounded to the largest decimal place in the input parameter, leading to precision loss (although in many places, the decimal place length is extended by varying X amounts but this is not consistent). In any case, at this point, the best option seems to be to bypass all of this and just call m_apm_cpp_precision() with a large number.
Update: Aaand... that was a bad idea :-) m_apm_cpp_precision() with larger values slows down MAPM significantly since it isn't just used as a rounding value but as actual calculated precision length. So, instead, I am going to fix the ^ issue by just using m_apm_iround(r, -1) in the m_apm_integer_pow() function. If any other functions show similar precision issues to ^, I will change them individually as necessary.
Last edited by Khaled; 27/10/22 10:30 AM.
|
|
|
|
Joined: Jan 2004
Posts: 2,127
Hoopy frood
|
Hoopy frood
Joined: Jan 2004
Posts: 2,127 |
Bigfloat feedback on 3rd beta. Your list of identifiers affected by BF mode included $min $max, and since they behave like the various modes of $sorttok I assume that is also counted. I do see an example where .bf mode does not make $max $min $sorttok be correc at the border of doubles space due to evaluating $calc(2^53+1) same as 2^53: //var -s %smallr 9007199254740992 , %larger 9007199254740993, %a.bf sort lo-to-hi: $sorttok(%larger %smallr ,n,32) vs max: $max( %smallr %larger) vs min: $min( %larger %smallr ) * Set %a.bf to sort lo-to-hi: 9007199254740993 9007199254740992 vs max: 9007199254740992 vs min: 9007199254740993 --- Observation of the precision of $log2 for 2^n integers. I saw $log2(using 2^n) mentioned in Talon's bitwise post, and it looks the current level of precision means he will specifically need to ensure using it in doubles mode, unless that creates rounding issues for other integers he'd be using. //var %i.bf 2 , %j %i.bf | while (%i.bf isnum 0-100) { echo -a %j $log2(%j) : $log2(%i.bf) | var %i.bf %i.bf * 2 , %j %i.bf } 2 1 : 0.99999999999999999489848895823 4 2 : 1.999999999999999989796977916459 8 3 : 2.999999999999999984695466874691 16 4 : 3.99999999999999997959395583292 32 5 : 4.99999999999999997449244479115 64 6 : 5.99999999999999996939093374938 This is a case where my suggested digits parameter for $calc and $log could help obtain good results without needing to wrap $round() around the output -- 2 types of BF mode $powmod() gpf crashes The 1st type doesn't crash when in doubles mode, but in .bf mode it becomes non-responsive when there's a fraction in either the base or modulus: //var -s %gpf_crash.bf $powmod(94906267.1,94906268,94906270.1) In both .bf mode and doubles mode I also get a 2nd type of crash if allowing the exponent to be negative: //var -s %gpf_crash.bx $powmod(4,-5,21) Negatives for the base and modulus are not returning the correct values because they are being handled by other (value < 2) code intended for values 0 or 1, but that's my fault for assuming that negatives would not be fed here, and I describe the solution later below. - - //bigfloat off | echo -a $powmod(3,54,77) $calc(3^54 % 77) : $powmod(3.1,54,77) $calc(3.1^54 % 77) : $powmod(3.1,54,77.1) $calc(3.1^54 % 77.1) : $powmod(3.1,54,77.1) $calc(3.1^54 % 77.1) 15 66 : 40.79617 8 : 62.729731 61.683662 : 62.729731 61.683662 From these results in doubles mode, it appears that fractions for the base or modulus can be valid to obtain accurate results by evading the doubles limit, because all the base is used for is to initialize one of the loop variables being multiplied by, and the modulus is just the divisor for the % operation that $calc uses safely every day. So a fraction for either the base or modulus shouldn't be crashing it in .bf mode unless there's something in one of the MAPM functions being used that's designed to only work with integers, and throws a fit when given a float. Perhaps I just didn't wait enough time for the non-responsive mIRC to finish whatever it's doing. However, the story is completely different for why $powmod must validate that the exponent does not have a fraction, and 5.1 would have given $powmod the same result: //echo -a $calc(2 ^ 4.1 % 99999) vs $powmod(2,4.1,99999) 17.148375 vs 32 But in .bf mode the issue is worse, because of how the design accesses the next bit of the exponent by assuming that (exponent //2) just divides by 2 while discarding the previously-the-lowest bit. The algorithm is not designed to handle a float as the exponent the same way the $calc(2^3.5) is the same as 8 * sqrt(5). The only reason I could think for the fractional exponent to trigger a crash in .bf is if the (exponent //2) or (exponent %2) was using a function designed for integer-only, or else if the result was keeping the fractional result by doing /2 instead of //2, and the extreme precision means that "while (exponent)" can never be false by never reaching zero no matter how many times you divide by 2. Since the algorithm isn't designed to handle exponent fractions, that should either return=0 or syntax error. While it's possible for this algorithm to work correctly where base or modulus is a fraction, that would just slow it down for the true integer purpose if it needs to use different functions that can play nice with floats. - - The other issue to deal with in doubles mode: the $powmod algorithm works by squaring a number that for integer calculation can be as large as (modulus-1)^squared, so you can lose precision in doubles mode for results involving a modulus that's only slightly greater than $sqrt(2^53): //var -s %mod $calc( (2^53^0.5 + 5) // 1) , %base %mod - 3 , %exp 3 | echo -a doubles mode wrong $powmod(%base,%exp,%mod) vs %null.bf bf mode good $powmod(%base,%exp,%mod) * Set %mod to 94906270 * Set %base to 94906267 * Set %exp to 3 doubles mode wrong 94906246 vs bf mode good 94906243 So, if this is going to be used in doubles mode, the user needs to be warned that using this in doubles mode can easily return bad results when the modulus exceeds sqrt(2^53), such as (odd,anything,even) returning an impossible even result $powmod(3,46,$calc(2^32)). * * It's my fault that $powmod() is not returning the correct results for negative parameters, because I saw: // Assumed x, n, m all integers, n >= 0 ... and I read as if it meant that all negative inputs had been filtered out before calling the function. However they're not, so I need to update the validation portion of the bf_modpow script to allow it to handle negatives correctly. The fix for $powmod includes rejecting inputs that are not integers, which I did have as a simple regex trying to mimc that, which also blocked fractions which was my main intent. These next rules assume the inputs are integers regardless if positive or zero or negative, so they do things like reject a technically valid modulus of 1.9 because it's less than 2. For reasons I explained earlier, I prefer to not have the .bf mode permit fractions for any parameters, but in both cases never permit the exponent to have a fraction. So this is an updated validation and changing of parameters, because the handling of negatives is interleaved with handling =1 =0. I used $regsubex as a 'safe' method of $abs() that's guaranteed to not round any integers. I include an explanation prior to each step, and after each validation is a comment showing a summary of the suriviving range for each of the 3 input integers:
var %base.bf $1 , %exponent.bf $2 , %modulus.bf $3
var %answer.bf 1
if (!$regex($1-3,^-?\d+ -?\d+ -?\d+$)) return $calc(0/0)
;now: (base=any integer, exp=any integer, mod=any integer) result always [0,modulus-1]
; modulus = abs(modulus)
if (-* iswm %modulus.bf) var %modulus.bf $regsubex(foo,$v2,^(-*)(.*)$,\2)
;now: (base=anything, exp=anything, mod >= 0)
; (any % 1 = 0); (any % 0 = divide by zero)
; if (%modulus.bf == 0) return $calc(0/0)
if (%modulus.bf < 2) return 0
;now: (base=anything, exp=anything, mod >= 2)
; if (base < 0) base = modulus - (abs(base) % modulus)
if (-* iswm %base.bf) var %base.bf $calc(%modulus.bf - $regsubex(foo,$v2,^(-*)(.*)$,\2) )
; aka -N % mod -> [0,mod-1] -> (-N+mod) % mod
;now: (base >= 0, exp=anything, mod >= 2)
if (%exponent.bf < 2) {
; any^0 = 1
if (%exponent.bf == 0) return 1
; base is positive, so for base^1 return [0,mod-1] as (base % modulus)
if (%exponent.bf == 1) return $calc( %base.bf % %modulus.bf)
; remaining case here is (exp < 0)
if (%exponent.bf < 0) {
; powmod(+base,-exp, +mod) -> powmod( mod_inverse(+base,modulus) , +exp, +mod)
; exp = abs(exp)
var %exponent.bf $regsubex(foo,%exponent.bf,^(-*)(.*)$,\2)
; base = inverse(+base,modulus)
var %base.bf $bf_ModInverse( %base.bf , %modulus.bf )
; 6^-2 mod 15 -> 6^(-1*2) mod 15 -> 6^(-1)^2 mod 15 -> (1/6)^2 mod 15 -> modinv(6,15)^2 mod 15 -> 3^2 mod 15 = 9
; if (%base.bf == 0) return $calc(0/0)
; assumes $ModInverse() return value = 0 when there is no inverse ie (3,-anything,21)
}
}
;now: (base >= 0, exp >= 2, mod >= 2)
if (%base.bf < 2) {
; 0^any = 0; 1^any = 1; i.e. return val matches base
return %base.bf
}
;now: (base >= 2, exp >= 2, mod >= 2)
the rest of the alias goes here
Notes: The alias coded as $powmod() includes (%base.bf * %base.bf) that was a replacement for (%base.bf * 2) because the $calc(base^2) resi;t still needs repair. If m_apm_square is pure integer and doesn't involve any float calculations like the other function that uses exponents other than 2, it should be safe to use. That command is called 'n' times for an 'n' bit modulus, so there might be a detectable speed change if _square is a lot faster than _multiply. The above references a modular multiplicative inverse that I haven't mentioned yet, so (exp < 0) can return the $calc(0/0) condition if that feature will not be integrated into this function. In the above code lines are references to $calc(0/0). These are error conditions where the design decision would be made whether errors return an identifier syntax message, or returns 0, or returns some other invalid value outside the range of [0,modulus-1], such as -1. If any of these lines are commented, they only need to be uncommented if the design is changed to make modulus=0 or similar cases return the error condition instead of returning 0. Just like $calc(0/0) or $calc(0 % 0) are identical to legitimate return values, this identifier also has legit cases to return 0, such as for powmod(same,positive,same) or powmod(zero,positive,positive) The $calc ^ operator is doing better, and now the 2^103 and 2^120 examples work. But it looks like that was fixed by putting more precision into a float calculation, and the effect still appears at lengths slightly more than double that. Both powmod results here are correct, but $calc ^ now finally begins to drift at 2^259: //.bigfloat on | var -s %exp 258 | while (%exp isnum 258-259) { echo -a exp %exp powmod: $powmod(2,%exp,1 $+ $str(0,%exp)) | echo -a exp %exp calc::: $calc(2^ %exp) | inc %exp }
exp 258 powmod: 463168356949264781694283940034751631413079938662562256157830336031652518559744
exp 258 calc::: 463168356949264781694283940034751631413079938662562256157830336031652518559744
exp 259 powmod: 926336713898529563388567880069503262826159877325124512315660672063305037119488
exp 259 calc::: 926336713898529563388567880069503262823437618389757004607953675203850891427840
And it does appear that the issue with ^ is what affects the drift in the $base output, because each time the accuracy for $calc ^ changes, so does the size of the input number where $base begins to go astray. //var -s %hex $left($sha512(test),66) , %good $qbase(%hex,16,36) , %nope $base(%hex,16,36) * Set %hex to ee26b0dd4af7e749aa1a8ee3c10ae9923f618980772e473f8819a5d4940e0db27a * Set %good to 167J1NURV0796I7EZBYO6RICQ7CELO8SQUCLOZCHHW1WI82RAFZU * Set %nope to 167J1NURV0796I7EZBYO6RICQ7ZZBT07UT1HLTVZET7PY7I7XGU2 * * Possible partial fix for $calc ^ Since it appears that $powmod works fine for positive integers even when using a modulus of 8192 bits which allows an internal result to be double that bit length, it seems that a scripted variant of powmod which does not use a modulus would solve the rounding errors above 2^258 at least for the subset of $calc(integer^(integer >= 0)), as long as the timing is not too awful, which I'm hoping should not be the case. This wouldn't be a slow thing like $powmod working against a 4096-bit modulus, it would be working against an exponent of literally 4096, which is a *12* bit number. Even at $maxlenl that would only be a 14-bit number for a max of 14 loops. That means there would rarely ever be even a dozen loops, and each loop would be faster because there would not be any modular division against anything. The only housekeeping duties that this would face that $powmod does not is caused by the fact that the modulus in $powmod does throttle the %answer.bf from ever being more than double the bit length of of %modulus.bf, but %calc_^_operator would need something else to make sure that its own storage can't get crashed. It wouldn't need to count bits every time until the number got large enough into the worry zone. Because %answer and %base are affected only by multiplying against %base, each 'round' can only increase the variable length by the round-entry bit length of 'base', so it should be able to estimate when/if it runs out of trouble. For example, $calc(3^31). An approximation of the bit length is: exponent * log(base) / log(2) aka exponent * log2(base) result: 49.133853 The $ceil(result) indicates that storage of %answer would require 50 bits of storage. However, since %base doubles each 'round', the bitlength for %base would keep double. Also, your mileage may vary with a log fraction that's a little too close for comfort to the underside of an integer. I've modified the $powmod loop for a small optimization which probably benefits $powmod too. The optimization allows skipping the final squaring of %base as being un-necessary due to being the last command before falling out of the loop. It speeds up the last of 'n' loops, so it has a greater benefit for $calc ^ by avoiding 1 of 9 loops than it gives for powmod if it's optimizing 1 of 4096 loops, but every little bit helps. Plus, more importantly for the ^ operator, it chops in half the needed storage space for the %base.bf variable that's being doubled each loop. old: var %base.bf $calc(%base.bf * %base.bf) var %exponent.bf $calc(%exponent.bf //2) new: var %exponent.bf $calc(%exponent.bf //2) if (%exponent) var %base.bf $calc(%base.bf * %base.bf) The speed optimization depends on whether it's faster to check (bitlength of exponent) times whether a value is zero, or whether the library has an effective way to just go ahead and square the variable 'n' times instead of 'n-1' times and swallow the extra storage. For $powmod I'm pretty sure it's worth the trouble to save even 1 squaring in 1-of-many loops. And also, the powmod alias had avoided squaring like base^2 because it's not the size of the exponent that was causing the error, but the result. So if the underlying $powmod code is using m_apm_multiply to multiply %base.bf by itself, it's probably faster to use m_apm_square to square it, and it's likely to be just as reliable-ish for $calc ^ as m_apm_multiply But the way to estimate the storage needed for %base is to determine that the final value of base after n-1 loops is base^(2*(n)). 17 requires 5 bits to represent it, so that means it would require 5 loops. So the new 'fake' exponent is 2*5=10 So the storage bits for base 3 would be: 10 * log2(3) = 15.84963 If this result is an exact multiple of the storage space unit, it would need to increase + 1 unit for the same reason that 2^32 doesn't fit in a 32-bit variable. I think but am not certain that there shouldn't be any combos for base/exponent where the answer is shorter than the scratchpad value for base, but if %answer is always the larger number, then that's the only one whose length needs checking. To summarize, this is only a partial potential solution for cases where the ^ operator sees both parameters as integers where the exponent is not negative. The case for either of them being 0 or 1 is easy to make do an early exit, and the alias has code for that. The code also handles negative bases by checking if the exponent is odd/even. For the other cases where either the exponent or the base has a fraction, or the exponent is negative, I think it's good enough to let the existing routine handle things, since thouse are going to be a float fraction that's either being handled correctly now, or the 'correct' answer would be some kind of number/big that's beyond the MAPM precision. i.e. $calc(2^-anything) will be a fraction that is already within precision i.e. $calc(16^(0.25)) is already handled correctly, and otherwise fractions will produce a float. It's a little difficult to test fully by comparing against $powmod() results, because of the need to put in a really long fake modulus to make sure parm3 doesn't alter the result, plus the strings get hard to compare. But this example compares a large result from the alias against a large result from $powmod, and either they match or I'm famous for finding a hash collision. //var -s %a.bf $calc_^_operator(3,8192) , %b.bf $powmod(3,8192,$str(3,8192)) | echo -a $sha256(%a.bf) vs $sha256(%b.bf)
; $calc_^_operator(base,exponent)
alias calc_^_operator {
var %base.bf $1 , %exponent.bf $2 , %answer.bf 1 , %minus $null
if (%exponent.bf < 2) {
if (%exponent.bf == 0) return 1 | if (%exponent.bf == 1) return %base.bf
return should not call this with negative exponent
}
if (%base.bf < 2) {
if (%base.bf isnum 0-1) return $v1
; only condition now is base < 0
if (%base.bf < 0) var %minus - , %base.bf $mid(%base.bf,2)
if (!$isbit(%exponent.bf,1)) var %minus $null
}
while (%exponent.bf) {
if ($calc(%exponent.bf % 2)) var %answer.bf $calc(%answer.bf * %base.bf)
var %exponent.bf $calc(%exponent.bf //2)
; the non-exponential _square function 'should' be safe
if (%exponent.bf) var %base.bf $calc(%base.bf * %base.bf)
}
return %minus $+ %answer.bf
}
* * In addition to $calc(2^n) returning a string ending with all zeroes at long lengths, I also find that including a -1 subtraction can change a huge result into zero. The following test returns zero, but removing the -1 returns a string ending with a large number of zeroes. I find changing the 19937 to 17174 is the largest 2^integer where this example does not change the large number into zero. //var -s %mersenne_twister.bf $calc(2^19937 -1) Same thing happens for + operator where this returns 0 instead of a large number ending with 1. This is just the obvious case, and I don't know whether there are lower numbers that have the behavior too. //var -s %a.bf $+(1,$str(0,5170)) + 1 * * In looking at m_apm_powmod I see some of what they're doing differently than my bf_modpow alias does, and I think I start to finally understand why they're messing around with fractions when all terms are required to be integer. And, I suspect that if $powmod had been a wrapper around m_apm_powmod instead of separate C code which mimics my bf_modpow script, that it would probably eventually show the same kind of rounding going on in these other areas. Instead of using something like m_apm_integer_div_rem to get the modulo remainder after division by the large integer 'm', m_apm_powmod instead pre-calculates 1/m as a really long fraction, and then multiplies each loop product by that really long 1/m fraction instead of just dividing by 'm' to get the actual remainder. And the iround is part of several functions whose job apparently is to massage the resulting float back into an integer that the code admits 'approximates' being modulo m. * * So, it looks as if any function using math that involves this kind of reciprocal shenanigans is going to eventually start showing warts like $base and ^ do, and the only solutions seem to be: * make these kind of reciprocal fractions so absurdly long that they slow everything down, which nobody wants * have a separate tier for big integer calculations that doesn't touch fractions at all. And I'm seeing a little bit of that Option#1 slowdown in the 2nd and 3rd betas. For my example editbox command which called my bf_modpow alias using a 2048-bit number, my time was 4.3 seconds in the 1275 beta, and when you gave your time as 7.4 seconds, I thought that simply meant you had a slower computer, and that my time for the $powmod() identifier would continue to be faster than yours. However for the 2nd and 3rd beta, my time for that 2048 command goes up to 7.1 seconds, so that means my computer is a similar speed to your slow one. And sure enough, when I tested the same thing with the built-in $powmod identifier, instead of the 4.3*(2.9/7.4) = 1.7 seconds that I was expecting for 2048, I'm getting 2.8 secs which is just slightly shorter than your 2.9 I'm guessing that the precision setting that you increased in order to obtain extra accuracy for $calc(^) is affecting the speed of the integer calculations used by $powmod and probably used by a lot of other things. I later found a glitch in the 1275 beta that I couldn't find in the next 2 betas where the % operator in .bf mode showed behavior as if it wasn't actually doing integer modulo either, but looks like it was instead making a long fraction out of (1/divisor) and then multiplying using that, which means increasing the precision for something like that would take longer to create the 1/reciprocal and also take longer to multiply with it. Without benchmarking it, it seems that the shortcut for creating a long fraction reciprocal then multiplying by it, in order to create a modulo remainder, is a shortcut only for cases where it needs to iterate repeatedly using that same modulus, such as what MAPM was doing in their version of powmod. However for the one-time-shot usage of the % operator it's a 'long cut' not a 'short cut', because it avoids dividing by the divisor - by doing divisor division to create the fractional reciprocal and then multiplying by that long fraction, followed by all the iround and related adjustments. There might be cases where the function could cheat by avoiding the creation of the reciprocal by peeking to see if it's inverting the same number that was most recently inverted. * * From observation, I gathered that there's 2 types of interest for numbers that can't easily come from the doubles range. On the one side, there's people wanting longer fractions from relatively small numbers coming from $sqrt $log $cos $calc(number1/number2) etc. Then on the other side there's interest in accurate calculations involving big integers, where any support for fractions is of no interest, and is viewed as something which instead introduces slower actions and the potential for rounding errors in the results. For example, while there have been lots of other threads from people wanting longer accurate fractions, the top of this thread is someone who was wanting accuracy when $base translates from hex to base36, and they were using a large integer, not a large number having a 30-digit fraction attached to it. But rarely is there much demand for having both at the same time, where they would want an accurate representation of a number having many digits on both sides of the decimal. A solution that can give everyone the kind of accuracy they need, while speeding things up for everyone, is to have * doubles mode, range 2^53 with 6 digit fractions * bigfloat mode with 30 digit fractions and no promises about how super huge the mantissa + biginteger mode with no fraction at all, but with accurate results for big integers, and there's no such thing as a mantissa when there's no fraction. As an analogy for trying to obtain big integer results and big float results from the same library, it would be like a vehicle whose main goal is to carry heavy weights up a steep hill, so it was designed to have only a 'low gear'. And there's another vehicle whose main goal is as a racing car on a level grade. While the racing car can't be used for carrying the heavy weights, the 1st vehicle could be made to work as a race car, but only by accelerating the engine too fast. Using a big integer library for the integer results would allow identifiers to perform the racing car functions without changing the moving vehicle in ways that causes problems. By separating big integer from big float, I believe this would allow a significant reduction in the precision for that reciprocal and other fractions heavily used by the trigs, logs, and other floating identifiers, while still retaining accuracy for the 30 digit fraction. Hopefully this could be a drop-in precision that would restore faster speed and accuracy to integer exponents and other integer calculation, without having affecting the speed and return values on the $cos $sqrt etc fractions where games want to do that stuff at many frames per second. And on the other side, if the integer-only calculations were using completely different functions coming from a big _integer_ library instead of being an appendage to a big _float_ library, the calculations would be much faster coming from the big integer library, and it could be a win/win all around. I'm not sure if this would need there be a separate /biginteger ON mode alongside /bigfloat ON, because I think it might just be sufficient for some identifiers to be always in big-integer mode, and for others to only key off the name of the %variable itself. If variable names would be a flag for biginteger mode like .bf is for bigfloat, it could be something like %var.bi etc. From looking at your list of identifiers that enabled support for bigfloat mode, most of those should be basically ignored in .bi mode, or else they may not even belong in .bf mode once .bi exists. A big integer library would have a lot of things that could be efficiently done without trying to make them work inside a floating point library. For example, the bitwise operators that Talon wanted aren't in MAPM because it doesn't really make sense to look at bit positions of a fraction that's stored in a mystery format. But bitwise functions are one of the key building blocks for a big integer library. * * Unless there's a better package out there, the easiest solution could be to use OpenSSL as a BigInteger library, considering how it is already used by many functions inside mIRC, and some of the relevant functions are probably imported already. For things that use only integers, OpenSSL can be dozens of times faster, because its functions are almost exclusively integer only, fractions need not apply. As far as speed, OpenSSL seems pretty fast compared to other things that can do .bi style of math. As an example of how much speed can be obtained by having a big integer library do big integer things while a float library do float things, I reference the demo alias bf_modpow where the editbox command uses 2048-bit integers, where substituting the new $powmod() reduces time from 7.4 seconds down to 2.9 seconds. Each time mIRC connnects to a server using an RSA-4096 SASL certificate, it does a shortcut that involves doing a pair of $powmod using numbers of this 2048 length, which substitutes a pair of 2.9 sec calculations instead of a single 23 secs 4096 calculation. OpenSSL is doing a pair of those similar 2048 calculation using a subroutine that uses only integers, and it's able to do it much quicker than a floating precision library could possibly do. I've even used an RSA-8192 as an SASL cert, which does a pair of 23-second 4096's instead of a 3-minutes 8192 calculations each time it does a certificate handshake, I have tried looking for it, but never detected any noticeable pause while it did the handshake. The timing can only be measured from the side using the private key, because the server side just uses the small public exponent that's much quicker. * * In my separate post in the $base2 thread, I give examples of how $base can benefit from several speed optimizations if it can be in a mode which rejects input in an 'inbase' which contains digits that can't exist when 'inbase' is used as an outbase. If $base still intends to support long fractions, that other post also shows a way that fractions can have better speed and precision by processing as if a 2nd 'fake' integer, which enables $base to pretend it doesn't know what a fraction is, but hey there's these 2 integers that I attach to this decimal here. Without being required to support $base(ZZ,10,16) or fractions in .bi mode, there are quite a few inbase/outbase pairs that can make $base be much quicker, including being able to do string manipulation when people simply want to zero-pad their number to the same length by doing $base(number,same,same,10). Also, since internal variables are probably uint32, there should be quick efficiencies when 16 is either the inbase or outbase. Something for down the road when $base gets better support for long strings is the ability to pad to longer string lengths. $base is still at the doubles default of treating the $4 padding as having a max of 100, which for a hex string is 400 bits. At the current level of 2^258 accuracy, that number would be a base10 string of only 80 digits or so, which hasn't quite reached that 100 barrier. But if $base is able to handle longer strings by fixing $calc ^, that can be easily exceeded. The 100 limit is also fixed regardless if outbase is 2 or 36, which means the max padding bit length at base=2 is 100 which is below the current accuracy, and at base=36 is 517. Though to be fair, using $base(number,same,same,length) for zero padding is much slower than simply doing string manipulation. //tokenize 32 $base($sha512(S),16,16,128) 36 36 128 | if ($len($1) < $4) var -s %output $str(0,$calc($v2 - $v1)) $+ $1 , %outlen $len($1) -> $len(%output) * * * This last is just for reference in case this helps with other issues, since this appears to have gone away. In addition to the rounding error affecting both $calc^ and $base, and the mentioned above 2^n+1 rounding to zero, I came across another $calc glitch that affects the % operator in the 1275 beta, but I cannot detect in the next 2 betas, probably due to ramping up the precision of fractions. To demonstrate in 1275, paste the sample numbers into the next command, where the result of $calc(positive % small-positive) can be less than zero or greater than the divisor. I found these errors with a $base3 alias that imitated $base2 by using the subset of .bf mode calculations I thought was safe, excluding "^". When the number was 1024-bits long, there would be only a handful of digits that were wrong, and they looked like single items that appeared at far-separated ranges. In all cases that I found, the errors are an exact multiple of the modulus in one direction or the other, so this looks like this is caused by kind of modulo that m_apm_powmod was trying to do, by making long_fraction = (1/divisor) then multiplying by long_fraction and then adjusting it to be approximately correct, instead of dividing by the divisor to get the remainder. So these differences look like they would be caused by inaccuracies when generating certain long_fraction numbers. When the errors did rarely appear for various divisors against these 2 numbers, the result was further away from the correct answer when the number was larger. //var -s %i 2 , %a.bf REPLACE_ME | while (%i isnum 2-99) { echo -ag a mod %i = $calc(%a.bf % %i) | inc %i } 2375367556404088850618648174520826561280160432840589702972085294732162134239579969864370734273995321418305033664988384807865935555938322586901096972029296805385174745353604865654752517095868844281580877080228836752924976062380370295865131511490650052293878941356520340648053365611547195104400734887585100 4880764126099366743837 7919469740387813605031256289423841237995399571684111272206998186170769885721416238571318279483520798098990001232855690606254165366075219118430826622030552451715827119443274220533030920865103504446738458680112065589481385303840571100725796559750403704097781428796416803222269605442046883432586397577234279697820643783939861375686447028897903660789517177623671447371011012146391141909217824049825006796875609169056493133980096497118239292592211941820601612285824760020244043454619516083873000 * Edit: Updated parameter checking on $gcd and $lcm I discovered that these allow more than 2 parameters, which is great. However they allow 1 parameter which should be 'invalid parameters', because the definition includes the requirement that there be 2 numbers, and that they be integers. //echo -a $gcd(5) result: 5 should be 'invalid parameters' //echo -a $gcd(1.23,1.23) result: 1.23 should always be 0 if any term is a float Same for LCM, where $lcm(1.23) should be insufficient parameters, and $lcm(1.23,1.23) should change from 1.23 to zero since this is not 2+ integers.
Last edited by maroon; 31/10/22 11:48 AM.
|
|
|
|
|