How does Erlang store large numbers? - functional-programming

How does Erlang store large numbers? - functional-programming

I have been tinkering with a factorial module as follows:
-module(factorial).
-export([factorial/1]).
factorial(0) ->
1;
factorial(Val)->
Val * factorial(Val-1).
If I run:
1> c(factorial).
{ok,factorial}
2> factorial:factorial(100).
I get:
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
How does Erlang so effortless hold such large numbers? On erlang.org when it talks about number types it simply states that that they hold either integers or floats. It must be some kind of dynamic integer that adjust its byte size as necessary?
I find this very cool I just don't know how its done.

It is a common feature of many functional programming languages, called Arbitrary precision arithmetic.
Notice that in Erlang, arbitrary precision is available only for integers, not floats.

Related

Is there a way to do a right bit-shift on a BigInt in Rust?

I get this error when attempting to do >> or >>= on a BigInt:
no implementation for `BigInt >> BigInt
using the num_bigint::BigInt library
Edit: More Context:
I am rewriting this program https://www.geeksforgeeks.org/how-to-generate-large-prime-numbers-for-rsa-algorithm/ from python/c++ into rust however I will focus on the python implementation as it is written to handle 1024 bit prime numbers which are extremely big.
In the code we run the Miller Rabin Primality test which includes shifting EC: (prime-candidate - 1) to the right by 1 if we find that EC % 2 == 0. As I mentioned in the python implementation EC can be an incredibly large integer.
It would be convenient to be able to use the same operator in rust, if that is not possible can someone suggest an alternative?

According to the documentation for the num-bigint crate, the BigInt struct does implement the Shr trait for the right-shift operator, just not when the shift amount is itself a BigInt. If you convert the shift amount to a standard integer type (e.g. i64) then it should work.
It is unlikely you would ever want to shift by an amount greater than i64::MAX, but if you do need this, then the correct result is going to be zero (because no computer has 2^60 bytes of memory), so you can write a simple implementation which checks for that case.

How to perform mathematical operations on large numbers

I have a question about working on very big numbers. I'm trying to run RSA algorithm and lets's pretend i have 512 bit number d and 1024 bit number n. decrypted_word = crypted_word^d mod n, isn't it? But those d and n are very large numbers! Non of standard variable types can handle my 512 bit numbers. Everywhere is written, that rsa needs 512 bit prime number at last, but how actually can i perform any mathematical operations on such a number?
And one more think. I can't use extra libraries. I generate my prime numbers with java, using BigInteger, but on my system, i have only basic variable types and STRING256 is the biggest.

Suppose your maximal integer size is 64 bit. Strings are not that useful for doing math in most languages, so disregard string types. Now choose an integer of half that size, i.e. 32 bit. An array of these can be interpreted as digits of a number in base 232. With these, you can do long addition and multiplication, just like you are used to with base 10 and pen and paper. In each elementary step, you combine two 32-bit quantities, to produce both a 32-bit result and possibly some carry. If you do the elementary operation in 64-bit arithmetic, you'll have both of these as part of a single 64-bit variable, which you'll then have to split into the 32-bit result digit (via bit mask or simple truncating cast) and the remaining carry (via bit shift).
Division is harder. But if the divisor is known, then you may get away with doing a division by constant using multiplication instead. Consider an example: division by 7. The inverse of 7 is 1/7=0.142857…. So you can multiply by that to obtain the same result. Obviously we don't want to do any floating point math here. But you can also simply multiply by 14286 then omit the last six digits of the result. This will be exactly the right result if your dividend is small enough. How small? Well, you compute x/7 as x*14286/100000, so the error will be x*(14286/100000 - 1/7)=x/350000 so you are on the safe side as long as x<350000. As long as the modulus in your RSA setup is known, i.e. as long as the key pair remains the same, you can use this approach to do integer division, and can also use that to compute the remainder. Remember to use base 232 instead of base 10, though, and check how many digits you need for the inverse constant.
There is an alternative you might want to consider, to do modulo reduction more easily, perhaps even if n is variable. Instead of expressing your remainders as numbers 0 through n-1, you could also use 21024-n through 21024-1. So if your initial number is smaller than 21024-n, you add n to convert to this new encoding. The benefit of this is that you can do the reduction step without performing any division at all. 21024 is equivalent to 21024-n in this setup, so an elementary modulo reduction would start by splitting some number into its lower 1024 bits and its higher rest. The higher rest will be right-shifted by 1024 bits (which is just a change in your array indexing), then multiplied by 21024-n and finally added to the lower part. You'll have to do this until you can be sure that the result has no more than 1024 bits. How often that is depends on n, so for fixed n you can precompute that (and for large n I'd expect it to be two reduction steps after addition but hree steps after multiplication, but please double-check that) whereas for variable n you'll have to check at runtime. At the very end, you can go back to the usual representation: if the result is not smaller than n, subtract n. All of this should work as described if n>2512. If not, i.e. if the top bit of your modulus is zero, then you might have to make further adjustments. Haven't thought this through, since I only used this approach for fixed moduli close to a power of two so far.
Now for that exponentiation. I very much suggest you do the binary approach for that. When computing xd, you start with x, x2=x*x, x4=x2*x2, x8=…, i.e. you compute all power-of-two exponents. You also maintain some intermediate result, which you initialize to one. In every step, if the corresponding bit is set in the exponent d, then you multiply the corresponding power into that intermediate result. So let's say you have d=11. Then you'd compute 1*x1*x2*x8 because d=11=1+2+8=10112. That way, you'll need only about 1024 multiplications max if your exponent has 512 bits. Half of them for the powers-of-two exponentiation, the other to combine the right powers of two. Every single multiplication in all of this should be immediately followed by a modulo reduction, to keep memory requirements low.
Note that the speed of the above exponentiation process will, in this simple form, depend on how many bits in d are actually set. So this might open up a side channel attack which might give an attacker access to information about d. But if you are worried about side channel attacks, then you really should have an expert develop your implementation, because I guess there might be more of those that I didn't think about.

You may write some macros you may execute under Microsoft for functions like +, -, x, /, modulo, x power y which work generally for any integer of less than ten or hundred thousand digits (the practical --not theoretical-- limit being the internal memory of your CPU). Please note the logic is exactly the same as the one you got at elementary school.
E.g.: p= 1819181918953471 divider of (2^8091) - 1, q = ((2^8091) - 1)/p, mod(2^8043 ; q ) = 23322504995859448929764248735216052746508873363163717902048355336760940697615990871589728765508813434665732804031928045448582775940475126837880519641309018668592622533434745187004918392715442874493425444385093718605461240482371261514886704075186619878194235490396202667733422641436251739877125473437191453772352527250063213916768204844936898278633350886662141141963562157184401647467451404036455043333801666890925659608198009284637923691723589801130623143981948238440635691182121543342187092677259674911744400973454032209502359935457437167937310250876002326101738107930637025183950650821770087660200075266862075383130669519130999029920527656234911392421991471757068187747362854148720728923205534341236146499449910896530359729077300366804846439225483086901484209333236595803263313219725469715699546041162923522784170350104589716544529751439438021914727772620391262534105599688603950923321008883179433474898034318285889129115556541479670761040388075352934137326883287245821888999474421001155721566547813970496809555996313854631137490774297564881901877687628176106771918206945434350873509679638109887831932279470631097604018939855788990542627072626049281784152807097659485238838560958316888238137237548590528450890328780080286844038796325101488977988549639523988002825055286469740227842388538751870971691617543141658142313059934326924867846151749777575279310394296562191530602817014549464614253886843832645946866466362950484629554258855714401785472987727841040805816224413657036499959117701249028435191327757276644272944743479296268749828927565559951441945143269656866355210310482235520220580213533425016298993903615753714343456014577479225435915031225863551911605117029393085632947373872635330181718820669836830147312948966028682960518225213960218867207825417830016281036121959384707391718333892849665248512802926601676251199711698978725399048954325887410317060400620412797240129787158839164969382498537742579233544463501470239575760940937130926062252501116458281610468726777710383038372260777522143500312913040987942762244940009811450966646527814576364565964518092955053720983465333258335601691477534154940549197873199633313223848155047098569827560014018412679602636286195283270106917742919383395056306107175539370483171915774381614222806960872813575048014729965930007408532959309197608469115633821869206793759322044599554551057140046156235152048507130125695763956991351137040435703946195318000567664233417843805257728.
The last step took about 0.1 sec.
wpjo (willibrord oomen on academia.edu)

Addition and multiplication in a Galois Field

I am attempting to generate QR codes on an extremely limited embedded platform. Everything in the specification seems fairly straightforward except for generating the error correction codewords. I have looked at a bunch of existing implementations, and they all try to implement a bunch of polynomial math that goes straight over my head, particularly with regards to the Galois fields. The most straightforward way I can see, both in mathematical complexity and in memory requirements is a circuit concept that is laid out in the spec itself:
With their description, I am fairly confident I could implement this with the exception of the parts labeled GF(256) addition and GF(256) Multiplication.
They offer this help:
The polynomial arithmetic for QR Code shall be calculated using bit-wise modulo 2 arithmetic and byte-wise
modulo 100011101 arithmetic. This is a Galois field of 2^8
with 100011101 representing the field's prime modulus
polynomial x^8+x^4+x^3+x^2+1.
which is all pretty much greek to me.
So my question is this: What is the easiest way to perform addition and multiplication in this kind of Galois field arithmetic? Assume both input numbers are 8 bits wide, and my output needs to be 8 bits wide also. Several implementations precalculate, or hardcode in two lookup tables to help with this, but I am not sure how those are calculated, or how I would use them in this situation. I would rather not take the 512 byte memory hit for the two tables, but it really depends on what the alternative is. I really just need help understanding how to do a single multiplication and addition operation in this circuit.

In practice only one table is needed. That would be for the GP(256) multiply. Note that all arithmetic is carry-less, meaning that there is no carry-propagation.
Addition and subtraction without carry is equivalent to an xor.
So in GF(256), a + b and a - b are both equivalent to a xor b.
GF(256) multiplication is also carry-less, and can be done using carry-less multiplication in a similar way with carry-less addition/subtraction. This can be done efficiently with hardware support via say Intel's CLMUL instruction set.
However, the hard part, is reducing the modulo 100011101. In normal integer division, you do it using a series of compare/subtract steps. In GF(256), you do it in a nearly identical manner using a series of compare/xor steps.
In fact, it's bad enough where it's still faster to just precompute all 256 x 256 multiplies and put them into a 65536-entry look-up table.
page 3 of the following pdf has a pretty good reference on GF256 arithmetic:
http://www.eecs.harvard.edu/~michaelm/CS222/eccnotes.pdf

(I'm following up on the pointer to zxing in the first answer, since I'm the author.)
The answer about addition is exactly right; that's why working in this field is convenient on a computer.
See http://code.google.com/p/zxing/source/browse/trunk/core/src/com/google/zxing/common/reedsolomon/GenericGF.java
Yes multiplication works, and is for GF256. a * b is really the same as exp(log(a) + log(b)). And because GF256 has only 256 elements, there are only 255 unique powers of "x", and same for log. So these are easy to put in a lookup table. The tables would "wrap around" at 256, so that is why you see the "% size". "/ size" is slightly harder to explain in a sentence -- it's because really 1-255 "wrap around", not 0-255. So it's not quite just a simple modulus that's needed.
The final piece perhaps is how you reduce modulo an irreducible polynomial. The irreducibly polynomial is x^8 plus some lower-power terms, right -- call it I(x) = x^8 + R(x). And the polynomial is congruent to 0 in the field, by definition; I(x) == 0. So x^8 == -R(x). And, conveniently, addition and subtraction are the same, so x^8 == -R(x) == R(x).
The only time we need to reduce higher-power polynomials is when constructing the exponents table. You just keep multiplying by x (which is a shift left) until it gets too big -- gets an x^8 term. But x^8 is the same as R(x). So you take out the x^8 and add in R(x). R(x) merely has powers up to x^7 so it's all in a byte still, all in GF(256). And you know how to add in this field.
Helps?

How do programming languages handle huge number arithmetic

For a computer working with a 64 bit processor, the largest number that it can handle would be 264 = 18,446,744,073,709,551,616. How does programming languages, say Java or be it C, C++ handle arithmetic of numbers higher than this value. Any register cannot hold it as a single piece. How was this issue tackled?

There are lots of specialized techniques for doing calculations on numbers larger than the register size. Some of them are outlined in this wikipedia article on arbitrary precision arithmetic
Low level languages, like C and C++, leave large number calculations to the library of your choice. One notable one is the GNU Multi-Precision library. High level languages like Python, and others, integrate this into the core of the language, so normal numbers and very large numbers are identical to the programmer.

You assume the wrong thing. The biggest number it can handle in a single register is a 64-bits number. However, with some smart programming techniques, you could just combined a few dozens of those 64-bits numbers in a row to generate a huge 6400 bit number and use that to do more calculations. It's just not as fast as having the number fit in one register.
Even the old 8 and 16 bits processors used this trick, where they would just let the number overflow to other registers. It makes the math more complex but it doesn't put an end to the possibilities.
However, such high-precision math is extremely unusual. Even if you want to calculate the whole national debt of the USA and store the outcome in Zimbabwean Dollars, a 64-bits integer would still be big enough, I think. It's definitely big enough to contain the amount of my savings account, though.

Programming languages that handle truly massive numbers use custom number primitives that go beyond normal operations optimized for 32, 64, or 128 bit CPUs. These numbers are especially useful in computer security and mathematical research.
The GNU Multiple Precision Library is probably the most complete example of these approaches.
You can handle larger numbers by using arrays. Try this out in your web browser. Type the following code in the JavaScript console of your web browser:
The point at which JavaScript fails
console.log(9999999999999998 + 1)
// expected 9999999999999999
// actual 10000000000000000 oops!
JavaScript does not handle plain integers above 9999999999999998. But writing your own number primitive is to make this calculation work is simple enough. Here is an example using a custom number adder class in JavaScript.
Passing the test using a custom number class
// Require a custom number primative class
const {Num} = require('./bases')
// Create a massive number that JavaScript will not add to (correctly)
const num = new Num(9999999999999998, 10)
// Add to the massive number
num.add(1)
// The result is correct (where plain JavaScript Math would fail)
console.log(num.val) // 9999999999999999
How it Works
You can look in the code at class Num { ... } to see details of what is happening; but here is a basic outline of the logic in use:
Classes:
The Num class contains an array of single Digit classes.
The Digit class contains the value of a single digit, and the logic to handle the Carry flag
Steps:
The chosen number is turned into a string
Each digit is turned into a Digit class and stored in the Num class as an array of digits
When the Num is incremented, it gets carried to the first Digit in the array (the right-most number)
If the Digit value plus the Carry flag are equal to the Base, then the next Digit to the left is called to be incremented, and the current number is reset to 0
... Repeat all the way to the left-most digit of the array
Logistically it is very similar to what is happening at the machine level, but here it is unbounded. You can read more about about how digits are
carried here; this can be applied to numbers of any base.

Ada actually supports this natively, but only for its typeless constants ("named numbers"). For actual variables, you need to go find an arbitrary-length package. See Arbitrary length integer in Ada

More-or-less the same way that you do. In school, you memorized single-digit addition, multiplication, subtraction, and division. Then, you learned how to do multiple-digit problems as a sequence of single-digit problems.
If you wanted to, you could multiply two twenty-digit numbers together using nothing more than knowledge of a simple algorithm, and the single-digit times tables.

In general, the language itself doesn't handle high-precision, high-accuracy large number arithmetic. It's far more likely that a library is written that uses alternate numerical methods to perform the desired operations.
For example (I'm just making this up right now), such a library might emulate the actual techniques that you might use to perform that large number arithmetic by hand. Such libraries are generally much slower than using the built-in arithmetic, but occasionally the additional precision and accuracy is called for.

As a thought experiment, imagine the numbers stored as a string. With functions to add, multiply, etc these arbitrarily long numbers.
In reality these numbers are probably stored in a more space efficient manner.

Think of one machine-size number as a digit and apply the algorithm for multi-digit multiplication from primary school. Then you don't need to keep the whole numbers in registers, just the digits as they are worked on.

Most languages store them as array of integers. If you add/subtract two to of these big numbers the library adds/subtracts all integer elements in the array separately and handles the carries/borrows.
It's like manual addition/subtraction in school because this is how it works internally.
Some languages use real text strings instead of integer arrays which is less efficient but simpler to transform into text representation.

A little diversion into floating point (im)precision, part 1

Most mathematicians agree that:
eπi + 1 = 0
However, most floating point implementations disagree. How well can we settle this dispute?
I'm keen to hear about different languages and implementations, and various methods to make the result as close to zero as possible. Be creative!

It's not that most floating point implementations disagree, it's just that they cannot get the accuracy necessary to get a 100% answer. And the correct answer is that they can't.
PI is an infinite series of digits that nobody has been able to denote by anything other than a symbolic representation, and e^X is the same, and thus the only way to get to 100% accuracy is to go symbolic.

Here's a short list of implementations and languages I've tried. It's sorted by closeness to zero:
Scheme: (+ 1 (make-polar 1 (atan 0 -1)))
⇒ 0.0+1.2246063538223773e-16i (Chez Scheme, MIT Scheme)
⇒ 0.0+1.22460635382238e-16i (Guile)
⇒ 0.0+1.22464679914735e-16i (Chicken with numbers egg)
⇒ 0.0+1.2246467991473532e-16i (MzScheme, SISC, Gauche, Gambit)
⇒ 0.0+1.2246467991473533e-16i (SCM)
Common Lisp: (1+ (exp (complex 0 pi)))
⇒ #C(0.0L0 -5.0165576136843360246L-20) (CLISP)
⇒ #C(0.0d0 1.2246063538223773d-16) (CMUCL)
⇒ #C(0.0d0 1.2246467991473532d-16) (SBCL)
Perl: use Math::Complex; Math::Complex->emake(1, pi) + 1
⇒ 1.22464679914735e-16i
Python: from cmath import exp, pi; exp(complex(0, pi)) + 1
⇒ 1.2246467991473532e-16j (CPython)
Ruby: require 'complex'; Complex::polar(1, Math::PI) + 1
⇒ Complex(0.0, 1.22464679914735e-16) (MRI)
⇒ Complex(0.0, 1.2246467991473532e-16) (JRuby)
R: complex(argument = pi) + 1
⇒ 0+1.224606353822377e-16i

Is it possible to settle this dispute?
My first thought is to look to a symbolic language, like Maple. I don't think that counts as floating point though.
In fact, how does one represent i (or j for the engineers) in a conventional programming language?
Perhaps a better example is sin(π) = 0? (Or have I missed the point again?)

I agree with Ryan, you would need to move to another number representation system. The solution is outside the realm of floating point math because you need pi to represented as an infinitely long decimal so any limited precision scheme just isn't going to work (at least not without employing some kind of fudge-factor to make up the lost precision).

Your question seems a little odd to me, as you seem to be suggesting that the Floating Point math is implemented by the language. That's generally not true, as the FP math is done using a floating point processor in hardware. But software or hardware, floating point will always be inaccurate. That's just how floats work.
If you need better precision you need to use a different number representation. Just like if you're doing integer math on numbers that don't fit in an int or long. Some languages have libraries for that built in (I know java has BigInteger and BigDecimal), but you'd have to explicitly use those libraries instead of native types, and the performance would be (sometimes significantly) worse than if you used floats.

#Ryan Fox In fact, how does one represent i (or j for the engineers) in a conventional programming language?
Native complex data types are far from unknown. Fortran had it by the mid-sixties, and the OP exhibits a variety of other languages that support them in hist followup.
And complex numbers can be added to other languages as libraries (with operator overloading they even look just like native types in the code).
But unless you provide a special case for this problem, the "non-agreement" is just an expression of imprecise machine arithmetic, no? It's like complaining that
float r = 2/3;
float s = 3*r;
float t = s - 2;
ends with (t != 0) (At least if you use an dumb enough compiler)...

I had looooong coffee chats with my best pal talking about Irrational numbers and the diference between other numbers. Well, both of us agree in this different point of view:
Irrational numbers are relations, as functions, in a way, what way? Well, think about "if you want a perfect circle, give me a perfect pi", but circles are diferent to the other figures (4 sides, 5, 6... 100, 200) but... How many more sides do you have, more like a circle it look like. If you followed me so far, connecting all this ideas here is the pi formula:
So, pi is a function, but one that never ends! because of the ∞ parameter, but I like to think that you can have "instance" of pi, if you change the ∞ parameter for a very big Int, you will have a very big pi instance.
Same with e, give me a huge parameter, I will give you a huge e.
Putting all the ideas together:
As we have memory limitations, the language and libs provide to us huge instance of irrational numbers, in this case, pi and e, as final result, you will have long aproach to get 0, like the examples provided by #Chris Jester-Young

In fact, how does one represent i (or j for the engineers) in a conventional programming language?
In a language that doesn't have a native representation, it is usually added using OOP to create a Complex class to represent i and j, with operator overloading to properly deal with operations involving other Complex numbers and or other number primitives native to the language.
Eg: Complex.java, C++ < complex >

Numerical Analysis teaches us that you can't rely on the precise value of small differences between large numbers.
This doesn't just affect the equation in question here, but can bring instability to everything from solving a near-singular set of simultaneous equations, through finding the zeros of polynomials, to evaluating log(~1) or exp(~0) (I have even seen special functions for evaluating log(x+1) and (exp(x)-1) to get round this).
I would encourage you not to think in terms of zeroing the difference -- you can't -- but rather in doing the associated calculations in such a way as to ensure the minimum error.
I'm sorry, it's 43 years since I had this drummed into me at uni, and even if I could remember the references, I'm sure there's better stuff around now. I suggest this as a starting point.
If that sounds a bit patronising, I apologise. My "Numerical Analysis 101" was part of my Chemistry course, as there wasn't much CS in those days. I don't really have a feel for the place/importance numerical analysis has in a modern CS course.

It's a limitation of our current floating point computational architectures. Floating point arithmetic is only an approximation of numeric poles like e or pi (or anything beyond the precision your bits allow). I really enjoy these numbers because they defy classification, and appear to have greater entropy(?) than even primes, which are a canonical series. A ratio defy's numerical representation, sometimes simple things like that can blow a person's mind (I love it).
Luckily entire languages and libraries can be dedicated to precision trigonometric functions by using notational concepts (similar to those described by Lasse V. Karlsen ).
Consider a library/language that describes concepts like e and pi in a form that a machine can understand. Does a machine have any notion of what a perfect circle is? Probably not, but we can create an object - circle that satisfies all the known features we attribute to it (constant radius, relationship of radius to circumference is 2*pi*r = C). An object like pi is only described by the aforementioned ratio. r & C can be numeric objects described by whatever precision you want to give them. e can be defined "as the e is the unique real number such that the value of the derivative (slope of the tangent line) of the function f(x) = ex at the point x = 0 is exactly 1" from wikipedia.
Fun question.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex