Using MPFR And Adding - How many Digits are Correct? - numerical

I have a pretty easy question (I think). As much as I've tried, I can not find an answer to this question.
I am creating a function, for which I want the user to enter two numbers. The first is the the number of terms of a certain infinite series to add together. The second is the number of digits the user would like the truncated sum to be accurate to.
Say the terms of the sequence are a_i. How much precision n, would be required in mpfr to ensure the result of adding these a_i from i=0 up to the user's entered value would be needed to guarantee the number of digits the user needs?
By the way, I'm adding the a_i in a naive way.
Any help will be much appreciated.
Thanks,
Rick

You can convert between decimal digits of precision, d, and binary digits of precision, b, with logarithms
b = d × log(10) / log(2)
A little rearranging shows why
b × log(2) = d × log(10)
log(2b) = log(10d)
2b = 10d
Each term of the series (and each addition) will introduce a rounding error at the least significant digit so, assuming each of the t terms involves n (two argument) arithmetic operations, you will want to add an extra
log(t * (n+2))/log(2)
bits.
You'll need to round the number of bits of precision up to be sure that you have enough room for your decimal digits of precision
b = ceil((d*log(10.0) + log(t*(n+2)))/log(2.0));
Finally, you should be aware that the terms may introduce cancellation errors, in which case this simple calculation will dramatically underestimate the required number of bits, even assuming I've got it right in the first place ;-)

Related

After rounduing float variable, there still be number `0.80000001`

I ma using MT4 but it might be the general question of floating number.
I am using NormalizeDouble function which rounds the digit of numbers like this.
double x = 1.33242
y = NormalizeDouble(x,2) // y is 1.33
However in some case.
Even after rounded by NormalizeDouble, there happens a number such us 0.800000001
I have no idea why it happens and how to fix it.
It might be a basic mathematical thing.
You are truncating to powers of 10 but fractional part of float/double can express exactly only powers of 2 like
0.5,0.25,0.125,...
and numbers decomposable to them hence your case:
0.8 = 1/2+1/4 +1/32 +1/64 +1/512 +1/1024 +1/8192 +1/16384...
= 0.5+0.25+0.03125+0.015625+0.001953125+0.0009765625+0.0001220703125+0.00006103515625...
= 0.11001100110011... [bin]
as 0.3 is like periodic number in binary and will always cause some noise in lower bits of mantissa. The FPU implementation tries to find the closest number to your desired value hence the 0.800000001

Using big-O notation to count the number of bits in x^2?

If x is an n-bit integer. What is the size (in bits) of x2?
I think the answer is O(n); is that correct? The way I thought about it is adding a number to itself that number amount of times means that there will be n operations, therefore O(n). Is my understanding correct?
Let's suppose x has n bits. This means x = Θ(2n). Therefore, x2 = Θ(2n · 2n) = Θ(22n), so the number now has about twice as many bits as before. This means that if there were n bits to begin with, there are now about 2n = Θ(n) bits.
While the answer you gave of O(n) is correct, your reasoning is invalid. Note that the question isn't asking for how long it takes to compute x2, but rather the number of bits it contains. The time to compute x2 is a different question.
Hope this helps!

Calculations precision level in R

I am working in R with very small numbers which reflect probabilities in an Maximum Likelihood Estimation algorithm. Some of these numbers are as small as 1e-155 ( or smaller). However, when there is something as simple as summation taking place, the precision level gets truncated to the least precise one and thus ruins the precisions of my calculations and produces meaningless results.
Example:
> sum(c(7.831908e-70,6.002923e-26,6.372573e-36,5.025015e-38,5.603268e-38,1.118121e-14, 4.512098e-07,4.400717e-05,2.300423e-26,1.317602e-58))
[1] 4.445838e-05
As is seen from the example, the base for this calculation is 1e-5 , which in a very rude manner rounds up sensitive calculation.
Is there a way around this? Why is R choosing such a strange automatic behavior? Perhaps it is not really doing this, I just see the result in the truncated form? In this case, is the actual number with correct precision stored in the variable?
There is no precision loss in your sum. But if you're worried about it, you should use a multiple-precision library:
library("Rmpfr")
x <- c(7.831908e-70,6.002923e-26,6.372573e-36,5.025015e-38,5.603268e-38,1.118121e-14, 4.512098e-07,4.400717e-05,2.300423e-26,1.317602e-58)
sum(mpfr(x, 1024))
# 1 'mpfr' number of precision 1024 bits
# [1] 4.445837981118120898327314579322617633703674840117902103769961398533293289165193843930280422747754618577451267010103975610356319174778512980120125435961577770470993217990999166176083700886405875414277348471907198346293122011042229843450802884152750493740313686430454254150390625000000000000000000000000000000000e-5
Your results are only truncated in the display.
Try:
x <- sum(c(7.831908e-70,6.002923e-26,6.372573e-36,5.025015e-38,5.603268e-38,1.118121e-14, 4.512098e-07,4.400717e-05,2.300423e-26,1.317602e-58))
print(x, digits=22)
[1] 4.445837981118121081878e-05
You can read more about the behaviour of print at ?print.default
You can also set an option - this will affext all calls to print
options(digits=22)
have you ever heard about Floating point numbers?
there is no loss of precision (significant figures) in multiplication or division as far as the result stay between
1.7976931348623157·10^308 to 4.9·10^−324 (see the link for detail)
so if you do 1.0e-30 * 1.0e-10 result will be 1.0e-40
but if you do 1.0e-30 + 1.0e-10 result will be 1.0e-10
Why?
-> finite set of number rapresentable with computer works. (64 bits max 2^64 different representation of numbers with 64 bits)
instead of using a direct conversion like for integer numbers (they represent from ~ -2^62 to +2^62, every INTEGER number -> about from -10^16 to +10*16)
or there exist a clever way like floating point? from 1.7976931348623157·10^308 to - 4.9·10^−324 and it can represent /approximate rational numbers?
So in floating point, to achieve a wider range, precision in sums is sacrified, There is loss of precision during sums or subtractions as the significant figures that could be represented by (the 52 bits of) the fraction part (of a floating point number of 64 bits) are less than log10(2^52) ~ 16.
if you look for a basic everyday example, summary(lm), when the p-value of parameter is near zero, summary() output <2.2e-16 (what a coincidence).
why limited to 64 bits? CPU have the execution units specifically to 64bits floating point arithmetic (64 bit IEEE 754 standard), if you use higher precision like 128 bits floating point, the performances will be lowered by 10 times or more, as CPU need to split the data and operation in multiple 64 bits data and operations.
https://en.wikipedia.org/wiki/Double-precision_floating-point_format

How do computers evaluate huge numbers?

If I enter a value, for example
1234567 ^ 98787878
into Wolfram Alpha it can provide me with a number of details. This includes decimal approximation, total length, last digits etc. How do you evaluate such large numbers? As I understand it a programming language would have to have a special data type in order to store the number, let alone add it to something else. While I can see how one might approach the addition of two very large numbers, I can't see how huge numbers are evaluated.
10^2 could be calculated through repeated addition. However a number such as the example above would require a gigantic loop. Could someone explain how such large numbers are evaluated? Also, how could someone create a custom large datatype to support large numbers in C# for example?
Well it's quite easy and you can have done it yourself
Number of digits can be obtained via logarithm:
since `A^B = 10 ^ (B * log(A, 10))`
we can compute (A = 1234567; B = 98787878) in our case that
`B * log(A, 10) = 98787878 * log(1234567, 10) = 601767807.4709646...`
integer part + 1 (601767807 + 1 = 601767808) is the number of digits
First, say, five, digits can be gotten via logarithm as well;
now we should analyze fractional part of the
B * log(A, 10) = 98787878 * log(1234567, 10) = 601767807.4709646...
f = 0.4709646...
first digits are 10^f (decimal point removed) = 29577...
Last, say, five, digits can be obtained as a corresponding remainder:
last five digits = A^B rem 10^5
A rem 10^5 = 1234567 rem 10^5 = 34567
A^B rem 10^5 = ((A rem 10^5)^B) rem 10^5 = (34567^98787878) rem 10^5 = 45009
last five digits are 45009
You may find BigInteger.ModPow (C#) very useful here
Finally
1234567^98787878 = 29577...45009 (601767808 digits)
There are usually libraries providing a bignum datatype for arbitrarily large integers (eg. mapping digits k*n...(k+1)*n-1, k=0..<some m depending on n and number magnitude> to a machine word of size n redefining arithmetic operations). for c#, you might be interested in BigInteger.
exponentiation can be recursively broken down:
pow(a,2*b) = pow(a,b) * pow(a,b);
pow(a,2*b+1) = pow(a,b) * pow(a,b) * a;
there also are number-theoretic results that have engenedered special algorithms to determine properties of large numbers without actually computing them (to be precise: their full decimal expansion).
To compute how many digits there are, one uses the following expression:
decimal_digits(n) = 1 + floor(log_10(n))
This gives:
decimal_digits(1234567^98787878) = 1 + floor(log_10(1234567^98787878))
= 1 + floor(98787878 * log_10(1234567))
= 1 + floor(98787878 * 6.0915146640862625)
= 1 + floor(601767807.4709647)
= 601767808
The trailing k digits are computed by doing exponentiation mod 10^k, which keeps the intermediate results from ever getting too large.
The approximation will be computed using a (software) floating-point implementation that effectively evaluates a^(98787878 log_a(1234567)) to some fixed precision for some number a that makes the arithmetic work out nicely (typically 2 or e or 10). This also avoids the need to actually work with millions of digits at any point.
There are many libraries for this and the capability is built-in in the case of python. You seem primarily concerned with the size of such numbers and the time it may take to do computations like the exponent in your example. So I'll explain a bit.
Representation
You might use an array to hold all the digits of large numbers. A more efficient way would be to use an array of 32 bit unsigned integers and store "32 bit chunks" of the large number. You can think of these chunks as individual digits in a number system with 2^32 distinct digits or characters. I used an array of bytes to do this on an 8-bit Atari800 back in the day.
Doing math
You can obviously add two such numbers by looping over all the digits and adding elements of one array to the other and keeping track of carries. Once you know how to add, you can write code to do "manual" multiplication by multiplying digits and putting the results in the right place and a lot of addition - but software will do all this fairly quickly. There are faster multiplication algorithms than the one you would use manually on paper as well. Paper multiplication is O(n^2) where other methods are O(n*log(n)). As for the exponent, you can of course multiply by the same number millions of times but each of those multiplications would be using the previously mentioned function for doing multiplication. There are faster ways to do exponentiation that require far fewer multiplies. For example you can compute x^16 by computing (((x^2)^2)^2)^2 which involves only 4 actual (large integer) multiplications.
In practice
It's fun and educational to try writing these functions yourself, but in practice you will want to use an existing library that has been optimized and verified.
I think a part of the answer is in the question itself :) To store these expressions, you can store the base (or mantissa), and exponent separately, like scientific notation goes. Extending to that, you cannot possibly evaluate the expression completely and store such large numbers, although, you can theoretically predict certain properties of the consequent expression. I will take you through each of the properties you talked about:
Decimal approximation: Can be calculated by evaluating simple log values.
Total number of digits for expression a^b, can be calculated by the formula
Digits = floor function (1 + Log10(a^b)), where floor function is the closest integer smaller than the number. For e.g. the number of digits in 10^5 is 6.
Last digits: These can be calculated by the virtue of the fact that the expression of linearly increasing exponents form a arithmetic progression. For e.g. at the units place; 7, 9, 3, 1 is repeated for exponents of 7^x. So, you can calculate that if x%4 is 0, the last digit is 1.
Can someone create a custom datatype for large numbers, I can't say, but I am sure, the number won't be evaluated and stored.

Efficient Multiplication of Varying-Length #s [Conceptual]

EDIT
So it seems I "underestimated" what varying length numbers meant. I didn't even think about situations where the operands are 100 digits long. In that case, my proposed algorithm is definitely not efficient. I'd probably need an implementation who's complexity depends on the # of digits in each operands as opposed to its numerical value, right?
As suggested below, I will look into the Karatsuba algorithm...
Write the pseudocode of an algorithm that takes in two arbitrary length numbers (provided as strings), and computes the product of these numbers. Use an efficient procedure for multiplication of large numbers of arbitrary length. Analyze the efficiency of your algorithm.
I decided to take the (semi) easy way out and use the Russian Peasant Algorithm. It works like this:
a * b = a/2 * 2b if a is even
a * b = (a-1)/2 * 2b + a if a is odd
My pseudocode is:
rpa(x, y){
if x is 1
return y
if x is even
return rpa(x/2, 2y)
if x is odd
return rpa((x-1)/2, 2y) + y
}
I have 3 questions:
Is this efficient for arbitrary length numbers? I implemented it in C and tried varying length numbers. The run-time in was near-instant in all cases so it's hard to tell empirically...
Can I apply the Master's Theorem to understand the complexity...?
a = # subproblems in recursion = 1 (max 1 recursive call across all states)
n / b = size of each subproblem = n / 1 -> b = 1 (problem doesn't change size...?)
f(n^d) = work done outside recursive calls = 1 -> d = 0 (the addition when a is odd)
a = 1, b^d = 1, a = b^d -> complexity is in n^d*log(n) = log(n)
this makes sense logically since we are halving the problem at each step, right?
What might my professor mean by providing arbitrary length numbers "as strings". Why do that?
Many thanks in advance
What might my professor mean by providing arbitrary length numbers "as strings". Why do that?
This actually change everything about the problem (and make your algorithm incorrect).
It means than 1234 is provided as 1,2,3,4 and you cannot operate directly on the whole number. You need to analyze your algorithm in terms of #additions, #multiplications, #divisions.
You should expect a division to be a bit more expensive than a multiplication, and a multiplication to be lot more expensive than an addition. So a good algorithm try to reduce the number of divisions and multiplications.
Check out the Karatsuba algorithm, (ps don't copy it that's not what your teacher want) is one of the fastest for this specification.
Add 3): Native integers are limited in how large (or small) numbers they can represent (32- or 64-bit integers for example). To represent arbitrary length numbers you can choose strings, because then you are not really limited by this. The problem is then, of course, that your arithmetic units are not really made to add strings ;-)

Resources