Inaccurate results calculating Fibonacci numbers in R - r

I'm calculating Fibonacci numbers in R with this code:
fib <- function(n) {
a = 0
b = 1
for (i in 1:n) {
tmp = b
b = a
a = a + tmp
}
return (a)
}
sprintf("%.0f",fib(79))
From fib(79) onwards I'm getting inaccurate results. For instance: fib(79) = "14472334024676220" when the right result, according to this web, should be: fib(79)=14472334024676221
Using the function fibonacci from the package numbers I get the same inaccurate results. I assume this is because of the number precision in R.
How can I bypass this limitation and get accurate Fibonacci numbers in R?

Thank you for voting the question. My reputation is above 10 so I can post it rigth now :). I've got a pretty simple solution using the package gmp (a library mentioned as well in the link provided by Ben Bolker) to sum large integers.
require(gmp)
fib <- function(n) {
a = 0
b = 1
for (i in 1:n) {
tmp = b
b = a
a = add.bigz(a, tmp) # gmp function
}
return (a)
}
fib(79)
The result is the right Fibonacci number, fib(79): 14472334024676221.
I tested it for even larger integers fib(5000), 1045 digits, and the result seems accurate to the last digit.

You reached the limits of double precision floating-point arithmetics which has 16 decimal digits of accuracy. You do require 17 here. AFAIK R does not have a variable type with greater precision.
To bypass this you might want to split your operands and sum'em separately.
A state-of-the-art way around this is to convert your operands to character, parse them from astern, adding digit by digit, paying attention for carryover.

Related

R Precision for Double - Why code returns negative why positive outcome expected?

I am testing 2 ways of calculating Prod(b-a), where a and b are vectors of length n. Prod(b-a)=(b1-a1)(b2-a2)(b3-a3)*... (bn-an), where b_i>a_i>0 for all i=1,2,3, n. For some special cases, another way (Method 2) of calculation this prod(b-a) is more efficient. It uses the following formula, which is to expand the terms and sum them:
Here is my question is: When it happens that a_i very close to b_i, the true outcome could be very, very close 0, something like 10^(-16). Method 1 (substract and Multiply) always returns positive output. Method 2 of using the formula some times return negative output ( about 7~8% of time returning negative for my experiment). Mathematically, these 2 methods should return exactly the same output. But in computer language, it apparently produces different outputs.
Here are my codes to run the test. When I run the testing code for 10000 times, about 7~8% of my runs for method 2 returns negative output. According to the official document, the R double has the precision of "2.225074e-308" as indicated by R parameter: ".Machine$double.xmin". Why it's getting into the negative values when the differences are between 10^(-16) ~ 10^(-18)? Any help that sheds light on this will be apprecaited. I would also love some suggestions concerning how to practically increase the precision to higher level as indicated by R document.
########## Testing code 1.
ftest1case<-function(a,b) {
n<-length(a)
if (length(b)!=n) stop("--------- length a and b are not right.")
if ( any(b<a) ) stop("---------- b has to be greater than a all the time.")
out1<-prod(b-a)
out2<-0
N<-2^n
for ( i in 1:N ) {
tidx<-rev(as.integer(intToBits(x=i-1))[1:n])
tsign<-ifelse( (sum(tidx)%%2)==0,1.0,-1.0)
out2<-out2+tsign*prod(b[tidx==0])*prod(a[tidx==1])
}
c(out1,out2)
}
########## Testing code 2.
ftestManyCases<-function(N,printFreq=1000,smallNum=10^(-20))
{
tt<-matrix(0,nrow=N,ncol=2)
n<-12
for ( i in 1:N) {
a<-runif(n,0,1)
b<-a+runif(n,0,1)*0.1
tt[i,]<-ftest1case(a=a,b=b)
if ( (i%%printFreq)==0 ) cat("----- i = ",i,"\n")
if ( tt[i,2]< smallNum ) cat("------ i = ",i, " ---- Negative summation found.\n")
}
tout<-apply(tt,2,FUN=function(x) { round(sum(x<smallNum)/N,6) } )
names(tout)<-c("PerLess0_Method1","PerLee0_Method2")
list(summary=tout, data=tt)
}
######## Step 1. Test for 1 case.
n<-12
a<-runif(n,0,1)
b<-a+runif(n,0,1)*0.1
ftest1case(a=a,b=b)
######## Step 2 Test Code 2 for multiple cases.
N<-300
tt<-ftestManyCases(N=N,printFreq = 100)
tt[[1]]
It's hard for me to imagine when an algorithm that consists of generating 2^n permutations and adding them up is going to be more efficient than a straightforward product of differences, but I'll take your word for it that there are some special cases where it is.
As suggested in comments, the root of your problem is the accumulation of floating-point errors when adding values of different magnitudes; see here for an R-specific question about floating point and here for the generic explanation.
First, a simplified example:
n <- 12
set.seed(1001)
a <- runif(a,0,1)
b <- a + 0.01
prod(a-b) ## 1e-24
out2 <- 0
N <- 2^n
out2v <- numeric(N)
for ( i in 1:N ) {
tidx <- rev(as.integer(intToBits(x=i-1))[1:n])
tsign <- ifelse( (sum(tidx)%%2)==0,1.0,-1.0)
j <- as.logical(tidx)
out2v[i] <- tsign*prod(b[!j])*prod(a[j])
}
sum(out2v) ## -2.011703e-21
Using extended precision (with 1000 bits of precision) to check that the simple/brute force calculation is more reliable:
library(Rmpfr)
a_m <- mpfr(a, 1000)
b_m <- mpfr(b, 1000)
prod(a_m-b_m)
## 1.00000000000000857647286522936696473705868726043995807429578968484409120647055193862325070279593735821154440625984047036486664599510856317884962563644275433171621778761377125514191564456600405460403870124263023336542598111475858881830547350667868450934867675523340703947491662460873009229537576817962228e-24
This proves the point in this case, but in general doing extended-precision arithmetic will probably kill any performance gains you would get.
Redoing the permutation-based calculation with mpfr values (using out2 <- mpfr(0, 1000), and going back to the out2 <- out2 + ... running summation rather than accumulating the values in a vector and calling sum()) gives an accurate answer (at least to the first 20 or so digits, I didn't check farther), but takes 6.5 seconds on my machine (instead of 0.03 seconds when using regular floating-point).
Why is this calculation problematic? First, note the difference between .Machine$double.xmin (approx 2e-308), which is the smallest floating-point value that the system can store, and .Machine$double.eps (approx 2e-16), which is the smallest value such that 1+x > x, i.e. the smallest relative value that can be added without catastrophic cancellation (values a little bit bigger than this magnitude will experience severe, but not catastrophic, cancellation).
Now look at the distribution of values in out2v, the series of values in out2v:
hist(out2v)
There are clusters of negative and positive numbers of similar magnitude. If our summation happens to add a bunch of values that almost cancel (so that the result is very close to 0), then add that to another value that is not nearly zero, we'll get bad cancellation.
It's entirely possible that there's a way to rearrange this calculation so that bad cancellation doesn't happen, but I couldn't think of one easily.

What is the R equivalence of Python's modulo (%) operator? [duplicate]

What is the double percent (%%) used for in R?
From using it, it looks as if it divides the number in front by the number in back of it as many times as it can and returns the left over value. Is that correct?
Out of curiosity, when would this be useful?
The "Arithmetic operators" help page (which you can get to via ?"%%") says
‘%%’ indicates ‘x mod y’
which is only helpful if you've done enough programming to know that this is referring to the modulo operation, i.e. integer-divide x by y and return the remainder. This is useful in many, many, many applications. For example (from #GavinSimpson in comments), %% is useful if you are running a loop and want to print some kind of progress indicator to the screen every nth iteration (e.g. use if (i %% 10 == 0) { #do something} to do something every 10th iteration).
Since %% also works for floating-point numbers in R, I've just dug up an example where if (any(wts %% 1 != 0)) is used to test where any of the wts values are non-integer.
The result of the %% operator is the REMAINDER of a division,
Eg. 75%%4 = 3
I noticed if the dividend is lower than the divisor, then R returns the same dividend value.
Eg. 4%%75 = 4
Cheers
%% in R return remainder
for example:
s=c(1,8,10,4,6)
d=c(3,5,8,9,2)
x=s%%d
x
[1] 1 3 2 4 0

Ceiling function in R is inaccurate (sometimes)

I am trying to round up a number x to be divisible by a number m. Using the following function from previous post on SO:
roundUP = function(x,m)
{
return(m * ceiling(x/m))
}
But, when I input x = 0.28 and m = 0.005, the function outputs 0.285 when the result should be 0.28.
When I tried ceiling(0.28/0.005) it outputs 57 when the result should be 56 since 56 is already a whole number. Can anyone explain if is this happening and is this an error from Ceiling function?
The issue has to do with floating point arithmetic. You can find some details about this here.
Walk through the code below and it should shed some light on what's going on.
0.28/0.005 # Console displays 56
0.28/0.005 == 56 # returns FALSE. Why?
print(0.28/0.005, digits = 18) # 56.0000000000000071
# Solution?
roundUP = function(x, m)
{
return(m * ceiling(round(x/m, 9)))
}
Also, note the Warning section within ?ceiling
The realities of computer arithmetic can cause unexpected results,
especially with floor and ceiling. For example, we ‘know’ that
floor(log(x, base = 8)) for x = 8 is 1, but 0 has been seen on an R
platform. It is normally necessary to use a tolerance.
In R,
floor(x) function rounds to the nearest integer that’s smaller than x.
ceiling(x) function rounds to the nearest integer that’s larger than x.
So whatever be the value after the decimal point, R considers the next integer (before/after).
If you want 56 as the output, you must use floor(0.28/0.005).

What does the double percentage sign (%%) mean?

What is the double percent (%%) used for in R?
From using it, it looks as if it divides the number in front by the number in back of it as many times as it can and returns the left over value. Is that correct?
Out of curiosity, when would this be useful?
The "Arithmetic operators" help page (which you can get to via ?"%%") says
‘%%’ indicates ‘x mod y’
which is only helpful if you've done enough programming to know that this is referring to the modulo operation, i.e. integer-divide x by y and return the remainder. This is useful in many, many, many applications. For example (from #GavinSimpson in comments), %% is useful if you are running a loop and want to print some kind of progress indicator to the screen every nth iteration (e.g. use if (i %% 10 == 0) { #do something} to do something every 10th iteration).
Since %% also works for floating-point numbers in R, I've just dug up an example where if (any(wts %% 1 != 0)) is used to test where any of the wts values are non-integer.
The result of the %% operator is the REMAINDER of a division,
Eg. 75%%4 = 3
I noticed if the dividend is lower than the divisor, then R returns the same dividend value.
Eg. 4%%75 = 4
Cheers
%% in R return remainder
for example:
s=c(1,8,10,4,6)
d=c(3,5,8,9,2)
x=s%%d
x
[1] 1 3 2 4 0

How do computers evaluate huge numbers?

If I enter a value, for example
1234567 ^ 98787878
into Wolfram Alpha it can provide me with a number of details. This includes decimal approximation, total length, last digits etc. How do you evaluate such large numbers? As I understand it a programming language would have to have a special data type in order to store the number, let alone add it to something else. While I can see how one might approach the addition of two very large numbers, I can't see how huge numbers are evaluated.
10^2 could be calculated through repeated addition. However a number such as the example above would require a gigantic loop. Could someone explain how such large numbers are evaluated? Also, how could someone create a custom large datatype to support large numbers in C# for example?
Well it's quite easy and you can have done it yourself
Number of digits can be obtained via logarithm:
since `A^B = 10 ^ (B * log(A, 10))`
we can compute (A = 1234567; B = 98787878) in our case that
`B * log(A, 10) = 98787878 * log(1234567, 10) = 601767807.4709646...`
integer part + 1 (601767807 + 1 = 601767808) is the number of digits
First, say, five, digits can be gotten via logarithm as well;
now we should analyze fractional part of the
B * log(A, 10) = 98787878 * log(1234567, 10) = 601767807.4709646...
f = 0.4709646...
first digits are 10^f (decimal point removed) = 29577...
Last, say, five, digits can be obtained as a corresponding remainder:
last five digits = A^B rem 10^5
A rem 10^5 = 1234567 rem 10^5 = 34567
A^B rem 10^5 = ((A rem 10^5)^B) rem 10^5 = (34567^98787878) rem 10^5 = 45009
last five digits are 45009
You may find BigInteger.ModPow (C#) very useful here
Finally
1234567^98787878 = 29577...45009 (601767808 digits)
There are usually libraries providing a bignum datatype for arbitrarily large integers (eg. mapping digits k*n...(k+1)*n-1, k=0..<some m depending on n and number magnitude> to a machine word of size n redefining arithmetic operations). for c#, you might be interested in BigInteger.
exponentiation can be recursively broken down:
pow(a,2*b) = pow(a,b) * pow(a,b);
pow(a,2*b+1) = pow(a,b) * pow(a,b) * a;
there also are number-theoretic results that have engenedered special algorithms to determine properties of large numbers without actually computing them (to be precise: their full decimal expansion).
To compute how many digits there are, one uses the following expression:
decimal_digits(n) = 1 + floor(log_10(n))
This gives:
decimal_digits(1234567^98787878) = 1 + floor(log_10(1234567^98787878))
= 1 + floor(98787878 * log_10(1234567))
= 1 + floor(98787878 * 6.0915146640862625)
= 1 + floor(601767807.4709647)
= 601767808
The trailing k digits are computed by doing exponentiation mod 10^k, which keeps the intermediate results from ever getting too large.
The approximation will be computed using a (software) floating-point implementation that effectively evaluates a^(98787878 log_a(1234567)) to some fixed precision for some number a that makes the arithmetic work out nicely (typically 2 or e or 10). This also avoids the need to actually work with millions of digits at any point.
There are many libraries for this and the capability is built-in in the case of python. You seem primarily concerned with the size of such numbers and the time it may take to do computations like the exponent in your example. So I'll explain a bit.
Representation
You might use an array to hold all the digits of large numbers. A more efficient way would be to use an array of 32 bit unsigned integers and store "32 bit chunks" of the large number. You can think of these chunks as individual digits in a number system with 2^32 distinct digits or characters. I used an array of bytes to do this on an 8-bit Atari800 back in the day.
Doing math
You can obviously add two such numbers by looping over all the digits and adding elements of one array to the other and keeping track of carries. Once you know how to add, you can write code to do "manual" multiplication by multiplying digits and putting the results in the right place and a lot of addition - but software will do all this fairly quickly. There are faster multiplication algorithms than the one you would use manually on paper as well. Paper multiplication is O(n^2) where other methods are O(n*log(n)). As for the exponent, you can of course multiply by the same number millions of times but each of those multiplications would be using the previously mentioned function for doing multiplication. There are faster ways to do exponentiation that require far fewer multiplies. For example you can compute x^16 by computing (((x^2)^2)^2)^2 which involves only 4 actual (large integer) multiplications.
In practice
It's fun and educational to try writing these functions yourself, but in practice you will want to use an existing library that has been optimized and verified.
I think a part of the answer is in the question itself :) To store these expressions, you can store the base (or mantissa), and exponent separately, like scientific notation goes. Extending to that, you cannot possibly evaluate the expression completely and store such large numbers, although, you can theoretically predict certain properties of the consequent expression. I will take you through each of the properties you talked about:
Decimal approximation: Can be calculated by evaluating simple log values.
Total number of digits for expression a^b, can be calculated by the formula
Digits = floor function (1 + Log10(a^b)), where floor function is the closest integer smaller than the number. For e.g. the number of digits in 10^5 is 6.
Last digits: These can be calculated by the virtue of the fact that the expression of linearly increasing exponents form a arithmetic progression. For e.g. at the units place; 7, 9, 3, 1 is repeated for exponents of 7^x. So, you can calculate that if x%4 is 0, the last digit is 1.
Can someone create a custom datatype for large numbers, I can't say, but I am sure, the number won't be evaluated and stored.

Resources