How can I overcome large integer limitations in R? - r

In python 3 I can easily represent and use fairly large integers such as 2**128. However, in R I run into problems at much smaller integer values, with 2^53 being the upper limit (and why that limit?). For example, the following problem can occur.
x11 <- 2^54 - 11
x12 <- 2^54 - 12
print(x11, digits = 22)
# [1] 18014398509481972
print(x12, digits = 22)
# [1] 18014398509481972
x11 == x12
# [1] TRUE
I know that I could scale values or use floating point and then deal with machine error. But I'm wondering if there is a library or some other work around for using integers directly. Note that the L designation does not solve this problem.
In this case I know versions, and hardware matter so this is R 4.0.5 on macos 11.5.1.

You may use the package gmp (look at https://www.r-bloggers.com/2019/08/really-large-numbers-in-r/). Then
library(gmp)
num = as.bigz(2)
x11 <- num^54 -11
x12 <- num^54 -12
print(x11, digits = 22)
Big Integer ('bigz') :
[1] 18014398509481973
print(x12, digits = 22)
Big Integer ('bigz') :
[1] 18014398509481972
x11 == x12
[1] FALSE

Related

R: Number precision, how to prevent rounding?

In R, I have the following vector of numbers:
numbers <- c(0.0193738397702257, 0.0206218006695066, 0.021931558829559,
0.023301378178208, 0.024728095594751, 0.0262069239112787, 0.0277310799996657,
0.0292913948762414, 0.0308758879014822, 0.0324693108459748, 0.0340526658271053,
0.03560271425176, 0.0370915716288017, 0.0384863653635563, 0.0397490272396821,
0.0408363289939899, 0.0417002577578561, 0.0422890917131629, 0.0425479537267193,
0.0424213884467212, 0.0418571402964338, 0.0408094991140723, 0.039243951482081,
0.0371450856007627, 0.0345208537496488, 0.0314091884865658, 0.0278854381969885,
0.0240607638577763, 0.0200808932436969, 0.0161193801903312, 0.0123615428382314,
0.00920410652651576, 0.00628125319205829, 0.0038816517651031,
0.00214210795679701, 0.00103919307280354, 0.000435532895812429,
0.000154730641092234, 4.56593150728962e-05, 1.09540661898799e-05,
2.08952167815574e-06, 3.10045314287095e-07, 3.51923218134997e-08,
3.02121734299694e-09, 1.95269500257237e-10, 9.54697530552714e-12,
3.5914029230041e-13, 1.07379981978647e-14, 2.68543048763588e-16,
6.03891613157815e-18, 1.33875697089866e-19, 3.73885699170518e-21,
1.30142752487978e-22, 5.58607581840324e-24, 2.92551478380617e-25,
1.85002124085815e-26, 1.39826890505611e-27, 1.25058972437096e-28,
1.31082961467944e-29, 1.59522437605631e-30, 2.23371981458205e-31,
3.5678974253211e-32, 6.44735482309705e-33, 1.30771083084868e-33,
2.95492180915218e-34, 7.3857554006177e-35, 2.02831084124162e-35,
6.08139499028838e-36, 1.97878175996974e-36, 6.94814886769478e-37,
2.61888070029751e-37, 1.05433608968287e-37, 4.51270543356897e-38,
2.04454840598946e-38, 9.76544451781597e-39, 4.90105271869773e-39,
2.5743371658684e-39, 1.41165292292001e-39, 8.06250933233367e-40,
4.78746160076622e-40, 2.94835809615626e-40, 1.87667170875529e-40,
1.22833908072915e-40, 8.21091993733535e-41, 5.53869254991177e-41,
3.74485710867631e-41, 2.52485401054841e-41, 1.69027430542613e-41,
1.12176290106797e-41, 7.38294520887852e-42, 4.8381070000246e-42,
3.20123319815522e-42, 2.16493953538386e-42, 1.50891804884267e-42,
1.09057070511506e-42, 8.1903023226717e-43, 6.3480235351625e-43,
5.13533594742621e-43, 4.25591269645348e-43, 3.57422485839717e-43,
3.0293235331048e-43, 2.58514651313175e-43, 2.21952686649801e-43,
1.91634521841049e-43, 1.66319240529025e-43, 1.45043336371471e-43,
1.27052593975384e-43, 1.11752052211757e-43, 9.86689196888877e-44,
8.74248543892126e-44)
I use cumsum to get the cumulative sum. Due to R's numerical precision, many of the numbers towards the end of the vector are now equivalent to 1 (even though technically they're not exactly = 1, just very close to it).
So then when I try to recover my original numbers by using diff(cumulative), I get a lot of 0s instead of a very small number. How can I prevent R from "rounding"?
cumulative <- cumsum(numbers)
diff(cumulative)
I think the Rmpfr package does what you want:
library(Rmpfr)
x <- mpfr(numbers,200) # set arbitrary precision that's greater than R default
cumulative <- cumsum(x)
diff(cumulative)
Here's the top and bottom of the output:
> diff(cumulative)
109 'mpfr' numbers of precision 200 bits
[1] 0.02062180066950659862445860426305443979799747467041015625
[2] 0.021931558829559001655429284483034280128777027130126953125
[3] 0.02330137817820800150148130569505156017839908599853515625
[4] 0.0247280955947510004688805196337852976284921169281005859375
...
[107] 1.117520522117570086014450710640040701536080790307716261438975e-43
[108] 9.866891968888769759087690539062888824928577731689952701181586e-44
[109] 8.742485438921260418707338389502002282130643811990663213422948e-44
You can adjust the precision as you like by changing the second argument to mpfr.
You might want to try out the package Rmpfr.

Approximation to constant "pi" does not get any better after 50 iterations

In R I have written this function
ifun <- function(m) {
o = c()
for (k in 1:m) {
o[k] = prod(1:k) / prod(2 * (1:k) + 1)
}
o_sum = 2 * (1 + sum(o)) # Final result
print(o_sum)
}
This function approximates constant pi, however, after m > 50 the approximation gets stuck, i.e. the approximation is the same value and don't get better. How can I fix this? Thanks.
Let's go inside:
o <- numeric(100)
for (k in 1:length(o)) {
o[k] = prod(1:k) / prod(2 * (1:k) + 1)
}
o
# [1] 3.333333e-01 1.333333e-01 5.714286e-02 2.539683e-02 1.154401e-02
# [6] 5.328005e-03 2.486402e-03 1.170072e-03 5.542445e-04 2.639260e-04
# [11] 1.262255e-04 6.058822e-05 2.917211e-05 1.408309e-05 6.814396e-06
# [16] 3.303950e-06 1.604776e-06 7.807016e-07 3.803418e-07 1.855326e-07
# [21] 9.060894e-08 4.429771e-08 2.167760e-08 1.061760e-08 5.204706e-09
# [26] 2.553252e-09 1.253415e-09 6.157124e-10 3.026383e-10 1.488385e-10
# [31] 7.323800e-11 3.605563e-11 1.775874e-11 8.750685e-12 4.313718e-12
# [36] 2.127313e-12 1.049474e-12 5.179224e-13 2.556832e-13 1.262633e-13
# [41] 6.237104e-14 3.081863e-14 1.523220e-14 7.530524e-15 3.723886e-15
# [46] 1.841922e-15 9.112667e-16 4.509361e-16 2.231906e-16 1.104904e-16
# [51] 5.470883e-17 2.709390e-17 1.342034e-17 6.648610e-18 3.294356e-18
# [56] 1.632601e-18 8.092024e-19 4.011431e-19 1.988861e-19 9.862119e-20
# [61] 4.890969e-20 2.425921e-20 1.203410e-20 5.970404e-21 2.962414e-21
# [66] 1.470070e-21 7.295904e-22 3.621325e-22 1.797636e-22 8.924434e-23
# [71] 4.431013e-23 2.200227e-23 1.092630e-23 5.426483e-24 2.695273e-24
# [76] 1.338828e-24 6.650954e-25 3.304296e-25 1.641757e-25 8.157799e-26
# [81] 4.053875e-26 2.014653e-26 1.001295e-26 4.976849e-27 2.473873e-27
# [86] 1.229786e-27 6.113795e-28 3.039627e-28 1.511323e-28 7.514865e-29
# [91] 3.736900e-29 1.858350e-29 9.242063e-30 4.596582e-30 2.286258e-30
# [96] 1.137206e-30 5.656871e-31 2.814078e-31 1.399968e-31 6.965017e-32
print(sum(o[1:49]), digits = 22)
#[1] 0.5707963267948963359544
print(sum(o[1:50]), digits = 22)
#[1] 0.5707963267948964469767
print(sum(o[1:51]), digits = 22)
#[1] 0.570796326794896557999
print(sum(o[1:52]), digits = 22)
#[1] 0.570796326794896557999
There is no further improvement after 51, because:
o[51] / o[1]
#[1] 1.641265e-16
o[52] / o[1]
#[1] 8.128169e-17
Further terms are too small compared with the 1st term, readily beyond what machine precision could measure.
.Machine$double.eps
#[1] 2.220446e-16
So eventually you are just adding zeros.
In this case, the summation over o has numerically converged, so does your approximation to pi.
More thoughts
IEEE 754 standard for double-precision floating-point format states that on a 64-bit machine: 11 bits are used for exponential, 53 bits are used for significant digits (including a sign bit). This gives the machine precision: 1 / (2 ^ 52) = 2.2204e-16. In other words, a double-precision floating point number at most has 16 valid significant digits. R function print can display up to 22 digits, while sprintf can display more, but remember, any digits beyond the 16th are invalid, garbage values.
Have a look at the constant pi in R:
sprintf("%.53f", pi)
#[1] "3.14159265358979311599796346854418516159057617187500000"
If you compare it with How to print 1000 decimals places of pi value?, you will see that only the first 16 digits are truly correct:
3.141592653589793
What could alternative be done, so I can calculate more digits using my approach?
No. There have been many crazy algorithms around so that we can compute a shockingly great many of digits of pi, but you cannot modify your approach to get more valid significant digits.
At first I was thinking about computing sum(o[1:51]) and sum(o[52:100]) separately, as both of them give 16 valid significant digits. But we can't just concatenate them to get 32 digits. Because for sum(o[1:51]), the true digits beyond the 16th are not zeros, so the 16 digits for sum(o[52:100]) are not the 17 ~ 32th digits of sum(o[1:100]).

Large numbers multiplication in R

I want to know how can I calculate large values multiplication in R.
R returns Inf!
For example:
6.350218e+277*2.218789e+215
[1] Inf
Let me clarify the problem more:
consider the following code and the results of outFunc function:
library(hypergeo)
poch <-function(a,b) gamma(a+b)/gamma(a)
n<-c(37 , 41 , 4 , 9 , 12 , 13 , 2 , 5 , 23 , 73 , 129 , 22 , 121 )
v<-c(90.2, 199.3, 61, 38, 176.3, 293.6, 318.6, 328.7, 328.1, 313.3, 142.4, 92.9, 95.5)
DF<-data.frame(n,v)
outFunc<-function(k,w,r,lam,a,b) {
((((w*lam)^k) * poch(r,k) * poch(a,b) ) * hypergeo(r+k,a+k,a+b+k,-(w*lam)) )/(poch(a+k,b)*factorial(k))
}
and the function returns:
outFunc(DF$n,DF$v,0.2, 1, 3, 1)
[1] 0.002911330+ 0i 0.003047594+ 0i 0.029886646+ 0i 0.013560599+ 0i 0.010160073+ 0i
[6] 0.008928524+ 0i 0.040165795+ 0i 0.019402318+ 0i 0.005336008+ 0i 0.001689114+ 0i
[11] Inf+NaNi 0.005577985+ 0i Inf+NaNi
As can be seen above, outFunc returns Inf+NaNi for n values of 129 and 121.
I checked the code sections part by part and I find that the returned results of (wlam)^k poch(r,k) for these n values are Inf. I also check my code with equivalent code in Mathematica which everything is OK:
in: out[indata[[All, 1]], indata[[All, 2]], 0.2, 1, 3, 1]
out: {0.00291133, 0.00304759, 0.0298866, 0.0135606, 0.0101601, 0.00892852, \
0.0401658, 0.0194023, 0.00533601, 0.00168911, 0.000506457, \
0.00557798, 0.000365445}
Now please let me know how we can solve this issue as simple as it is in Mathematica. regards.
One option you have available in base R, which does not require a special library, is to convert the two numbers to a common base, and then add the exponents together to get the final result:
> x <- log(6.350218e+277, 10)
> x
[1] 277.8028
> y <- log(2.218789e+215, 10)
> y
[1] 215.3461
> x + y
[1] 493.1489
Since 10^x * 10^y = 10^(x+y), your final answer is 10^493.1489
Note that this solution does not allow to actually store numbers which R would normally treat as INF. Hence, in this example, you still cannot compute 10^493, but you can tease out what the product would be.
For first, I'd recommend two useful reads: logarithms and how floating values are handled by a computer. These are pertinent because with some "tricks" you can handle much bigger values than you think. For instance, your definition of the poch function is terrible. This because the fraction can be simplified a lot but a computer will evaluate the numerator first and if it overflows the result will be useless. That's why R provides beside gamma the lgamma function: it just calculates the logarithm of gamma and can handle much bigger values. So, we calculate the log of each factor in your function and then we use exp to restore the intended values. Try this:
#redefine poch properly
poch<-function(a,b) lgamma(a+b) - lgamma(a)
#redefine outFunc
outFunc<-function(k,w,r,lam,a,b) {
exp((k*(log(w)+log(lam))+ poch(r,k) + poch(a,b) ) +
log(hypergeo(r+k,a+k,a+b+k,-(w*lam)))- poch(a+k,b)-lgamma(k+1))
}
#Now we go
outFunc(DF$n,DF$v,0.2, 1, 3, 1)
#[1] 0.0029113299+0i 0.0030475939+0i 0.0298866458+0i 0.0135605995+0i
#[5] 0.0101600732+0i 0.0089285243+0i 0.0401657947+0i 0.0194023182+0i
#[9] 0.0053360084+0i 0.0016891144+0i 0.0005064566+0i 0.0055779850+0i
#[13] 0.0003654449+0i
> library(gmp)
> x<- pow.bigz(6.350218,277)
> y<- pow.bigz(2.218789,215)
> x*y
Big Integer ('bigz') :
[1] 18592826814872791919942226542714580401488894909642693257011204682802122918146288728149155739011270579948954646130492024596687919148494136290260248656581476275790189359808616520170359345612068099238508437236172770752199936303947098513476300142414338199993261924467166943683593371648

Calculate the exponentials of big negative value

I would like to know how can I get the exponential of big negative number in R? For example when I try :
> exp(-6400)
[1] 0
> exp(-1200)
[1] 0
> exp(-2000)
[1] 0
but I need the value of above expression, even if it is so small, how can I get it in R?
Those number are too small. To know the minimum value your computer can handle try:
> .Machine$double.xmin
[1] 2.225074e-308
Will give you (from ?.Machine)
the smallest non-zero normalized floating-point number, a power of the radix, i.e., double.base ^ double.min.exp. Normally 2.225074e-308.
In my case
> .Machine$double.base
[1] 2
> .Machine$double.min.exp
[1] -1022
Actually I can calculate powers up to
> exp(-745)
[1] 4.940656e-324
To go around this issue you need infinite precision arithmetic.
In R you can achieve that using package Rmpfr (PDF vignette)
library(Rmpfr)
# Calculate exp(-100)
> a <- mpfr(exp(-100), precBits=64)
# exp(-1000)
> a^10
1 'mpfr' number of precision 64 bits
[1] 5.07595889754945890823e-435
# exp(-6400)
> a^64
1 'mpfr' number of precision 64 bits
[1] 3.27578823787094497049e-2780
# use an array of powers
> ex <- c(10, 20, 50, 100, 500, 1000, 1e5)
> a ^ ex
7 'mpfr' numbers of precision 64 bits
[1] 5.07595889754945890823e-435 2.57653587296115182772e-869
[3] 3.36969414830892462745e-2172 1.13548386531474089222e-4343
[5] 1.88757769782054893243e-21715 3.56294956530952353784e-43430
[7] 1.51693678090513840149e-4342945
Note that Rmpfr is based on GNU MPFR and requires GNU GMP. Under Linux you will need gmp, gmp-devel, mpfr, and mpfr-devel to be installed in your system in order to install these packages, not sure how that works under Windows.

conversion bigq to mpfr with Rmpfr package

The help documentation of the Rmpfr R package claims that the .bigq2mpfr() function uses the minimal precision necessary for correct representation when the precB argument is NULL :
Description:
Coerce from and to big integers (‘bigz’) and ‘mpfr’ numbers.
Further, coerce from big rationals (‘bigq’) to ‘mpfr’ numbers.
Usage:
.bigz2mpfr(x, precB = NULL)
.bigq2mpfr(x, precB = NULL)
.mpfr2bigz(x, mod = NA)
Arguments:
x: an R object of class ‘bigz’, ‘bigq’ or ‘mpfr’ respectively.
precB: precision in bits for the result. The default, ‘NULL’, means
to use the _minimal_ precision necessary for correct
representation.
However when converting 31/3 one gets a bad approximation:
> x <- as.bigq(31,3)
> .bigq2mpfr(x)
1 'mpfr' number of precision 8 bits
[1] 10.31
By looking inside the .bigq2mpfr() function we see the detailed procedure:
N <- numerator(x)
D <- denominator(x)
if (is.null(precB)) {
eN <- frexpZ(N)$exp
eD <- frexpZ(D)$exp
precB <- eN + eD + 1L
}
.bigz2mpfr(N, precB)/.bigz2mpfr(D, precB)
Firstly I do not understand why precB is taken as follows. The exp output of the frexpZ() is the exponent in binary decomposition:
> frexpZ(N)
$d
[1] 0.96875
$exp
[1] 5
> 0.96875*2^5
[1] 31
Here we get precB=8 and the result is then identical to:
> mpfr(31, precBits=8)/mpfr(3, precBits=8)
1 'mpfr' number of precision 8 bits
[1] 10.31
I am under the impression one should rather replace precB with 2^precB but I'd like to get some advices about that:
> mpfr(31, precBits=8)/mpfr(3, precBits=2^8)
1 'mpfr' number of precision 256 bits
[1] 10.33333333333333333333333333333333333333333333333333333333333333333333333333329
> mpfr(31, precBits=8)/mpfr(3, precBits=2^9)
1 'mpfr' number of precision 512 bits
[1] 10.3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333329
> mpfr(31, precBits=8)/mpfr(3, precBits=2^7)
1 'mpfr' number of precision 128 bits
[1] 10.33333333333333333333333333333333333332
I get (note the difference in my initial creation):
Rgames> fooq<-as.bigq(31/3)
Rgames> fooq
Big Rational ('bigq') :
[1] 5817149518686891/562949953421312
Rgames> .bigq2mpfr(fooq)
1 'mpfr' number of precision 104 bits
[1] 10.3333333333333339254522798000835
All this strongly suggest to me that the precision in your bigq number is in fact zero decimal places, i.e. each of "31" and "3" has that precision. As such, your mpfr conversion is quite correct in giving you a result with one decimal place precision.
This has been corrected in a newer version of the package:
> x <- as.bigq(31,3)
> .bigq2mpfr(x)
1 'mpfr' number of precision 128 bits
[1] 10.33333333333333333333333333333333333332

Resources