In python 3 I can easily represent and use fairly large integers such as 2**128. However, in R I run into problems at much smaller integer values, with 2^53 being the upper limit (and why that limit?). For example, the following problem can occur.
x11 <- 2^54 - 11
x12 <- 2^54 - 12
print(x11, digits = 22)
# [1] 18014398509481972
print(x12, digits = 22)
# [1] 18014398509481972
x11 == x12
# [1] TRUE
I know that I could scale values or use floating point and then deal with machine error. But I'm wondering if there is a library or some other work around for using integers directly. Note that the L designation does not solve this problem.
In this case I know versions, and hardware matter so this is R 4.0.5 on macos 11.5.1.
You may use the package gmp (look at https://www.r-bloggers.com/2019/08/really-large-numbers-in-r/). Then
library(gmp)
num = as.bigz(2)
x11 <- num^54 -11
x12 <- num^54 -12
print(x11, digits = 22)
Big Integer ('bigz') :
[1] 18014398509481973
print(x12, digits = 22)
Big Integer ('bigz') :
[1] 18014398509481972
x11 == x12
[1] FALSE
In R, I have the following vector of numbers:
numbers <- c(0.0193738397702257, 0.0206218006695066, 0.021931558829559,
0.023301378178208, 0.024728095594751, 0.0262069239112787, 0.0277310799996657,
0.0292913948762414, 0.0308758879014822, 0.0324693108459748, 0.0340526658271053,
0.03560271425176, 0.0370915716288017, 0.0384863653635563, 0.0397490272396821,
0.0408363289939899, 0.0417002577578561, 0.0422890917131629, 0.0425479537267193,
0.0424213884467212, 0.0418571402964338, 0.0408094991140723, 0.039243951482081,
0.0371450856007627, 0.0345208537496488, 0.0314091884865658, 0.0278854381969885,
0.0240607638577763, 0.0200808932436969, 0.0161193801903312, 0.0123615428382314,
0.00920410652651576, 0.00628125319205829, 0.0038816517651031,
0.00214210795679701, 0.00103919307280354, 0.000435532895812429,
0.000154730641092234, 4.56593150728962e-05, 1.09540661898799e-05,
2.08952167815574e-06, 3.10045314287095e-07, 3.51923218134997e-08,
3.02121734299694e-09, 1.95269500257237e-10, 9.54697530552714e-12,
3.5914029230041e-13, 1.07379981978647e-14, 2.68543048763588e-16,
6.03891613157815e-18, 1.33875697089866e-19, 3.73885699170518e-21,
1.30142752487978e-22, 5.58607581840324e-24, 2.92551478380617e-25,
1.85002124085815e-26, 1.39826890505611e-27, 1.25058972437096e-28,
1.31082961467944e-29, 1.59522437605631e-30, 2.23371981458205e-31,
3.5678974253211e-32, 6.44735482309705e-33, 1.30771083084868e-33,
2.95492180915218e-34, 7.3857554006177e-35, 2.02831084124162e-35,
6.08139499028838e-36, 1.97878175996974e-36, 6.94814886769478e-37,
2.61888070029751e-37, 1.05433608968287e-37, 4.51270543356897e-38,
2.04454840598946e-38, 9.76544451781597e-39, 4.90105271869773e-39,
2.5743371658684e-39, 1.41165292292001e-39, 8.06250933233367e-40,
4.78746160076622e-40, 2.94835809615626e-40, 1.87667170875529e-40,
1.22833908072915e-40, 8.21091993733535e-41, 5.53869254991177e-41,
3.74485710867631e-41, 2.52485401054841e-41, 1.69027430542613e-41,
1.12176290106797e-41, 7.38294520887852e-42, 4.8381070000246e-42,
3.20123319815522e-42, 2.16493953538386e-42, 1.50891804884267e-42,
1.09057070511506e-42, 8.1903023226717e-43, 6.3480235351625e-43,
5.13533594742621e-43, 4.25591269645348e-43, 3.57422485839717e-43,
3.0293235331048e-43, 2.58514651313175e-43, 2.21952686649801e-43,
1.91634521841049e-43, 1.66319240529025e-43, 1.45043336371471e-43,
1.27052593975384e-43, 1.11752052211757e-43, 9.86689196888877e-44,
8.74248543892126e-44)
I use cumsum to get the cumulative sum. Due to R's numerical precision, many of the numbers towards the end of the vector are now equivalent to 1 (even though technically they're not exactly = 1, just very close to it).
So then when I try to recover my original numbers by using diff(cumulative), I get a lot of 0s instead of a very small number. How can I prevent R from "rounding"?
cumulative <- cumsum(numbers)
diff(cumulative)
I think the Rmpfr package does what you want:
library(Rmpfr)
x <- mpfr(numbers,200) # set arbitrary precision that's greater than R default
cumulative <- cumsum(x)
diff(cumulative)
Here's the top and bottom of the output:
> diff(cumulative)
109 'mpfr' numbers of precision 200 bits
[1] 0.02062180066950659862445860426305443979799747467041015625
[2] 0.021931558829559001655429284483034280128777027130126953125
[3] 0.02330137817820800150148130569505156017839908599853515625
[4] 0.0247280955947510004688805196337852976284921169281005859375
...
[107] 1.117520522117570086014450710640040701536080790307716261438975e-43
[108] 9.866891968888769759087690539062888824928577731689952701181586e-44
[109] 8.742485438921260418707338389502002282130643811990663213422948e-44
You can adjust the precision as you like by changing the second argument to mpfr.
You might want to try out the package Rmpfr.
In R I have written this function
ifun <- function(m) {
o = c()
for (k in 1:m) {
o[k] = prod(1:k) / prod(2 * (1:k) + 1)
}
o_sum = 2 * (1 + sum(o)) # Final result
print(o_sum)
}
This function approximates constant pi, however, after m > 50 the approximation gets stuck, i.e. the approximation is the same value and don't get better. How can I fix this? Thanks.
Let's go inside:
o <- numeric(100)
for (k in 1:length(o)) {
o[k] = prod(1:k) / prod(2 * (1:k) + 1)
}
o
# [1] 3.333333e-01 1.333333e-01 5.714286e-02 2.539683e-02 1.154401e-02
# [6] 5.328005e-03 2.486402e-03 1.170072e-03 5.542445e-04 2.639260e-04
# [11] 1.262255e-04 6.058822e-05 2.917211e-05 1.408309e-05 6.814396e-06
# [16] 3.303950e-06 1.604776e-06 7.807016e-07 3.803418e-07 1.855326e-07
# [21] 9.060894e-08 4.429771e-08 2.167760e-08 1.061760e-08 5.204706e-09
# [26] 2.553252e-09 1.253415e-09 6.157124e-10 3.026383e-10 1.488385e-10
# [31] 7.323800e-11 3.605563e-11 1.775874e-11 8.750685e-12 4.313718e-12
# [36] 2.127313e-12 1.049474e-12 5.179224e-13 2.556832e-13 1.262633e-13
# [41] 6.237104e-14 3.081863e-14 1.523220e-14 7.530524e-15 3.723886e-15
# [46] 1.841922e-15 9.112667e-16 4.509361e-16 2.231906e-16 1.104904e-16
# [51] 5.470883e-17 2.709390e-17 1.342034e-17 6.648610e-18 3.294356e-18
# [56] 1.632601e-18 8.092024e-19 4.011431e-19 1.988861e-19 9.862119e-20
# [61] 4.890969e-20 2.425921e-20 1.203410e-20 5.970404e-21 2.962414e-21
# [66] 1.470070e-21 7.295904e-22 3.621325e-22 1.797636e-22 8.924434e-23
# [71] 4.431013e-23 2.200227e-23 1.092630e-23 5.426483e-24 2.695273e-24
# [76] 1.338828e-24 6.650954e-25 3.304296e-25 1.641757e-25 8.157799e-26
# [81] 4.053875e-26 2.014653e-26 1.001295e-26 4.976849e-27 2.473873e-27
# [86] 1.229786e-27 6.113795e-28 3.039627e-28 1.511323e-28 7.514865e-29
# [91] 3.736900e-29 1.858350e-29 9.242063e-30 4.596582e-30 2.286258e-30
# [96] 1.137206e-30 5.656871e-31 2.814078e-31 1.399968e-31 6.965017e-32
print(sum(o[1:49]), digits = 22)
#[1] 0.5707963267948963359544
print(sum(o[1:50]), digits = 22)
#[1] 0.5707963267948964469767
print(sum(o[1:51]), digits = 22)
#[1] 0.570796326794896557999
print(sum(o[1:52]), digits = 22)
#[1] 0.570796326794896557999
There is no further improvement after 51, because:
o[51] / o[1]
#[1] 1.641265e-16
o[52] / o[1]
#[1] 8.128169e-17
Further terms are too small compared with the 1st term, readily beyond what machine precision could measure.
.Machine$double.eps
#[1] 2.220446e-16
So eventually you are just adding zeros.
In this case, the summation over o has numerically converged, so does your approximation to pi.
More thoughts
IEEE 754 standard for double-precision floating-point format states that on a 64-bit machine: 11 bits are used for exponential, 53 bits are used for significant digits (including a sign bit). This gives the machine precision: 1 / (2 ^ 52) = 2.2204e-16. In other words, a double-precision floating point number at most has 16 valid significant digits. R function print can display up to 22 digits, while sprintf can display more, but remember, any digits beyond the 16th are invalid, garbage values.
Have a look at the constant pi in R:
sprintf("%.53f", pi)
#[1] "3.14159265358979311599796346854418516159057617187500000"
If you compare it with How to print 1000 decimals places of pi value?, you will see that only the first 16 digits are truly correct:
3.141592653589793
What could alternative be done, so I can calculate more digits using my approach?
No. There have been many crazy algorithms around so that we can compute a shockingly great many of digits of pi, but you cannot modify your approach to get more valid significant digits.
At first I was thinking about computing sum(o[1:51]) and sum(o[52:100]) separately, as both of them give 16 valid significant digits. But we can't just concatenate them to get 32 digits. Because for sum(o[1:51]), the true digits beyond the 16th are not zeros, so the 16 digits for sum(o[52:100]) are not the 17 ~ 32th digits of sum(o[1:100]).
I am trying to generate some random numbers from hypergeometric distribution using R. However, the rhyper() behaves very strange when I have a very small number of white balls and a large number for black balls. Here is what I got in my computer:
> sum(rhyper(100,1000,1e9-1000,1e6))
[1] 91
> sum(rhyper(100,2000,1e9-2000,1e6))
[1] 204
> sum(rhyper(100,10000,1e9-10000,1e6))
[1] 1016
> sum(rhyper(100,20000,1e9-20000,1e6))
[1] 1909
> sum(rhyper(100,50000,1e9-50000,1e6))
[1] 4968
> sum(rhyper(100,5000,1e9-5000,1e6))
[1] 60
> sum(rhyper(100,6000,1e9-6000,1e6))
[1] 164
> sum(rhyper(100,8000,1e9-8000,1e6))
[1] 0
> sum(rhyper(100,9000,1e9-9000,1e6))
[1] 45
The first 5 works fine, but for the 6th, I expected to get a number around 500, but not something like 60, also for the 7th,8th,9th.
Something wrong with the rhyper() function or my computer?
The help documentation of the Rmpfr R package claims that the .bigq2mpfr() function uses the minimal precision necessary for correct representation when the precB argument is NULL :
Description:
Coerce from and to big integers (‘bigz’) and ‘mpfr’ numbers.
Further, coerce from big rationals (‘bigq’) to ‘mpfr’ numbers.
Usage:
.bigz2mpfr(x, precB = NULL)
.bigq2mpfr(x, precB = NULL)
.mpfr2bigz(x, mod = NA)
Arguments:
x: an R object of class ‘bigz’, ‘bigq’ or ‘mpfr’ respectively.
precB: precision in bits for the result. The default, ‘NULL’, means
to use the _minimal_ precision necessary for correct
representation.
However when converting 31/3 one gets a bad approximation:
> x <- as.bigq(31,3)
> .bigq2mpfr(x)
1 'mpfr' number of precision 8 bits
[1] 10.31
By looking inside the .bigq2mpfr() function we see the detailed procedure:
N <- numerator(x)
D <- denominator(x)
if (is.null(precB)) {
eN <- frexpZ(N)$exp
eD <- frexpZ(D)$exp
precB <- eN + eD + 1L
}
.bigz2mpfr(N, precB)/.bigz2mpfr(D, precB)
Firstly I do not understand why precB is taken as follows. The exp output of the frexpZ() is the exponent in binary decomposition:
> frexpZ(N)
$d
[1] 0.96875
$exp
[1] 5
> 0.96875*2^5
[1] 31
Here we get precB=8 and the result is then identical to:
> mpfr(31, precBits=8)/mpfr(3, precBits=8)
1 'mpfr' number of precision 8 bits
[1] 10.31
I am under the impression one should rather replace precB with 2^precB but I'd like to get some advices about that:
> mpfr(31, precBits=8)/mpfr(3, precBits=2^8)
1 'mpfr' number of precision 256 bits
[1] 10.33333333333333333333333333333333333333333333333333333333333333333333333333329
> mpfr(31, precBits=8)/mpfr(3, precBits=2^9)
1 'mpfr' number of precision 512 bits
[1] 10.3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333329
> mpfr(31, precBits=8)/mpfr(3, precBits=2^7)
1 'mpfr' number of precision 128 bits
[1] 10.33333333333333333333333333333333333332
I get (note the difference in my initial creation):
Rgames> fooq<-as.bigq(31/3)
Rgames> fooq
Big Rational ('bigq') :
[1] 5817149518686891/562949953421312
Rgames> .bigq2mpfr(fooq)
1 'mpfr' number of precision 104 bits
[1] 10.3333333333333339254522798000835
All this strongly suggest to me that the precision in your bigq number is in fact zero decimal places, i.e. each of "31" and "3" has that precision. As such, your mpfr conversion is quite correct in giving you a result with one decimal place precision.
This has been corrected in a newer version of the package:
> x <- as.bigq(31,3)
> .bigq2mpfr(x)
1 'mpfr' number of precision 128 bits
[1] 10.33333333333333333333333333333333333332