Approximation to constant "pi" does not get any better after 50 iterations - r

In R I have written this function
ifun <- function(m) {
o = c()
for (k in 1:m) {
o[k] = prod(1:k) / prod(2 * (1:k) + 1)
}
o_sum = 2 * (1 + sum(o)) # Final result
print(o_sum)
}
This function approximates constant pi, however, after m > 50 the approximation gets stuck, i.e. the approximation is the same value and don't get better. How can I fix this? Thanks.

Let's go inside:
o <- numeric(100)
for (k in 1:length(o)) {
o[k] = prod(1:k) / prod(2 * (1:k) + 1)
}
o
# [1] 3.333333e-01 1.333333e-01 5.714286e-02 2.539683e-02 1.154401e-02
# [6] 5.328005e-03 2.486402e-03 1.170072e-03 5.542445e-04 2.639260e-04
# [11] 1.262255e-04 6.058822e-05 2.917211e-05 1.408309e-05 6.814396e-06
# [16] 3.303950e-06 1.604776e-06 7.807016e-07 3.803418e-07 1.855326e-07
# [21] 9.060894e-08 4.429771e-08 2.167760e-08 1.061760e-08 5.204706e-09
# [26] 2.553252e-09 1.253415e-09 6.157124e-10 3.026383e-10 1.488385e-10
# [31] 7.323800e-11 3.605563e-11 1.775874e-11 8.750685e-12 4.313718e-12
# [36] 2.127313e-12 1.049474e-12 5.179224e-13 2.556832e-13 1.262633e-13
# [41] 6.237104e-14 3.081863e-14 1.523220e-14 7.530524e-15 3.723886e-15
# [46] 1.841922e-15 9.112667e-16 4.509361e-16 2.231906e-16 1.104904e-16
# [51] 5.470883e-17 2.709390e-17 1.342034e-17 6.648610e-18 3.294356e-18
# [56] 1.632601e-18 8.092024e-19 4.011431e-19 1.988861e-19 9.862119e-20
# [61] 4.890969e-20 2.425921e-20 1.203410e-20 5.970404e-21 2.962414e-21
# [66] 1.470070e-21 7.295904e-22 3.621325e-22 1.797636e-22 8.924434e-23
# [71] 4.431013e-23 2.200227e-23 1.092630e-23 5.426483e-24 2.695273e-24
# [76] 1.338828e-24 6.650954e-25 3.304296e-25 1.641757e-25 8.157799e-26
# [81] 4.053875e-26 2.014653e-26 1.001295e-26 4.976849e-27 2.473873e-27
# [86] 1.229786e-27 6.113795e-28 3.039627e-28 1.511323e-28 7.514865e-29
# [91] 3.736900e-29 1.858350e-29 9.242063e-30 4.596582e-30 2.286258e-30
# [96] 1.137206e-30 5.656871e-31 2.814078e-31 1.399968e-31 6.965017e-32
print(sum(o[1:49]), digits = 22)
#[1] 0.5707963267948963359544
print(sum(o[1:50]), digits = 22)
#[1] 0.5707963267948964469767
print(sum(o[1:51]), digits = 22)
#[1] 0.570796326794896557999
print(sum(o[1:52]), digits = 22)
#[1] 0.570796326794896557999
There is no further improvement after 51, because:
o[51] / o[1]
#[1] 1.641265e-16
o[52] / o[1]
#[1] 8.128169e-17
Further terms are too small compared with the 1st term, readily beyond what machine precision could measure.
.Machine$double.eps
#[1] 2.220446e-16
So eventually you are just adding zeros.
In this case, the summation over o has numerically converged, so does your approximation to pi.
More thoughts
IEEE 754 standard for double-precision floating-point format states that on a 64-bit machine: 11 bits are used for exponential, 53 bits are used for significant digits (including a sign bit). This gives the machine precision: 1 / (2 ^ 52) = 2.2204e-16. In other words, a double-precision floating point number at most has 16 valid significant digits. R function print can display up to 22 digits, while sprintf can display more, but remember, any digits beyond the 16th are invalid, garbage values.
Have a look at the constant pi in R:
sprintf("%.53f", pi)
#[1] "3.14159265358979311599796346854418516159057617187500000"
If you compare it with How to print 1000 decimals places of pi value?, you will see that only the first 16 digits are truly correct:
3.141592653589793
What could alternative be done, so I can calculate more digits using my approach?
No. There have been many crazy algorithms around so that we can compute a shockingly great many of digits of pi, but you cannot modify your approach to get more valid significant digits.
At first I was thinking about computing sum(o[1:51]) and sum(o[52:100]) separately, as both of them give 16 valid significant digits. But we can't just concatenate them to get 32 digits. Because for sum(o[1:51]), the true digits beyond the 16th are not zeros, so the 16 digits for sum(o[52:100]) are not the 17 ~ 32th digits of sum(o[1:100]).

Related

Partial Variances at each row of a Matrix

I generated a series of 10,000 random numbers through:
rand_x = rf(10000, 3, 5)
Now I want to produce another series that contains the variances at each point i.e. the column look like this:
[variance(first two numbers)]
[variance(first three numbers)]
[variance(first four numbers)]
[variance(first five numbers)]
.
.
.
.
[variance of 10,000 numbers]
I have written the code as:
c ( var(rand_x[1:1]) : var(rand_x[1:10000])
but I am only getting 157 elements in the column rather than not 10,000. Can someone guide what I am doing wrong here?
An option is to loop over the index from 2 to 10000 in sapply, extract the elements of 'rand_x' from position 1 to the looped index, apply the var and return a vector of variance output
out <- sapply(2:10000, function(i) var(rand_x[1:i]))
Your code creates a sequence incrementing by one with the variance of the first two elements as start value and the variance of the whole vector as limit.
var(rand_x[1:2]):var(rand_x[1:n])
# [1] 0.9026262 1.9026262 2.9026262
## compare:
.9026262:3.33433
# [1] 0.9026262 1.9026262 2.9026262
What you want is to loop over the vector indices, using seq_along to get the variances of sequences growing by one. To see what needs to be done, I show you first a (rather slow) for loop.
vars <- numeric() ## initialize numeric vector
for (i in seq_along(rand_x)) {
vars[i] <- var(rand_x[1:i])
}
vars
# [1] NA 0.9026262 1.4786540 1.2771584 1.7877717 1.6095619
# [7] 1.4483273 1.5653797 1.8121144 1.6192175 1.4821020 3.5005254
# [13] 3.3771453 3.1723564 2.9464537 2.7620001 2.7086317 2.5757641
# [19] 2.4330738 2.4073546 2.4242747 2.3149455 2.3192964 2.2544765
# [25] 3.1333738 3.0343781 3.0354998 2.9230927 2.8226541 2.7258979
# [31] 2.6775278 2.6651541 2.5995346 3.1333880 3.0487177 3.0392603
# [37] 3.0483917 4.0446074 4.0463367 4.0465158 3.9473870 3.8537925
# [43] 3.8461463 3.7848464 3.7505158 3.7048694 3.6953796 3.6605357
# [49] 3.6720684 3.6580296
The first element has to be NA because the variance of one element is not defined (division by zero).
However, the for loop is slow. Since R is vectorized we rather want to use a function from the *apply family, e.g. vapply, which is much faster. In vapply we initialize with numeric(1) (or just 0) because the result of each iteration is of length one.
vars <- vapply(seq_along(rand_x), function(i) var(rand_x[1:i]), numeric(1))
vars
# [1] NA 0.9026262 1.4786540 1.2771584 1.7877717 1.6095619
# [7] 1.4483273 1.5653797 1.8121144 1.6192175 1.4821020 3.5005254
# [13] 3.3771453 3.1723564 2.9464537 2.7620001 2.7086317 2.5757641
# [19] 2.4330738 2.4073546 2.4242747 2.3149455 2.3192964 2.2544765
# [25] 3.1333738 3.0343781 3.0354998 2.9230927 2.8226541 2.7258979
# [31] 2.6775278 2.6651541 2.5995346 3.1333880 3.0487177 3.0392603
# [37] 3.0483917 4.0446074 4.0463367 4.0465158 3.9473870 3.8537925
# [43] 3.8461463 3.7848464 3.7505158 3.7048694 3.6953796 3.6605357
# [49] 3.6720684 3.6580296
Data:
n <- 50
set.seed(42)
rand_x <- rf(n, 3, 5)

R: Number precision, how to prevent rounding?

In R, I have the following vector of numbers:
numbers <- c(0.0193738397702257, 0.0206218006695066, 0.021931558829559,
0.023301378178208, 0.024728095594751, 0.0262069239112787, 0.0277310799996657,
0.0292913948762414, 0.0308758879014822, 0.0324693108459748, 0.0340526658271053,
0.03560271425176, 0.0370915716288017, 0.0384863653635563, 0.0397490272396821,
0.0408363289939899, 0.0417002577578561, 0.0422890917131629, 0.0425479537267193,
0.0424213884467212, 0.0418571402964338, 0.0408094991140723, 0.039243951482081,
0.0371450856007627, 0.0345208537496488, 0.0314091884865658, 0.0278854381969885,
0.0240607638577763, 0.0200808932436969, 0.0161193801903312, 0.0123615428382314,
0.00920410652651576, 0.00628125319205829, 0.0038816517651031,
0.00214210795679701, 0.00103919307280354, 0.000435532895812429,
0.000154730641092234, 4.56593150728962e-05, 1.09540661898799e-05,
2.08952167815574e-06, 3.10045314287095e-07, 3.51923218134997e-08,
3.02121734299694e-09, 1.95269500257237e-10, 9.54697530552714e-12,
3.5914029230041e-13, 1.07379981978647e-14, 2.68543048763588e-16,
6.03891613157815e-18, 1.33875697089866e-19, 3.73885699170518e-21,
1.30142752487978e-22, 5.58607581840324e-24, 2.92551478380617e-25,
1.85002124085815e-26, 1.39826890505611e-27, 1.25058972437096e-28,
1.31082961467944e-29, 1.59522437605631e-30, 2.23371981458205e-31,
3.5678974253211e-32, 6.44735482309705e-33, 1.30771083084868e-33,
2.95492180915218e-34, 7.3857554006177e-35, 2.02831084124162e-35,
6.08139499028838e-36, 1.97878175996974e-36, 6.94814886769478e-37,
2.61888070029751e-37, 1.05433608968287e-37, 4.51270543356897e-38,
2.04454840598946e-38, 9.76544451781597e-39, 4.90105271869773e-39,
2.5743371658684e-39, 1.41165292292001e-39, 8.06250933233367e-40,
4.78746160076622e-40, 2.94835809615626e-40, 1.87667170875529e-40,
1.22833908072915e-40, 8.21091993733535e-41, 5.53869254991177e-41,
3.74485710867631e-41, 2.52485401054841e-41, 1.69027430542613e-41,
1.12176290106797e-41, 7.38294520887852e-42, 4.8381070000246e-42,
3.20123319815522e-42, 2.16493953538386e-42, 1.50891804884267e-42,
1.09057070511506e-42, 8.1903023226717e-43, 6.3480235351625e-43,
5.13533594742621e-43, 4.25591269645348e-43, 3.57422485839717e-43,
3.0293235331048e-43, 2.58514651313175e-43, 2.21952686649801e-43,
1.91634521841049e-43, 1.66319240529025e-43, 1.45043336371471e-43,
1.27052593975384e-43, 1.11752052211757e-43, 9.86689196888877e-44,
8.74248543892126e-44)
I use cumsum to get the cumulative sum. Due to R's numerical precision, many of the numbers towards the end of the vector are now equivalent to 1 (even though technically they're not exactly = 1, just very close to it).
So then when I try to recover my original numbers by using diff(cumulative), I get a lot of 0s instead of a very small number. How can I prevent R from "rounding"?
cumulative <- cumsum(numbers)
diff(cumulative)
I think the Rmpfr package does what you want:
library(Rmpfr)
x <- mpfr(numbers,200) # set arbitrary precision that's greater than R default
cumulative <- cumsum(x)
diff(cumulative)
Here's the top and bottom of the output:
> diff(cumulative)
109 'mpfr' numbers of precision 200 bits
[1] 0.02062180066950659862445860426305443979799747467041015625
[2] 0.021931558829559001655429284483034280128777027130126953125
[3] 0.02330137817820800150148130569505156017839908599853515625
[4] 0.0247280955947510004688805196337852976284921169281005859375
...
[107] 1.117520522117570086014450710640040701536080790307716261438975e-43
[108] 9.866891968888769759087690539062888824928577731689952701181586e-44
[109] 8.742485438921260418707338389502002282130643811990663213422948e-44
You can adjust the precision as you like by changing the second argument to mpfr.
You might want to try out the package Rmpfr.

How to format numbers in R, specifying the number of significant digits but keep significant zeroes and integer part?

I've been struggling with formatting numbers in R using what I feel are very sensible rules. What I would want is to specify a number of significant digits (say 3), keep significant zeroes, and also keep all digits before the decimal point, some examples (with 3 significant digits):
1.23456 -> "1.23"
12.3456 -> "12.3"
123.456 -> "123"
1234.56 -> "1235"
12345.6 -> "12346"
1.50000 -> "1.50"
1.49999 -> "1.50"
Is there a function in R that does this kind of formatting? If not, how could it be done?
I feel these are quite sensible formatting rules, yet I have not managed to find a function that formats in this way in R. As far as I googled this is not a duplicate of many similar questions such as this
Edit:
Inspired by the two good answers I put together a function myself that I believe works for all cases:
sign_digits <- function(x,d){
s <- format(x,digits=d)
if(grepl("\\.", s) && ! grepl("e", s)) {
n_sign_digits <- nchar(s) -
max( grepl("\\.", s), attr(regexpr("(^[-0.]*)", s), "match.length") )
n_zeros <- max(0, d - n_sign_digits)
s <- paste(s, paste(rep("0", n_zeros), collapse=""), sep="")
}
s
}
format(num,3) comes very close.
format(1.23456,digits=3)
# [1] "1.23"
format(12.3456,digits=3)
# [1] "12.3"
format(123.456,digits=3)
# [1] "123"
format(1234.56,digits=3)
# [1] "1235"
format(12345.6,digits=3)
# [1] "12346"
format(1.5000,digits=3)
# [1] "1.5"
format(1.4999,digits=3)
# [1] "1.5"
Your rules are not actually internally consistent. You want 1234.56 to round down to 1234, yet you want 1.4999 to round up to 1.5.
EDIT This appears to deal with the very valid point made by #Henrik.
sigDigits <- function(x,d){
z <- format(x,digits=d)
if (!grepl("[.]",z)) return(z)
require(stringr)
return(str_pad(z,d+1,"right","0"))
}
z <- c(1.23456, 12.3456, 123.456, 1234.56, 12345.6, 1.5000, 1.4999)
sapply(z,sigDigits,d=3)
# [1] "1.23" "12.3" "123" "1235" "12346" "1.50" "1.50"
As #jlhoward points out, your rounding rule is not consistent. Hence you should use a conditional statement:
x <- c(1.23456, 12.3456, 123.456, 1234.56, 12345.6, 1.50000, 1.49999)
ifelse(x >= 100, sprintf("%.0f", x), ifelse(x < 100 & x >= 10, sprintf("%.1f", x), sprintf("%.2f", x)))
# "1.23" "12.3" "123" "1235" "12346" "1.50" "1.50"
It's hard to say the intended usage, but it might be better to use consistent rounding. Exponential notation could be an option:
sprintf("%.2e", x)
[1] "1.23e+00" "1.23e+01" "1.23e+02" "1.23e+03" "1.23e+04" "1.50e+00" "1.50e+00"
sig0=\(x,y){
dig=abs(pmin(0,floor(log10(abs(x)))-y+1))
dig[is.infinite(dig)]=y-1
sprintf(paste0("%.",dig,"f"),x)
}
> v=c(1111,111.11,11.1,1.1,1.99,.01,.001,0,-.11,-.9,-.000011)
> paste(sig0(v,2),collapse=" ")
[1] "1111 111 11 1.1 2.0 0.010 0.0010 0.0 -0.11 -0.90 -0.000011"
Or the following is almost the same with the exception that 0 is converted to 0 and not 0.0 (fg is a special version of f where the digits specify significant digits and not digits after the decimal point, and the # flag causes fg to not drop trailing zeroes):
> paste(sub("\\.$","",formatC(v,2,,"fg","#")),collapse=" ")
[1] "1111 111 11 1.1 2.0 0.010 0.0010 0 -0.11 -0.90 -0.000011"

Calculate the exponentials of big negative value

I would like to know how can I get the exponential of big negative number in R? For example when I try :
> exp(-6400)
[1] 0
> exp(-1200)
[1] 0
> exp(-2000)
[1] 0
but I need the value of above expression, even if it is so small, how can I get it in R?
Those number are too small. To know the minimum value your computer can handle try:
> .Machine$double.xmin
[1] 2.225074e-308
Will give you (from ?.Machine)
the smallest non-zero normalized floating-point number, a power of the radix, i.e., double.base ^ double.min.exp. Normally 2.225074e-308.
In my case
> .Machine$double.base
[1] 2
> .Machine$double.min.exp
[1] -1022
Actually I can calculate powers up to
> exp(-745)
[1] 4.940656e-324
To go around this issue you need infinite precision arithmetic.
In R you can achieve that using package Rmpfr (PDF vignette)
library(Rmpfr)
# Calculate exp(-100)
> a <- mpfr(exp(-100), precBits=64)
# exp(-1000)
> a^10
1 'mpfr' number of precision 64 bits
[1] 5.07595889754945890823e-435
# exp(-6400)
> a^64
1 'mpfr' number of precision 64 bits
[1] 3.27578823787094497049e-2780
# use an array of powers
> ex <- c(10, 20, 50, 100, 500, 1000, 1e5)
> a ^ ex
7 'mpfr' numbers of precision 64 bits
[1] 5.07595889754945890823e-435 2.57653587296115182772e-869
[3] 3.36969414830892462745e-2172 1.13548386531474089222e-4343
[5] 1.88757769782054893243e-21715 3.56294956530952353784e-43430
[7] 1.51693678090513840149e-4342945
Note that Rmpfr is based on GNU MPFR and requires GNU GMP. Under Linux you will need gmp, gmp-devel, mpfr, and mpfr-devel to be installed in your system in order to install these packages, not sure how that works under Windows.

conversion bigq to mpfr with Rmpfr package

The help documentation of the Rmpfr R package claims that the .bigq2mpfr() function uses the minimal precision necessary for correct representation when the precB argument is NULL :
Description:
Coerce from and to big integers (‘bigz’) and ‘mpfr’ numbers.
Further, coerce from big rationals (‘bigq’) to ‘mpfr’ numbers.
Usage:
.bigz2mpfr(x, precB = NULL)
.bigq2mpfr(x, precB = NULL)
.mpfr2bigz(x, mod = NA)
Arguments:
x: an R object of class ‘bigz’, ‘bigq’ or ‘mpfr’ respectively.
precB: precision in bits for the result. The default, ‘NULL’, means
to use the _minimal_ precision necessary for correct
representation.
However when converting 31/3 one gets a bad approximation:
> x <- as.bigq(31,3)
> .bigq2mpfr(x)
1 'mpfr' number of precision 8 bits
[1] 10.31
By looking inside the .bigq2mpfr() function we see the detailed procedure:
N <- numerator(x)
D <- denominator(x)
if (is.null(precB)) {
eN <- frexpZ(N)$exp
eD <- frexpZ(D)$exp
precB <- eN + eD + 1L
}
.bigz2mpfr(N, precB)/.bigz2mpfr(D, precB)
Firstly I do not understand why precB is taken as follows. The exp output of the frexpZ() is the exponent in binary decomposition:
> frexpZ(N)
$d
[1] 0.96875
$exp
[1] 5
> 0.96875*2^5
[1] 31
Here we get precB=8 and the result is then identical to:
> mpfr(31, precBits=8)/mpfr(3, precBits=8)
1 'mpfr' number of precision 8 bits
[1] 10.31
I am under the impression one should rather replace precB with 2^precB but I'd like to get some advices about that:
> mpfr(31, precBits=8)/mpfr(3, precBits=2^8)
1 'mpfr' number of precision 256 bits
[1] 10.33333333333333333333333333333333333333333333333333333333333333333333333333329
> mpfr(31, precBits=8)/mpfr(3, precBits=2^9)
1 'mpfr' number of precision 512 bits
[1] 10.3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333329
> mpfr(31, precBits=8)/mpfr(3, precBits=2^7)
1 'mpfr' number of precision 128 bits
[1] 10.33333333333333333333333333333333333332
I get (note the difference in my initial creation):
Rgames> fooq<-as.bigq(31/3)
Rgames> fooq
Big Rational ('bigq') :
[1] 5817149518686891/562949953421312
Rgames> .bigq2mpfr(fooq)
1 'mpfr' number of precision 104 bits
[1] 10.3333333333333339254522798000835
All this strongly suggest to me that the precision in your bigq number is in fact zero decimal places, i.e. each of "31" and "3" has that precision. As such, your mpfr conversion is quite correct in giving you a result with one decimal place precision.
This has been corrected in a newer version of the package:
> x <- as.bigq(31,3)
> .bigq2mpfr(x)
1 'mpfr' number of precision 128 bits
[1] 10.33333333333333333333333333333333333332

Resources