finding the fraction which is minimal to pi in r - r

I am trying to write a function to output the minimal common fraction n/d, where min <= d <= max <= |(n/d) - pi|
That is:
n is the numerator
d is the denominator
And the minand max are the boundaries, i.e. search over all denominators dbetween min and max.
If d = 1: 3/1 <= pi <= 4/1, gives the closest fraction of 3/1 with a distance of |3/1 - pi| = 0.142
...
If d = 4: 12/4 <= pi <= 13/4, gives the closest fraction of 13/4 with a distance of |13/4 - pi| = 0.108
...
If d = 6: 18/6 <= pi <= 19/6, gives the closest fraction of 19/6 with distance of |19/6 - pi| = 0.025
If d = 7: 21/7 <= pi <= 22/7, gives the closest fraction of 22/7 with a distance of |22/7 - pi| = 0.001
...
If d = 10: 31/10 <= pi <= 32/10 gives the closest fraction of 31/10 with a distance of |31/10 - pi| = 0.042
Therefore, here, the best approximation is 22/7 when d = 7 and where a distance to pi is 0.001
min = 1
max = 10
library(Rmpfr)
Const("pi", 3333) # pi correct to 1000 decimal places
1 'mpfr' number of precision 3333 bits
[1] 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938446095505822317253594081284811174502841027019385211055596446229489549303819644288109756659334461284756482337867831652712019091456485669234603486104543266482133936072602491412737245870066063155881748815209209628292540917153643678925903600113305305488204665213841469519415116094330572703657595919530921861173819326117931051185480744623799627495673518857527248912279381830119491298336733624406566430860213949463952247371907021798609437027705392171762931767523846748184676694051320005681271452635608277857713427577896091736371787214684409012249534301465495853710507922796892589235420199561121290219608640344181598136297747713099605187072113499999983729780499510597317328160963185950244594553469083026425223082533446850352619311881710100031378387528865875332083814206171776691473035982534904287554687311595628638823537875937519577818577805321712268066130019278766111959092164201989381

The following function finds the best approximation in an entire range of denominators (and replaces an extremely inefficient function I posted earlier. Look at the edit history if you want a good laugh).
findBestApprox <- function(minD, maxD) {
lowers <- floor(pi*(minD:maxD))/(minD:maxD)
i <- which.min(abs(pi - lowers))
best.lower <- lowers[i]
uppers <- ceiling(pi*(minD:maxD))/(minD:maxD)
j <- which.min(abs(pi - uppers))
best.upper <- uppers[j]
if(abs(pi - best.lower) < abs(pi - best.upper)) {
d <- minD + i - 1
n <- floor(pi*d)
} else {
d <- minD + j - 1
n <- ceiling(pi*d)
}
c(n,d)
}
For example
> findBestApprox(2,1000)
[1] 355 113
> 355/113
[1] 3.141593
The vectorized nature of the code makes it very fast, taking only a second or so to search out to 10 million:
> findBestApprox(2,10000000)
[1] 5419351 1725033
> format(5419351/1725033,digits = 16)
[1] "3.141592653589815"
> 5419351/1725033 - pi
[1] 2.220446e-14

Related

Pi Estimator in R

The code below estimates pi in R, now I am trying to find the minimum number of terms N_Min
you would have to include in your estimate of pie to make it accurate to three decimal places.
pi_Est<- function(NTerms){
NTerms = 5 # start with an estimate of just five terms
pi_Est = 0 # initialise the value of pi to zero
Sum_i = NA # initialise the summation variable to null
for(ii in 1:NTerms)
{
Sum_i[ii] = (-1)^(ii+1)/(2*ii - 1) # this is the series equation for calculating pi
}
Sum_i = 4*Sum_i # multiply by four as required in the formula (see lecture notes)
pi_Est = sum(Sum_i)
cat('\nThe estimate of pi with terms = ', NTerms ,' is ',pi_Est)
}
First of all, I would change some things about your function. Instead of getting it to print out a message, get it to return a value. Otherwise it becomes very difficult to do anything with its output, including testing it for convergence to pi.
Also, no matter what the value of NTerms is you feed this function, you are immediately over-writing NTerms inside the function.
You could rewrite the function like this:
pi_Est <- function(NTerms) {
pi_Est <- 0
Sum_i <- numeric()
for(ii in seq(NTerms))
{
Sum_i[ii] <- (-1)^(ii+1)/(2*ii - 1)
}
return(sum(4 * Sum_i))
}
And to show it converges to pi, let's test it with 50,000 terms:
pi_Est(50000)
#> [1] 3.141573
Now, if we want to find the first value of NTerms that is correct to 3 decimal places, we are going to need to be able to call this function on a vector of NTerms - at the moment it is only working on a single number. So let's define the function f that vectorizes pi_Est:
f <- Vectorize(pi_Est)
Now, let's create the estimate for all values of NTerms between 1 and 2,000 and store them in a vector:
estimates <- f(1:2000)
We can see that the values of estimates seem to oscillate round and converge to pi if we plot the first 100 values:
plot(estimates[1:100], type = 'l')
abline(h = pi)
Our answer is just the first value which, when rounded to three decimal places, is the same as pi rounded to three decimal places:
result <- which(round(estimates, 3) == round(pi, 3))[1]
result
#> [1] 1103
And we can check this is correct by feeding 1103 into our original function:
pi_Est(result)
#> [1] 3.142499
You will see that this gives us 3.142, which is the same as pi rounded to 3 decimal places.
Created on 2022-01-31 by the reprex package (v2.0.1)
1000 terms are required to make the estimate accurate to within 0.001:
pi_Est1 <- function(n) {
if (n == 0) return(0)
neg <- 1/seq(3, 2*n + 1, 4)
if (n%%2) neg[length(neg)] <- 0
4*sum(1/seq(1, 2*n, 4) - neg)
}
pi_Est2 <- function(tol) {
for (i in ceiling(1/tol + 0.5):0) {
est <- pi_Est1(i)
if (abs(est - pi) > tol) break
est1 <- est
}
list(NTerms = i + 1, Estimate = est1)
}
tol <- 1e-3
pi_Est2(tol)
#> $NTerms
#> [1] 1000
#>
#> $Estimate
#> [1] 3.140593
tol - abs(pi - pi_Est2(tol)$Estimate)
#> [1] 2.500001e-10
tol - abs(pi - pi_Est1(pi_Est2(tol)$NTerms - 1))
#> [1] -1.00075e-06
Created on 2022-01-31 by the reprex package (v2.0.1)
Perhaps we can try the code below
pi_Est <- function(digits = 3) {
s <- 0
ii <- 1
repeat {
s <- s + 4 * (-1)^(ii + 1) / (2 * ii - 1)
if (round(s, digits) == round(pi, digits)) break
ii <- ii + 1
}
list(est = s, iter = ii)
}
and you will see
> pi_Est()
$est
[1] 3.142499
$iter
[1] 1103
> pi_Est(5)
$est
[1] 3.141585
$iter
[1] 130658
Why not use a single line of code for the calculation?
Pi <- tail(cumsum(4*(1/seq(1,4*50000000,2))*rep(c(1,-1), 50000000)),1)

Determining the n sample size to reject null at alpha =0.01 in R script

Create a simulated dataset of 100 observations, where x is a random normal variable with mean 0 and standard deviation 1, and y = 0.1 + 2 * X + e, where epsilon is also a random normal error with mean 0 and sd 1.
set.seed(1)
# simulate a data set of 100 observations
x <- rnorm(100)
y.1 <- 0.1 + 2*x + rnorm(100)
Now extract the first 5 observations.
y1.FirstFive <- (y.1[1:5]) # extract first 5 observations from y
x.FirstFive <- (x[1:5]) # extract first 5 observations from x
y1.FirstFive # extracted 5 observations from y1
[1] -1.7732743 0.5094025 -2.4821789 3.4485904 0.1044309
x.FirstFive # extracted 5 observations from x
[1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078
Assuming the mean and sd of the sample that you calculated from the first five observations would not change, what is the minimum total number of additional observations you would need to be able to conclude that the true mean of the population is different from 0 at the p = 0.01 confidence level?
alpha <- 0.01
mu <- 0
for (i in 5:2000) {
# Recalculate the standard error and CI
stand_err <- Sd_y1 / sqrt(i)
ci <- sample_mean_y1 + c(qt(alpha/2, i-1), qt(1-alpha/2, i-1))*stand_err
if (ci[2] < mu)
break # condition met, exit loop
}
i
[1] 2000
Here, I wrote a loop that iteratively increases n from the initial n=5 to n=2000, uses pt to find the p value (given a fixed y-bar and sd), and stops when p < 0.01. However I keep getting the wrong output. Such that, the output is always the number of the maximum range that I give (here, it is 2000) instead of giving me the specific minimum n sample in order to reject the null that mu_y = 0 at the p=0.01 level. Any suggestions as to how to fix the code?
additional info: the sd of y1.FirstFive = 2.3 and mean of y1.FirstFive = -0.04
Assuming:
Sd_y1 = sd(y1.FirstFive)
sample_mean_y1 = mean(y1.FirstFive)
sample_mean_y1
[1] -0.03860587
As pointed out by #jblood94, you need to go for larger sample size.
You don't need a for loop for this, most of your functions are vectorized, so something like this:
n = 5:30000
stand_err = Sd_y1 / sqrt(n)
ub = sample_mean_y1 + qt(1-alpha/2, n-1)*stand_err
n[min(which(ub<0))]
[1] 23889
It's because n > 2000.
set.seed(1)
x <- rnorm(100)
y.1 <- 0.1 + 2*x + rnorm(100)
Sd_y1 <- sd(y.1[1:5])
sample_mean_y1 <- mean(y.1[1:5])
alpha <- 0.01
sgn <- 2*(sample_mean_y1 > 0) - 1
f <- function(n) qt(alpha/2, n - 1)*Sd_y1 + sgn*sample_mean_y1*sqrt(n)
upper <- 2
while (f(upper) < 0) upper <- upper*2
(n <- ceiling(uniroot(f, lower = upper/2, upper = upper, tol = 0.5)$root))
#> [1] 23889

why dot product of normalized vector is always data size -1

I don't understand why dot product of normalized vector is always data size -1.
a <- scale(rnorm(100))
crossprod(a)
# equal = 100 - 1 = 99
b <- scale(runif(50))
crossprod(b)
# equal = 50 - 1 = 49
c <- scale(rchisq(30, 5))
crossprod(c)
# equal = 30 - 1 = 29
I want to know mathematical understanding.
Not in LaTex, but proof may help you to understand:
Your values are scaled, so: [x_i-mean(X)] / sd(X).
Crossprod does sum of squares of x_i = Sum_i ( [x_i-mean(X)])^2
Variance (squared sd): var(X) = sd^2(X) = 1/(n-1) * Sum_i ( [x_i-mean(X)])^2
Crossprod = Sum_i ([x_i-mean(X)] / sd(X))^2) = 1/sd(X)^2 * Sum_i ( [x_i-mean(X)]^2) = 1/(1/(n-1)) = n-1

quadprog fails to find a solution

I am trying to optimize layout of a set of boxes w.r.t. their hanger locations s.t. the boxes are most aligned with their hangers and do not crowd out each other. Using quadprog.
Givens:
1. box hanger x-locations (P). =710 850 990 1130
2. box-sizes (W). =690 550 690 130
3. usable x-spread tuple (S). =-150 2090
4. number of boxes (K). =4
5. minimum interbox spread (G). =50
6. box x-locations (X). =objective
We can see that the total required x-spread is sum(W) + 3G = 2060 + 150 = 2210 whereas the available x-spread is S[2] - S1 = 2240. So, a solution should exist.
Min:
sumof (P[i] – X[i])^2
s.t.:
(1) X[i+i] – X[i] >= G + ½ ( W[i+1] + W[i] ); i = 1..(K-1), i.e. the boxes do not crowd out each other
-X[i] + X[i+1] >= -( -G – ½ (W[i+1] + W[i]) )
(2) X1 >= S[left] + ½ W1, and (3) X[K] <= S[right] – ½ W[K], i.e. the boxes are within the given x-spread
X[1] >= - ( S[left] + ½ W[1] )
-X[K] >= - ( S[right] – ½ W[K] )
for a total of 5 constraints - 3 for the inter-box spread, and 2 for extremities.
in R:
> Dmat = matrix(0,4,4)
> diag(Dmat) = 1
> dvec = P, the hanger locations
[1] 710 850 990 1130
> bvec
[1] -670 -670 -460 -195 2025
> t(Amat)
[,1] [,2] [,3] [,4]
[1,] -1 1 0 0
[2,] 0 -1 1 0
[3,] 0 0 -1 1
[4,] 1 0 0 0
[5,] 0 0 0 -1
> solve.QP(Dmat, dvec, Amat, bvec)
Error in solve.QP(Dmat, dvec, Amat, bvec) :
constraints are inconsistent, no solution!
Quite obviously I have missed or mis-specified the problem (Package 'quadprog')! I am using quadprog as I found a JavaScript port of it.
Thanks a lot.
I'm not sure that this solves your physical problem but the code below seems to solve the optimization problem as you stated it. I've generalized it to a
variable number of boxes and included a plot to check the solution.
library(quadprog)
p <- c(710, 850, 990, 1130) # hanger positions
w <- c(690, 550, 690, 130) # box widths
g <- 50 # min box separation
s <- c(-150, 2390) # min and max postions of box edges
k <- length(w) # number of boxes
Dmat <- 2*diag(nrow=k)
dvec <- p
# separation constraints
Amat <- -diag(nrow=k,ncol=(k-1))
Amat[lower.tri(Amat)] <- unlist(lapply((k-1):1, function(n) c(1,numeric(n-1))))
bvec <- sapply(1:(k-1), function(n) g + (w[n+1]+w[n])/2)
# x-spread constraints
Amat <- cbind(Amat, c(1,numeric(k-1)), c(numeric(k-1),-1))
bvec <- c(bvec, s[1] + w[1]/2, -(s[2] - w[k]/2))
sol <- solve.QP(Dmat, dvec, Amat, bvec)
plot(x=s, y=c(0,0), type="l", ylim=c(-2.5,0))
points(x=p, y=numeric(k), pch=19)
segments(x0=sol$solution, y0=-1, x1=p, y1=0)
rect(xleft=sol$solution-w/2, xright=sol$solution+w/2, ytop=-1.0, ybottom=-2, density=8)
The problem lies with the setup of Amat, bvec or both. solve.QP tries to find a solution, b, of the quadratic programming problem subject to the constraint that
t(Amat)*b >= bvec
Expanding out this constraint in your example, we want to find a vector b := c(b[1], b[2], b[3], b[4]) that satisfies the conditions:
-b[1] + b[2] >= -670,
-b[2] + b[3] >= -670,
-b[3] + b[4] >= -460,
b[1] >= -195
and -b[4] >= 2025 (i.e., b[4] <= -2025).
However, by adding the first four inequalities together, we have b[4] >= -670-670-460-195 = -1995. In other words, b[4] must be greater than -1995 and less than -2025. This is a contradiction and therefore solve.QP fails to find a solution.
Trying this example with the constraint -b[4] >= -2025, by setting bvec = c(-670, -670, -460, -195, -2025) yields a solution. Without going too much into your formulation above, perhaps this was intended (or another one of these values should have been positive)?

Central limit theorem in [R]

I am working with the language [R] to generate a sample of M = 32000 averages each calculated by averaging 36 independent values ​​of the random variable continuous uniform distribution (0, 1) is generated as follows:
sampleA<-1:32000
for ( i in 1:32000){
MuestraAUnif<- runif(36)
sampleA[i]<-mean(MuestraAUnif)
}
For the sample generated ask me calculate relative frequency of observed averages greater than L = 0.32 +4 * 1 / 100 and compare it with the probability (approximated by "Central limit theorem") that the average N values ​​greater than L. as follows:
L<- 0.32+4*1/100
sigma<- sqrt(1/12) #(b-a)/12
miu = 0.5 #(a+b)/2
greaterA <-sum(sampleA > L) #values of the sample greater than L are 23693
xBar<- greaterA/length(sampleA)
X <- sum(sampleA)
n<-32000
Zn<- (X - n*miu)/(sigma*sqrt(n))
cat("P(xBar >",L,") = P(Z>", Zn, ")=","1 - P (Z < ", Zn,") =",1-pnorm(Zn),"\n") #print the theoretical prob Xbar greater than L
cat("sum (sampleA >",L,")/","M=", n," para N =", 36,":",xBar, "\n") #print the sampling probability print when is greater than L
The output is:
P(xBar > 0.36 ) = P(Z> -3.961838 )= 1 - P (Z < -3.961838 ) = 0.9999628
sum (sampleA > 0.36 )/ M= 32000 para N = 36 : 0.7377187
My question is: Why are so far values​​?, Presumably they should be much closer (0.9999628 is far from 0.7377187). Am I doing something wrong with my implementation?. Excuse my English.
Melkhiah66. You did everything right only change
MuestraAUnif<- runif(2) for MuestraAUnif<- runif(32)
and it should work

Resources