I'm completely baffled by the following simple R code. In the first part x will equal v (that's what I want).
But then strangely in the second part I change the input values but follow the exact same logic as in the first part HOWEVER this time x and v no longer match! I'm deeply wondering where is the problem?
First Part:
m1 = 5
m2 = 1.3*m1
A = m1 + m2
x = 5
a <- function(m3){
abs((m1 - (A + m3)/3)^2 + (1.3*m1 - (A + m3)/3)^2 + (m3 - (A + m3)/3)^2 - 3*x) }
m3 = optimize(a, interval = c(0, 100), tol = 1e-20)[[1]]
v = var(c(m1, m2, m3))*(2/3) # gives "5" same as "x"
Second Part:
eta.sq = .25
beta = qnorm(c(1e-12, .999999999999))
q = c(0, 25)
mu.sig = solve(cbind(1L, beta), q)
m1 = mu.sig[[1]]
H = (mu.sig[[2]])^2
m2 = 1.3 * m1
A = m1 + m2
x = (H * eta.sq) / (1 - eta.sq) # "x" is: 1.052529
a = function(m3){
abs((m1 - (A + m3)/3)^2 + (1.3*m1 - (A + m3)/3)^2 + (m3 - (A + m3)/3)^2 - 3*x) }
m3 = optimize(a, interval = c(0, 100), tol = 1e-20)[[1]]
v = var(c(m1, m2, m3))*(2/3) # "v" is: 2.343749
The difference is that for your first part, the function a has two roots, and the optimize function finds one of them (m3=10.31207). At this value of m3, the fact that a(m3)==0 implies that the normalized sum of squares (SS) of m1, m2, and m3 is equal to 3*x:
> a(m3)
[1] 3.348097e-07
> ss <- function(x) { sum((x-mean(x))^2) }
> ss(c(m1, m2, m3))
[1] 15
> 3*x
[1] 15
>
By the definition of the sample variance, the variable v is equal to one-third the SS, so you get v==x.
In contrast, in the second part, your function a has no roots. It attains a minimum at m3=14.375, but at this value of m3, the value of a(m3)==3.87366 is not zero, so the normalized sum of squares is not equal to 3*x, and so there's no reason to expect that v (one-third the SS) should equal x.
> a(m3)
[1] 3.87366
> ss(c(m1, m2, m3))
[1] 7.031247 -- actual SS value...
> 3*x
[1] 3.157587 -- ...couldn't be optimized to equal 3*x
>
Related
Is it possible to solve the following problem in R?
In particular, I want to find the values of a1 and a2 minimizing the loss below:
> n <- 1000
> x <- rnorm(n, 1, 1)
> e <- rnorm(n, 0, 1)
> d <- a1+a2*x+e < 0
> loss <- (mean(d) - 0.5) + (mean((a1 + a2*x + e)[d=0]) - 2)
That is, I want to find the values of a1 and a2 that make mean(d) and mean((a1+a2*x+e)[d=0]) as close as possible to 0.5 and 2, respectively.
(the chosen values 0.5 and 2 are just temporal values)
Using optim with a function f that computes the defined loss. p is a vector of parameters, i.e. p[1] is your a1, and p[2] your a2. Use reasonable starting values when calling optim with your function.
f <- \(p) {
d <- p[1] + p[2]*x + e < 0
(mean(d) - 0.5) + (mean((p[1] + p[2]*x + e)[d]) - 2)
}
res <- optim(c(0, 0), f)
res$par
# [1] 4.393432e+53 1.010012e+55 ## a1 and a2
Note that d is already boolean.
In case you get different results with different starting values, your distribution might be multi-modal.
Data:
n <- 1e3; set.seed(42); x <- rnorm(n, 1, 1); e <- rnorm(n, 0, 1)
I want to define very simple function following:
where:
My work so far
prob <- function(x, n) {
quan <- qgamma(0.95, n, 1)
temp <- quan / (x)^2
first_term <- exp(-temp)
second_term <- temp^(0:(n - 1)) / factorial(0:(n - 1))
second_term <- sum(second_term)
first_term * second_term
}
The problem here is that in the sum (second term) for big n we are dealing with very big numbers, so R treats those as infinity.
So for example:
prob(0.5, n = 1000)
[1] NaN
Because quantile for n = 1000 equals to 1052.577, in nominator we have to calculate 1052.577^999 and in denominator factorial of 999. R understands those two numbers as infinity:
> factorial(999)
[1] Inf
> 1052.577^999
[1] Inf
So when it tries to divide them NaN is produced. However the output of this function is always in interval (0, 1), since its a probability. Is there any possibility to calculate value of this function in this point?
Your prob function is just the cumulative Poisson with lambda = temp and k = n - 1. Use ppois:
prob <- function(x, n) {
return(ppois(n - 1, qgamma(0.95, n, 1)/x^2))
}
prob(0.5, n = 1000)
# [1] 0
prob(0.5, n = 1000) = 0 because n - 1 = 999 is so far from the mean (lambda = qgamma(0.95, 1000, 1)/0.5^2 = 4210.308).
I want to simplify the equation below, and at the same time, varies the value of b
b*(b*((1-b)*x) + (1-b)*y) + (1-b)*z
So, if I give b = 0.9,
b <- 0.9
# the answer will be:
0.081x + 0.09y + 0.1z
The reason is I want to see how different values of b, will impact the weights/coefficients of x, y, and z.
I have no idea how to do this, or if it even possible in R.
Any help is appreciated.
I guess you may try Reduce like below
Reduce(function(u, v) b * u + v, (1 - b) * c(x, y, z))
and you will see
> b <- 0.9
> x <- 1e3
> y <- 1e2
> z <- 1e1
> Reduce(function(u, v) b * u + v, (1 - b) * c(x, y, z))
[1] 91
If you want to see the coefficients of x, y and z, you can use
> f <- function(b) (1 - b) * b^((3:1) - 1)
> f(0.9)
[1] 0.081 0.090 0.100
and the sum of weighted x, y, and z can be written as
s = sum(f(0.9)*c(x,y,z))
I am trying to implement a variation of the Brent-Salamin algorithm in R. It works well for the first 25 iterations, but then, it behaves unexpectedly, returning negative results.
The algorithm I want to implement is:
initial values:
x_0 = 1; y_0 = 1/sqrt(2); z_0 = 1/2
x_n = (x_n-1 + y_n-1)/2
y_n = sqrt(x_n-1 * y_n-1)
z_n = z_n-1 - 2^n * (x_n^2-y_n^2)
p_n = (2 * x_n^2) / z_n
where n is the current iteration.
A more beautifully formatted formula is here.
The code I figured out is:
mypi <- function(n){
x = 1
y = 1/sqrt(2)
z = 1/2
iteration = 0
while(iteration < n){
iteration = iteration + 1
newx = (x + y) / 2
y = sqrt(x * y)
x = newx
z = z-(2^iteration * (x^2 - y^2))
p = (2 * x^2) / z
}
return(p)
}
Output:
> mypi(10)
[1] 3.141593
> mypi(20)
[1] 3.141593
> mypi(50)
[1] -33.34323
So as I am new to R, is there a bug in my code or is it the algorithm?
Your code simply messes up because it does not agree with the algorithm as written in the wiki page. A correct version looks like this:
mypi <- function(n){
x = 1
y = 1/sqrt(2)
z = 1/4
p <- 1
iteration = 0
while(iteration < n){
iteration = iteration + 1
newx = (x + y) / 2
y = sqrt(x * y)
# x = newx
# z = z-(2^iteration * (x^2 - y^2))
z = z- p* (x-newx)^2
p = 2*p
x = newx
}
(newx + y)^2/(4*z)
}
Gives
> mypi(10)
[1] 3.141593
> mypi(20)
[1] 3.141593
> mypi(50)
[1] 3.141593
I have to generate random numbers for two groups of a vector of size N.
The probability for one group is p, and for the other is q = 1-p.
(Eg. for a population of 1000 with p=0.5, I have to generate 500 random number from a distribution and 500 from another).
Since this is a simulation in which I have to vary 'p' I wrote my code to generate like this:
group1 = rnorm(n = N*p)
group2 = rnorm(n = N*q) # 1st method
group2 = rnorm(n = (N - N*p)) # 2nd method
With both of the above methods, R generates one less random numbers than it should in several rows of group2 (about 35% of rows with the first, and about 12% of rows with the second method).
I run into the same bug with rexp, rpois and runif as well.
Below is the snapshot of both the methods for your reference.
#### EXAMPLE SCRIPT #####
N = 1000
p1 = seq(0.01, 0.99, 0.001)
q1 = 1 - p1
### FIRST METHOD ###
X = data.frame()
for (i in 1:length(p1))
{
X[i, 1] = p1[i]
X[i, 2] = q1[i]
X[i, 3] = length(runif((N * X[i, 1])))
X[i, 4] = length(runif((N * X[i, 2])))
X[i, 5] = X[i, 4] + X[i, 3]
}
table(X[, 5] == 1000) # column three + coulmn four should sum to 1000
### SECOND METHOD ###
Y = data.frame()
for (i in 1:length(p1))
{
Y[i, 1] = p1[i]
Y[i, 2] = q1[i]
Y[i, 3] = length(runif((N * Y[i, 1])))
Y[i, 4] = length(runif((N - N * Y[i, 1])))
Y[i, 5] = Y[i, 3] + Y[i, 4]
}
table(Y[, 5] == 1000) # column three + coulmn four should sum to 1000
R FAQ 7.31 - rounding error - your particular problem boils down to this:
> p=0.32
> p*1000 + (1-p)*1000
[1]1000
well that looks correct. But is it really?
> (p*1000 + (1-p)*1000) == 1000
[1] FALSE
No. Why not? How wrong is it?
> (p*1000 + (1-p)*1000) - 1000
[1] -1.136868e-13
1 part in 10^-13. Which means:
> length(runif(1000*p))
[1] 320
> length(runif(1000*(1-p)))
[1] 679
because:
> as.integer(1000*p)
[1] 320
> as.integer(1000*(1-p))
[1] 679
which adds up to 999. See the R FAQ 7.31 for details on floating point approximations
The solution is to work in integers as much as possible when dealing with counts.
> Np = as.integer(1000*p)
> length(runif(Np))
[1] 320
> length(runif(1000-Np))
[1] 680
rather than computing q as 1-p and multiplying that by N to try and get 1000-N*p.