With x, p and s known, I'm trying to solve this problem in R: Find N such as qnorm(p, N, s)=q
Example: Find N such as 30==qnorm(0.05, N, 3)
My solution:
x<-seq(30, 50, 0.1)
y<-qnorm(0.05, x, 3)
plot(x,y)
Looking at the plot, the solution is around 35.
I can refine the answer following this trial method.
My question is: Is there a direct function to solve this problem?
The key here is realising that the qnorm(0.05, N, 3) is the same as qnorm(0.05, 0, 3) + N, since all the mean parameter does is to shift the whole distribution left or right. So if we take 30 = qnorm(0.05, N, 3) and rearrange it, we get:
N <- 30 - qnorm(0.05, 0, 3)
N
#> [1] 34.93456
Or to generalise:
inv.qnorm <- function(goal, sd, p) goal - qnorm(p, 0, sd)
This gives us an answer with greater precision, speed and memory usage than could be achieved using the sequences-lookup approach.
Basically I create a means vector centred on the goal with length 2*standard_deviation*qnorm(1-p/2) and then get the element of this vector which has the minimal distance from the goal and return it
inv.qnorm <- function(goal, sd, p, precision=.0001){
x <- seq(goal - sd* qnorm(1-p/2), goal + sd* qnorm(1-p/2), precision)
x[which.min(abs(qnorm(p, x, sd)-goal))]
}
inv.qnorm(30, 3, .05)
#> [1] 34.93461
Related
If I have a equation like 10 + x^2 + x^3 + x^4 = y and an x value like 2. Is there way to plug this into r so it would solve for y? It sounds trivial but eventually I would like to solve for x using polynomials that higher degrees like 30. Anyone know of a possible way to do this in r but without plugging in the x value manually?
Please note: I'm trying to solve for y given a specific x value.
You can easily write your own function:
p <- function(x, coefs) c(cbind(1, poly(x, degree = length(coefs) - 1,
raw = TRUE, simple = TRUE)) %*% coefs)
p(2, c(10, 0, 1, 1, 1))
#[1] 38
Use rep if you need many coefficients of 1.
How can I calculate the alpha value for P(x>a)=0.1 using R commands for a given Beta(3,2)?
I know there are pbeta, qbeta but none of them fits the problem as far as I know...
Note that a <- qbeta(p, 3, 2) solves P(x < a) = p. Then, note that P(x >= a) = 1 - P(x<a). So, you need to calculate a <- qbeta(1 - p, 3, 2)
Let g(x) = 1/(2*pi) exp ( - x^2 / 2) be the density of the normal distribution with mean 0 and standard deviation 1. In some calculation on paper appeared integrals of the form
where c>0 is a positive number.
Since I could not evaluate this by hand, I had the idea to approximate and plot it. I tried this in R, because R provides the dnorm function and a function to do integrals.
You see that I need to integrate numerically n times, where n shall be chosed by the call of a plot function. My code has an for-loop to create those "incomplete" convolutions iterativly.
For example even with n=3 and c=1 this gives me an error. n=2 (thus it's one integration) works.
N = 3
ngauss <- function(x) dnorm(x , mean = 0, sd = 1)
convoluts <- list()
convoluts[[1]] <- ngauss
for (i in 2:N) {
h <- function(y) {
g <- function(z) {ngauss(y-z)*convoluts[[i-1]](z)}
return(integrate(g, lower = -1, upper = 1)$value)
}
h <- Vectorize(h)
convoluts[[i]] <- h
}
convoluts[[3]](0)
What I get is:
Error: evaluation nested too deeply: infinite recursion /
options(expressions=)?
I understand that this is a hard computation, but for "small" n something similar should possible.
Maybe someone can help me to fix my code or provide a recommendation how I can implement this in a better way. Another language that is more appropriate for this would be also okay.
The issue appears to be in how integrate deals with variables in different environments. In particular, it doesn't really deal with i correctly in each iteration. Instead using
h <- evalq(function(y) {
g <- function(z) {ngauss(y - z) * convoluts[[i - 1]](z)}
integrate(g, lower = -1, upper = 1)$value
}, list(i = i))
does the job and, say, setting N <- 6 quickly gives
convoluts[[N]](0)
# [1] 0.03423872
As your integration is simply the pdf of a sum of N independent standard normals (which then follows N(0, N)), we may also verify this approach by setting lower = -Inf and upper = Inf. Then with N <- 4 we have
dnorm(0, sd = sqrt(N))
# [1] 0.1994711
convoluts[[N]](0)
# [1] 0.1994711
So, for practical purposes, when c = Inf, you are way better off using dnorm rather than manual computations.
I'm writing a function to calculate the quantile of the GEV distribution. The relevant aspect for this question is that a different form of the function is required when one of the parameters (the shape parameter or kappa) is zero
Programmatically, this is commonly addressed as follows (this is a snippet from evd:qgev and is similar in lmomco::quagev):
(Edit: Version 2.2.2 of lmomco has addressed the issue identified in this question)
if (shape == 0)
return(loc - scale * log(-log(p)))
else return(loc + scale * ((-log(p))^(-shape) - 1)/shape)
This works fine if shape/kappa is exactly equal to zero but there is odd behaviour near zero.
Lets look at an example:
Qgev_zero <- function(shape){
# p is an exceedance probability
p= 0.01
location=0
scale=1
if(shape == 0) return( location - scale*(log(-log(1-p) )))
location + (scale/shape)*((-log(1-p))^-shape - 1)
}
Qgev_zero(0)
#[1] 4.600149
Qgev_zero(1e-8)
#[1] 4.600149
This looks fine because the same answer is returned near zero and at zero. But look at what happens closer to zero.
k.seq <- seq(from = -4e-16, to = 4e-16, length.out = 1000)
plot(k.seq, sapply(k.seq, Qgev_zero), type = 'l')
The value returned by the function oscillates is often incorrect.
These problems go away if I replace the direct comparison with zero with all.equal e.g.
if(isTRUE(all.equal(shape, 0))) return( location - scale*(log(-log(1-p) )))
Looking at the help for all.equal suggests that for default values, anything smaller than 1.5e-8 will be treated as zero.
Of course this odd behaviour near zero is probably not generally an issue but in my case, I'm using optimisation/root finding to determine parameters from known quantiles so am concerned that my code needs to be robust.
To the question: is using all.equal(target, 0) an appropriate way to deal with this problem? Why is it that this approach isn't used routinely?
Some functions, when implemented the obvious way with floating point representations, are ill-behaved at certain points. That's especially likely to be the case when the function has to be manually defined at a single point: When things go absolutely undefined at a point, it's likely that they're hanging on for dear life when they get close.
In this case, that's from the kappa denominator fighting the kappa negative exponent. Which one wins the battle is determined on a bit-by-bit basis, each one sometimes winning the "rounding to a stronger magnitude" contest.
There's a variety of approaches to fixing these sorts of problems, all of them designed on a case-by-case basis. One often-flawed but easy-to-implement approach is to switch to a better-behaved representation (say, the Taylor expansion with respect to kappa) near the problematic point. That'll introduce discontinuities at the boundaries; if necessary, you can try interpolating between the two.
Following Sneftel's suggestion, I calculate the quantile at k = -1e-7 and k = 1e-7 and interpolate if k argument falls between these limits. This seems to work.
In this code I'm using the parameterisation for the gev quantile function from lmomco::quagev
(Edit: Version 2.2.2 of lmomco has addressed the issues identified in this question)
The function Qgev is the problematic version (black line on plot), while Qgev_interp, interpolates near zero (green line on plot).
Qgev <- function(K, f, XI, A){
# K = shape
# f = probability
# XI = location
# A = scale
Y <- -log(-log(f))
Y <- (1-exp(-K*Y))/K
x <- XI + A*Y
return(x)
}
Qgev_interp <- function(K, f, XI, A){
.F <- function(K, f, XI, A){
Y <- -log(-log(f))
Y <- (1-exp(-K*Y))/K
x <- XI + A*Y
return(x)
}
k1 <- -1e-7
k2 <- 1e-7
y1 <- .F(k1, f, XI, A)
y2 <- .F(k2, f, XI, A)
F_nearZero <- approxfun(c(k1, k2), c(y1, y2))
if(K > k1 & K < k2) {
return(F_nearZero(K))
} else {
return(.F(K, f, XI, A))
}
}
k.seq <- seq(from = -1.1e-7, to = 1.1e-7, length.out = 1000)
plot(k.seq, sapply(k.seq, Qgev, f = 0.01, XI = 0, A = 1), col=1, lwd = 1, type = 'l')
lines(k.seq, sapply(k.seq, Qgev_interp, f = 0.01, XI = 0, A = 1), col=3, lwd = 2)
I'm having trouble to compute and then plot multiple integral. It would be great if you could help me.
So I have this function
> f = function(x, mu = 30, s = 12){dnorm(x, mu, s)}
which i want to integrate multiple time between z(1:100) to +Inf to plot that with x=z and y = auc :
> auc = Integrate(f, z, Inf)
R return :
Warning message:
In if (is.finite(lower)) { :
the condition has length > 1 and only the first element will be used
I have tested to do a loop :
while(z < 100){
z = 1
auc = integrate(f,z,Inf)
z = z+1}
Doesn't work either ... don't know what to do
(I'm new to R , so I'm already sorry if it is really easy .. )
Thanks for your help :) !
There is no need to do the integrating by hand. pnorm gives the integral from negative infinity to the input for the normal density. You can get the upper tail instead by modifying the lower.tail parameter
z <- 1:100
y <- pnorm(z, mean = 30, sd = 12, lower.tail = FALSE)
plot(z, y)
If you're looking to integrate more complex functions then using integrate will be necessary - but if you're just looking to find probabilities for distributions then there will most likely be a function built in that does the integration for you directly.
Your problem is actually somewhat subtle, and in a certain sense gets to the core of how R works, so here is a slightly longer explanation.
R is a "vectorized" language, which means that just about everything works on vectors. If I have 2 vectors A and B, then A+B is the element-by-element sum of A and B. Nearly all R functions work this way also. If X is a vector, then Y <- exp(X) is also a vector, where each element of Y is the exponential of the corresponding element of X.
The function integrate(...) is one of the few functions in R that is not vectorized. So when you write:
f <- function(x, mu = 30, s = 12){dnorm(x, mu, s)}
auc <- integrate(f, z, Inf)
the integrate(...) function does not know what to do with z when it is a vector. So it takes the first element and complains. Hence the warning message.
There is a special function in R, Vectorize(...) that turns scalar functions into vectorized functions. You would use it this way:
f <- function(x, mu = 30, s = 12){dnorm(x, mu, s)}
auc <- Vectorize(function(z) integrate(f,z,Inf)$value)
z <- 1:100
plot(z,auc(z), type="l") # plot lines