How can I calculate the alpha value for P(x>a)=0.1 using R commands for a given Beta(3,2)?
I know there are pbeta, qbeta but none of them fits the problem as far as I know...
Note that a <- qbeta(p, 3, 2) solves P(x < a) = p. Then, note that P(x >= a) = 1 - P(x<a). So, you need to calculate a <- qbeta(1 - p, 3, 2)
Related
If I have a equation like 10 + x^2 + x^3 + x^4 = y and an x value like 2. Is there way to plug this into r so it would solve for y? It sounds trivial but eventually I would like to solve for x using polynomials that higher degrees like 30. Anyone know of a possible way to do this in r but without plugging in the x value manually?
Please note: I'm trying to solve for y given a specific x value.
You can easily write your own function:
p <- function(x, coefs) c(cbind(1, poly(x, degree = length(coefs) - 1,
raw = TRUE, simple = TRUE)) %*% coefs)
p(2, c(10, 0, 1, 1, 1))
#[1] 38
Use rep if you need many coefficients of 1.
With x, p and s known, I'm trying to solve this problem in R: Find N such as qnorm(p, N, s)=q
Example: Find N such as 30==qnorm(0.05, N, 3)
My solution:
x<-seq(30, 50, 0.1)
y<-qnorm(0.05, x, 3)
plot(x,y)
Looking at the plot, the solution is around 35.
I can refine the answer following this trial method.
My question is: Is there a direct function to solve this problem?
The key here is realising that the qnorm(0.05, N, 3) is the same as qnorm(0.05, 0, 3) + N, since all the mean parameter does is to shift the whole distribution left or right. So if we take 30 = qnorm(0.05, N, 3) and rearrange it, we get:
N <- 30 - qnorm(0.05, 0, 3)
N
#> [1] 34.93456
Or to generalise:
inv.qnorm <- function(goal, sd, p) goal - qnorm(p, 0, sd)
This gives us an answer with greater precision, speed and memory usage than could be achieved using the sequences-lookup approach.
Basically I create a means vector centred on the goal with length 2*standard_deviation*qnorm(1-p/2) and then get the element of this vector which has the minimal distance from the goal and return it
inv.qnorm <- function(goal, sd, p, precision=.0001){
x <- seq(goal - sd* qnorm(1-p/2), goal + sd* qnorm(1-p/2), precision)
x[which.min(abs(qnorm(p, x, sd)-goal))]
}
inv.qnorm(30, 3, .05)
#> [1] 34.93461
I'm writing a function to calculate the quantile of the GEV distribution. The relevant aspect for this question is that a different form of the function is required when one of the parameters (the shape parameter or kappa) is zero
Programmatically, this is commonly addressed as follows (this is a snippet from evd:qgev and is similar in lmomco::quagev):
(Edit: Version 2.2.2 of lmomco has addressed the issue identified in this question)
if (shape == 0)
return(loc - scale * log(-log(p)))
else return(loc + scale * ((-log(p))^(-shape) - 1)/shape)
This works fine if shape/kappa is exactly equal to zero but there is odd behaviour near zero.
Lets look at an example:
Qgev_zero <- function(shape){
# p is an exceedance probability
p= 0.01
location=0
scale=1
if(shape == 0) return( location - scale*(log(-log(1-p) )))
location + (scale/shape)*((-log(1-p))^-shape - 1)
}
Qgev_zero(0)
#[1] 4.600149
Qgev_zero(1e-8)
#[1] 4.600149
This looks fine because the same answer is returned near zero and at zero. But look at what happens closer to zero.
k.seq <- seq(from = -4e-16, to = 4e-16, length.out = 1000)
plot(k.seq, sapply(k.seq, Qgev_zero), type = 'l')
The value returned by the function oscillates is often incorrect.
These problems go away if I replace the direct comparison with zero with all.equal e.g.
if(isTRUE(all.equal(shape, 0))) return( location - scale*(log(-log(1-p) )))
Looking at the help for all.equal suggests that for default values, anything smaller than 1.5e-8 will be treated as zero.
Of course this odd behaviour near zero is probably not generally an issue but in my case, I'm using optimisation/root finding to determine parameters from known quantiles so am concerned that my code needs to be robust.
To the question: is using all.equal(target, 0) an appropriate way to deal with this problem? Why is it that this approach isn't used routinely?
Some functions, when implemented the obvious way with floating point representations, are ill-behaved at certain points. That's especially likely to be the case when the function has to be manually defined at a single point: When things go absolutely undefined at a point, it's likely that they're hanging on for dear life when they get close.
In this case, that's from the kappa denominator fighting the kappa negative exponent. Which one wins the battle is determined on a bit-by-bit basis, each one sometimes winning the "rounding to a stronger magnitude" contest.
There's a variety of approaches to fixing these sorts of problems, all of them designed on a case-by-case basis. One often-flawed but easy-to-implement approach is to switch to a better-behaved representation (say, the Taylor expansion with respect to kappa) near the problematic point. That'll introduce discontinuities at the boundaries; if necessary, you can try interpolating between the two.
Following Sneftel's suggestion, I calculate the quantile at k = -1e-7 and k = 1e-7 and interpolate if k argument falls between these limits. This seems to work.
In this code I'm using the parameterisation for the gev quantile function from lmomco::quagev
(Edit: Version 2.2.2 of lmomco has addressed the issues identified in this question)
The function Qgev is the problematic version (black line on plot), while Qgev_interp, interpolates near zero (green line on plot).
Qgev <- function(K, f, XI, A){
# K = shape
# f = probability
# XI = location
# A = scale
Y <- -log(-log(f))
Y <- (1-exp(-K*Y))/K
x <- XI + A*Y
return(x)
}
Qgev_interp <- function(K, f, XI, A){
.F <- function(K, f, XI, A){
Y <- -log(-log(f))
Y <- (1-exp(-K*Y))/K
x <- XI + A*Y
return(x)
}
k1 <- -1e-7
k2 <- 1e-7
y1 <- .F(k1, f, XI, A)
y2 <- .F(k2, f, XI, A)
F_nearZero <- approxfun(c(k1, k2), c(y1, y2))
if(K > k1 & K < k2) {
return(F_nearZero(K))
} else {
return(.F(K, f, XI, A))
}
}
k.seq <- seq(from = -1.1e-7, to = 1.1e-7, length.out = 1000)
plot(k.seq, sapply(k.seq, Qgev, f = 0.01, XI = 0, A = 1), col=1, lwd = 1, type = 'l')
lines(k.seq, sapply(k.seq, Qgev_interp, f = 0.01, XI = 0, A = 1), col=3, lwd = 2)
I hope this is the right place for such a basic question. I found this and this solutions quite articulated, hence they do not help me to get the fundamentals of the procedure.
Consider a random dataset:
x <- c(1.38, -0.24, 1.72, 2.25)
w <- c(3, 2, 4, 2)
How can I find the value of μ that minimizes the least squares equation :
The package manipulate allows to manually change with bar the model with different values of μ, but I am looking for a more precise procedure than "try manually until you do not find the best fit".
Note: If the question is not correctly posted, I would welcome constructive critics.
You could proceed as follows:
optim(mean(x), function(mu) sum(w * (x - mu)^2), method = "BFGS")$par
# [1] 1.367273
Here mean(x) is an initial guess for mu.
I'm not sure if this is what you want, but here's a little algebra:
We want to find mu to minimise
S = Sum{ i | w[i]*(x[i] - mu)*(x[i] - mu) }
Expand the square, and rearrange into three summations. bringing things that don't depend on i outside the sums:
S = Sum{i|w[i]*x[i]*x[i])-2*mu*Sum{i|w[i]*x[i]}+mu*mu*Sum{i|w[i]}
Define
W = Sum{i|w[i]}
m = Sum{i|w[i]*x[i]} / W
Q = Sum{i|w[i]*x[i]*x[i]}/W
Then
S = W*(Q -2*mu*m + mu*mu)
= W*( (mu-m)*(mu-m) + Q - m*m)
(The second step is 'completing the square', a simple but very useful technique).
In the final equation, since a square is always non-negative, the value of mu to minimise S is m.
I'm trying to calculate the weighted average of a statistic sample (vector) in R using this form:
The function takes a vector and the weight is adjusted according by a second parameter (1 - 3), which are:
where s is the standard deviation.
I've adjusted the weight accordingly if the parameter is 1 or 3 using else-if's, but I'm having trouble with the 2nd one given that there is criteria to meet...
I've been calculating X - xBar as a vector: m = x-mean(x)
I've been calculating s with an R function: s = sd(x)
My query is regarding how "the meeting of the conditions should be programmed" in the 2nd critera. So far I have an if for each condition, but...
When calculating the weighted average, (taking the top one as an eg), does each element of the x vector (m/s) need to be less than 1? or do I need to test each element and assign a weight from the 3 conditions accordingly?
eg. if the first elements answer was less than 1, assign a weight or 1, but second elements answer was inbetween 1 and 2, assign it a weight of 0.5?
I hope this makes sense. In R it throws a warning message saying the logic is only comparing the first element of the vector... so thats what raised the question.
Thanks in advance.
To avoid the warning message while staying reasonably efficient, you probably want to use ifelse rather than if and else, perhaps in something like
m <- mean(x)
s <- sd(x)
absstandardx <- abs( (x - m) / s )
w2 <- ifelse( absstandardx < 1, 1, ifelse( absstandardx < 2, 0.5, 0 ) )
weightedmean2 <- sum(w2 * x) / sum(w2)