Mathematically the following is impossible for
library(truncdist)
q = function(x, L, R ) dtrunc(x, "exp", rate=0.1, a=L,b=R)
integrate(q, L=2, R=3, lower =0, upper = 27 )
integrate(q, L=2, R=3, lower =0, upper = 29 )
integrate(q, L=2, R=3, lower =27, upper = 29 )
integrate(q, L=2, R=3, lower =0, upper = 30 )
We found the first integral giving a positive number, the second one evaluating to zero by adding the third interval which integrates itself to zero. Is this an issue in integrate or truncdist?
We can use the following to find more such issues
z=numeric()
for(i in 1:50){
z[i]=integrate(q, L=2, R=3, lower =0, upper = i)$value
}
What do I need to do to find the correct integrals (which all are 1 when integrating from 0 to i>=3)?
From help("integrate"):
Like all numerical integration routines, these evaluate the function on a finite set of points. If the function is approximately constant (in particular, zero) over nearly all its range it is possible that the result and error estimate may be seriously wrong.
You found an example of this:
curve(q(x, 2, 3), from = -1, to = 30)
You shouldn't integrate distribution density functions numerically. Use the cumulative distribution function:
diff(ptrunc(c(0, 29), "exp", rate = 0.1, a = 2, b = 3))
#[1] 1
I have found an alternative answer in this post: Integration in R with integrate function
Using hcubature the problem can be solved numerically, which is a closer answer to my original question.
Related
I want to calculate the integral of the Normal Distribution at exactly some point - I know that to do this, this is the equivalent of integrating the Normal Distribution at that point and at some point slightly after that point : then, you can subtract both of these values and get an approximate answer.
I tried doing this in R:
a = pnorm(1.96, mean = 0, sd = 1, log = FALSE)
b = pnorm(1.961, mean = 0, sd = 1, log = FALSE)
final_answer = b - a
#5.83837e-05
Is it possible to do this in one step instead of manually subtracting "a" and "b"?
Thank you!
We need to be clear about what you are asking here. If you are looking for the integral of a normal distribution at a specific point, then you can use pnorm, which is the anti-derivative of dnorm.
We can see this by reversing the process and looking at the derivative of pnorm to ensure it matches dnorm:
# Numerical approximation to derivative of pnorm:
delta <- 10^-6
(pnorm(0.75 + delta) - pnorm(0.75)) / delta
#> [1] 0.3011373
Note that this is a very close approximation of dnorm
dnorm(0.75)
#> [1] 0.3011374
So the anti-derivative of a normal distribution density at point x is given by:
pnorm(x)
You can try this
> diff(pnorm(c(1.96, 1.961), mean = 0, sd = 1, log = FALSE))
[1] 5.83837e-05
I am having issues with an optimization problem involving numerical estimation of an integral which contains an unknown variable.
Numerical estimating an integral is simple enough, just use the integrate function in R. I am trying to estimate a rather unpleasant integral which requires optimization since it contains an unknown variable and a constraint. I am using the nlminb function but the result is highly incorrect. The idea is to evaluate the integral to constraint smaller or equal to 1-l, where l is between 0 and 1.
the code is the following:
integrand <- function(x, p) {dnorm(x,0,1)*(1-dnorm((qnorm(p)
-sqrt(0.12)*x)/(sqrt(1-0.12)), 0, 1))^800}
and it is the variable p which is unknown.
The objective function to be minimised is the following:
objective <- function(p){
PoD <- integrate(integrand, lower = -Inf, upper = Inf, p = p)$value
PoD - 0.5
}
test <- nlminb(0.015, objective = objective, lower = 0, upper = 1)$par*100
Edited to reflect mistakes in the objective function and the integral.
Same issue still remains.
I think my mistake is not specifying which variable to minimise. The optimisation just gives the starting value in nlminb multiplied by 100.
The authors of the paper used dummy variables and showed that a l = 0,5 should give p=0,15%.
Thank you for your time.
Of course, since your objective function does not depend on p. Do:
integrand <- function(x, p) {dnorm(x,0,1)*(1-dnorm((qnorm(p)
-sqrt(0.12)*x)/(sqrt(1-0.12)), 0, 1))^800}
objective <- function(p){
PoD <- integrate(integrand, lower = -Inf, upper = Inf, p = p)$value
PoD - 0.5
}
I am using the Gauss-Laguerre quadrature to approximate the following integral
I wrote the function in R
int.gl<-function(x)
{
x^(-0.2)/(x+100)*exp(-100/x)
}
First, I used integrate() function to get the "true" value, which I double check with Wolfram Alpha
integrate(int.gl, lower = 0, upper = Inf)$value
[1] 1.627777
Then, I used glaguerre.quadrature() function
rule<-glaguerre.quadrature.rules(64, alpha = 0, normalized = F)[[64]]
glaguerre.quadrature(int.gl, lower = 0, upper = Inf, rule = rule, weighted = F)
[1] 0.03610346
Clearly, the result is far away from the true value. At that time, I thought I had to transform this function to include a e^(-x) term. So, let X = 1/Y. I obtained a different formula but the same integral
In the same manner, I used the following R code
int.gl2<-function(x)
{
x^(-0.8)/(1+100*x)*exp(-100*x)
}
integrate(int.gl2, lower = 0, upper = Inf)$value
[1] 1.627777
glaguerre.quadrature(int.gl2, lower = 0, upper = Inf, rule = rule, weighted = F)
[1] 0.03937068
Well, two different values. Does the Gauss-Laguerre quadrature have trouble in finding this integral? Is there any other Gauss-type quadrature can help find this integral?
Note: I have to use Gauss-type quadrature since I am trying to find MLEs of some parameters for a customized distribution. For simplicity, I just fixed these parameters(these constants in the integral). However, integrate() function seems to be less "robust" than glaguerre.quadrature() function. (integrate() returns divergent error when optimizing the log-likelihood).
EDIT 1
According to Hans. W's comment, I check the use of glaguerre.quadrature with the following example. Suppose we wanna find the following integral
int.gl3<-function(x)
{
((x+100)/(x+200))^0.2*exp(-5*x)
}
glaguerre.quadrature(int.gl3, lower = 0, upper = Inf, rule = rule, weighted = F)
[1] 0.1741448
integrate(int.gl3, lower = 0, upper = Inf)$value
[1] 0.1741448
It seems like the use of glaguerre.quadrature is correct.
Let's check the transformation now. I transform this integral, by letting 100Y=X, such that it includes exp(-x).
int.gl4<-function(x)
{
0.01^0.2*x^(-0.8)/(1+x)*exp(-x)
}
integrate(int.gl4, lower = 0, upper = Inf)$value
[1] 1.627777
glaguerre.quadrature(int.gl4, lower = 0, upper = Inf, rule = rule, weighted = F)
[1] 0.9621667
The result is more close to the true value but not exactly equal.
EDIT 2
Here is my complete example
Integration and false convergence of optimization in R
I'm using optim to try and find the critical region in a binomial test, however after a certain sample size it fails to converge on the correct value.
Seems like the function is well behaved so not sure why it stops working at this point.
N <- 116
optim(1, function(x) abs(1 - pbinom(x, N, 0.1) - 0.05), method = "Brent", lower = 1, upper = N)
The optim function as above works for N < 116.
You should probably use the built-in qbinom function, which computes specified values of the quantile (inverse CDF) function for the binomial distribution: it works fine for any reasonable value of N.
N <- 116
qbinom(0.95, size = N, prob = 0.1)
The function is not well-behaved from an optimization point of view: as explained here, it is piecewise constant.
The gradient at your starting point is almost 0 and the algorithm cannot move to the next best solution.
One way is to use another starting point:
optim(0.1*N, function(x) abs(1 - pbinom(x, N, 0.1) - 0.05), method = "Brent", lower = 1, upper = N)
or to use optimize since its one dimensional:
optimize(function(x) abs(1 - pbinom(x, N, 0.1) - 0.05), c(1,N))
Not sure if this numerical methods problem should really be here or in crossvalidated, but since I have a nice reproducible example I though I would start here.
I am going to be estimating and fitting a bunch of distributions both to some large data sets and to data sets generated randomly from similar distributions. As part of this process I will be generating estimates for the conditional mean of various value ranges, including truncated and non-truncated values of the right tail.
The function cr_moment below, given a pdf function for dfun and parameters for that function in params calculates the unconditional mean of that distribution. Given the upper, lower, or both bounds, it calculates the conditional mean for the range specified by those bounds, using the singly- or doubly-truncated distribution for those bounds. The function beneath it, cr_gb2, specializes cr_moment to the generalized beta distribution of the second kind. Finally, the parameter values supplied beneath that approximate the unadjusted current-dollar household income distribution from the US Census/BLS Current Population Survey for the year 2000. McDonald & Ransom 2008. (Also, kudos to Mikko Marttila on this list for help with coding this function).
This function gives me a failure to converge error, copied below, for various lower bounds and an upper bound equal to 4.55e8, or higher, but not at 4.54e8. The kth moment of the GB2 exists for k < shape1 * shape3, here about 2.51. This is a nice smooth unimodal function being integrated over a finite interval, and I don’t know why it is failing to converge and don-t know what to do about it. For other parameter values, but not this one, I have also seen convergence problems at the low end for lower bounds ranging from 6 to a couple of hundred.
Error in integrate(f = prob_interval, lower = lb, upper = ub, subdivisions = 100L):
the integral is probably divergent
455 billion will be above the highest observable income level, by one or two orders of magnitude, but given a wider range of parameter values and using hill-climbing algorithms to fit real and simulated data I think I will hit this wall many times. I know very little about numerical methods in a case like this and don’t really know where to start. Help and suggestions greatly appreciated.
cr_moment <- function(lb = -Inf, ub = Inf, dfun, params, v=1, ...){
x_pdf <- function(X){
X^v * do.call(what=dfun, args=c(list(x=X), params))
}
prob_interval <- function(X){
do.call(what=dfun, args=c(list(x=X), params))
}
integral_val <- integrate(f = x_pdf, lower = lb, upper = ub)
integral_prob <- integrate(f = prob_interval, lower = lb, upper = ub)
crm <- interval_val[[1]] / interval_prob[[1]]
out <- list(value = integral_val[[1]], probability = integral_prob[[1]],
cond_moment = crm)
out
}
library(GB2)
cr_gb2 <- function(lb = -Inf, ub = Inf, v = 1, params){
cr_moment(lb, ub, dfun = dgb2, params = get("params"))
}
GB2_params <- list(shape1 = 2.2474, scale = 58441.5, shape2 = 0.6186, shape3 = 1.118)
cr_gb2(lb=1, ub= 4.55e8, params = GB2_params)