integrate a very peaked function - r

I am using integrate function in R to integrate a very peaked function.
Say that function is a log-normal density:
xs <- seq(0,3,0.00001)
fun <- function(xs) dlnorm(xs, meanlog=-1.057822,sdlog=0.001861871)
plot(xs,fun(xs),type="l")
From the plot, I know that the peak is at around 0.3-0.4.
If I integrate this density function over its support (with increased abs.tol and increased subdivisions) the integrate() gives me zero, which should not be true.
integrate(fun,lower=0,upper=Inf,subdivisions=10000000,abs.tol=1e-100)
0 with absolute error < 0
However, if I restrict the interval to 0.3 - 0.4, it gives me the correct answer.
integrate(fun,lower=0.3,upper=0.4,subdivisions=10000000,abs.tol=1e-100)
1 with absolute error < 1.7e-05
Is there a way to integrate this density without manually choosing the interval?

Not sure whether this is helpful -- might be too specific to dlnorm, but you can partition [0, Inf[, especially if you have a good idea of where the peak will end up:
integrate.dlnorm <- function(mu=0, sd=1, width=2) {
integral.l <- integrate(f=dlnorm, lower=0, upper=exp(mu - width * sd), meanlog=mu, sdlog=sd)$value
integral.m <- integrate(f=dlnorm, lower=exp(mu - width * sd), upper=exp(mu + width * sd), meanlog=mu, sdlog=sd)$value
integral.u <- integrate(f=dlnorm, lower=exp(mu + width * sd), upper=Inf, meanlog=mu, sdlog=sd)$value
return(integral.l + integral.m + integral.u)
}
integrate.dlnorm() # 1
integrate.dlnorm(-1.05, 10^-3) # .97
integrate.dlnorm(-1.05, 10^-3, 3) # .998

integrate:
Like all numerical integration routines, these evaluate the function
on a finite set of points. If the function is approximately constant
(in particular, zero) over nearly all its range it is possible that
the result and error estimate may be seriously wrong.
So, the answer is no.
You really need to know something about the function to compute the integral correctly - for any automated algorithm which detects support there is a function for which it fails.
PS (7 years later). For any deterministic algorithm, and any error, there is a function, such that this algorithm will make this error on it.

Related

Optimizing for global minimum

I am attempting to use optimize() to find the minimum value of n for the following function (Clopper-Pearson lower bound):
f <- function (n, p=0.5)
(1 + (n - p*n + 1) /
(p*n*qf(p= .025, df1= 2*p, df2= 2*(n - p + 1))))^-1
And here is how I attempted to optimize it:
n_clop <- optimize(f.1, c(300,400), maximum = FALSE, p=0.5)
n_clop
I did this over the interval [300,400] because I suspect the value to be between within it but ultimately I would like to do the optimization between 0 and infinity. It seems that this command is producing a local minimum because no matter the interval it produces the lower bound of that interval as the minimum - which is not what I suspect from clopper-pearson. So, my two questions are how to properly find a global minimum in R and how to so over any interval?
I've very briefly looked over the Wikipedia page you linked and don't see any obvious typos in your formula (although I feel like it should be 0.975=1-alpha/2 rather than 0.025=alpha/2?). However, evaluating the function you've coded over a very broad scale suggests that there are no local minima that are messing you up. My strong guess would be that either your logic is wrong (i.e., n->0 is really the right answer) or that you haven't coded what you think you're coding, due to a typo (possibly in the Wikipedia article, although that seems unlikely) or a thinko.
f <- function (n, p=0.5)
(1 + (n - p*n + 1) /
(p*n*qf(p= .025, df1= 2*p, df2= 2*(n - p + 1))))^-1
Confirm that you're getting the right answer for the interval you chose:
curve(f(x),c(300,400))
Evaluating over a broad range (n=0.00001 to 1000000):
curve(f(10^x),c(-5,7))
As #MrFlick suggests, global optimization is hard. You could start with optim(...method="SANN") but the best answer is definitely case-specific.

Numerical precision problems in R?

I have a problem with the following function in R:
test <- function(alpha, beta, n){
result <- exp(lgamma(alpha) + lgamma(n + beta) - lgamma(alpha + beta + n) - (lgamma(alpha) + lgamma(beta) - lgamma(alpha + beta)))
return(result)
}
Now if you insert the following values:
betabinom(-0.03292708, -0.3336882, 10)
It should fail and result in a NaN. That is because if we implement the exact function in Excel, we would get a result that is not a number. The implementation in Excel is simple, for J32 is a cell for alpha, K32 beta and L32 for N. The implementation of the resulting cell is given below:
=EXP(GAMMALN(J32)+GAMMALN(L32+K32)-GAMMALN(J32+K32+L32)-(GAMMALN(J32)+GAMMALN(K32)-GAMMALN(J32+K32)))
So this seems to give the correct answer, because the function is only defined for alpha and beta greater than zero and n greater or equal to zero. Therefore I am wondering what is happening here? I have also tried the package Rmpf to increase the numerical accuracy, but that does not seem to do anything.
Thanks
tl;dr log(gamma(x)) is defined more generally than you think, or than Excel thinks. If you want your function not to accept negative values of alpha and beta, or to return NaN, just test manually and return the appropriate values (if (alpha<0 || beta<0) return(NaN)).
It's not a numerical accuracy problem, it's a definition issue. The Gamma function is defined for negative real values: ?lgamma says:
The gamma function is defined by (Abramowitz and Stegun section 6.1.1, page 255)
Gamma(x) = integral_0^Inf t^(x-1) exp(-t) dt
for all real ‘x’ except zero and negative integers (when ‘NaN’ is returned).
Furthermore, referring to lgamma ...
... and the natural logarithm of the absolute value of the gamma function ...
(emphasis in original)
curve(lgamma(x),-1,1)
gamma(-0.1) ## -10.68629
log(gamma(-0.1)+0i) ## 2.368961+3.141593i
log(abs(gamma(-0.1)) ## 2.368961
lgamma(-0.1) ## 2.368961
Wolfram Alpha agrees with second calculation.

Optimizing a simple linear curve (constant and coefficient estimated from a regression)

I am trying to calculate the turning point of a a few functions where I have estimated the coefficient and constant from a regression. I'm using the optimize function for this as my curves are all linear.
My function looks like:
F<- function(x){
beta* x + alpha
}
mind: beta and alpha are both vectors here. When running the optimisation with optimize, I'm getting the following error:
Error in optimize(F, interval = c(10, 20), lower = (10), :
invalid function value in 'optimize'
Is this because optimize is running the optimisation mathematically, so the beta and alphas need to be single parameters? If anyone knows a better way of doing this please do contribute!
Thank you in advance :)
If the functions are linear, then they will be at a minimum at the lower end of the range where beta>=0, and at the upper end of the range if beta<=0 - no need to use optimize().
It's not entirely clear what you're expecting the code to do - if you want it to return an x for each set of parameters, look at optim() instead and have F return the sum, or run optimize on each set of parameters in turn using an apply() function or loop.
One other thing is that your syntax is a bit wonky - I imagine that you mean:
> F<- function(x){
+ beta* x + alpha
+ }
> alpha <- 1
> beta <- 2
> optimize(F,c(10,20))
$minimum
[1] 10.00006
$objective
[1] 21.00011

How extreme values of a functional can be found using R?

I have a functional like this :
(LaTex formula: $v[y]=\int_0^2 (y'^2+23yy'+12y^2+3ye^{2t})dt$)
with given start and end conditions y(0)=-1, y(2)=18.
How can I find extreme values of this functional in R? I realize how it can be done for example in Excel but didn't find appropriate solution in R.
Before trying to solve such a task in a numerical setting, it might be better to lean back and think about it for a moment.
This is a problem typically treated in the mathematical discipline of "variational calculus". A necessary condition for a function y(t) to be an extremum of the functional (ie. the integral) is the so-called Euler-Lagrange equation, see
Calculus of Variations at Wolfram Mathworld.
Applying it to f(t, y, y') as the integrand in your request, I get (please check, I can easily have made a mistake)
y'' - 12*y + 3/2*exp(2*t) = 0
You can go now and find a symbolic solution for this differential equation (with the help of a textbook, or some CAS), or solve it numerically with the help of an R package such as 'deSolve'.
PS: Solving this as an optimization problem based on discretization is possible, but may lead you on a long and stony road. I remember solving the "brachistochrone problem" to a satisfactory accuracy only by applying several hundred variables (not in R).
Here is a numerical solution in R. First the functional:
f<-function(y,t=head(seq(0,2,len=length(y)),-1)){
len<-length(y)-1
dy<-diff(y)*len/2
y0<-(head(y,-1)+y[-1])/2
2*sum(dy^2+23*y0*dy+12*y0^2+3*y0*exp(2*t))/len
}
Now the function that does the actual optimization. The best results I got were using the BFGS optimization method, and parametrizing using dy rather than y:
findMinY<-function(points=100, ## number of points of evaluation
boundary=c(-1,18), ## boundary values
y0=NULL, ## optional initial value
method="Nelder-Mead", ## optimization method
dff=T) ## if TRUE, optimizes based on dy rather than y
{
t<-head(seq(0,2,len=points),-1)
if(is.null(y0) || length(y0)!=points)
y0<-seq(boundary[1],boundary[2],len=points)
if(dff)
y0<-diff(y0)
else
y0<-y0[-1]
y0<-head(y0,-1)
ff<-function(z){
if(dff)
y<-c(cumsum(c(boundary[1],z)),boundary[2])
else
y<-c(boundary[1],z,boundary[2])
f(y,t)
}
res<-optim(y0,ff,control=list(maxit=1e9),method=method)
cat("Iterations:",res$counts,"\n")
ymin<-res$par
if(dff)
c(cumsum(c(boundary[1],ymin)),boundary[2])
else
c(boundary[1],ymin,boundary[2])
}
With 500 points of evaluation, it only takes a few seconds with BFGS:
> system.time(yy<-findMinY(500,method="BFGS"))
Iterations: 90 18
user system elapsed
2.696 0.000 2.703
The resulting function looks like this:
plot(seq(0,2,len=length(yy)),yy,type='l')
And now a solution that numerically integrates the Euler equation.
As #HansWerner pointed out, this problem boils down to applying the Euler-Lagrange equation to the integrand in OP's question, and then solving that differential equation, either analytically or numerically. In this case the relevant ODE is
y'' - 12*y = 3/2*exp(2*t)
subject to:
y(0) = -1
y(2) = 18
So this is a boundary value problem, best approached using bvpcol(...) in package bvpSolve.
library(bvpSolve)
F <- function(t, y.in, pars){
dy <- y.in[2]
d2y <- 12*y.in[1] + 1.5*exp(2*t)
return(list(c(dy,d2y)))
}
init <- c(-1,NA)
end <- c(18,NA)
t <- seq(0, 2, by = 0.01)
sol <- bvpcol(yini = init, yend = end, x = t, func = F)
y = function(t){ # analytic solution...
b <- sqrt(12)
a <- 1.5/(4-b*b)
u <- exp(2*b)
C1 <- ((18*u + 1) - a*(exp(4)*u-1))/(u*u - 1)
C2 <- -1 - a - C1
return(a*exp(2*t) + C1*exp(b*t) + C2*exp(-b*t))
}
par(mfrow=c(1,2))
plot(t,y(t), type="l", xlim=c(0,2),ylim=c(-1,18), col="red", main="Analytical Solution")
plot(sol[,1],sol[,2], type="l", xlim=c(0,2),ylim=c(-1,18), xlab="t", ylab="y(t)", main="Numerical Solution")
It turns out that in this very simple example, there is an analytical solution:
y(t) = a * exp(2*t) + C1 * exp(sqrt(12)*t) + C2 * exp(-sqrt(12)*t)
where a = -3/16 and C1 and C2 are determined to satisfy the boundary conditions. As the plots show, the numerical and analytic solution agree completely, and also agree with the solution provided by #mrip

Errors when attempting constrained optimisation using optim()

I have been using the Excel solver to handle the following problem
solve for a b and c in the equation:
y = a*b*c*x/((1 - c*x)(1 - c*x + b*c*x))
subject to the constraints
0 < a < 100
0 < b < 100
0 < c < 100
f(x[1]) < 10
f(x[2]) > 20
f(x[3]) < 40
where I have about 10 (x,y) value pairs. I minimize the sum of abs(y - f(x)). And I can constrain both the coefficients and the range of values for the result of my function at each x.
I tried nls (without trying to impose the constraints) and while Excel provided estimates for almost any starting values I cared to provide, nls almost never returned an answer.
I switched to using optim, but I'm having trouble applying the constraints.
This is where I have gotten so far-
best = function(p,x,y){sum(abs(y - p[1]*p[2]*p[3]*x/((1 - p[3]*x)*(1 - p[3]*x + p[2]*p[3]*x))))}
p = c(1,1,1)
x = c(.1,.5,.9)
y = c(5,26,35)
optim(p,best,x=x,y=y)
I did this to add the first set of constraints-
optim(p,best,x=x,y=y,method="L-BFGS-B",lower=c(0,0,0),upper=c(100,100,100))
I get the error ""ERROR: ABNORMAL_TERMINATION_IN_LNSRCH"
and end up with a higher value of the error ($value). So it seems like I am doing something wrong. I couldn't figure out how to apply my other set of constraints at all.
Could someone provide me a basic idea how to solve this problem that a non-statistician can understand? I looked at a lot of posts and looked in a few R books. The R books stopped at the simplest use of optim.
The absolute value introduces a singularity:
you may want to use a square instead,
especially for gradient-based methods (such as L-BFGS).
The denominator of your function can be zero.
The fact that the parameters appear in products
and that you allow them to be (arbitrarily close to) zero
can also cause problems.
You can try with other optimizers
(complete list on the optimization task view),
until you find one for which the optimization converges.
x0 <- c(.1,.5,.9)
y0 <- c(5,26,35)
p <- c(1,1,1)
lower <- 0*p
upper <- 100 + lower
f <- function(p,x=x0,y=y0) sum(
(
y - p[1]*p[2]*p[3]*x / ( (1 - p[3]*x)*(1 - p[3]*x + p[2]*p[3]*x) )
)^2
)
library(dfoptim)
nmkb(p, f, lower=lower, upper=upper) # Converges
library(Rvmmin)
Rvmmin(p, f, lower=lower, upper=upper) # Does not converge
library(DEoptim)
DEoptim(f, lower, upper) # Does not converge
library(NMOF)
PSopt(f, list(min=lower, max=upper))[c("xbest", "OFvalue")] # Does not really converge
DEopt(f, list(min=lower, max=upper))[c("xbest", "OFvalue")] # Does not really converge
library(minqa)
bobyqa(p, f, lower, upper) # Does not really converge
As a last resort, you can always use a grid search.
library(NMOF)
r <- gridSearch( f,
lapply(seq_along(p), function(i) seq(lower[i],upper[i],length=200))
)

Resources