Optim: non-finite finite-difference value in L-BFGS-B - r

I'm trying to maximize a likelihood with R's 'optim.' I get the error "non-finite finite-difference value."
I'm using L-BFGS-B because I have to constrain the 11th parameter (Bernoulli "p") to be 0<=p<=1. Since I need this constraint, I can't use a nongradient method like "Nelder-Mead." Any thoughts on how I can fix this? It worked fine with simulated data!
Note that I'm using a floor function in here because discrete values are needed for the "Trials" parameters (params 1 through 10).
library(rmutil)
Nhat<-c(14335,15891,2700,1218,2213,10985,4985,8738,13878)
sdNhat<-
sqrt(c(26915344,6574096,175561,51529,71824,12166144,145924,2808976,3319684))
C<-c(313,410,38,30,69,175,132,193,240)
LL1<-vector()
LL2<-vector()
NLL<-function(data,par){
for (i in 1:length(Nhat)){
LL1[i]<-dnorm(Nhat[i],par[i],sdNhat[i],log=TRUE)
LL2[i]<-dbetabinom(C[i],floor(par[i]),par[length(Nhat)+1],par[length(Nhat)+2],log=TRUE)
}
-1*(sum(LL1)+sum(LL2))
}
out<-optim(par=c(floor(Nhat*runif(length(Nhat),0.9,1.1)),0.02,3),
fn=NLL,data=list(Nhat=Nhat,sdNhat=sdNhat,C=C),
method='L-BFGS-B',
lower=c(rep(min(Nhat),length(Nhat)),0.0001,1),
upper=c(rep(min(Nhat),length(Nhat)),0.9999,2))

You are getting an error, because the boundaries you are setting for the parameters 1 to 9 are identical. Thus, you have to adjust upper=c(rep(min(Nhat),length(Nhat)),0.9999,2)) (or lower) to be an interval.
You said that only the 10th (you actually wrote 11th, but I guess that's a typo) has to be bounded between 0 and 1, so this would work:
set.seed(1)
out<-optim(par=c(floor(Nhat*runif(length(Nhat),0.9,1.1)),0.02,3),
fn=NLL,data=list(Nhat=Nhat,sdNhat=sdNhat,C=C),
method='L-BFGS-B',
lower=c(rep(-Inf,length(Nhat)),0,-Inf),
upper=c(rep(Inf,length(Nhat)),1,Inf))
out
# $par
# [1] 13660.61522882 15482.96819195 2730.66273051 1310.04511624 2077.45269032 11857.94955470
# [7] 5417.09464008 9016.57472573 14234.22972586 0.02165253 826.21691430
#
# $value
# [1] 116.2657

Related

How to interpret mle() trace = 6 output and why does the mle() procedure stop after 101 iterations in R?

I am trying to estimate seven constrained parameters with the function mle() using the L-BFGS-B method in R. In order to investigate why I get an non-finite finite-difference value [2] error, I include control = list(trace = 6) in the mle() function to hopefully learn more about the origins of the error.
I do not understand the output of the tracing very well unfortunately, making the result surprising to me: the program seems to simply stop after 101 iterations without giving me a proper reason.
Does anyone know why?
I suppose the seven X values reported by trace=6 are the parameter values the mle procedure has converged to up to this iteration. Imputing these values in my loglikelihood function gives me the same value as reported under "final value": -152.449285. When I impute the seven X values from iteration 97 I get the same loglikelihood of -152.449285.
There are two things that seem to stand out. First, the second value of X, 0.999, is exactly the upper limit of the second parameter I estimate. Second, the second value of G seems relatively large at -412.172 compared to the other G values. What exactly does G indicate? The second values of X and G have been like this for many iterations. Does any of this give me a clue how I can potentially make the estimation work? Thanks in advance!
Since my question is about interpretation/intuition of results I refrained from providing a reproducible example. Its lots of code and I do not know how to reproduce this situation with just a tiny bit of code. Please let me know if you need my code.
The final 101th iteration:
---------------- CAUCHY entered-------------------
There are 4 breakpoints
Piece 1 f1, f2 at start point -9.1069e-03 3.1139e+01
Distance to the next break point = 1.9816e+00
Distance to the stationary point = 2.9246e-04
GCP found in this segment
Piece 1 f1, f2 at start point -9.1069e-03 3.1139e+01
Distance to the stationary point = 2.9246e-04
Cauchy X = -0.749937 0.999 0.841376 1.14695 0.134673 0.121755 0.365289
---------------- exit CAUCHY----------------------
0 variables leave; 0 variables enter
6 variables are free at GCP on iteration 101
LINE SEARCH 0 times; norm of step = 0.000232633
X = -0.749896 **0.999** 0.841349 1.14697 0.134672 0.121757 0.36551
G = -0.0154393 **-412.172** -0.0621798 0.0130552 -0.00801055 0.00692317 -0.0134718
final value -152.449285
**stopped after 101 iterations**
Error in optim(start, f, method = method, hessian = TRUE, ...) :
non-finite finite-difference value [2]
---- UPDATE 1 ---
I followed Roland's suggestion but first tried to set maxit at 200: control = list(maxit=200, trace=6).
The procedure now converges at the 106th iteration, yet, I still get the error from before:
iterations 106
function evaluations 127
segments explored during Cauchy searches 110
BFGS updates skipped 2
active bounds at final generalized Cauchy point 1
norm of the final projected gradient 0.0217961
final function value -152.449
X = -0.749748 0.999 0.841415 1.14687 0.134666 0.121766 0.366383
F = -152.449
final value -152.449295
converged
Error in optim(start, f, method = method, hessian = TRUE, ...) :
non-finite finite-difference value [2]
---- UPDATE 2---
I followed the suggestion by Biswajit Banerjee optim in r :non finite finite difference error and set ndeps which ?optim tells me is "A vector of step sizes for the finite-difference approximation to the gradient" to 0.0001 for the second parameter (default 0.001 for all other parameters). Everything works fine now! I wonder whether this is related to the second parameter's value being the upper limit or its G value being relatively large?

Questions about boundary constraints with L-BFGS-B method in optim() in R

I am trying to use L-BFGS-B method in optim() to find out the minimum value of the following function:
ip<-function(x) log(mean(exp(return*x))) , where "return" is a series of constants.
First, I gave no boundary constraints: rst1<-optim (-1,ip,method="L-BFGS-B"), and it provided a reasonable answer (x=-118.44,ip.min=-0.00017), which could be justified by both theory and excel calculation. The given message in the result was
CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL.
As x must be less than zero in theory, I then added boundary constraints to the optimizer: rst2<-optim (-1,ip,method="L-BFGS-B",lower=-Inf,upper=0). However, this time it only provided an answer calculated by the initial parameter (-1), which is obviously not the minimum value. The given message in the result was
CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH.
I then tried other boundary constraints, and no matter what they were, as long as boundary constraints were added here, it would always provided an answer calculated by the initial parameter, and failed to find the minimum value.
Does anyone know why this happens? Many thanks.
example
rtntxt<-"
return
9.15051E-05
9.67217E-07
1.34187E-05
-0.000105801
0.000111004
0.000228786
3.84068E-06
0.000388639
-0.000122291
-7.73028E-05
4.97595E-05
-3.97503E-05
1.86449E-05
-0.000137739
-0.000180709
-1.07254E-05
3.89723E-05
"
rtn<-read.table(text=rtntxt,header=TRUE)
ip<-function(x) log(mean(exp(rtn$return*x)))
rst1<-optim(-1,ip,method="L-BFGS-B") #no boundaries
rst2<-optim(-1,ip,method="L-BFGS-B",lower=-Inf,upper=0) #with boundaries
plot
x<- -10000:10000
n<-length(x)
s<-numeric(n)
for(i in 1:n) s[i]<-ip(x[i])
plot(x,s)
x[which(s==min(s))] #rst1(no boundaries) is correct
min(s)
I am not sure how did you get that result: If I correct your code for misspelling I still get similar answers, and not the answer you got from your result:
ip<-function(x) log(mean(exp(return(x))))
rst1<-optim(-1,ip,method="L-BFGS-B")
# > rst1
# $`par`
# [1] -1.820444e+13
#
# $value
# [1] -1.820444e+13
#
# $counts
# function gradient
# 20 20
#
# $convergence
# [1] 0
#
# $message
# [1] "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL"
#
rst2<-optim (-1,ip,method="L-BFGS-B",lower=-Inf,upper=0)
# $`par`
# [1] -1.80144e+13
#
# $value
# [1] -1.80144e+13
#
# $counts
# function gradient
# 3 3
#
# $convergence
# [1] 0
#
# $message
# [1] "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL"
Moreover, to check whether there could be a mistake on my code, I tried to plot the values of you function for -1:-100000, but It does not look like there exist an optim where you tell there is. Check your code/post, and if you know approximately where the optimum value is, try to plot it graphically ( that would be my advise). Cheers !,
plot(x = -1:-100000, y = ip(-1:-100000))

Calculating the maximum of a PDF

I have the following PDF in the form
f(x) = 3(1-x)^2 for x [0,1]
I would like to calculate the maximum of f(x).
I've done the following:
integral<-function(x)
{3*x*(1-x)^2
}
max(integral(x))
I can't figure it out where I've gone wrong...
The error is due to the numerical approximation. If you reduce the delta you will find your result:
max(integral(seq(0,1,by=.1))) # your first answer
max(integral(seq(0,1,by=.0001))) # what you are looking for
However, I'd choose an optimisation procedure to converge towards the maximum :
optimise(integral, lower=0, upper=1, maximum = TRUE)
$maximum
[1] 0.2499993 # this is the value of x causing f to be at its max
$objective
[1] 2.109375 # this is the max value

Gamma function returns unstable value?

Gamma function should not take any negative value as an argument. Look at the code below where strange thing happens. Is this some problem with R?
I was using function optim to optimize some function containing:
gamma(sum(alpha))
with respect to alpha. R returns negative alpha.
> gamma(sum(alpha))
[1] 3.753+14
>sum(alpha)
[1] -3
gamma(-3)
[1] NaN
Warning message:
In gamma(-3) NaN's produced.
Can somebody explain? Or any suggestion for the optimization?
Thanks!
Gamma function is "not defined" at negative integer argument values so R returns Not a Number (NaN). The reason of the "strange" behaviour is decimal representation of numbers in R. In case the number differs from the nearest integer not very much, R rounds it during printing (in fact when you type alpha, R is calling for print(alpha). Please see the examples of such a behaviour below.
gamma(-3)
# [1] NaN
# Warning message:
# In gamma(-3) : NaNs produced
x <- -c(1, 2, 3) / 2 - 1e-15
x
# [1] -0.5 -1.0 -1.5
sum(x)
# [1] -3
gamma(sum(x))
# [1] 5.361428e+13
curve(gamma, xlim = c(-3.5, -2.5))
Please see a graph below which explains the behaviour of gamma-function near negative integers:

R minimize absolute error

Here's my setup
obs1<-c(1,1,1)
obs2<-c(0,1,2)
obs3<-c(0,0,3)
absoluteError<-function(obs,x){
return(sum(abs(obs-x)))
}
Example:
> absoluteError(obs2,1)
[1] 2
For a random vector of observations, I'd like to find a minimizer, x, which minimizes the absolute error between the observation values and a vector of all x. For instance, clearly the argument that minimizes absoluteError(obs1,x) is x=1 because this results in an error of 0. How do I find a minimizer for a random vector of observations? I'd imagine this is a linear programming problem, but I've never implemented one in R before.
The median of obs is a minimizer for the absolute error. The following is a sketch of how one might try proving this:
Let the median of a set of n observations, obs, be m. Call the absolute error between obs and m f(obs,m).
Case n is odd:
Consider f(obs,m+delta) where delta is some non zero number. Suppose delta is positive - then there are (n-1)/2 +1 observations whose error is delta more than f(obs,m). The remaining (n-1)/2 observations' error is at most delta less than f(obs,m). So f(obs,m+delta)-f(obs,m)>=delta. (The same argument can be made if delta is negative.) So the median is the only minimizer in this case. Thus f(obs,m+delta)>f(obs,m) for any non zero delta so m is a minimizer for f.
Case n is even:
Basically the same logic as above, except in this case any number between the two inner most numbers in the set will be a minimizer.
I am not sure this answer is correct, and even if it is I am not sure this is what you want. Nevertheless, I am taking a stab at it.
I think you are talking about 'Least absolute deviations', a form of regression that differs from 'Least Squares'.
If so, I found this R code for solving Least absolute deviations regression:
fabs=function(beta0,x,y){
b0=beta0[1]
b1=beta0[2]
n=length(x)
llh=0
for(i in 1:n){
r2=(y[i]-b0-b1*x[i])
llh=llh + abs(r2)
}
llh
}
g=optim(c(1,1),fabs,x=x,y=y)
I found the code here:
http://www.stat.colostate.edu/~meyer/hw12ans.pdf
Assuming you are talking about Least absolute deviations, you might not be interested in the above code if you want a solution in R from scratch rather than a solution that uses optim.
The above code is for a regression line with an intercept and one slope. I modified the code as follows to handle a regression with just an intercept:
y <- c(1,1,1)
x <- 1:length(y)
fabs=function(beta0,x,y){
b0=beta0[1]
b1=0
n=length(x)
llh=0
for(i in 1:n){
r2=(y[i]-b0-b1*x[i])
llh=llh + abs(r2)
}
llh
}
# The commands to get the estimator
g = optim(c(1),fabs,x=x,y=y, method='Brent', lower = (min(y)-5), upper = (max(y)+5))
g
I was not familiar with (i.e., had not heard of) Least absolute deviations until tonight. So, hopefully my modifications are fairly reasonable.
With y <- c(1,1,1) the parameter estimate is 1 (which I think you said is the correct answer):
$par
[1] 1
$value
[1] 1.332268e-15
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
With y <- c(0,1,2) the parameter estimate is 1:
$par
[1] 1
$value
[1] 2
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
With y <- c(0,0,3) the parameter estimate is 0 (which you said is the correct answer):
$par
[1] 8.613159e-10
$value
[1] 3
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL
If you want R code from scratch, there is additional R code in the file at the link above which might be helpful.
Alternatively, perhaps it might be possible to extract the relevant code from the source file.
Alternatively, perhaps someone else can provide the desired code (and correct any errors on my part) in the next 24 hours.
If you come up with code from scratch please post it as an answer as I would love to see it myself.
lad=function(x,y){
SAD = function(beta, x, y) {
return(sum(abs(y - (beta[1] + beta[2] * x))))
}
d=lm(y~x)
ans1 = optim(par=c(d$coefficients[1], d$coefficients[2]),method = "Nelder-Mead",fn=SAD, x=x, y=y)
coe=setNames(ans1$par,c("(Intercept)",substitute(x)))
fitted=setNames(ans1$par[1]+ans1$par[2]*x,c(1:length(x)))
res=setNames(y-fitted,c(1:length(x)))
results = list(coefficients=coe, fitted.values=fitted, residuals=res)
class(results)="lad"
return(results)
}

Resources