Issue with constrOptim

Issue with constrOptim - r

When doing constrained optimization using the constrOptim function, I sometimes get the following error message:
Error in optim(theta.old, fun, gradient, control = control, method = method, :
initial value in 'vmmin' is not finite
Example
x <- c(-0.2496881061155757641767394261478330008685588836669921875,
0.0824038146359631351600683046854101121425628662109375,
0.25000000111421105675191256523248739540576934814453125)
nw <- length(x)
ui <- diag(1, nrow = nw)
ui <- rbind(ui, rep(0, nw))
ui[cbind(2:(nw + 1), 1:nw)] <- -1
ci <- rep(-0.8 / (nw + 1), nw + 1)
constrOptim(theta = rep(0, nw), f = function(theta) mean((theta - x)^2),
grad = function(theta) 2 * (theta - x), ui = ui, ci = ci,
method = "BFGS")
What I know
The problem occurs during the iteration inside constrOptim, when the result comes so close to the boundary that almost all point evaluated by the BFGS optimizer are NaNs (excluding the initial point). In this case, BFGS will sometimes return an optimal value of NaN and a corresponding minimizing parameter outside the constraint set.
In constrOptim, the objective function fed to BFGS is given by
R <- function(theta, theta.old, ...) {
ui.theta <- ui %*% theta
gi <- ui.theta - ci
if (any(gi < 0)) {
return(NaN)
}
gi.old <- ui %*% theta.old - ci
bar <- sum(gi.old * log(gi) - ui.theta)
if (!is.finite(bar))
bar <- -Inf
f(theta, ...) - mu * bar
}
My question
It seems to me that the obvious solution to the problem is to simply return sign(mu) * Inf instead of NaN if there are any gi < 0, but could this fix lead to other problems?

After normalizing the gradient properly
constrOptim(theta = rep(0, nw), f = function(theta) mean((theta - x)^2),
grad = function(theta) 2 / nw * (theta - x), ui = ui, ci = ci,
method = "BFGS")
I can no longer replicate the problem. It seems that the issue was caused by the wrong weighting of the gradient of the objective function and the gradient of the logarithmic barrier term in the internal gradient.
However, I still think that returning Inf outside the boundary would be more robust than returning NaN.

Related

MLE of the parameters of a PDF written as an infinite sum of terms

My question relates to the use of R for the derivation of maximum likelihood estimates of parameters when a probability distributions is expressed in the form of an infinite sum, such as the one below due to Rao, Girija et al.
I wanted to see if I could reproduce the maximum likelihood estimates obtained by these authors (who used Matlab, rather than R) when the model is applied to a given set of data. My attempt is given below, although this throws up several warnings that "longer object length is not a multiple of shorter object length". I know why I am getting this warning, but I do not know how to remedy it. How can I edit my code to overcome this?
Also, is there a better way to handle infinite sums? Here I'm just using an arbitrary large number for n (1000).
library(bbmle)
svec <- list(c=1,lambda=1)
x <- scan(textConnection("0.1396263 0.1570796 0.2268928 0.2268928 0.2443461 0.3141593 0.3839724 0.4712389 0.5235988 0.5934119 0.6632251 0.6632251 0.6981317 0.7679449 0.7853982 0.8203047 0.8377580 0.8377580 0.8377580 0.8377580 0.8726646 0.9250245 0.9773844 0.9948377 1.0122910 1.0122910 1.0646508 1.0995574 1.1170107 1.1170107 1.1170107 1.1344640 1.1344640 1.1868239 1.2217305 1.2740904 1.3613568 1.3613568 1.3613568 1.4486233 1.4486233 1.5358897 1.5358897 1.5358897 1.5707963 1.6057029 1.6057029 1.6231562 1.6580628 1.6755161 1.7104227 1.7453293 1.7976891 1.8500490 1.9722221 2.0594885 2.4085544 2.6703538 2.6703538 2.7052603 3.5604717 3.7524579 3.8920842 3.9444441 4.1364303 4.1538836 4.2411501 4.2586034 4.3633231 4.3807764 4.4854962 4.6774824 4.9741884 5.5676003 5.9864793 6.1086524"))
dL <- function(x, c,lambda,n = 1000,log=TRUE) {
k <- 0:n
r <- log(sum(lambda*c*(x+2*k*pi)^(-c-1)*(exp(-(x+2*k*pi)^(-c))^(lambda))))
if (log) return(r) else return(exp(r))
}
dat <- data.frame(x)
m1 <- mle2( x ~ dL(c,lambda),
data=dat,
start=svec,
control=list(parscale=unlist(svec)),
method="L-BFGS-B",
lower=c(0,0)
)

I suggest starting out with that algorithm and making a density function that can be tested for proper behavior by integrating over its range of definition, (c(0, 2*pi). You are calling it a "probability function" but that is a term that I associate with CDF's rather than density distributions (PDF's):
dL <- function(x, c=1,lambda=1,n = 1000, log=FALSE) {
k <- 0:n
r <- sum(lambda*c*(x+2*k*pi)^(-c-1)*(exp(-(x+2*k*pi)^(-c))^(lambda)))
if (log) {log(r) }
}
vdL <- Vectorize(dL)
integrate(vdL, 0,2*pi)
#0.999841 with absolute error < 9.3e-06
LL <- function(x, c, lambda){ -sum( log( vdL(x, c, lambda))) }
(I think you were trying to pack too much into your log-likelihood function so I decide to break apart the steps.)
When I ran that version I got a warning message from the final mle2 step that I didn't like and I thought it might be the case that this density function was occasionally returning negative values, so this was my final version:
dL <- function(x, c=1,lambda=1,n = 1000) {
k <- 0:n
r <- max( sum(lambda*c*(x+2*k*pi)^(-c-1)*(exp(-(x+2*k*pi)^(-c))^(lambda))), 0.00000001)
}
vdL <- Vectorize(dL)
integrate(vdL, 0,2*pi)
#0.999841 with absolute error < 9.3e-06
LL <- function(x, c, lambda){ -sum( log( vdL(x, c, lambda))) }
(m0 <- mle2(LL,start=list(c=0.2,lambda=1),data=list(x=x)))
#------------------------
Call:
mle2(minuslogl = LL, start = list(c = 0.2, lambda = 1), data = list(x = x))
Coefficients:
c lambda
0.9009665 1.1372237
Log-likelihood: -116.96
(The warning and the warning-free LL numbers were the same.)
So I guess I think you were attempting to pack too much into your definition of a log-likelihood function and got tripped up somewhere. There should have been two summations, one for the density approximation and a second one for the summation of the log-likelihood. The numbers in those summations would have been different, hence the error you were seeing. Unpacking the steps allowed success at least to the extent of not throwing errors. I'm not sure what that density represents and cannot verify correctness.
As far as the question of whether there is a better way to approximate an infinite series, the answer hinges on what is known about the rate of convergence of the partial sums, and whether you can set up a tolerance value to compare successive values and stop calculations after a smaller number of terms.
When I look at the density, it makes me wonder if it applies to some scattering process:
curve(vdL(x, c=.9, lambda=1.137), 0.00001, 2*pi)
You can examine the speed of convergence by looking at the ratios of successive terms. Here's a function that does that for the first 10 terms at an arbitrary x:
> ratios <- function(x, c=1, lambda=1) {lambda*c*(x+2*(1:11)*pi)^(-c-1)*(exp(-(x+2*(1:10)*pi)^(-c))^(lambda))/lambda*c*(x+2*(0:10)*pi)^(-c-1)*(exp(-(x+2*(0:10)*pi)^(-c))^(lambda)) }
> ratios(0.5)
[1] 1.015263e-02 1.017560e-04 1.376150e-05 3.712618e-06 1.392658e-06 6.351874e-07 3.299032e-07 1.880054e-07
[9] 1.148694e-07 7.409595e-08 4.369854e-08
Warning message:
In lambda * c * (x + 2 * (1:11) * pi)^(-c - 1) * (exp(-(x + 2 * :
longer object length is not a multiple of shorter object length
> ratios(0.05)
[1] 1.755301e-08 1.235632e-04 1.541082e-05 4.024074e-06 1.482741e-06 6.686497e-07 3.445688e-07 1.952358e-07
[9] 1.187626e-07 7.634088e-08 4.443193e-08
Warning message:
In lambda * c * (x + 2 * (1:11) * pi)^(-c - 1) * (exp(-(x + 2 * :
longer object length is not a multiple of shorter object length
> ratios(0.5)
[1] 1.015263e-02 1.017560e-04 1.376150e-05 3.712618e-06 1.392658e-06 6.351874e-07 3.299032e-07 1.880054e-07
[9] 1.148694e-07 7.409595e-08 4.369854e-08
Warning message:
In lambda * c * (x + 2 * (1:11) * pi)^(-c - 1) * (exp(-(x + 2 * :
longer object length is not a multiple of shorter object length
That looks like pretty rapid convergence to me, so I'm guessing that you could use only the first 20 terms and get similar results. With 20 terms the results look like:
> integrate(vdL, 0,2*pi)
0.9924498 with absolute error < 9.3e-06
> (m0 <- mle2(LL,start=list(c=0.2,lambda=1),data=list(x=x)))
Call:
mle2(minuslogl = LL, start = list(c = 0.2, lambda = 1), data = list(x = x))
Coefficients:
c lambda
0.9542066 1.1098169
Log-likelihood: -117.83
Since you never attempt to interpret a LL in isolation but rather look at differences, I'm guessing that the minor difference will not affect your inferences adversely.

Using nlminb to maximize functions

I have the following function and i need it to be maximized instead of minimized.
adbudgReturn = function(Spend,a,b,c,d){
adbudgReturn = sum(b+(a-b)*((Spend^c)/(d+(Spend^c))))
return(adbudgReturn)
}
FP_param <- c(95000,0,1.15,700000)
FB_param <- c(23111.55,0,1.15,20000)
GA_param <- c(115004,1409,1.457,2000000)
y = c(0.333333,0.333333,0.333333)
TotalSpend <- function(Budget,y){
FP_clicks = adbudgReturn(Budget * y[1], FP_param[1], FP_param[2], FP_param[3], FP_param[4])
FB_clicks = adbudgReturn(Budget * y[2], FB_param[1], FB_param[2], FB_param[3], FB_param[4])
GA_clicks = adbudgReturn(Budget * y[3], GA_param[1], GA_param[2], GA_param[3], GA_param[4])
return(total = FP_clicks + FB_clicks + GA_clicks)
}
startValVec = c(0.33333,0.333333,0.3333333)
minValVec = c(0,0.2,0)
maxValVec = c(0.8,1,08)
MaxClicks_optim.parms <- nlminb(objective = TotalSpend,start = startValVec,
lower = minValVec,
upper = maxValVec,
control = list(iter.max=100000,eval.max=20000),
Budget = 10000)
I have tried adding the minus sign in front of the nlminb function i.e:
-nlminb(..)
but without any success. Any help will be appreciated.
Also i would like to add constraints so the sum of the maxValVec = 1

Other optimization functions in R such as optim() have a built-in fnscale control parameter you can use to switch from minimization to maximization (i.e. optim(..., control=list(fnscale=-1)), but nlminb doesn't appear to. So you either need to flip the sign in your original objective function, or (possibly more transparently) make a wrapper function that inverts the sign, e.g.
max_obj <- function(...) -1*TotalSpend(...)
MaxClicks_optim.parms <- nlminb(objective = max_obj,
[ .... everything else as before ... ] )
Note that the ... in the max_obj() definition are literal. The only part of the solution above that needs to be filled in is the [.... everything else as a before ...] part. To be absolutely explicit:
max_obj <- function(...) -1*TotalSpend(...)
MaxClicks_optim.parms <- nlminb(objective = max_obj,
start = startValVec,
lower = minValVec,
upper = maxValVec,
control = list(iter.max=100000,eval.max=20000),
Budget = 1e4)
If you were using a user-specified gradient argument you'd have to wrap that too.
This CV question points out that you can maximize by minimizing the negative of a function, but doesn't go into the nuts and bolts.
An optim()-based solution would look something like:
optim(fn = TotalSpend,
par = startValVec,
lower = minValVec,
upper = maxValVec,
method = "L-BFGS-B",
control = list(maxit=100000, fnscale=-1),
Budget = 1e4)
L-BFGS-B is the only method built into to optim() that does box-constrained optimization
optim() doesn't have separate controls for max iterations and max function evaluations

Here is an example with a simple parabolic function, It works the same with nlminband optim:
## ==== Some preliminaries ========================
par(mfrow=c(1,2))
a <- b <- seq(-10, 10, 0.1)
## ==== Search for a minimum ======================
# function has minimum
f1 <- function(a, b) {
(a - 1)^2 + (b - 2)^2
}
## show function, blue color is low
image(a, b, outer(a, b, f1), col=topo.colors(16))
## wrapper: combine parameters
g1 <- function(p) f1(p["a"], p["b"])
## minimization
(ret <- nlminb(c(a=0, b=0), g1))
## show minimum
points(t(ret$par), pch="+", cex=2)
## ==== Search for a maximum =======================
## function has a maximum
f2 <- function(a, b) {
- (a - 1)^2 - (b + 2)^2
}
## brown color is high
image(a, b, outer(a, b, f2), col=topo.colors(16))
## wrapper: combine parameters, invert sign
g2 <- function(p) -f2(p["a"], p["b"])
## minimization of negative objective = maximization
(ret <- nlminb(c(a=0, b=0), g2))
## show maximum
points(t(ret$par), pch="+", cex=2)

Fit Exponentially modified Gaussian to 2D data

I am trying to fit a exponentially modified gaussian (like in https://en.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution equation (1)) to my 2D (x, y) data in R.
My data are:
x <- c(1.13669371604919, 1.14107275009155, 1.14545404911041, 1.14983117580414,
1.15421032905579, 1.15859162807465, 1.16296875476837, 1.16734790802002,
1.17172694206238, 1.17610621452332, 1.18048334121704, 1.18486452102661,
1.18924164772034, 1.19362080097198, 1.19800209999084, 1.20237922668457,
1.20675826072693, 1.21113955974579, 1.21551668643951, 1.21989583969116,
1.22427713871002, 1.22865414619446, 1.2330334186554, 1.23741245269775,
1.24178957939148, 1.24616885185242, 1.25055003166199, 1.25492715835571,
1.25930631160736, 1.26368761062622, 1.26806473731995, 1.2724437713623
)
y <- c(42384.03125, 65262.62890625, 235535.828125, 758616, 1691651.75,
3956937.25, 8939261, 20311304, 41061724, 65143896, 72517440,
96397368, 93956264, 87773568, 82922064, 67289832, 52540768, 50410896,
35995212, 27459486, 14173627, 12645145, 10069048, 4290783.5,
2999174.5, 2759047.5, 1610762.625, 1514802, 958150.6875, 593638.6875,
368925.8125, 172826.921875)
The function I am trying to fit and the value I am trying to minimize for optimization:
EMGCurve <- function(x, par)
{
ta <- 1/par[1]
mu <- par[2]
si <- par[3]
h <- par[4]
Fct.V <- (h * si / ta) * (pi/2)^0.5 * exp(0.5 * (si / ta)^2 - (x - mu)/ta)
Fct.V
}
RMSE <- function(par)
{
Fct.V <- EMGCurve(x,par)
sqrt(sum((signal - Fct.V)^2)/length(signal))
}
result <- optim(c(1, x[which.max(y)], unname(quantile(x)[4]-quantile(x)[2]), max(y)),
lower = c(1, min(x), 0.0001, 0.1*max(y)),
upper = c(Inf, max(x), 0.5*(max(x) - min(x)), max(y)),
RMSE, method="L-BFGS-B", control=list(factr=1e7))
However, when I try to vizualize the result in the end it seems like nothing usful is happening,..
plot(x,y,xlab="RT/min",ylab="I")
lines(seq(min(x),max(x),length=1000),GaussCurve(seq(min(x),max(x),length=1000),result$par),col=2)
However, for some reason it doesn't work at all, although a managed to do it for a normal distribution with similar code. Would be great if someone has an idea?

If it might be of some use, I got an OK fit to your data using an X-shifted log-normal type peak equation, "y = a * exp(-0.5 * pow((log(x-d)-b) / c, 2.0))" with parameters a = 9.4159743234392539E+07, b = -2.7516932481669185E+00, c = -2.4343893243720971E-01, and d = 1.1251623071481867E+00 yielding R-squared = 0.994 and RMSE = 2.49E06. I personally was unable to fit using the equation in your post. There may be value in scaling the dependent data as the values seem large, but this equation seems to fit the data as is.

constrained nonlinear minimization with many variables

Here is a minimization problem I've meant to solve, but no matter what form or package I try it with, it never resolves itself.
The Problem is a transportation problem with a quadratic objective function. It is formulated as follows:
Minimize f(x), with f(x) being x' * C * x, subject to the equality constraints UI * x - ci = 0.
where C is a diagonal matrix of constants, UI is matrix with the values 0, 1, -1 in order to set up the constraints.
I'll provide an example that I have tried with two functions so far, nloptr from its likewise called package and constrOptim.
Here's an example for nloptr:
require(nloptr)
objective <- function(x) {return( list( "objective" = t(x) %*% C %*% x,
"gradient" = 2* C %*% x )) }
constraints <- function(x) {return( list( "constraints" = ui %*% x - ci,
"jacobian" = ui))}
C <- diag(c(10,15,14,5,6,10,8))
ci <- c(20, -30, -10, -20, 40))
ui <- rbind( c(1,1,1,0,0,0,0),
c(-1,0,0,1,0,0,0),
c(0,-1,0,-1,1,1,0),
c(0,0,-1,0,-1,0,1),
c(0,0,0,0,0,-1,-1))
opts <- list("alorithm" = "NLOPT_GN_ISRES")
res <- nloptr( x0=x0, eval_f=objective, eval_g_eq = constraints, opts=opts)
When trying to solve this Problem with constrOptim, I face the problem that I have to provide starting values that are within the feasible region. However, I will ultimately have a lot of equations and don't really know how to set these starting points.
Here's the same example with constrOptim:
C <- diag(c(10,15,14,5,6,10,8))
ci <- c(20, -30, -10, -20, 40)
ui <- rbind( c(1,1,1,0,0,0,0),
c(-1,0,0,1,0,0,0),
c(0,-1,0,-1,1,1,0),
c(0,0,-1,0,-1,0,1),
c(0,0,0,0,0,-1,-1))
start <- c(10,10,10,0,0,0,0)
objective <- function(x) { t(x) %*% C %*% x }
gradient <- function(x) { 2 * C %*% x }
constrOptim(start, objective, gradient, ui = ui, ci = ci)

Try this:
co <- coef(lm.fit(ui, ci))
co[is.na(co)] <- 0
res <- nloptr( x0=co, eval_f=objective, eval_g_eq = constraints,
opts=list(algorithm = "NLOPT_LD_SLSQP"))
giving:
> res
Call:
nloptr(x0 = co, eval_f = objective, eval_g_eq = constraints,
opts = list(algorithm = "NLOPT_LD_SLSQP"))
Minimization using NLopt version 2.4.0
NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because
xtol_rel or xtol_abs (above) was reached. )
Number of Iterations....: 22
Termination conditions: relative x-tolerance = 1e-04 (DEFAULT)
Number of inequality constraints: 0
Number of equality constraints: 5
Optimal value of objective function: 37378.6963822218
Optimal value of controls: 28.62408 -29.80155 21.17747 -1.375917 -17.54977 -23.6277 -16.3723

Understanding different results of optim() and lm()

Given:
set.seed(1001)
outcome<-rnorm(1000,sd = 1)
covariate<-rnorm(1000,sd = 1)
log-likelihood of normal pdf:
loglike <- function(par, outcome, covariate){
cov <- as.matrix(cbind(1, covariate))
xb <- cov * par
(- 1/2* sum((outcome - xb)^2))
}
optimize:
opt.normal <- optim(par = 0.1,fn = loglike,outcome=outcome,cov=covariate, method = "BFGS", control = list(fnscale = -1),hessian = TRUE)
However I get different results when running an simple OLS. However maximizing log-likelihhod and minimizing OLS should bring me to a similar estimate. I suppose there is something wrong with my optimization.
summary(lm(outcome~covariate))

Umm several things... Here's a proper working likelihood function (with names x and y):
loglike =
function(par,x,y){cov = cbind(1,x); xb = cov %*% par;(-1/2)*sum((y-xb)^2)}
Note use of matrix multiplication operator.
You were also only running it with one par parameter, so it was not only broken because your loglike was doing element-element multiplication, it was only returning one value too.
Now compare optimiser parameters with lm coefficients:
opt.normal <- optim(par = c(0.1,0.1),fn = loglike,y=outcome,x=covariate, method = "BFGS", control = list(fnscale = -1),hessian = TRUE)
opt.normal$par
[1] 0.02148234 -0.09124299
summary(lm(outcome~covariate))$coeff
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.02148235 0.03049535 0.7044466 0.481319029
covariate -0.09124299 0.03049819 -2.9917515 0.002842011
shazam.
Helpful hints: create data that you know the right answer for - eg x=1:10; y=rnorm(10)+(1:10) so you know the slope is 1 and the intercept 0. Then you can easily see which of your things are in the right ballpark. Also, run your loglike function on its own to see if it behaves as you expect.

Maybe you will find it usefull to see the difference between these two methods from my code. I programmed it the following way.
data.matrix <- as.matrix(hprice1[,c("assess","bdrms","lotsize","sqrft","colonial")])
loglik <- function(p,z){
beta <- p[1:5]
sigma <- p[6]
y <- log(data.matrix[,1])
eps <- (y - beta[1] - z[,2:5] %*% beta[2:5])
-nrow(z)*log(sigma)-0.5*sum((eps/sigma)^2)
}
p0 <- c(5,0,0,0,0,2)
m <- optim(p0,loglik,method="BFGS",control=list(fnscale=-1,trace=10),hessian=TRUE,z=data.matrix)
rbind(m$par,sqrt(diag(solve(-m$hessian))))
And for the lm() method I find this
m.ols <- lm(log(assess)~bdrms+lotsize+sqrft+colonial,data=hprice1)
summary(m.ols)
Also if you would like to estimate the elasticity of assessed value with respect to the lotsize or calculate a 95% confidence interval
for this parameter, you could use the following
elasticity.at.mean <- mean(hprice1$lotsize) * m$par[3]
var.coefficient <- solve(-m$hessian)[3,3]
var.elasticity <- mean(hprice1$lotsize)^2 * var.coefficient
# upper bound
elasticity.at.mean + qnorm(0.975)* sqrt(var.elasticity)
# lower bound
elasticity.at.mean + qnorm(0.025)* sqrt(var.elasticity)
A more simple example of the optim method is given below for a binomial distribution.
loglik1 <- function(p,n,n.f){
n.f*log(p) + (n-n.f)*log(1-p)
}
m <- optim(c(pi=0.5),loglik1,control=list(fnscale=-1),
n=73,n.f=18)
m
m <- optim(c(pi=0.5),loglik1,method="BFGS",hessian=TRUE,
control=list(fnscale=-1),n=73,n.f=18)
m
pi.hat <- m$par
numerical calculation of s.d
rbind(pi.hat=pi.hat,sd.pi.hat=sqrt(diag(solve(-m$hessian))))
analytical
rbind(pi.hat=18/73,sd.pi.hat=sqrt((pi.hat*(1-pi.hat))/73))
Or this code for the normal distribution.
loglik1 <- function(p,z){
mu <- p[1]
sigma <- p[2]
-(length(z)/2)*log(sigma^2) - sum(z^2)/(2*sigma^2) +
(mu*sum(z)/sigma^2) - (length(z)*mu^2)/(2*sigma^2)
}
m <- optim(c(mu=0,sigma2=0.1),loglik1,
control=list(fnscale=-1),z=aex)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Issue with constrOptim - r

Related

MLE of the parameters of a PDF written as an infinite sum of terms

Using nlminb to maximize functions

Fit Exponentially modified Gaussian to 2D data

constrained nonlinear minimization with many variables

Understanding different results of optim() and lm()

Categories

Resources