constrained nonlinear minimization with many variables - r

Here is a minimization problem I've meant to solve, but no matter what form or package I try it with, it never resolves itself.
The Problem is a transportation problem with a quadratic objective function. It is formulated as follows:
Minimize f(x), with f(x) being x' * C * x, subject to the equality constraints UI * x - ci = 0.
where C is a diagonal matrix of constants, UI is matrix with the values 0, 1, -1 in order to set up the constraints.
I'll provide an example that I have tried with two functions so far, nloptr from its likewise called package and constrOptim.
Here's an example for nloptr:
require(nloptr)
objective <- function(x) {return( list( "objective" = t(x) %*% C %*% x,
"gradient" = 2* C %*% x )) }
constraints <- function(x) {return( list( "constraints" = ui %*% x - ci,
"jacobian" = ui))}
C <- diag(c(10,15,14,5,6,10,8))
ci <- c(20, -30, -10, -20, 40))
ui <- rbind( c(1,1,1,0,0,0,0),
c(-1,0,0,1,0,0,0),
c(0,-1,0,-1,1,1,0),
c(0,0,-1,0,-1,0,1),
c(0,0,0,0,0,-1,-1))
opts <- list("alorithm" = "NLOPT_GN_ISRES")
res <- nloptr( x0=x0, eval_f=objective, eval_g_eq = constraints, opts=opts)
When trying to solve this Problem with constrOptim, I face the problem that I have to provide starting values that are within the feasible region. However, I will ultimately have a lot of equations and don't really know how to set these starting points.
Here's the same example with constrOptim:
C <- diag(c(10,15,14,5,6,10,8))
ci <- c(20, -30, -10, -20, 40)
ui <- rbind( c(1,1,1,0,0,0,0),
c(-1,0,0,1,0,0,0),
c(0,-1,0,-1,1,1,0),
c(0,0,-1,0,-1,0,1),
c(0,0,0,0,0,-1,-1))
start <- c(10,10,10,0,0,0,0)
objective <- function(x) { t(x) %*% C %*% x }
gradient <- function(x) { 2 * C %*% x }
constrOptim(start, objective, gradient, ui = ui, ci = ci)

Try this:
co <- coef(lm.fit(ui, ci))
co[is.na(co)] <- 0
res <- nloptr( x0=co, eval_f=objective, eval_g_eq = constraints,
opts=list(algorithm = "NLOPT_LD_SLSQP"))
giving:
> res
Call:
nloptr(x0 = co, eval_f = objective, eval_g_eq = constraints,
opts = list(algorithm = "NLOPT_LD_SLSQP"))
Minimization using NLopt version 2.4.0
NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because
xtol_rel or xtol_abs (above) was reached. )
Number of Iterations....: 22
Termination conditions: relative x-tolerance = 1e-04 (DEFAULT)
Number of inequality constraints: 0
Number of equality constraints: 5
Optimal value of objective function: 37378.6963822218
Optimal value of controls: 28.62408 -29.80155 21.17747 -1.375917 -17.54977 -23.6277 -16.3723

Related

Get wrong answer when doing MLE on gamma parameters

I first want to sample 100 gamma distributed numbers where shape = 2 and scale = 1/2. I wrote down the log-likelyhood function and negated it since I'm using a minimization tool to maximize. I also tried using optim but to no avail. both optim and nlm gave me different answers. This is my code thus far:
N = 100
shape = 2
scale = 1/2
Data <- rgamma(SampSize, shape, scale)
LogL = function (x){
k = x[1]
gamma = x[2]
(-1)*(N*x[1]*log(x[2])+(x[1]-1)*sum(log(Data))-x[2]*sum(Data))
}
nlm(LogL,c(1.5,1))
logL <- function (x) -sum(dgamma(Data, x[1], x[2], log = TRUE))
N = 100
shape = 2
scale = 1/2
Data <- rgamma(N, shape, scale)
optim(c(1.5, 1), logL)$par
nlm(logL, c(1.5, 1))$estimate

nlm function fails with analytic Hessian

Some background: the nlm function in R is a general purpose optimization routine that uses Newton's method. To optimize a function, Newton's method requires the function, as well as the first and second derivatives of the function (the gradient vector and the Hessian matrix, respectively). In R the nlm function allows you to specify R functions that correspond to calculations of the gradient and Hessian, or one can leave these unspecified and numerical solutions are provided based on numerical derivatives (via the deriv function). More accurate solutions can be found by supplying functions to calculate the gradient and Hessian, so it's a useful feature.
My problem: the nlm function is slower and often fails to converge in a reasonable amount of time when the analytic Hessian is supplied. I'm guessing this is some sort of bug in the underlying code, but I'd be happy to be wrong. Is there a way to make nlm work better with an analytic Hessian matrix?
Example: my R code below demonstrates this problem using a logistic regression example, where
log(Pr(Y=1)/Pr(Y=0)) = b0 + Xb
where X is a multivariate normal of dimension N by p and b is a vector of coefficients of length p.
library(mvtnorm)
# example demonstrating a problem with NLM
expit <- function(mu) {1/(1+exp(-mu))}
mk.logit.data <- function(N,p){
set.seed(1232)
U = matrix(runif(p*p), nrow=p, ncol=p)
S = 0.5*(U+t(U)) + p*diag(rep(1,p))
X = rmvnorm(N, mean = runif(p, -1, 1), sigma = S)
Design = cbind(rep(1, N), X)
beta = sort(sample(c(rep(0,p), runif(1))))
y = rbinom(N, 1, expit(Design%*%beta))
list(X=X,y=as.numeric(y),N=N,p=p)
}
# function to calculate gradient vector at given coefficient values
logistic_gr <- function(beta, y, x, min=TRUE){
mu = beta[1] + x %*% beta[-1]
p = length(beta)
n = length(y)
D = cbind(rep(1,n), x)
gri = matrix(nrow=n, ncol=p)
for(j in 1:p){
gri[,j] = D[,j]*(exp(-mu)*y-1+y)/(1+exp(-mu))
}
gr = apply(gri, 2, sum)
if(min) gr = -gr
gr
}
# function to calculate Hessian matrix at given coefficient values
logistic_hess <- function(beta, y, x, min=TRUE){
# allow to fail with NA, NaN, Inf values
mu = beta[1] + x %*% beta[-1]
p = length(beta)
n = length(y)
D = cbind(rep(1,n), x)
h = matrix(nrow=p, ncol=p)
for(j in 1:p){
for(k in 1:p){
h[j,k] = -sum(D[,j]*D[,k]*(exp(-mu))/(1+exp(-mu))^2)
}
}
if(min) h = -h
h
}
# function to calculate likelihood (up to a constant) at given coefficient values
logistic_ll <- function(beta, y,x, gr=FALSE, he=FALSE, min=TRUE){
mu = beta[1] + x %*% beta[-1]
lli = log(expit(mu))*y + log(1-expit(mu))*(1-y)
ll = sum(lli)
if(is.na(ll) | is.infinite(ll)) ll = -1e16
if(min) ll=-ll
# the below specification is required for using analytic gradient/Hessian in nlm function
if(gr) attr(ll, "gradient") <- logistic_gr(beta, y=y, x=x, min=min)
if(he) attr(ll, "hessian") <- logistic_hess(beta, y=y, x=x, min=min)
ll
}
First example, with p=3:
dat = mk.logit.data(N=100, p=3)
The glm function estimates are for reference. nlm should give the same answer, allowing for small errors due to approximation.
(glm.sol <- glm(dat$y~dat$X, family=binomial()))$coefficients
> (Intercept) dat$X1 dat$X2 dat$X3
> 0.00981465 0.01068939 0.04417671 0.01625381
# works when correct analytic gradient is specified
(nlm.sol1 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, y=dat$y, x=dat$X))$estimate
> [1] 0.009814547 0.010689396 0.044176627 0.016253966
# works, but less accurate when correct analytic hessian is specified (even though the routine notes convergence is probable)
(nlm.sol2 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, he=TRUE, y=dat$y, x=dat$X, hessian = TRUE, check.analyticals=TRUE))$estimate
> [1] 0.009827701 0.010687278 0.044178416 0.016255630
But the problem becomes apparent when p is larger, here it is 10
dat = mk.logit.data(N=100, p=10)
Again, glm solution for reference. nlm should give the same answer, allowing for small errors due to approximation.
(glm.sol <- glm(dat$y~dat$X, family=binomial()))$coefficients
> (Intercept) dat$X1 dat$X2 dat$X3 dat$X4 dat$X5 dat$X6 dat$X7
> -0.07071882 -0.08670003 0.16436630 0.01130549 0.17302058 0.03821008 0.08836471 -0.16578959
> dat$X8 dat$X9 dat$X10
> -0.07515477 -0.08555075 0.29119963
# works when correct analytic gradient is specified
(nlm.sol1 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, y=dat$y, x=dat$X))$estimate
> [1] -0.07071879 -0.08670005 0.16436632 0.01130550 0.17302057 0.03821009 0.08836472
> [8] -0.16578958 -0.07515478 -0.08555076 0.29119967
# fails to converge in 5000 iterations when correct analytic hessian is specified
(nlm.sol2 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, he=TRUE, y=dat$y, x=dat$X, hessian = TRUE, iterlim=5000, check.analyticals=TRUE))$estimate
> [1] 0.31602065 -0.06185190 0.10775381 -0.16748897 0.05032156 0.34176104 0.02118631
> [8] -0.01833671 -0.20364929 0.63713991 0.18390489
Edit: I should also add that I have confirmed I have the correct Hessian matrix through multiple different approaches
I tried the code, but at first it seemed to be using a different rmvnorm than I can find on CRAN. I found one rmvnorm in dae package, then one in the mvtnorm package. The latter is the one to use.
nlm() was patched about the time of the above posting. I'm currently trying to verify the patches and it now seems to work OK. Note that I am author of a number of R's optimization codes, including 3/5 in optim().
nashjc at uottawa.ca
Code is below.
Revised code:
# example demonstrating a problem with NLM
expit <- function(mu) {1/(1+exp(-mu))}
mk.logit.data <- function(N,p){
set.seed(1232)
U = matrix(runif(p*p), nrow=p, ncol=p)
S = 0.5*(U+t(U)) + p*diag(rep(1,p))
X = rmvnorm(N, mean = runif(p, -1, 1), sigma = S)
Design = cbind(rep(1, N), X)
beta = sort(sample(c(rep(0,p), runif(1))))
y = rbinom(N, 1, expit(Design%*%beta))
list(X=X,y=as.numeric(y),N=N,p=p)
}
# function to calculate gradient vector at given coefficient values
logistic_gr <- function(beta, y, x, min=TRUE){
mu = beta[1] + x %*% beta[-1]
p = length(beta)
n = length(y)
D = cbind(rep(1,n), x)
gri = matrix(nrow=n, ncol=p)
for(j in 1:p){
gri[,j] = D[,j]*(exp(-mu)*y-1+y)/(1+exp(-mu))
}
gr = apply(gri, 2, sum)
if(min) gr = -gr
gr
}
# function to calculate Hessian matrix at given coefficient values
logistic_hess <- function(beta, y, x, min=TRUE){
# allow to fail with NA, NaN, Inf values
mu = beta[1] + x %*% beta[-1]
p = length(beta)
n = length(y)
D = cbind(rep(1,n), x)
h = matrix(nrow=p, ncol=p)
for(j in 1:p){
for(k in 1:p){
h[j,k] = -sum(D[,j]*D[,k]*(exp(-mu))/(1+exp(-mu))^2)
}
}
if(min) h = -h
h
}
# function to calculate likelihood (up to a constant) at given coefficient values
logistic_ll <- function(beta, y,x, gr=FALSE, he=FALSE, min=TRUE){
mu = beta[1] + x %*% beta[-1]
lli = log(expit(mu))*y + log(1-expit(mu))*(1-y)
ll = sum(lli)
if(is.na(ll) | is.infinite(ll)) ll = -1e16
if(min) ll=-ll
# the below specification is required for using analytic gradient/Hessian in nlm function
if(gr) attr(ll, "gradient") <- logistic_gr(beta, y=y, x=x, min=min)
if(he) attr(ll, "hessian") <- logistic_hess(beta, y=y, x=x, min=min)
ll
}
##!!!! NOTE: Must have this library loaded
library(mvtnorm)
dat = mk.logit.data(N=100, p=3)
(glm.sol <- glm(dat$y~dat$X, family=binomial()))$coefficients
# works when correct analytic gradient is specified
(nlm.sol1 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, y=dat$y, x=dat$X))$estimate
# works, but less accurate when correct analytic hessian is specified (even though the routine notes convergence is probable)
(nlm.sol2 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, he=TRUE, y=dat$y, x=dat$X, hessian = TRUE, check.analyticals=TRUE))$estimate
dat = mk.logit.data(N=100, p=10)
# Again, glm solution for reference. nlm should give the same answer, allowing for small errors due to approximation.
(glm.sol <- glm(dat$y~dat$X, family=binomial()))$coefficients
# works when correct analytic gradient is specified
(nlm.sol1 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, y=dat$y, x=dat$X))$estimate
# fails to converge in 5000 iterations when correct analytic hessian is specified
(nlm.sol2 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, he=TRUE, y=dat$y, x=dat$X, hessian = TRUE, iterlim=5000, check.analyticals=TRUE))$estimate

R: Nonlinear optimization with equality constraints

I have a TxN matrix M and a Nx1 weight vector w, where sum(w)=1.
I need to find the w, which maximises the number of positive elements in Mw.
If there is no single w, the maximum possible value for Mw is desired.
More formally, denote with M_t the t-row in M, then I need
max_w Sum_t I(M_t w)
sub 1'w=1
where 1 is the vector of ones and the function I(x) returns 1, if x is positive, and 0 otherwise.
Note: if dividing the objective function by T, the summands can be thought as frequencies of a weighted sum of random variables.
Data can be simulated as follows:
N <- 4
set.seed(1)
M <- matrix(rnorm(200), ncol=N)
w <- as.matrix(rep(1/N, N))
one <- as.matrix(rep(1, N))
The objective function and the constraint are:
I <- function(r) as.numeric(r>0)
f <- function(x) -sum(sapply(M %*% x, I))
h <- function(x) t(one) %*% x - 1
Now in R one possibility could be nloptr:
library(nloptr)
local_opts <- list( "algorithm" = "NLOPT_LN_AUGLAG_EQ",
"xtol_rel" = 1.0e-7 )
opts <- list( "algorithm" = "NLOPT_LN_AUGLAG_EQ",
"xtol_rel" = 1.0e-7,
"maxeval" = 100000,
"local_opts" = local_opts,
"print_level" = 2)
nloptr( x0=as.vector(w), eval_f=f, eval_g_eq=h, opts=opts)
I get:
NLopt solver status: -4 ( NLOPT_ROUNDOFF_LIMITED: Roundoff errors led to a
breakdown of the optimization algorithm. In this case, the returned minimum may
still be useful.
Number of Iterations....: 17530
Termination conditions: xtol_rel: 1e-07 maxeval: 1e+05
[...]
Current value of objective function: -33
Current value of controls: 0.5175225 0.1124845 0.1598906 0.2101024
Note that 17530 iterations are far less than maxeval (100000).
I don't know how to correctly use NLOPT_GN_ISRES, which could potentially give speed improvements. Replacing NLOPT_LN_AUGLAG_EQ with NLOPT_GN_ISRES gives:
NLopt solver status: -2 ( NLOPT_INVALID_ARGS: Invalid arguments (e.g. lower
bounds are bigger than upper bounds, an unknown algorithm was specified,
etcetera). )
I am new to nloptr, so I would like to know what follows.
As I understand my result for f (-33) is reliable with a tolerance of 1.0e-7. Is this correct?
How can I state whether the value for wis unique?
What is the syntax for NLOPT_GN_ISRES?
The objective f contains an implicit if condition, can I still use a (numerical) gradient?
Are other packages, such as alabama, better at this type of problems?
Update Changed starting point to w.

Issue with constrOptim

When doing constrained optimization using the constrOptim function, I sometimes get the following error message:
Error in optim(theta.old, fun, gradient, control = control, method = method, :
initial value in 'vmmin' is not finite
Example
x <- c(-0.2496881061155757641767394261478330008685588836669921875,
0.0824038146359631351600683046854101121425628662109375,
0.25000000111421105675191256523248739540576934814453125)
nw <- length(x)
ui <- diag(1, nrow = nw)
ui <- rbind(ui, rep(0, nw))
ui[cbind(2:(nw + 1), 1:nw)] <- -1
ci <- rep(-0.8 / (nw + 1), nw + 1)
constrOptim(theta = rep(0, nw), f = function(theta) mean((theta - x)^2),
grad = function(theta) 2 * (theta - x), ui = ui, ci = ci,
method = "BFGS")
What I know
The problem occurs during the iteration inside constrOptim, when the result comes so close to the boundary that almost all point evaluated by the BFGS optimizer are NaNs (excluding the initial point). In this case, BFGS will sometimes return an optimal value of NaN and a corresponding minimizing parameter outside the constraint set.
In constrOptim, the objective function fed to BFGS is given by
R <- function(theta, theta.old, ...) {
ui.theta <- ui %*% theta
gi <- ui.theta - ci
if (any(gi < 0)) {
return(NaN)
}
gi.old <- ui %*% theta.old - ci
bar <- sum(gi.old * log(gi) - ui.theta)
if (!is.finite(bar))
bar <- -Inf
f(theta, ...) - mu * bar
}
My question
It seems to me that the obvious solution to the problem is to simply return sign(mu) * Inf instead of NaN if there are any gi < 0, but could this fix lead to other problems?
After normalizing the gradient properly
constrOptim(theta = rep(0, nw), f = function(theta) mean((theta - x)^2),
grad = function(theta) 2 / nw * (theta - x), ui = ui, ci = ci,
method = "BFGS")
I can no longer replicate the problem. It seems that the issue was caused by the wrong weighting of the gradient of the objective function and the gradient of the logarithmic barrier term in the internal gradient.
However, I still think that returning Inf outside the boundary would be more robust than returning NaN.

Optimizing for Vector Using Optimize R

I want to construct my own optimization using R's optimization function.
The objective function is the diversification ratio, to maximize it (hope its correct):
div.ratio<-function(weight,vol,cov.mat){
dr<-(t(weight) %*% vol) / (sqrt(t(weight) %*% cov.mat %*% (weight)))
return(-dr)
}
A example:
rm(list=ls())
require(RCurl)
sit = getURLContent('https://github.com/systematicinvestor/SIT/raw/master/sit.gz', binary=TRUE, followlocation = TRUE, ssl.verifypeer = FALSE)
con = gzcon(rawConnection(sit, 'rb'))
source(con)
close(con)
load.packages('quantmod')
data <- new.env()
tickers<-spl("VTI,VGK,VWO,GLD,VNQ,TIP,TLT,AGG,LQD")
getSymbols(tickers, src = 'yahoo', from = '1980-01-01', env = data, auto.assign = T)
for(i in ls(data)) data[[i]] = adjustOHLC(data[[i]], use.Adjusted=T)
bt.prep(data, align='remove.na', dates='1990::2013')
prices<-data$prices[,-10] #don't include cash
ret<-na.omit(prices/mlag(prices) - 1)
vol<-apply(ret,2,sd)
cov.mat<-cov(ret)
optimize(div.ratio,
weight,
vol=vol,
cov.mat=cov.mat,
lower=0, #min constraints
upper=1, #max
tol = 0.00001)$minimum
I get the following error message which seems to be it that optimization package doesn't do vector optimization. What did I do wrong?
Error in t(weight) %*% cov.mat : non-conformable arguments
First of all, weight has no reason to be in the optimize call if that's what you are trying to optimize.
Then, optimize is for one-dimensional optimization while you are trying to solve for a vector of weights. You could use the optim function instead.
Regarding your second question in the comments, how do you set a constraint that it sums to 1 for the function? You can use the trick proposed here: How to set parameters' sum to 1 in constrained optimization, i.e, rewrite your objective function as follows:
div.ratio <- function(weight, vol, cov.mat){
weight <- weight / sum(weight)
dr <- (t(weight) %*% vol) / (sqrt(t(weight) %*% cov.mat %*% (weight)))
return(-dr)
}
This gives:
out <- optim(par = rep(1 / length(vol), length(vol)), # initial guess
fn = div.ratio,
vol = vol,
cov.mat = cov.mat,
method = "L-BFGS-B",
lower = 0,
upper = 1)
Your optimal weights:
opt.weights <- out$par / sum(out$par)
# [1] 0.154271776 0.131322307 0.073752360 0.030885856 0.370706931 0.049627627
# [7] 0.055785740 0.126062746 0.007584657
pie(opt.weights, names(vol))

Resources