R: Nonlinear optimization with equality constraints - r

I have a TxN matrix M and a Nx1 weight vector w, where sum(w)=1.
I need to find the w, which maximises the number of positive elements in Mw.
If there is no single w, the maximum possible value for Mw is desired.
More formally, denote with M_t the t-row in M, then I need
max_w Sum_t I(M_t w)
sub 1'w=1
where 1 is the vector of ones and the function I(x) returns 1, if x is positive, and 0 otherwise.
Note: if dividing the objective function by T, the summands can be thought as frequencies of a weighted sum of random variables.
Data can be simulated as follows:
N <- 4
set.seed(1)
M <- matrix(rnorm(200), ncol=N)
w <- as.matrix(rep(1/N, N))
one <- as.matrix(rep(1, N))
The objective function and the constraint are:
I <- function(r) as.numeric(r>0)
f <- function(x) -sum(sapply(M %*% x, I))
h <- function(x) t(one) %*% x - 1
Now in R one possibility could be nloptr:
library(nloptr)
local_opts <- list( "algorithm" = "NLOPT_LN_AUGLAG_EQ",
"xtol_rel" = 1.0e-7 )
opts <- list( "algorithm" = "NLOPT_LN_AUGLAG_EQ",
"xtol_rel" = 1.0e-7,
"maxeval" = 100000,
"local_opts" = local_opts,
"print_level" = 2)
nloptr( x0=as.vector(w), eval_f=f, eval_g_eq=h, opts=opts)
I get:
NLopt solver status: -4 ( NLOPT_ROUNDOFF_LIMITED: Roundoff errors led to a
breakdown of the optimization algorithm. In this case, the returned minimum may
still be useful.
Number of Iterations....: 17530
Termination conditions: xtol_rel: 1e-07 maxeval: 1e+05
[...]
Current value of objective function: -33
Current value of controls: 0.5175225 0.1124845 0.1598906 0.2101024
Note that 17530 iterations are far less than maxeval (100000).
I don't know how to correctly use NLOPT_GN_ISRES, which could potentially give speed improvements. Replacing NLOPT_LN_AUGLAG_EQ with NLOPT_GN_ISRES gives:
NLopt solver status: -2 ( NLOPT_INVALID_ARGS: Invalid arguments (e.g. lower
bounds are bigger than upper bounds, an unknown algorithm was specified,
etcetera). )
I am new to nloptr, so I would like to know what follows.
As I understand my result for f (-33) is reliable with a tolerance of 1.0e-7. Is this correct?
How can I state whether the value for wis unique?
What is the syntax for NLOPT_GN_ISRES?
The objective f contains an implicit if condition, can I still use a (numerical) gradient?
Are other packages, such as alabama, better at this type of problems?
Update Changed starting point to w.

Related

MLE of the parameters of a PDF written as an infinite sum of terms

My question relates to the use of R for the derivation of maximum likelihood estimates of parameters when a probability distributions is expressed in the form of an infinite sum, such as the one below due to Rao, Girija et al.
I wanted to see if I could reproduce the maximum likelihood estimates obtained by these authors (who used Matlab, rather than R) when the model is applied to a given set of data. My attempt is given below, although this throws up several warnings that "longer object length is not a multiple of shorter object length". I know why I am getting this warning, but I do not know how to remedy it. How can I edit my code to overcome this?
Also, is there a better way to handle infinite sums? Here I'm just using an arbitrary large number for n (1000).
library(bbmle)
svec <- list(c=1,lambda=1)
x <- scan(textConnection("0.1396263 0.1570796 0.2268928 0.2268928 0.2443461 0.3141593 0.3839724 0.4712389 0.5235988 0.5934119 0.6632251 0.6632251 0.6981317 0.7679449 0.7853982 0.8203047 0.8377580 0.8377580 0.8377580 0.8377580 0.8726646 0.9250245 0.9773844 0.9948377 1.0122910 1.0122910 1.0646508 1.0995574 1.1170107 1.1170107 1.1170107 1.1344640 1.1344640 1.1868239 1.2217305 1.2740904 1.3613568 1.3613568 1.3613568 1.4486233 1.4486233 1.5358897 1.5358897 1.5358897 1.5707963 1.6057029 1.6057029 1.6231562 1.6580628 1.6755161 1.7104227 1.7453293 1.7976891 1.8500490 1.9722221 2.0594885 2.4085544 2.6703538 2.6703538 2.7052603 3.5604717 3.7524579 3.8920842 3.9444441 4.1364303 4.1538836 4.2411501 4.2586034 4.3633231 4.3807764 4.4854962 4.6774824 4.9741884 5.5676003 5.9864793 6.1086524"))
dL <- function(x, c,lambda,n = 1000,log=TRUE) {
k <- 0:n
r <- log(sum(lambda*c*(x+2*k*pi)^(-c-1)*(exp(-(x+2*k*pi)^(-c))^(lambda))))
if (log) return(r) else return(exp(r))
}
dat <- data.frame(x)
m1 <- mle2( x ~ dL(c,lambda),
data=dat,
start=svec,
control=list(parscale=unlist(svec)),
method="L-BFGS-B",
lower=c(0,0)
)
I suggest starting out with that algorithm and making a density function that can be tested for proper behavior by integrating over its range of definition, (c(0, 2*pi). You are calling it a "probability function" but that is a term that I associate with CDF's rather than density distributions (PDF's):
dL <- function(x, c=1,lambda=1,n = 1000, log=FALSE) {
k <- 0:n
r <- sum(lambda*c*(x+2*k*pi)^(-c-1)*(exp(-(x+2*k*pi)^(-c))^(lambda)))
if (log) {log(r) }
}
vdL <- Vectorize(dL)
integrate(vdL, 0,2*pi)
#0.999841 with absolute error < 9.3e-06
LL <- function(x, c, lambda){ -sum( log( vdL(x, c, lambda))) }
(I think you were trying to pack too much into your log-likelihood function so I decide to break apart the steps.)
When I ran that version I got a warning message from the final mle2 step that I didn't like and I thought it might be the case that this density function was occasionally returning negative values, so this was my final version:
dL <- function(x, c=1,lambda=1,n = 1000) {
k <- 0:n
r <- max( sum(lambda*c*(x+2*k*pi)^(-c-1)*(exp(-(x+2*k*pi)^(-c))^(lambda))), 0.00000001)
}
vdL <- Vectorize(dL)
integrate(vdL, 0,2*pi)
#0.999841 with absolute error < 9.3e-06
LL <- function(x, c, lambda){ -sum( log( vdL(x, c, lambda))) }
(m0 <- mle2(LL,start=list(c=0.2,lambda=1),data=list(x=x)))
#------------------------
Call:
mle2(minuslogl = LL, start = list(c = 0.2, lambda = 1), data = list(x = x))
Coefficients:
c lambda
0.9009665 1.1372237
Log-likelihood: -116.96
(The warning and the warning-free LL numbers were the same.)
So I guess I think you were attempting to pack too much into your definition of a log-likelihood function and got tripped up somewhere. There should have been two summations, one for the density approximation and a second one for the summation of the log-likelihood. The numbers in those summations would have been different, hence the error you were seeing. Unpacking the steps allowed success at least to the extent of not throwing errors. I'm not sure what that density represents and cannot verify correctness.
As far as the question of whether there is a better way to approximate an infinite series, the answer hinges on what is known about the rate of convergence of the partial sums, and whether you can set up a tolerance value to compare successive values and stop calculations after a smaller number of terms.
When I look at the density, it makes me wonder if it applies to some scattering process:
curve(vdL(x, c=.9, lambda=1.137), 0.00001, 2*pi)
You can examine the speed of convergence by looking at the ratios of successive terms. Here's a function that does that for the first 10 terms at an arbitrary x:
> ratios <- function(x, c=1, lambda=1) {lambda*c*(x+2*(1:11)*pi)^(-c-1)*(exp(-(x+2*(1:10)*pi)^(-c))^(lambda))/lambda*c*(x+2*(0:10)*pi)^(-c-1)*(exp(-(x+2*(0:10)*pi)^(-c))^(lambda)) }
> ratios(0.5)
[1] 1.015263e-02 1.017560e-04 1.376150e-05 3.712618e-06 1.392658e-06 6.351874e-07 3.299032e-07 1.880054e-07
[9] 1.148694e-07 7.409595e-08 4.369854e-08
Warning message:
In lambda * c * (x + 2 * (1:11) * pi)^(-c - 1) * (exp(-(x + 2 * :
longer object length is not a multiple of shorter object length
> ratios(0.05)
[1] 1.755301e-08 1.235632e-04 1.541082e-05 4.024074e-06 1.482741e-06 6.686497e-07 3.445688e-07 1.952358e-07
[9] 1.187626e-07 7.634088e-08 4.443193e-08
Warning message:
In lambda * c * (x + 2 * (1:11) * pi)^(-c - 1) * (exp(-(x + 2 * :
longer object length is not a multiple of shorter object length
> ratios(0.5)
[1] 1.015263e-02 1.017560e-04 1.376150e-05 3.712618e-06 1.392658e-06 6.351874e-07 3.299032e-07 1.880054e-07
[9] 1.148694e-07 7.409595e-08 4.369854e-08
Warning message:
In lambda * c * (x + 2 * (1:11) * pi)^(-c - 1) * (exp(-(x + 2 * :
longer object length is not a multiple of shorter object length
That looks like pretty rapid convergence to me, so I'm guessing that you could use only the first 20 terms and get similar results. With 20 terms the results look like:
> integrate(vdL, 0,2*pi)
0.9924498 with absolute error < 9.3e-06
> (m0 <- mle2(LL,start=list(c=0.2,lambda=1),data=list(x=x)))
Call:
mle2(minuslogl = LL, start = list(c = 0.2, lambda = 1), data = list(x = x))
Coefficients:
c lambda
0.9542066 1.1098169
Log-likelihood: -117.83
Since you never attempt to interpret a LL in isolation but rather look at differences, I'm guessing that the minor difference will not affect your inferences adversely.

How does ar.yw estimate the variance

In R, how does the function ar.yw estimate the variance? Specifically, where does the number "var.pred" come from? It does not seem to come from the usual YW estimate of the variance, nor the sum of squared residuals divided by df (even though there is disagreement about what the df should be, none of the choices give an answer equivalent to var.pred). And yes, I know that there are better methods than YW; just trying to figure out what R is doing.
set.seed(82346)
temp <- arima.sim(n=10, list(ar = 0.5), sd=1)
fit <- ar(temp, method = "yule-walker", demean = FALSE, aic=FALSE, order.max=1)
## R's estimate of the sigma squared
fit$var.pred
## YW estimate
sum(temp^2)/10 - fit$ar*sum(temp[2:10]*temp[1:9])/10
## YW if there was a mean
sum((temp-mean(temp))^2)/10 - fit$ar*sum((temp[2:10]-mean(temp))*(temp[1:9]-mean(temp)))/10
## estimate based on residuals, different possible df.
sum(na.omit(fit$resid^2))/10
sum(na.omit(fit$resid^2))/9
sum(na.omit(fit$resid^2))/8
sum(na.omit(fit$resid^2))/7
Need to read the code if it's not documented.
?ar.yw
Which says: "In ar.yw the variance matrix of the innovations is computed from the fitted coefficients and the autocovariance of x." If that is not enough explanation, then you need to look at the code:
methods(ar.yw)
#[1] ar.yw.default* ar.yw.mts*
#see '?methods' for accessing help and source code
getAnywhere(ar.yw.default)
# there are two cases that I see
x <- as.matrix(x)
nser <- ncol(x)
if (nser > 1L) # .... not your situation
#....
else{
r <- as.double(drop(xacf))
z <- .Fortran(C_eureka, as.integer(order.max), r, r,
coefs = double(order.max^2), vars = double(order.max),
double(order.max))
coefs <- matrix(z$coefs, order.max, order.max)
partialacf <- array(diag(coefs), dim = c(order.max, 1L,
1L))
var.pred <- c(r[1L], z$vars)
#.......
order <- if (aic)
(0L:order.max)[xaic == 0L]
else order.max
ar <- if (order)
coefs[order, seq_len(order)]
else numeric()
var.pred <- var.pred[order + 1L]
var.pred <- var.pred * n.used/(n.used - (order + 1L))
So you now need to find the Fortran code for C_eureka. I think I'm finding it here: https://svn.r-project.org/R/trunk/src/library/stats/src/eureka.f This is the code that aI think is returning the var.pred estimate. I'm not a time series guy and It's your responsibility to review this process for applicability to your problem.
subroutine eureka (lr,r,g,f,var,a)
c
c solves Toeplitz matrix equation toep(r)f=g(1+.)
c by Levinson's algorithm
c a is a workspace of size lr, the number
c of equations
c
snipped
c estimate the innovations variance
var(l) = var(l-1) * (1 - f(l,l)*f(l,l))
if (l .eq. lr) return
d = 0.0d0
q = 0.0d0
do 50 i = 1, l
k = l-i+2
d = d + a(i)*r(k)
q = q + f(l,i)*r(k)
50 continue

Portfolio optimization

I am trying to build a portfolio which is optimized with respect to another in R.
I am trying to minimize the objective function
$$min Var(return_p-return'weight_{bm})$$
with the constraints
$$ 1_n'w = 1$$
$$w > .005$$
$$w < .8$$
with w being the returns from a portfolio. there are 10 securities, so I set the benchmark weights at .1 each.
I know that
$$ Var(return_p-return'weight_{bm})= var(r) + var(r'w_{bm}) - 2*cov(r_p, r'w_{bm})=var(r'w)-2cov(r'w,r'w_{bm})=w'var(r)w-2cov(r'w,r'w_{bm})$$
$$=w'var(r)w-2cov(r',r'w_bm)w$$
the last term is of the form I need so I tried to solve this with solve.QP in R, the constraints are giving me a problem though.
here is my code
trackport <- array(rnorm(obs * assets, mean = .2, sd = .15), dim = c(obs,
assets)) #this is the portfolio which the assets are tracked against
wbm <- matrix(rep(1/assets, assets)) #random numbers for the weights
Aeq <- t(matrix(rep(1,assets), nrow=assets, ncol = 1)) #col of 1's to add
#the weights
Beq <- 1 # weights should sum to 1's
H = 2*cov(trackport) #times 2 because of the syntax
#multiplies the returns times coefficients to create a vector of returns for
#the benchmark
rbm = trackport %*% wbm
#covariance between the tracking portfolio and benchmark returns
eff <- cov(trackport, rbm)
#constraints
Amatrix <- t(matrix(c(Aeq, diag(assets), -diag(assets)), ncol = assets,
byrow = T))
Bvector <- matrix(c(1,rep(.005, assets), rep(.8, assets)))
#solve
solQP3 <- solve.QP(Dmat = H,
dvec = zeros, #reduces to min var portfolio for
#troubleshooting purposes
Amat = Amatrix,
bvec = Bvector,
meq = 1)
the error I am getting is "constraints are inconsistent, no solution!" but I can't find what's wrong with my A matrix
My (transposed) A matrix looks like this
[1,1,...,1]
[1,0,...,0]
[0,1,...,0]
...
[0,0,...,1]
[-1,0,...,0]
[0,-1,...,0]
...
[0,0,...,-1]
and my $b_0$ looks like this
[1]
[.005]
[.005]
...
[.005]
[.8]
[.8]
...
[.8]
so I'm not sure why it isn't finding a solution, could anyone take a look?
I'm not familiar with the package, but just took a quick look at https://cran.r-project.org/web/packages/quadprog/quadprog.pdf , which apparently is what you are using.
Your RHS values of .8 should be -0.8 because this function uses ≥ inequalities. So you have been constraining the variables to be ≥ .005 and ≤ -0.8, which of course is not what you want, and is infeasible.
So leave transposed A as is and make
b0:
[1]
[.005]
[.005]
...
[.005]
[-.8]
[-.8]
...
[-.8]

constrained nonlinear minimization with many variables

Here is a minimization problem I've meant to solve, but no matter what form or package I try it with, it never resolves itself.
The Problem is a transportation problem with a quadratic objective function. It is formulated as follows:
Minimize f(x), with f(x) being x' * C * x, subject to the equality constraints UI * x - ci = 0.
where C is a diagonal matrix of constants, UI is matrix with the values 0, 1, -1 in order to set up the constraints.
I'll provide an example that I have tried with two functions so far, nloptr from its likewise called package and constrOptim.
Here's an example for nloptr:
require(nloptr)
objective <- function(x) {return( list( "objective" = t(x) %*% C %*% x,
"gradient" = 2* C %*% x )) }
constraints <- function(x) {return( list( "constraints" = ui %*% x - ci,
"jacobian" = ui))}
C <- diag(c(10,15,14,5,6,10,8))
ci <- c(20, -30, -10, -20, 40))
ui <- rbind( c(1,1,1,0,0,0,0),
c(-1,0,0,1,0,0,0),
c(0,-1,0,-1,1,1,0),
c(0,0,-1,0,-1,0,1),
c(0,0,0,0,0,-1,-1))
opts <- list("alorithm" = "NLOPT_GN_ISRES")
res <- nloptr( x0=x0, eval_f=objective, eval_g_eq = constraints, opts=opts)
When trying to solve this Problem with constrOptim, I face the problem that I have to provide starting values that are within the feasible region. However, I will ultimately have a lot of equations and don't really know how to set these starting points.
Here's the same example with constrOptim:
C <- diag(c(10,15,14,5,6,10,8))
ci <- c(20, -30, -10, -20, 40)
ui <- rbind( c(1,1,1,0,0,0,0),
c(-1,0,0,1,0,0,0),
c(0,-1,0,-1,1,1,0),
c(0,0,-1,0,-1,0,1),
c(0,0,0,0,0,-1,-1))
start <- c(10,10,10,0,0,0,0)
objective <- function(x) { t(x) %*% C %*% x }
gradient <- function(x) { 2 * C %*% x }
constrOptim(start, objective, gradient, ui = ui, ci = ci)
Try this:
co <- coef(lm.fit(ui, ci))
co[is.na(co)] <- 0
res <- nloptr( x0=co, eval_f=objective, eval_g_eq = constraints,
opts=list(algorithm = "NLOPT_LD_SLSQP"))
giving:
> res
Call:
nloptr(x0 = co, eval_f = objective, eval_g_eq = constraints,
opts = list(algorithm = "NLOPT_LD_SLSQP"))
Minimization using NLopt version 2.4.0
NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because
xtol_rel or xtol_abs (above) was reached. )
Number of Iterations....: 22
Termination conditions: relative x-tolerance = 1e-04 (DEFAULT)
Number of inequality constraints: 0
Number of equality constraints: 5
Optimal value of objective function: 37378.6963822218
Optimal value of controls: 28.62408 -29.80155 21.17747 -1.375917 -17.54977 -23.6277 -16.3723

Why doesn't solve.QP and portfolio.optim generate identical results?

The documentation for portfolio.optim {tseries} says that solve.QP {quadprog} is used to generate the solution for finding the tangency portfolio that maximizes the Sharpe ratio. That implies that results should be identical with either function. I'm probably overlooking something, but in this simple example I get similar but not identical solutions for estimating optimal portfolio weights with portfolio.optim and solve.QP. Shouldn't the results be identical? If so, where am I going wrong? Here's the code:
library(tseries)
library(quadprog)
# 1. Generate solution with solve.QP via: comisef.wikidot.com/tutorial:tangencyportfolio
# create artifical data
set.seed(1)
nO <- 100 # number of observations
nA <- 10 # number of assets
mData <- array(rnorm(nO * nA, mean = 0.001, sd = 0.01), dim = c(nO, nA))
rf <- 0.0001 # riskfree rate (2.5% pa)
mu <- apply(mData, 2, mean) # means
mu2 <- mu - rf # excess means
# qp
aMat <- as.matrix(mu2)
bVec <- 1
zeros <- array(0, dim = c(nA,1))
solQP <- solve.QP(cov(mData), zeros, aMat, bVec, meq = 1)
# rescale variables to obtain weights
w <- as.matrix(solQP$solution/sum(solQP$solution))
# 2. Generate solution with portfolio.optim (using artificial data from above)
port.1 <-portfolio.optim(mData,riskless=rf)
port.1.w <-port.1$pw
port.1.w <-matrix(port.1.w)
# 3. Compare portfolio weights from the two methodologies:
compare <-cbind(w,port.1$pw)
compare
[,1] [,2]
[1,] 0.337932967 0.181547633
[2,] 0.073836572 0.055100415
[3,] 0.160612441 0.095800361
[4,] 0.164491490 0.102811562
[5,] 0.005034532 0.003214622
[6,] 0.147473396 0.088792283
[7,] -0.122882875 0.000000000
[8,] 0.127924865 0.067705050
[9,] 0.026626940 0.012507530
[10,] 0.078949672 0.054834759
The one and the only way to deal with such situations is to browse the source. In your case, it is accessible via tseries:::portfolio.optim.default.
Now, to find the difference between those two calls, we may narrow down the issue by defining an equivalent helper function:
foo <- function(x, pm = mean(x), covmat = cov(x), riskless = FALSE, rf = 0)
{
x <- mData
pm <- mean(x)
covmat <- cov(x)
k <- dim(x)[2]
Dmat <- covmat
dvec <- rep.int(0, k)
a1 <- colMeans(x) - rf
a2 <- matrix(0, k, k)
diag(a2) <- 1
b2 <- rep.int(0, k)
Amat <- t(rbind(a1, a2))
b0 <- c(pm - rf, b2)
solve.QP(Dmat, dvec, Amat, bvec = b0, meq = 1)$sol
}
identical(portfolio.optim(mData, riskless=TRUE, rf=rf)$pw,
foo(mData, riskless=TRUE, rf=rf))
#[1] TRUE
With that, you can see that 1) riskless=rf is not the intended way, riskless=TRUE, rf=rf is the correct one; 2) there are several differences in Amat and bvec.
I am not an expert in portfolio optimization, so I do not know what's the explanation behind these additional constraints and if they should be there in the first place, but at least you can see what exactly causes the difference.
The difference in your example occurs due to the default value 'shorts=FALSE' in tseries::portfolio.optim(). Therefore you would have to either change the argument or add a non-negativity restriction in your solve.QP problem to reach the same results.
EDIT: While the answer still holds true, there seem to be some other weird default values with tseries::portfolio.optim(). For example it sets the minimum return requirement to pm = mean(x), leading to a random portfolio on the efficiency frontier instead of returning the global minimum variance portfolio if there is no return requirement. Bottom line: Stay with your quadprog::solve.QP solution. Enclosed an example of the wrapper function I use (I just started working with R and while I am quite sure that this delivers mathematically correct results, it might not be the cleanest piece of code):
# --------------------------------------------------------------------------
#' Quadratic Optimization
#' #description Wrapper for quadratic optimization to calculate the general
#' mean-variance portfolio.
#' #param S [matrix] Covariance matrix.
#' #param mu [numeric] Optional. Vector of expected returns.
#' #param wmin [numeric] Optional. Min weight per asset.
#' #param wmax [numeric] Optional. Max weight per asset.
#' #param mu_target [numeric] Optional. Required return, if empty the optimization returns the global minimum variance portfolio
#' #return Returns the mean-variance portfolio or the global minimum variance portfolio
# --------------------------------------------------------------------------
meanvar.pf <- function(S,
mu=NULL,
wmin=-1000,
wmax=1000,
mu_target=NULL){
if (!try(require(quadprog)))
stop("Execute 'install.packages('quadprog')' and try again")
if (missing(S))
stop("Covariance matrix is missing")
if (!is.null(mu) & dim(S)[1] != length(mu))
stop("S and mu have non-conformable dimensions")
N <- ncol(S)
if (wmin >= 1/N)
stop("wmin >= 1/N is not feasible")
if (wmax <= 1/N)
stop("wmax <= 1/N is not feasible")
meq <- 1
bvec <- c(1, rep(wmin,N), -rep(wmax,N))
Amat <- cbind(rep(1, N), diag(N), -diag(N))
if (!is.null(mu_target)) {
if (is.null(mu))
stop("Vector of asset returns is missing")
Amat <- cbind(mu, Amat)
bvec <- c(mu_target, bvec)
meq <- 2
}
result <- quadprog::solve.QP(Dmat=S,
dvec=rep(0, N),
Amat=Amat,
bvec=bvec,
meq=meq)
return(result)
}

Resources