quadprog optimization - r

Here's an interesting puzzle.
Below is an R snippet that identifies the tangency point of a quadratic function with respect to a line drawn from the point (0,rf) on the y-axis.
For those familiar with portfolio theory, this point is in return and risk space and the solution is set of weights that define the tangency portfolio (max sharpe ratio). The snippet allows for negative weights (i.e. shorts) and there is one equality weight constraint which requires the sum of the weights = 1.
# create artifical data
nO <- 100 # number of observations
nA <- 10 # number of assets
mData <- array(rnorm(nO * nA, mean = 0.001, sd = 0.01), dim = c(nO, nA))
rf <- 0.0001 # riskfree rate (2.5% pa)
mu <- apply(mData, 2, mean) # means
mu2 <- mu - rf # excess means
# qp
aMat <- as.matrix(mu2)
bVec <- 1 # set expectation of portfolio excess return to 1
zeros <- array(0, dim = c(nA,1))
solQP <- solve.QP(cov(mData), zeros, aMat, bVec, meq = 1)
# rescale variables to obtain weights
w <- as.matrix(solQP$solution/sum(solQP$solution))
# compute sharpe ratio
SR <- t(w) %*% mu2 / sqrt(t(w) %*% cov(mData) %*% w)
My question -- how to adapt the code to solve for the optimal set of weights such that the sum of weights sum to an arbitrary number (including the corner case of a self-financing portfolio where the sum of weights = 0) as opposed to unity?
Alternatively, you might consider adding an element 'cash' to the covariance matrix with variance-covariance of 0, and add an equality constraint requiring the weight on cash = 1. However this matrix would be not be positive semi-definite. Also I suspect the non-cash weights might be trivially zero.

Let us first explain why this
actually produces the maximum Sharpe ratio portfolio.
We want w to maximize w' mu / sqrt( w' V w ).
But that quantity is unchanged if we multiply w by a number
(it is "homogeneous of degree 0"):
we can therefore impose w' mu = 1, and the problem
of maximizing 1 / sqrt( w' V w ) is equivalent
to minimizing w' V w.
The maximum Sharpe ratio portfolio is not unique: they form a line.
If we want the weights to sum up to 1 (or any other non-zero number),
we just have to rescale them.
If we want the weights to sum up to 0,
we can add that constraint to the problem
-- it only works because the constraint is also homogeneous of degree 0.
You will still need to rescale the weights, e.g., to be 100% long and 100% short.
solQP <- solve.QP(cov(mData), zeros,
meq = 2
# Let us compare with another solver
V <- cov(mData)
r <- solnp(
rep(1/length(mu), length(mu)),
function(w) - t(w) %*% mu2 / sqrt( t(w) %*% V %*% w ),
eqfun = function(w) sum(w),
eqB = 0,
LB = rep(-1, length(mu))
solQP$solution / r$pars # constant

Looking at the link you have included. Apparently, the role of aMat, bVec, meq = 1 inside the solve.QP call is to fix the value of the numerator (your return) in the Sharpe ratio formula, so the optimization is focused on minimizing the denominator. In a sense, it is perfectly legal to fix the numerator, it is like fixing the total size of your portfolio. Your portfolio can later be scaled up or down, it will keep the same Sharpe ratio. To help you realize that, you can run your code above for any value of bVec (granted, other than zero) and you will get the same result for the weights w and the Sharpe ratio SR.
So I feel you might be misinterpreting the notion of "portfolio weights". They are ratios representing what your portfolio is made of, and they should sum to one. Once you have found the optimal weights, which you already did, you are free to scale your portfolio to whatever level you want, just multiply w by the current value you want for your portfolio.

This is not a good technique for long portfolios. Even portfolios than can short stocks have allocations weights of the wrong sign after normalizing by the sum of weights.
These situations arise with negative excess returns. Forcing w'mu = 1 puts the solution to the left of the origin (negative risk) in these cases.
nA = 2 # two assets
mu2 = c(-.1,.1) # one negative excess return
Dmat = matrix(c(1,0,0,10),2,2)
aMat <- as.matrix(mu2)
bVec <- 1 # set expectation of portfolio excess return to 1
zeros <- array(0, dim = c(nA,1))
solQP <- solve.QP(Dmat, zeros, aMat, bVec, meq = 1)
rawW = solQP$solution
cat('\nraw weights ')
netW = rawW/sum(rawW)
cat('\nnormalized weights ')
portfReturn = sum(netW*mu2)
cat('\nportfolio excess return ')


Cholesky Decomposition of a random exponential correlation matrix in R

I have a set of exponential correlation matrices created using the following code.
for (j in 1:n)
for(k in 1:n)
and now I want to get their Cholesky decomposition. But many of these are negative definite. How could I resolve this?
The exponential correlation matrix used in spatial or temporal modeling, has a factor alpha that controls the speed of decay:
exp(- alpha * (x[i] - x[j]) ^ 2))
You have fixed such factor at 1. But in practice, such factor is estimated from data.
Note that alpha is necessary to ensure numerical positive definiteness. This matrix is in principle positive definite, but numerically not if alpha is not large enough for a fast decay.
Given that x <- runif(n, 0, 1), the distance between x[i] and x[j] is clustered in a short range [0, 1]. This is not a big range to see a decay in correlation, and maybe you want to try alpha = 10000.
Alternatively if you want to stay with alpha = 1, you need to make distance more spread out. Try x <- runif(n, 0, 100). The decay is very fast, even with alpha = 1.
So we see a duality between distance and alpha. This is also the reason why such correlation matrix can be used stably in statistical modeling. When alpha is to be estimated, it can be made adaptive to distance, so that the correlation matrix is always positive definite.
f <- function (xi, xj, alpha) exp(- alpha * (xi - xj) ^ 2)
n <- 100
# large alpha, small distance
x <- runif(n, 0, 1)
A <- outer(x, x, f, alpha = 10000)
R <- chol(A)
# small alpha, large distance
x <- runif(n, 0, 100)
A <- outer(x, x, f, alpha = 1)
R <- chol(A)
try use this to construct the positive defitive matrix

Maximum Likelihood Estimation for three-parameter Weibull distribution in r

I want to estimate the scale, shape and threshold parameters of a 3p Weibull distribution.
What I've done so far is the following:
Refering to this post, Fitting a 3 parameter Weibull distribution in R
I've used the functions
EPS = sqrt(.Machine$double.eps) # "epsilon" for very small numbers
llik.weibull <- function(shape, scale, thres, x)
sum(dweibull(x - thres, shape, scale, log=T))
thetahat.weibull <- function(x)
if(any(x <= 0)) stop("x values must be positive")
toptim <- function(theta) -llik.weibull(theta[1], theta[2], theta[3], x)
mu = mean(log(x))
sigma2 = var(log(x))
shape.guess = 1.2 / sqrt(sigma2)
scale.guess = exp(mu + (0.572 / shape.guess))
thres.guess = 1
res = nlminb(c(shape.guess, scale.guess, thres.guess), toptim, lower=EPS)
c(shape=res$par[1], scale=res$par[2], thres=res$par[3])
to "pre-estimate" my Weibull parameters, such that I can use them as initial values for the argument "start" in the "fitdistr" function of the MASS-Package.
You might ask why I want to estimate the parameters twice... reason is that I need the variance-covariance-matrix of the estimates which is also estimated by the fitdistr function.
thres <- 450
dat <- rweibull(1000, 2.78, 750) + thres
pre_mle <- thetahat.weibull(dat)
my_wb <- function(x, shape, scale, thres) {
dweibull(x - thres, shape, scale)
ml <- fitdistr(dat, densfun = my_wb, start = list(shape = round(pre_mle[1], digits = 0), scale = round(pre_mle[2], digits = 0),
thres = round(pre_mle[3], digits = 0)))
> ml
shape scale thres
2.942548 779.997177 419.996196 ( 0.152129) ( 32.194294) ( 28.729323)
> ml$vcov
shape scale thres
shape 0.02314322 4.335239 -3.836873
scale 4.33523868 1036.472551 -889.497580
thres -3.83687258 -889.497580 825.374029
This works quite well for cases where the shape parameter is above 1. Unfortunately my approach should deal with the cases where the shape parameter could be smaller than 1.
The reason why this is not possible for shape parameters that are smaller than 1 is described here: http://www.weibull.com/hotwire/issue148/hottopics148.htm
in Case 1, All three parameters are unknown the following is said:
"Define the smallest failure time of ti to be tmin. Then when γ → tmin, ln(tmin - γ) → -∞. If β is less than 1, then (β - 1)ln(tmin - γ) goes to +∞ . For a given solution of β, η and γ, we can always find another set of solutions (for example, by making γ closer to tmin) that will give a larger likelihood value. Therefore, there is no MLE solution for β, η and γ."
This makes a lot of sense. For this very reason I want to do it the way they described it on this page.
"In Weibull++, a gradient-based algorithm is used to find the MLE solution for β, η and γ. The upper bound of the range for γ is arbitrarily set to be 0.99 of tmin. Depending on the data set, either a local optimal or 0.99tmin is returned as the MLE solution for γ."
I want to set a feasible interval for gamma (in my code called 'thres') such that the solution is between (0, .99 * tmin).
Does anyone have an idea how to solve this problem?
In the function fitdistr there seems to be no opportunity doing a constrained MLE, constraining one parameter.
Another way to go could be the estimation of the asymptotic variance via the outer product of the score vectors. The score vector could be taken from the above used function thetahat.weibul(x). But calculating the outer product manually (without function) seems to be very time consuming and does not solve the problem of the constrained ML estimation.
Best regards,
It's not too hard to set up a constrained MLE. I'm going to do this in bbmle::mle2; you could also do it in stats4::mle, but bbmle has some additional features.
The larger issue is that it's theoretically difficult to define the sampling variance of an estimate when it's on the boundary of the allowed space; the theory behind Wald variance estimates breaks down. You can still calculate confidence intervals by likelihood profiling ... or you could bootstrap. I ran into a variety of optimization issues when doing this ... I haven't really thought about wether there are specific reasons
Reformat three-parameter Weibull function for mle2 use (takes x as first argument, takes log as an argument):
dweib3 <- function(x, shape, scale, thres, log=TRUE) {
dweibull(x - thres, shape, scale, log=log)
Starting function (slightly reformatted):
weib3_start <- function(x) {
mu <- mean(log(x))
sigma2 <- var(log(x))
logshape <- log(1.2 / sqrt(sigma2))
logscale <- mu + (0.572 / logshape)
logthres <- log(0.5*min(x))
list(logshape = logshape, logsc = logscale, logthres = logthres)
Generate data:
dat <- data.frame(x=rweibull(1000, 2.78, 750) + 450)
Fit model: I'm fitting the parameters on the log scale for convenience and stability, but you could use boundaries at zero as well.
tmin <- log(0.99*min(dat$x))
m1 <- mle2(x~dweib3(exp(logshape),exp(logsc),exp(logthres)),
vcov(m1), which should normally provide a variance-covariance estimate (unless the estimate is on the boundary, which is not the case here) gives NaN values ... not sure why without more digging.
tmpf <- function(x,y) m1#minuslogl(logshape=x,
s1 <- curve3d(tmpf,
h <- lme4:::hessian(function(x) do.call(m1#minuslogl,as.list(x)),coef(m1))
vv <- solve(h)
diag(vv) ## [1] 0.002672240 0.001703674 0.004674833
(se <- sqrt(diag(vv))) ## standard errors
## [1] 0.05169371 0.04127558 0.06837275
## [,1] [,2] [,3]
## [1,] 1.0000000 0.8852090 -0.8778424
## [2,] 0.8852090 1.0000000 -0.9616941
## [3,] -0.8778424 -0.9616941 1.0000000
This is the variance-covariance matrix of the log-scaled variables. If you want to convert to the variance-covariance matrix on the original scale, you need to scale by (x_i)*(x_j) (i.e. by the derivatives of the transformation exp(x)).
outer(exp(coef(m1)),exp(coef(m1))) * vv
## logshape logsc logthres
## logshape 0.02312803 4.332993 -3.834145
## logsc 4.33299307 1035.966372 -888.980794
## logthres -3.83414498 -888.980794 824.831463
I don't know why this doesn't work with numDeriv - would be very careful with variance estimates above. (Maybe too close to boundary for Richardson extrapolation to work?)
grad(function(x) do.call(m1#minuslogl,as.list(x)),coef(m1)) ## looks OK
The profiles look OK ... (we have to supply std.err because the Hessian isn't invertible)
pp <- profile(m1,std.err=c(0.01,0.01,0.01))
## 2.5 % 97.5 %
## logshape 0.9899645 1.193571
## logsc 6.5933070 6.755399
## logthres 5.8508827 6.134346
Alternately, we can do this on the original scale ... one possibility would be to use the log-scaling to fit, then refit starting from those parameters on the original scale.
wstart <- as.list(exp(unlist(weib3_start(dat$x))))
names(wstart) <- gsub("log","",names(wstart))
m2 <- mle2(x~dweib3(shape,sc,thres),
## shape sc thres
## shape 0.02312399 4.332057 -3.833264
## sc 4.33205658 1035.743511 -888.770787
## thres -3.83326390 -888.770787 824.633714
About the same as the values above.
We can fit with a small shape, if we are a little more careful to bound the paraameters, but now we end up on the boundary for the threshold, which will cause lots of problems for the variance calculations.
dat <- data.frame(x = rweibull(1000, .53, 365) + 100)
tmin <- log(0.99 * min(dat$x))
m1 <- mle2(x ~ dweib3(exp(logshape), exp(logsc), exp(logthres)),
upper = c(logshape = 20, logsc = 20, logthres = tmin),
data = dat,
start = weib3_start(dat$x), method = "L-BFGS-B")
For censored data, you need to replace dweibull with pweibull; see Errors running Maximum Likelihood Estimation on a three parameter Weibull cdf for some hints.
Another possible solution is to do Bayesian inference. Using scale priors on the shape and scale parameters and a uniform prior on the location parameter, you can easily run Metropolis-Hastings as follows. It might be adviceable to reparameterize in terms of log(shape), log(scale) and log(y_min - location) because the posterior for some of the parameters becomes strongly skewed, in particular for the location parameter. Note that the output below shows the posterior for the backtransformed parameters.
logposterior <- function(par,y) {
gamma <- min(y) - exp(par[3])
sum(dweibull(y-gamma,exp(par[1]),exp(par[2]),log=TRUE)) + par[3]
y <- rweibull(100,shape=.8,scale=10) + 1
chain0 <- MCMCmetrop1R(logposterior, rep(0,3), y=y, V=.01*diag(3))
chain <- MCMCmetrop1R(logposterior, rep(0,3), y=y, V=var(chain0))
This produces the following output
The Metropolis acceptance rate was 0.43717
Iterations = 501:20500
Thinning interval = 1
Number of chains = 1
Sample size per chain = 20000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
[1,] 0.81530 0.06767 0.0004785 0.001668
[2,] 10.59015 1.39636 0.0098738 0.034495
[3,] 0.04236 0.05642 0.0003990 0.001174
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
var1 0.6886083 0.768054 0.81236 0.8608 0.9498
var2 8.0756210 9.637392 10.50210 11.4631 13.5353
var3 0.0003397 0.007525 0.02221 0.0548 0.1939

Constrained optimization of a vector

I have a (non-symmetric) probability matrix, and an observed vector of integer outcomes. I would like to find a vector that maximises the probability of the outcomes, given the transition matrix. Simply, I am trying to estimate a distribution of particles at sea given their ultimate distribution on land, and a matrix of probabilities of a particle released from a given point in the ocean ending up at a given point on the land.
The vector that I want to find is subject to the constraint that all components must be between 0-1, and the sum of the components must equal 1. I am trying to figure out the best optimisation approach for the problem.
My transition matrix and data set are quite large, but I have created a smaller one here:
I used a simulated known at- sea distribution of
msim<-c(.3,.2,.1,.3,.1,0) and a simulated probability matrix (t) to come up with an estimated coastal matrix (Datasim2), as follows:
t<-matrix (c(0,.1,.1,.1,.1,.2,0,.1,0,0,.3,0,0,0,0,.4,.1,.3,0,.1,0,.1,.4,0,0,0,.1,0,.1,.1),
nrow=5,ncol=6, byrow=T)
rownames(t)<-c("C1","C2","C3","C4","C5") ### locations on land
colnames(t)<-c("S1","S2","S3","S4","S5","S6") ### locations at sea
Datasim<-as.numeric (round((t %*% msim)*500))
Datasim2<-c(rep("C1",95), rep("C2",35), rep("C3",90),rep("C4",15),rep("C5",30))
M <-c(0.1,0.1,0.1,0.1,0.1,0.1) ## starting M
I started with a straightforward function as follows:
TotalLkhd<-rep(NA, times=dim(Data)[1])
for (j in 1:dim(Data)[1]){
ObsEstEndLkhd<-1-EstEndProbsall[1,] ## likelihood of particle NOT ending up at locations other than the location of interest
IndexC<-which(colnames(EstEndProbsall)==Data$LocationCode[j], arr.ind=T) ## likelihood of ending up at location of interest
#Total likelihood
DistributionEstimate <- optim(par = M, fn = EstimateSource3, Data = Datasim2, T=t,
control = list(fnscale = -1, trace=5, maxit=500), lower = 0, upper = 1)
To constrain the sum to 1, I tried using a few of the suggestions posted here:How to set parameters' sum to 1 in constrained optimization
e.g. adding M<-M/sum(M) or SumTotalLkhd<-SumTotalLkhd-(10*pwr) to the body of the function, but neither yielded anything like msim, and in fact, the 2nd solution came up with the error “L-BFGS-B needs finite values of 'fn'”
I thought perhaps the quadprog package might be of some help, but I don’t think I have a symmetric positive definite matrix…
Thanks in advance for your help!
What about that: Let D = distribution at land, M = at sea, T the transition matrix. You know D, T, you want to calculate M. You have
D' = M' T
hence D' T' = M' (T T')
and accordingly D'T'(T T')^(-1) = M'
Basically you solve it as when doing linear regression (seems SO does not support math notation: ' is transpose, ^(-1) is ordinary matrix inverse.)
Alternatively, D may be counts of particles, and now you can ask questions like: what is the most likely distribution of particles at sea. That needs a different approach though.
Well, I have never done such models but think along the following lines. Let M be of length 3 and D of length 2, and T is hence 3x2. We know T and we observe D_1 particles at location 1 and D_2 particles at location 2.
What is the likelihood that you observe one particle at location D_1? It is Pr(D = 1) = M_1 T_11 + M_2 T_21 + M_3 T_32. Analogously, Pr(D = 2) = M_1 T_12 + M_2 T_22 + M_3 T_32. Now you can easily write the log-likelihood of observing D_1 and D_2 particles at locations 1 and 2. The code might look like this:
loglik <- function(M) {
if(M[1] < 0 | M[1] > 1)
if(M[2] < 0 | M[2] > 1)
M3 <- 1 - M[1] - M[2]
if(M3 < 0 | M3 > 1)
D[1]*log(T[1,1]*M[1] + T[2,1]*M[2] + T[3,1]*M3) +
D[2]*log(T[1,2]*M[1] + T[2,2]*M[2] + T[3,2]*M3)
T <- matrix(c(0.1,0.2,0.3,0.9,0.8,0.7), 3, 2)
D <- c(100,200)
m <- maxLik(loglik, start=c(0.4,0.4), method="BFGS")
I get the answer (0, 0.2, 0.8) when I estimate it but standard errors are very large.
As I told, I have never done it so I don't know it it makes sense.

Portfolio optimization with Differential evolution

I faced an optimization problem. I need to optimize portfolio for return Omega measure. I found suggestions that this can be done by using differential evolution through DEoptim(Yollin's very nice slides on R tools for portfolio optimization. Original code can be found there).
I tried to adapt this method to my problem (since I only changed numbers and I hope didn't make any mistakes. Full credit for the author here for the idea):
optOmega <-function(x,ret,L){ #function I want to optimize and
retu = ret %*% x # x is vector of asset weights
obj = -Omega(retu,L=L,method="simple") #Omega from PerformanceAnalytics
weight.penalty = 100*(1-sum(x))^2
return( obj + weight.penalty )
L=0 #Parameter which defines loss
#in Omega calculation
lower = rep(0,30) #I want weight to be in bounds
upper = rep(1,30) # 0<=x<=1
res = DEoptim(optOmega,lower,upper, #I have 30 assets in StockReturn
Omega is calculated as mean(pmax(retu-L,0))/mean(pmax(L-retu,0))
When asset number is very small (5 for example), I get results which pretty much satisfy me: asset weights add up to 0.999???? which is fairly close to one and the Omega of such portfolio is greater than Omega of any single asset (otherwise, why not invest everything in that single asset). This can be reached with 100 iterations.
But when I increase asset number up to 30, result is not satisfying. Sum of weights comes to be 3 or more and Omega lower than that of some single assets. I thought this might be due to small number of iterations (I used 1000), so I tried 10 000 which is painfully slow. But the result is pretty much the same: weighs add up to way more than 1 and Omega does not seem optimal. With 10 asset algorithm seems to find weights close to 1, but Omega is lower than the one of a single asset.
My PC is quite old and it has Intel Core Duo 2 GHZ. Though, is it normal for such optimization with 1000 iterations to run ~40 minutes?
What might be the problem here? Is number of iterations too small, or my interpretation of provided algorithm is totally wrong. Thank You for your help!
If I comment out the control argument in your call to DEoptim, I have much better results:
the sum of the weights is closer to 1 (it was 3), and the objective is better that for the 1-asset portfolios (it was worse).
# Sample data
n <- 600
k <- 26
StockReturn <- matrix( rnorm(n*k), nc=k )
colnames(StockReturn) <- LETTERS[1:k]
StockReturn <- xts( StockReturn, seq.Date(Sys.Date(), length=n, by=1) )
# Objective
optOmega <- function(x, ret = coredata(StockReturn), L=0) {
penalty <- (1-sum(x))^2
x <- x/sum(x)
objective <- -Omega( ret %*% x, L=L, method="simple" )
objective + penalty
# Optimization
lower <- rep(0,k)
upper <- rep(1,k)
res <- DEoptim(
optOmega, lower, upper,
# control = list(NP=2000, itermax=100, F=0.2, CR=0.8),
ret = coredata(StockReturn), L = L
# Check the results
w <- res$optim$bestmem
sum(w) # Close to 1
w <- w / sum(w)
optOmega(w) # Better (lower) that for the 1-asset portfolios
min( apply( diag(k), 2, optOmega ) )

How do you use solve.QP in R using quadprog for investment portfolio optimization with no short sales

In R, using quadprog, in the function solve.QP for investment portfolio optimization, how do you set the constraint for the sum of the weights to equal one, and every weight is non-negative, (no short sales)?
Let V be the variance matrix of the asset returns,
mu their expected returns,
and n the number of assets.
The following finds w that minimizes
t(w) %*% V %*% w - mu
subject to the constraints
sum(w)=1 and w>=0.
A <- cbind( # One constraint per column
matrix( rep(1,n), nr=n ), # The weights sum up to 1
diag(n) # No short-selling
b <- c(1, rep(0,n))
r <- solve.QP(V, mu, A, b, meq=1)
