I want to solve an optimization problem regarding a minimum variance portfolio using R as shortly described on this website: http://enricoschumann.net/R/minvar.htm
The problem is: the matrix I want to use has more columns (=assets) than rows (=observations), which is why it is not positive definite and non-invertible.
You can recreate this problem by taking the opposite values for the variables as on the website, which results in the following:
nO <- 10L ## number of observations
nA <- 100L ## number of assets
mData <- array(rnorm(nO * nA, sd = 0.05),
dim = c(nO, nA)) #Creating sample stock observations
library("quadprog")
aMat <- array(1, dim = c(1,nA))
bVec <- 1
zeros <- array(0, dim = c(nA,1))
solQP <- solve.QP(cov(mData), zeros, t(aMat), bVec, meq = 1) #Minimize optimization
solQP$solution
which results in the following error:
matrix D in quadratic function is not positive definite!
Does anybody know other functions to solve the optimization with mData or ways to make mData invertible without losing information?
The desired result are the weights for each asset for the minimum variance portfolio.
You can try:
library(Matrix)
Q = nearPD(cov(mData))$mat
and then use Q instead of cov(mData).
There is also an alternative Mean-Variance model based on adjusted returns that handles your case directly. See link. Unfortunately, this is not so easy to implement using QuadProg (link).
Related
I need to find a minimum of an objective function by optimising a vector. The problem is finance related if that helps - the function RC (provided below) computes the sum of squared differences of risk contribution of different assets, where the risk contribution is a product of input Risk Measure (RM, given) and weights.
The goal is to find such weights that the sum is zero, i.e. all assets have equal risk contributions.
RC = function (RM, w){
w = w/sum(w) # normalizing weights so they sum up to 1
nAssets = length(RM)
rc_matrix = matrix(nrow=1,ncol=nAssets)
rc_matrix = RM*w #risk contributions: RM (risk measure multiplied by asset's
#w eight in the portfolio)
rc_sum_squares = numeric(length=1) #placeholder
rc_sum_squares = sum(combn(
seq_along(RM),
2,
FUN = function(x)
(rc_matrix[ , x[1]] - rc_matrix[, x[2]]) ** 2
)) # this function sums the squared differences of the risk contributions
return(rc_sum_squares)
}
I searched and the solution seems to lie in the "optim" function, so I tried:
out <- optim(
par = rep(1 / length(RM), length(RM)), # initial guess
fn = RC,
RM = RM,
method = "L-BFGS-B",
lower = 0.00001,
upper = 1)
However, this returns an error message: "Error in rc_matrix[, x[1]] : incorrect number of dimensions"
I don't know how the optimization algorithm works, so I can't really wrap my head around it. The RC function works though, here is a sample for replicability:
RM <- c(0.06006928, 0.06823795, 0.05716360, 0.08363529, 0.06491009, 0.06673174, 0.03103578, 0.05741140)
w <- matrix(0.125, nrow=1, ncol=1)
I saw also CVXR package, which crashes my RStudio for some reason and nlm(), which is little more complicated and I can't write the function properly.
A solution might be not to do the funky summation of the squared differences, but finding the weights so that the risk contributions (RM*weight) are equal. I will be very glad for your help.
Note: the vector of the weights has to sum up to 1 and the values have to lie between 0 and 1.
Cheers
Daniel
I am trying to compute the log likelihood ratio test in R, but am having some difficulties.
For some reason I keep getting a negative log likelihood value which isn't possible I do not know the reason.
This is the data I am using.
Here is the code so far:
I am trying to test the null hypothesis that the mean is not equal to (1,1,1)
The reason I multiply the covariance matrix by (n4-1)/n4 is that I need to divide the covariance by n, not n-1, and the cov function divides the matrix by n-1.
data <- read.csv('dat1.csv')
data <- data[, 2:4]
datamat <- as.matrix(data, nrow=25, ncol=3)
mu0_4 <- c(1,1,1)
n4 <- dim(datamat)[1]
xbar4 <- colMeans(datamat)
hs4 <- cov(datamat - xbar4)*(n4-1)/n4
det_hs4 <- det(hs4)
det_hs4
hs04 <- cov(datamat - mu0_4)*(n4-1)/n4
det_hs04 <- det(hs04)
det_hs04
LRS <- (det_hs4/det_hs04)^(n4/2)
l_lrs <- -2*log(LRS)
l_lrs
I am unsure of the reason I am getting a negative value, but if someone could please give me some advice that would be much appreciated.
Thank you for reading
I think you are making a mistake in the code. In (datamat - xbar4) the subtraction happens in row-wise. Try
sweep(datamat, 2, xbar4 , "-")
I do not know why you said negative log-likelihood values cannot be negative. Usually, they are negative. Maybe you meant the LRS.
I have two input matrices, dt(10,3) & wt(3,3), that i need to use to find the optimal decision matrix (same dimension), Par(10,3) so as to maximize an objective function. Below R code would give some direction into the problem (used Sample inputs here) -
#Input Matrices
dt <- matrix(runif(300),100,3)
wt <- matrix(c(1,0,0,0,2,0,0,0,1),3,3) #weights
#objective function
Obj <- function(Par) {
P = matrix(Par, nrow = 10, byrow=F) # Reshape
X = t((dt%*%wt)[,1])%*%P[,1]
Y = t((dt%*%wt)[,2])%*%P[,2]
Z = t((dt%*%wt)[,3])%*%P[,3]
as.numeric(X+Y+Z) #maximize
}
Now I am struggling to apply the following constraints to the problem :
1) Matrix, Par can only have binary values (0 or 1)
2) rowSums(Par) = 1 (Basically a row can only have 1 in one of the three columns)
3) colSums(Par[,1]) <= 5, colSums(Par[,2]) <= 6, & colSums(Par[,3]) <= 4
4) X/(X+Y+Z) < 0.35, & Y/(X+Y+Z) < 0.4 (X,Y,Z are defined in the objective function)
I tried coding the constraints in constrOptim, but not sure how to input binary & integer constraints. I am reading up on lpSolve, but not able to figure out. Any help much appreciated. Thanks!
I believe this is indeed a MIP so no issues with convexity. If I am correct the model can look like:
This model can be easily transcribed into R. Note that LP/MIP solvers do not use functions for the objective and constraints (opposed to NLP solvers). In R typically one builds up matrices with the LP coefficients.
Note: I had to make the limits on the column sums much larger (I used 50,60,40).
Based on Erwin's response, I am able to formulate the model using lpSolve in R. However still struggling to add the final constraint to the model (4th constraint in my question above). Here's what I am able to code so far :
#input dimension
r <- 10
c <- 3
#input matrices
dt <- matrix(runif(r*c),r,c)
wt <- matrix(c(1,0,0,0,2,0,0,0,1),3,3) #weights
#column controller
c.limit <- c(60,50,70)
#create structure for lpSolve
ncol <- r*c
lp.create <- make.lp(ncol=ncol)
set.type(lp.create, columns=1:ncol, type = c("binary"))
#create objective values
obj.vals <- as.vector(t(dt%*%wt))
set.objfn(lp.create, obj.vals)
lp.control(lp.create,sense='max')
#Add constraints to ensure sum of parameters for every row (rowSum) <= 1
for (i in 1:r){
add.constraint(lp.create, xt=c(1,1,1),
indices=c(3*i-2,3*i-1,3*i), rhs=1, type="<=")
}
#Add constraints to ensure sum of parameters for every column (colSum) <= column limit (defined above)
for (i in 1:c){
add.constraint(lp.create, xt=rep(1,r),
indices=seq(i,ncol,by=c), rhs=c.limit[i], type="<=")
}
#Add constraints to ensure sum of column objective (t((dt%*%wt)[,i])%*%P[,i) <= limits defined in the problem)
#NOT SURE HOW TO APPLY A CONSTRAINT THAT IS DEPENDENT ON THE OBJECTIVE FUNCTION
solve(lp.create)
get.objective(lp.create) #20
final.par <- matrix(get.variables(lp.create), ncol = c, byrow=T) # Reshape
Any help that can get me to the finish line is much appreciated :)
Thanks
I have a vector of integers as input values (starting values for optim par)
my.data.var <- c(10,0.25,0.25,0.25,0.25,0.25,
10,0.25,0.25,0.25,0.25,0.25,
10,0.25,0.25,0.25,0.25,0.25,
10,0.25,0.25,0.25,0.25,0.25)
Optimization problem is a min. problem.
The error function calculates sum of square root of diff in values between
TWO MATRICES (Given Values Matrix vs Calculated Matrix)
The calculated matrix is the one that uses above integer vector.
Hence, in the error function, I stack the integer vector into a
matrix as my.data.var.mat <- matrix(my.data.var,nrow = 4,ncol = 6,byrow = TRUE)
The constraint that I must introduce is that colSum(my.data.var.mat) <=1
The optim is defined as
sols<-optim(my.data.var,Error.func,method="L-BFGS-B",upper=c(Inf,1,1,1,1,1,Inf,1,1,1,1,1,Inf,1,1,1,1,1,Inf,1,1,1,1,1),
lower=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))
Error Function is defined as
Error.func <- function(my.data.var){
my.data.var.mat <- matrix(my.data.var,nrow = ncol(my.data.matrix.prod),ncol = ncol(my.data.matrix.inj)+1,byrow = TRUE)
Calc.Qjk.Value <- Qjk.Cal.func(my.data.timet0,my.data.qo,my.data.matrix.time,
my.data.matrix.inj, my.data.matrix.prod,my.data.var,my.data.var.mat)
diff.values <- my.data.matrix.prod-Calc.Qjk.Value #FIND DIFFERENCE BETWEEN CAL. MATRIX AND ORIGINAL MATRIX
Error <- ((colSums ((diff.values^2), na.rm = FALSE, dims = 1))/nrow(my.data.matrix.inj))^0.5 #sum of square root of the diff
Error_total <- sum(Error,na.rm=FALSE)/ncol(my.data.matrix.prod) # total avg error
Error_total
}
Given Dataset: my.data.matrix.prod , my.data.timet0, my.data.qo, my.data.matrix.time, my.data.matrix.inj
So, my question is how and where should I introduce the matrix col sum constraint? Or the other way to put it as how would OPTIM vary integer vector under Matrix col sum constraint?
I realized that nloptr is a better option than optim since my problem consisted of "inequality constraints".
I modified the implementation as I explain in this post here. "multiple inequality constraints" - Minimization with R nloptr package
Hence, closing this thread.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
In R, how do I find the optimal variable to maximize or minimize correlation between several datasets
This can be done in Excel, but my dataset has gotten too large. In excel, I would use solver.
I have 5 variables and I want to recreate a weighted average of these 5 variables so that they have the lowest correlation to a 6th variable.
Column A,B,C,D,E = random numbers
Column F = random number (which I want to minimise the correlation to)
Column G = Awi1+Bwi2+C*2i3+D*wi4+wi5*E
where wi1 to wi5 are coefficients resulted from solver In a separate cell, I would have correl(F,G)
This is all achieved with the following constraints in mind:
1. A,B,C,D, E have to be between 0 and 1
2. A+B+C+D+E= 1
I'd like to print the results of this so that I can have an efficient frontier type chart.
How can I do this in R? Thanks for the help.
I looked at the other thread mentioned by Vincent and I think I have a better solution. I hope it is correct. As Vincent points out, your biggest problem is that the optimization tools for such non-linear problems do not offer a lot of flexibility for dealing with your constraints. Here, you have two types of constraints: 1) all your weights must be >= 0, and 2) they must sum to 1.
The optim function has a lower option that can take care of your first constraint. For the second constraint, you have to be a bit creative: you can force your weights to sum to one by scaling them inside the function to be minimized, i.e. rewrite your correlation function as function(w) cor(X %*% w / sum(w), Y).
# create random data
n.obs <- 100
n.var <- 6
X <- matrix(runif(n.obs * n.var), nrow = n.obs, ncol = n.var)
Y <- matrix(runif(n.obs), nrow = n.obs, ncol = 1)
# function to minimize
correl <- function(w)cor(X %*% w / sum(w), Y)
# inital guess
w0 <- rep(1 / n.var, n.var)
# optimize
opt <- optim(par = w0, fn = correl, method = "L-BFGS-B", lower = 0)
optim.w <- opt$par / sum(opt$par)