I have a vector of integers as input values (starting values for optim par)
my.data.var <- c(10,0.25,0.25,0.25,0.25,0.25,
10,0.25,0.25,0.25,0.25,0.25,
10,0.25,0.25,0.25,0.25,0.25,
10,0.25,0.25,0.25,0.25,0.25)
Optimization problem is a min. problem.
The error function calculates sum of square root of diff in values between
TWO MATRICES (Given Values Matrix vs Calculated Matrix)
The calculated matrix is the one that uses above integer vector.
Hence, in the error function, I stack the integer vector into a
matrix as my.data.var.mat <- matrix(my.data.var,nrow = 4,ncol = 6,byrow = TRUE)
The constraint that I must introduce is that colSum(my.data.var.mat) <=1
The optim is defined as
sols<-optim(my.data.var,Error.func,method="L-BFGS-B",upper=c(Inf,1,1,1,1,1,Inf,1,1,1,1,1,Inf,1,1,1,1,1,Inf,1,1,1,1,1),
lower=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))
Error Function is defined as
Error.func <- function(my.data.var){
my.data.var.mat <- matrix(my.data.var,nrow = ncol(my.data.matrix.prod),ncol = ncol(my.data.matrix.inj)+1,byrow = TRUE)
Calc.Qjk.Value <- Qjk.Cal.func(my.data.timet0,my.data.qo,my.data.matrix.time,
my.data.matrix.inj, my.data.matrix.prod,my.data.var,my.data.var.mat)
diff.values <- my.data.matrix.prod-Calc.Qjk.Value #FIND DIFFERENCE BETWEEN CAL. MATRIX AND ORIGINAL MATRIX
Error <- ((colSums ((diff.values^2), na.rm = FALSE, dims = 1))/nrow(my.data.matrix.inj))^0.5 #sum of square root of the diff
Error_total <- sum(Error,na.rm=FALSE)/ncol(my.data.matrix.prod) # total avg error
Error_total
}
Given Dataset: my.data.matrix.prod , my.data.timet0, my.data.qo, my.data.matrix.time, my.data.matrix.inj
So, my question is how and where should I introduce the matrix col sum constraint? Or the other way to put it as how would OPTIM vary integer vector under Matrix col sum constraint?
I realized that nloptr is a better option than optim since my problem consisted of "inequality constraints".
I modified the implementation as I explain in this post here. "multiple inequality constraints" - Minimization with R nloptr package
Hence, closing this thread.
Related
I'm trying to calculate the Euclidean distance between pairs of points in a dataframe in R, and there's an ID for each pair:
ID <- sample(1:10, 10, replace=FALSE)
P <- runif(10, min=1, max=3)
S <- runif(10, min=1, max=3)
testdf <- data.frame(ID, P, S)
I found several ways to calculate the Euclidean distance in R, but I'm either getting an error, returning only 1 value (so it's computing the distance between the entire vector), or I end up with a matrix when all I need is a 4th column with the distance between each pair (columns 'P' and 'S.') I'm a bit confused by matrices so I'm not sure how to work with that result.
Tried making a function and applying it to the 2 columns but I get an error:
testdf$V <- apply(testdf[ , c('P', 'S')], 1, function(P, S) sqrt(sum((P^2, S^2)))
# Error in FUN(newX[, i], ...) : argument "S" is missing, with no default
Then tried using the dist() function in the stats package but it only returns 1 value:
(Same problem if I follow the method here: https://www.statology.org/euclidean-distance-in-r/)
P <- testdf$P
S <- testdf$S
testProbMatrix <- rbind(P, S)
stats::dist(testProbMatrix, method = "euclidean")
# returns only 1 distance
Returns a matrix
(Here's a nice explanation why: Calculate the distances between pairs of points in r)
stats::dist(cbind(P, S), method = "euclidean")
But I'm confused how to pull the distances out of the matrix and attach them to the correct ID for each pair of points. I don't understand why I have to make a matrix instead of just applying the function to the dataframe - matrices have always confused me.
I think this is the same question as here (Finding euclidean distance between all pair of points) but for R instead of Python
Thanks for the help!
Try this out if you would just like to add another column to your dataframe
testdf$distance <- sqrt((P^2 + S^2))
I need to find a minimum of an objective function by optimising a vector. The problem is finance related if that helps - the function RC (provided below) computes the sum of squared differences of risk contribution of different assets, where the risk contribution is a product of input Risk Measure (RM, given) and weights.
The goal is to find such weights that the sum is zero, i.e. all assets have equal risk contributions.
RC = function (RM, w){
w = w/sum(w) # normalizing weights so they sum up to 1
nAssets = length(RM)
rc_matrix = matrix(nrow=1,ncol=nAssets)
rc_matrix = RM*w #risk contributions: RM (risk measure multiplied by asset's
#w eight in the portfolio)
rc_sum_squares = numeric(length=1) #placeholder
rc_sum_squares = sum(combn(
seq_along(RM),
2,
FUN = function(x)
(rc_matrix[ , x[1]] - rc_matrix[, x[2]]) ** 2
)) # this function sums the squared differences of the risk contributions
return(rc_sum_squares)
}
I searched and the solution seems to lie in the "optim" function, so I tried:
out <- optim(
par = rep(1 / length(RM), length(RM)), # initial guess
fn = RC,
RM = RM,
method = "L-BFGS-B",
lower = 0.00001,
upper = 1)
However, this returns an error message: "Error in rc_matrix[, x[1]] : incorrect number of dimensions"
I don't know how the optimization algorithm works, so I can't really wrap my head around it. The RC function works though, here is a sample for replicability:
RM <- c(0.06006928, 0.06823795, 0.05716360, 0.08363529, 0.06491009, 0.06673174, 0.03103578, 0.05741140)
w <- matrix(0.125, nrow=1, ncol=1)
I saw also CVXR package, which crashes my RStudio for some reason and nlm(), which is little more complicated and I can't write the function properly.
A solution might be not to do the funky summation of the squared differences, but finding the weights so that the risk contributions (RM*weight) are equal. I will be very glad for your help.
Note: the vector of the weights has to sum up to 1 and the values have to lie between 0 and 1.
Cheers
Daniel
I'm simulating discrete character data using the function rTraitDisc {ape} in R using a variety of model matrices. I've not encountered any issues with scaling when all state changes are possible. However when I supply an ordered model with 8 or more possible states, the function breaks down and returns the following error:
## library
library(ape)
## read in tree
data("bird.orders")
## build model
model.matrix <- matrix(c(0,0.1,0,0,0,0,0,0,
0.1,0,0.1,0,0,0,0,0,
0,0.1,0,0.1,0,0,0,0,
0,0,0.1,0,0.1,0,0,0,
0,0,0,0.1,0,0.1,0,0,
0,0,0,0,0.1,0,0.1,0,
0,0,0,0,0,0.1,0,0.1,
0,0,0,0,0,0,0.1,0), 8)
## run function
rTraitDisc(phy = bird.orders, model = model.matrix)
Error message:
Error in sample.int(k, size = 1, FALSE, prob = p) : negative probability
Having dug a little deeper, it seems that when there are 8 or more states but only one possible transition (e.g. if the ancestral state is 0, only a transition to state 1 should be possible in an ordered matrix), the function matexpo produces a probability matrix with negative values for the shortest branch of the tree (0.5). As these probabilities are used by sample.int as the "prob" argument, the negative probabilities cause the function to break down.
## get number of states
k <- ncol(model.matrix)
## get equilibrium relative frequencies
freq = rep(1/k, k)
## match number of elements in model
freq <- rep(freq, each = k)
## get Q matrix
Q <- model.matrix * freq
diag(Q) <- 0
diag(Q) <- -rowSums(Q)
## get minimum edge length
min.el <- min(bird.orders$edge.length)
## run matexpo
matexpo(Q*min.el)
How do I deal with these negative values in this context? Is there a correction I can/should apply?
I want to solve an optimization problem regarding a minimum variance portfolio using R as shortly described on this website: http://enricoschumann.net/R/minvar.htm
The problem is: the matrix I want to use has more columns (=assets) than rows (=observations), which is why it is not positive definite and non-invertible.
You can recreate this problem by taking the opposite values for the variables as on the website, which results in the following:
nO <- 10L ## number of observations
nA <- 100L ## number of assets
mData <- array(rnorm(nO * nA, sd = 0.05),
dim = c(nO, nA)) #Creating sample stock observations
library("quadprog")
aMat <- array(1, dim = c(1,nA))
bVec <- 1
zeros <- array(0, dim = c(nA,1))
solQP <- solve.QP(cov(mData), zeros, t(aMat), bVec, meq = 1) #Minimize optimization
solQP$solution
which results in the following error:
matrix D in quadratic function is not positive definite!
Does anybody know other functions to solve the optimization with mData or ways to make mData invertible without losing information?
The desired result are the weights for each asset for the minimum variance portfolio.
You can try:
library(Matrix)
Q = nearPD(cov(mData))$mat
and then use Q instead of cov(mData).
There is also an alternative Mean-Variance model based on adjusted returns that handles your case directly. See link. Unfortunately, this is not so easy to implement using QuadProg (link).
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
In R, how do I find the optimal variable to maximize or minimize correlation between several datasets
This can be done in Excel, but my dataset has gotten too large. In excel, I would use solver.
I have 5 variables and I want to recreate a weighted average of these 5 variables so that they have the lowest correlation to a 6th variable.
Column A,B,C,D,E = random numbers
Column F = random number (which I want to minimise the correlation to)
Column G = Awi1+Bwi2+C*2i3+D*wi4+wi5*E
where wi1 to wi5 are coefficients resulted from solver In a separate cell, I would have correl(F,G)
This is all achieved with the following constraints in mind:
1. A,B,C,D, E have to be between 0 and 1
2. A+B+C+D+E= 1
I'd like to print the results of this so that I can have an efficient frontier type chart.
How can I do this in R? Thanks for the help.
I looked at the other thread mentioned by Vincent and I think I have a better solution. I hope it is correct. As Vincent points out, your biggest problem is that the optimization tools for such non-linear problems do not offer a lot of flexibility for dealing with your constraints. Here, you have two types of constraints: 1) all your weights must be >= 0, and 2) they must sum to 1.
The optim function has a lower option that can take care of your first constraint. For the second constraint, you have to be a bit creative: you can force your weights to sum to one by scaling them inside the function to be minimized, i.e. rewrite your correlation function as function(w) cor(X %*% w / sum(w), Y).
# create random data
n.obs <- 100
n.var <- 6
X <- matrix(runif(n.obs * n.var), nrow = n.obs, ncol = n.var)
Y <- matrix(runif(n.obs), nrow = n.obs, ncol = 1)
# function to minimize
correl <- function(w)cor(X %*% w / sum(w), Y)
# inital guess
w0 <- rep(1 / n.var, n.var)
# optimize
opt <- optim(par = w0, fn = correl, method = "L-BFGS-B", lower = 0)
optim.w <- opt$par / sum(opt$par)