I am trying to construct the minimum variance portfolio from a large set of stocks (890), which also satisfies some additional external constraints. For example I want to check whether the resulting portfolio meets certain sector weight restrictions and if it does not then look for new one that does.
Here is the code I am currently using to find the minimum variance portfolio (using cov.shrink from the corpcor package and solve.QP from the quadprog package):
X <- as.matrix(LogReturn)
# Shrinkage estimator covariance matrix
covar <- cov.shrink(X)
N <- ncol(X)
zeros <- array(0, dim = c(N,1))
# Evaluate the optimization to generate minimum variance portfolio with no short selling and with max allocation of 0.05
aMat <- cbind(1, diag(N))
aMat <- cbind(aMat, -diag(N))
b0 <- c(1, rep(0, N))
b0 <- c(b0, rep(-0.05, N))
res <- solve.QP(covar, zeros, aMat, bvec=b0, meq = 1)
# Return portfolio attributes
y <- X %*% res$solution
port <- list(pw = round(res$solution,3), px = y, pm = mean(y), ps = sd(y))
port
And here is the code I planned to use to check whether the proposed portfolio meets my sector constraints:
Sedol <- cbind(SedolData, round(res$solution,3))
colnames(Sedol) <- c("SEDOL", "Sector", "Country", "Weight")
# Proposed sector data
L <- nrow(SectorKey)
Sector <- cbind(SectorKey, 0)
colnames(Sector) <- c("Name", "Key", "Parent", "Proposed")
bPass <- TRUE
for(i in 1:L){
for (x in 1:N){
if(Sedol[x,2] == Sector[i,2]){
Sector[i,4] <- Sector[i,4] + Sedol[x,4]
}
}
if(abs(Sector[i,3] - Sector[i,4])>0.05){
bPass <- FALSE
}
}
if(bPass == FALSE){
# add cost function?
}
I am quite new to r and I was wondering whether someone could suggest how I should proceed. I was thinking I would iteratively penalise portfolios that do not satisfy my constraints with some sort of cost function, but as I do not how to solve my problem without using solve.QP I am not sure how to go about this.
LogReturns is a matrix of log returns for my 890 stocks with 120 observations.
SedolData is a key for which sector each stock is in and used to find the allocations of the proposed portfolio to each sector (matrix of 890 stocks with key for each sector in column two).
SectorKey is a matrix of sectors with target weights (tolerance 5%).
Any help would be greatly appreciated!
Related
I would like to perform a Sobol sensitivity analysis in R
The package "sensitivity" should allow me to do so, but I don't understand how to generate the sampling matrixes (X1, X2). I have a model that runs outside of R. I have 6 parameters with uniform distribution.
In my text book: N = (2k+2)*M ; M = 2^b ; b=[8,12] (New sampling method : Wu et al. 2012)
I had the feeling that I should create two sampling matrix and feed the two to the sobol function X1_{M,k} X2_{M,k}.
The dimension of final sampling matrix x$X is then (k+2)*M. because:
X <- rbind(X1, X2)
for (i in 1:k) {
Xb <- X1
Xb[, i] <- X2[, i]
X <- rbind(X, Xb)
}
How should I conduct my sampling to get the right number of runs as (2*k+2)*M ?
This script is for the old method but does someone know if the new method is already implemented yet in the sensitivity package? Feel free to comment this procedure
name = c("a" , "b" , "c" , "d" , "e", "f")
vals <- list(list(var="a",dist="unif",params=list(min=0.1,max=1.5)),
list(var="b",dist="unif",params=list(min=-0.3,max=0.4)),
list(var="c",dist="unif",params=list(min=-0.3,max=0.3)),
list(var="d",dist="unif",params=list(min=0,max=0.5)),
list(var="e",dist="unif",params=list(min=2.4E-5,max=2.4E-3)),
list(var="f",dist="unif",params=list(min=3E-5,max=3E-3)))
k = 6
b = 8
M = 2^b
n <- 2*M
X1 <- makeMCSample(n,vals, p = 1)
X2 <- makeMCSample(n,vals, p = 2)
x <- sobol2007(model = NULL, X1, X2, nboot = 200)
if I understand correctly, I should provide a y for each x$X sampling combination
then I can use the function "tell" which will generate the Sobol' first-order indices as well as the total indices
tell(x,y)
ggplot(x)
Supplemental R function SobolR
makeMCSample <- function(n, vals) {
# Packages to generate quasi-random sequences
# and rearrange the data
require(randtoolbox)
require(plyr)
# Generate a Sobol' sequence
if (p == 2){ sob <- sobol(n, length(vals), seed = 4321, scrambling = 1)
}else{sob <- sobol(n, length(vals), seed = 1234, scrambling = 1)}
# Fill a matrix with the values
# inverted from uniform values to
# distributions of choice
samp <- matrix(rep(0,n*(length(vals)+1)), nrow=n)
samp[,1] <- 1:n
for (i in 1:length(vals)) {
# i=1
l <- vals[[i]]
dist <- l$dist
params <- l$params
fname <- paste("q",dist,sep="")
samp[,i+1] <- do.call(fname,c(list(p=sob[,i]),params))
}
# Convert matrix to data frame and add labels
samp <- as.data.frame(samp)
names(samp) <- c("n",laply(vals, function(l) l$var))
return(samp)
}
ref: Qiong-Li Wu, Paul-Henry Cournède, Amélie Mathieu, 2012, Efficient computational method for global sensitivity analysis and its application to tree growth modelling
I have constructed a discrete time SIR model using a loop within a function (i have added my code below).
Currently the results of the iterations are coming out as a list which seems to show all the S values first followed by the I values and then the R values, which I have deduced myself from the nature of the values.
I need the output as a data frame with the column names: 'Iteration', 'S', 'I' and 'R' from left to right and the corresponding values underneath such that when a row is read it will tell you the iteration and values of S, I and R at that iteration.
I do not know how to construct a data frame that and returns the output values in this way, I have only started learning R a few weeks ago and so am not yet proficient so any help would be HUGELY appreciated.
Thank you in advance.
#INITIAL CONDITIONS
S=999
I=1
R=0
#PARAMETERS
beta = 0.003 # infectious contact rate (/person/day)
gamma = 0.2 # recovery rate (/day)
#SIR MODEL WITH POISSON SAMPLING
discrete_SIR_model <- function(){
for(i in 1:30){ #the number of iterations of loop indicates the
#duration of the model in days
# i.e. 'i in 1:30' constitutes 30 days
deltaI<- rpois(1,beta * I * S) #rate at which individuals in the
#population are becoming infected
deltaR<-rpois(1,gamma * I)#rate at which infected individuals are
#recovering
S[i+1]<-S[i] -deltaI
I[i+1] <-I[i] + deltaI -deltaR
R[i+1]<-R[i]+deltaR
}
}
output <- list(c(S, I, R))
output
If a foor loop is used, one can define vectors or a data frame beforehand where the results are stored:
beta <- 0.001 # infectious contact rate (/person/day)
gamma <- 0.2 # recovery rate (/day)
S <- I <- R <- numeric(31)
S[1] <- 999
I[1] <- 1
R[1] <- 0
set.seed(123) # makes the example reproducible
for(i in 1:30){
deltaI <- rpois(1, beta * I[i] * S[i])
deltaR <- rpois(1, gamma * I[i])
S[i+1] <- S[i] - deltaI
I[i+1] <- I[i] + deltaI - deltaR
R[i+1] <- R[i] + deltaR
}
output <- data.frame(S, I, R)
output
matplot(output)
As an alternative, it is also possible to employ a package for this. Package deSolve is intended for differential equations, but it can also solve the discrete case with method "euler":
library(deSolve)
discrete_SIR_model <- function(t, y, p) {
with(as.list(c(y, p)), {
deltaI <- rpois(1, beta * I * S)
deltaR <- rpois(1, gamma * I)
list(as.double(c(-deltaI, deltaI - deltaR, deltaR)))
})
}
y0 <- c(S = 999.0, I=1, R=0)
p <- c(
beta = 0.001, # infectious contact rate (/person/day)
gamma = 0.2 # recovery rate (/day)
)
times <- 1:30
set.seed(576) # to make the example reproducible
output <- ode(y0, times, discrete_SIR_model, p, method="euler")
plot(output, mfrow=c(1,3))
Note: I reduced beta, otherwise the discrete model would become unstable.
I am trying to write a code to solve the following problem (As stated in HW5 in the CalTech course Learning from Data):
In this problem you will create your own target function f
(probability in this case) and data set D to see how Logistic
Regression works. For simplicity, we will take f to be a 0=1
probability so y is a deterministic function of x. Take d = 2 so you
can visualize the problem, and let X = [-1; 1]×[-1; 1] with uniform
probability of picking each x 2 X . Choose a line in the plane as the
boundary between f(x) = 1 (where y has to be +1) and f(x) = 0 (where y
has to be -1) by taking two random, uniformly distributed points from
X and taking the line passing through them as the boundary between y =
±1. Pick N = 100 training points at random from X , and evaluate the
outputs yn for each of these points xn. Run Logistic Regression with
Stochastic Gradient Descent to find g, and estimate Eout(the cross
entropy error) by generating a sufficiently large, separate set of
points to evaluate the error. Repeat the experiment for 100 runs with
different targets and take the average. Initialize the weight vector
of Logistic Regression to all zeros in each run. Stop the algorithm
when |w(t-1) - w(t)| < 0:01, where w(t) denotes the weight vector at
the end of epoch t. An epoch is a full pass through the N data points
(use a random permutation of 1; 2; · · · ; N to present the data
points to the algorithm within each epoch, and use different
permutations for different epochs). Use a learning rate of 0.01.
I am required to calculate the nearest value to Eout for N=100, and the average number of epochs for the required criterion.
I wrote and ran the code but I'm not getting the right answers (as stated in the solutions, these are Eout is near 0.1 and the number of epochs is near 350). The required number of epochs for a delta w of 0.01 comes to far too small (around 10), leaving the error too big (around 2). I then tried to replace the criterion with |w(t-1) - w(t)| < 0.001 (rather than 0.01). Then, the average required number of epochs was about 250 and out of sample error was about 0.35.
Is there something wrong with my code/solution, or is it possible that the answers provided are faulty? I've added comments to indicate what I intend to do at each step. Thanks in advance.
library(pracma)
h<- 0 # h will later be updated to number of required epochs
p<- 0 # p will later be updated to Eout
C <- matrix(ncol=10000, nrow=2) # Testing set, used to calculate out of sample error
d <- matrix(ncol=10000, nrow=1)
for(i in 1:10000){
C[, i] <- c(runif(2, min = -1, max = 1)) # Sample data
d[1, i] <- sign(C[2, i] - f(C[1, i]))
}
for(g in 1:100){ # 100 runs of the experiment
x <- runif(2, min = -1, max = 1)
y <- runif(2, min = -1, max = 1)
fit = (lm(y~x))
t <- summary(fit)$coefficients[,1]
f <- function(x){ # Target function
t[2]*x + t[1]
}
A <- matrix(ncol=100, nrow=2) # Sample data
b <- matrix(ncol=100, nrow=1)
norm_vec <- function(x) {sqrt(sum(x^2))} # vector norm calculator
w <- c(0,0) # weights initialized to zero
for(i in 1:100){
A[, i] <- c(runif(2, min = -1, max = 1)) # Sample data
b[1, i] <- sign(A[2, i] - f(A[1, i]))
}
q <- matrix(nrow = 2, ncol = 1000) # q tracks the weight vector at the end of each epoch
l= 1
while(l < 1001){
E <- function(z){ # cross entropy error function
x = z[1]
y = z[2]
v = z[3]
return(log(1 + exp(-v*t(w)%*%c(x, y))))
}
err <- function(xn1, xn2, yn){ #gradient of error function
return(c(-yn*xn1, -yn*xn2)*(exp(-yn*t(w)*c(xn1,xn2))/(1+exp(-yn*t(w)*c(xn1,xn2)))))
}
e = matrix(nrow = 2, ncol = 100) # e will track the required gradient at each data point
e[,1:100] = 0
perm = sample(100, 100, replace = FALSE, prob = NULL) # Random permutation of the data indices
for(j in 1:100){ # One complete Epoch
r = A[,perm[j]] # pick the perm[j]th entry in A
s = b[perm[j]] # pick the perm[j]th entry in b
e[,perm[j]] = err(r[1], r[2], s) # Gradient of the error
w = w - 0.01*e[,perm[j]] # update the weight vector accorng to the formula involving step size, gradient
}
q[,l] = w # the lth entry is the weight vector at the end of the lth epoch
if(l > 1 & norm_vec(q[,l] - q[,l-1])<0.001){ # given criterion to terminate the algorithm
break
}
l = l+1 # move to the next epoch
}
for(n in 1:10000){
p[g] = mean(E(c(C[1,n], C[2, n], d[n]))) # average over 10000 data points, of the error function, in experiment no. g
}
h[g] = l #gth entry in the vector h, tracks the number of epochs in the gth iteration of the experiment
}
mean(h) # Mean number of epochs needed
mean(p) # average Eout, over 100 experiments
Introduction to the problem
I am trying to write down a code in R so to obtain the weights of an Equally-Weighted Contribution (ERC) Portfolio. As some of you may know, the portfolio construction was presented by Maillard, Roncalli and Teiletche.
Skipping technicalities, in order to find the optimal weights of an ERC portfolio one needs to solve the following Sequential Quadratic Programming problem:
with:
Suppose we are analysing N assets. In the above formulas, we have that x is a (N x 1) vector of portfolio weights and Σ is the (N x N) variance-covariance matrix of asset returns.
What I have done so far
Using the function slsqp of the package nloptr which solves SQP problems, I would like to solve the above minimisation problem. Here is my code. Firstly, the objective function to be minimised:
ObjFuncERC <- function (x, Sigma) {
sum <- 0
R <- Sigma %*% x
for (i in 1:N) {
for (j in 1:N) {
sum <- sum + (x[i]*R[i] - x[j]*R[j])^2
}
}
}
Secondly, the starting point (we start by an equally-weighted portfolio):
x0 <- matrix(1/N, nrow = N, ncol = 1)
Then, the equality constraint (weights must sum to one, that is: sum of the weights minus one equal zero):
heqERC <- function (x) {
h <- numeric(1)
h[1] <- (t(matrix(1, nrow = N, ncol = 1)) %*% x) - 1
return(h)
}
Finally, the lower and upper bounds constraints (weights cannot exceed one and cannot be lower than zero):
lowerERC <- matrix(0, nrow = N, ncol = 1)
upperERC <- matrix(1, nrow = N, ncol = 1)
So that the function which should output optimal weights is:
slsqp(x0 = x0, fn = ObjFuncERC, Sigma = Sigma, lower = lowerERC, upper = upperERC, heq = heqERC)
Unfortunately, I do not know how to share with you my variance-covariance matrix (which takes name Sigma and is a (29 x 29) matrix, so that N = 29) so to reproduce my result, still you can simulate one.
The output error
Running the above code yields the following error:
Error in nl.grad(x, fn) :
Function 'f' must be a univariate function of 2 variables.
I have no idea what to do guys. Probably, I have misunderstood how things must be written down in order for the function slsqp to understand what to do. Can someone help me understand how to fix the problem and get the result I want?
UPDATE ONE: as pointed out by #jogo in the comments, I have updated the code, but it still produces an error. The code and the error above are now updated.
UPDATE 2: as requested by #jaySf, here is the full code that allows you to reproduce my error.
## ERC Portfolio Test
# Preliminary Operations
rm(list=ls())
require(quantmod)
require(nloptr)
# Load Stock Data in R through Yahoo! Finance
stockData <- new.env()
start <- as.Date('2014-12-31')
end <- as.Date('2017-12-31')
tickers <-c('AAPL','AXP','BA','CAT','CSCO','CVX','DIS','GE','GS','HD','IBM','INTC','JNJ','JPM','KO','MCD','MMM','MRK','MSFT','NKE','PFE','PG','TRV','UNH','UTX','V','VZ','WMT','XOM')
getSymbols.yahoo(tickers, env = stockData, from = start, to = end, periodicity = 'monthly')
# Create a matrix containing the price of all assets
prices <- do.call(cbind,eapply(stockData, Op))
prices <- prices[-1, order(colnames(prices))]
colnames(prices) <- tickers
# Compute Returns
returns <- diff(prices)/lag(prices)[-1,]
# Compute variance-covariance matrix
Sigma <- var(returns)
N <- 29
# Set up the minimization problem
ObjFuncERC <- function (x, Sigma) {
sum <- 0
R <- Sigma %*% x
for (i in 1:N) {
for (j in 1:N) {
sum <- sum + (x[i]*R[i] - x[j]*R[j])^2
}
}
}
x0 <- matrix(1/N, nrow = N, ncol = 1)
heqERC <- function (x) {
h <- numeric(1)
h[1] <- t(matrix(1, nrow = N, ncol = 1)) %*% x - 1
}
lowerERC <- matrix(0, nrow = N, ncol = 1)
upperERC <- matrix(1, nrow = N, ncol = 1)
slsqp(x0 = x0, fn = ObjFuncERC, Sigma = Sigma, lower = lowerERC, upper = upperERC, heq = heqERC)
I spotted several mistakes in your code. For instance, ObjFuncERC is not returning any value. You should use the following instead:
# Set up the minimization problem
ObjFuncERC <- function (x, Sigma) {
sum <- 0
R <- Sigma %*% x
for (i in 1:N) {
for (j in 1:N) {
sum <- sum + (x[i]*R[i] - x[j]*R[j])^2
}
}
sum
}
heqERC doesn't return anything too, I also changed your function a bit
heqERC <- function (x) {
sum(x) - 1
}
I made those changes and tried slsqp without lower and upper and it worked. Still, another thing to consider is that you set lowerERC and upperERC as matrices. Use the following instead:
lowerERC <- rep(0,N)
upperERC <- rep(1,N)
Hope this helps.
So I have a system of ode's and some data I am using the R packages deSolve and FME to fit the parameters of the ode system to data. I am getting a singular matrix result when I fit the full parameter set to the data. So I went back and looked at the collinearity of the parameters using a collinearity index cut-off of 20 as suggested in all the FME package documentation I then picked a few models with subsets of parameters to fit. Then when I run modFit I get this error:
Error in approx(xMod, yMod, xout = xDat) :
need at least two non-NA values to interpolate
Can anyone enlighten me as to a fix for this. Everything else is working fine. So this is not a coding problem.
Here is a minimal working example (removing r=2 in modFit creates the error which I can fix in the minimal working example but not in my actual problem so I doubt a minimal working example helps here):
`## =======================================================================
## Now suppose we do not know K and r and they are to be fitted...
## The "observations" are the analytical solution
## =======================================================================
# You need these packages
library('deSolve')
library('FME')
## logistic growth model
TT <- seq(1, 100, 2.5)
N0 <- 0.1
r <- 0.5
K <- 100
## analytical solution
Ana <- cbind(time = TT, N = K/(1 + (K/N0 - 1) * exp(-r*TT)))
time <- 0:100
parms <- c(r = r, K = K)
x <- c(N = N0)
logist <- function(t, x, parms) {
with(as.list(parms), {
dx <- r * x[1] * (1 - x[1]/K)
list(dx)
})
}
## Run the model with initial guess: K = 10, r = 2
parms["K"] <- 10
parms["r"] <- 2
init <- ode(x, time, logist, parms)
## FITTING algorithm uses modFit
## First define the objective function (model cost) to be minimised
## more general: using modFit
Cost <- function(P) {
parms["K"] <- P[1]
parms["r"] <- P[2]
out <- ode(x, time, logist, parms)
return(modCost(out, Ana))
}
(Fit<-modFit(p = c(K = 10,r=2), f = Cost))
summary(Fit)`
I think the problem is in your Cost function. If you don't provide both K and r, then the cost function will override the start value of r to NA. You can test this:
Cost <- function(P) {
parms["K"] <- P[1]
parms["r"] <- P[2]
print(parms)
#out <- ode(x, time, logist, parms)
#return(modCost(out, Ana))
}
Cost(c(K=10, r = 2))
Cost(c(K=10))
This function works:
Cost <- function(P) {
parms[names(P)] <- P
out <- ode(x, time, logist, parms)
return(modCost(out, Ana))
}
The vignette FMEDyna is very helpful: https://cran.r-project.org/web/packages/FME/vignettes/FMEdyna.pdf See page 14 on how to specify the Objective (Cost) function.