CVaR minimization with the BDportfolio_optim function (PortfolioOptim package) - r

I'm trying to run an exemplary code using the BDportfolio_optim function from the PortfolioOptim package (https://cran.r-project.org/web/packages/PortfolioOptim/PortfolioOptim.pdf) in order to minimize the CVaR. The output should be the optimal weights for the minimum CVaR regarding some constraints.
However, I always run into an error saying that my first constraint matrix A has the wrong dimensions.
The used code is very similar to the example used in the PortfolioOptim manual:
https://cran.r-project.org/web/packages/PortfolioOptim/PortfolioOptim.pdf
library(PortfolioOptim)
library(fPortfolio)
library(quantmod)
library(pacman)
library(timeDate)
library(timeSeries)
getSymbols("EBAY", src="yahoo", from= "2011-01-01")
getSymbols("MSFT", src="yahoo", from= "2011-01-01")
getSymbols("INTC", src="yahoo", from= "2011-01-01")
getSymbols("KO", src="yahoo", from= "2011-01-01")
portfolio_1 = cbind(EBAY$EBAY.Close, MSFT$MSFT.Close, INTC$INTC.Close, KO$KO.Close)
ret_p_1 = cbind(dailyReturn(EBAY), dailyReturn(MSFT), dailyReturn(INTC), dailyReturn(KO))
portfolioReturns_1 <- as.timeSeries(ret_p_1)
colnames(ret_p_1) <- tickers
portfolioReturns_1 <- as.timeSeries(ret_p_1)
k = ncol(portfolioReturns_1)
num = nrow(portfolioReturns_1)
port_ret = 0.05 # target portfolio return
alpha_optim = 0.95
a0 <- rep(1,k)
A <- rbind(a0,-a0)
b <- rbind(1+1e-8, -1+1e-8)
LB <- rep(0, k)
UB <- rep(0.5 ,k)
result <- BDportfolio_optim(portfolioReturns_1, port_ret, "CVAR", alpha_optim,
Aconstr = A , bconstr = b, LB, UB, maxiter=10000, tol=1e-8)
The output of the console is always the same:
Error in BDportfolio_optim(portfolioReturns_1, port_ret, "CVAR", alpha_optim, :
Matrix A must have as many rows as constraints (=elements of vector b) and as many columns as variables (=assets).
However, when I check the dimensions of A, it's (2, 4) - corresponding exactly to the elements of vector b and asset variables (k=4).
Does someone have an idea what is going wrong?

It seems you missed part of the description of the BDportfolio_optim function's first parameter dat (your portfolioReturns_1), where it says:
dat Time series of returns data; dat = cbind(rr, pk), where rr is an array (time series)
of asset returns, for n returns and k assets it is an array with dim(rr) = (n, k),
pk is a vector of length n containing probabilities of returns.
Your portfolioReturns_1 consists only of the rr part and lacks the pk part.
it seems that BDportfolio_optim only works using a return (probability) distribution as an input. Do you have an idea how I could assign one to portfolioReturns_1?
According to the example used in the PortfolioOptim manual, you could do this:
result <- BDportfolio_optim(cbind(portfolioReturns_1, matrix(1/num, num, 1)), port_ret, …

Related

curious behavior of set.seed inside function

Something i came across today that i don't quite understand. The setup is that i want to generate some uniformly distributed points in the plane, afterwards i want to assign each point an arrival rate. I want to be able to reproduce the same points but assign different arrival rates. I figured i could use the set.seed function for this.
library(dplyr)
library(ggplot2)
seed = NULL
no_of_points = 50
interval = c("min" = -10, "max" = 10)
arv = c("min" = 1/80, "max" = 1)
plot_data <- function() {
id <- 1:no_of_points
# setting the seed here to be able to reproduce if desired
set.seed(seed)
x <- runif(no_of_points, min = interval["min"], max = interval["max"])
y <- runif(no_of_points, min = interval["min"], max = interval["max"])
# resetting the seed to give "random" arrival rates regardless of the seed
set.seed(NULL)
arrival_rate <- runif(no_of_points, min = arv["min"], max = arv["max"])
data <- tibble(
"Demand point id" = as.character(id),
"x" = x,
"y" = y,
"Arrival rate" = arrival_rate
)
}
ggplot(plot_data()) +
geom_point(aes(x, y, size = `Arrival rate`))
This works fine when i set a seed and i get a plot like this, which is what i would expect
However when i have seed = NULL as in the example code i get a plot like this, where it seems that arrival rates are correlated with the x-axis.
How can this be explained? Additionally i tried to run the same code but not inside a function, but then i get expected behavior. So i suspect it has something to do with the seed being set inside a function.
I don't think set.seed(NULL) is doing what you expect. In this case I think NULL is initializing the exact same random seed both times you call it. Therefore, the first random number generation after calling set.seed(NULL) (x) is correlated with the first random number generation after you call set.seed(NULL) again (Arrival rate) (but not the second generation of the first instance - y). In this simple example, you can see that the nth random generation after setting a particular seed is correlated with the nth random generation after setting that same seed again, and that using NULL and NULL is basically the same as using 1 and 1.
f <- function(s1 = NULL, s2 = NULL) {
set.seed(s1)
a <- runif(50)
b <- runif(50)
c <- runif(50)
set.seed(s2)
d <- runif(50)
e <- runif(50)
f <- runif(50)
x <- data.frame(a, b, c, d, e, f)
plot(x)
}
f(NULL, NULL)
f(1, 1)
f(1, 2)
Created on 2022-01-04 by the reprex package (v2.0.1)

Apply Machine learning process simultaneously on multiple datasets with R

I want to delete correlated variables and perform lasso regression on multiple datasets. So i divided my data in two lists: first list contains variables and the second contains targets.
I want also to divide my data into train and test before applying Lasso, making predictions and store tthe results in a final dataframe.
The main steps:
1- Correlation: (delete correlated variables)
2- divide data inton train and test
3- Perform LASSO
4- Make predictions
5- store predictions in a dataframe with their labels
Thanks!
set.seed(99)
library("caret")
# Create data frames
H <- data.frame(replicate(10,sample(0:20,10,rep=TRUE)))
C <- data.frame(replicate(5,sample(0:100,10,rep=FALSE)))
R <- data.frame(replicate(7,sample(0:30,10,rep=TRUE)))
E <- data.frame(replicate(4,sample(0:40,10,rep=FALSE)))
# Create target variables
Y_H <- data.frame(replicate(1,sample(20:35, 10, rep = TRUE)))
Y_H
names(Y_H)<-names(Y_H)[names(Y_H)=="replicate.1..sample.20.35..10..rep...TRUE.."] <-"label_1"
Y_C <- data.frame(replicate(1,sample(15:65, 10, rep = TRUE)))
names(Y_C) <- names(Y_C)[names(Y_C)=="replicate.1..sample.15.65..10..rep...TRUE.."] <-"label_2"
Y_R <- data.frame(replicate(1,sample(25:45, 10, rep = TRUE)))
names(Y_R) <-names(Y_R)[names(Y_R) == "replicate.1..sample.25.45..10..rep...TRUE.."] <- "label_3"
Y_E <- data.frame(replicate(1,sample(21:80, 10, rep = TRUE)))
names(Y_E) <-names(Y_E)[names(Y_E) == "replicate.1..sample.15.65..10..rep...TRUE.."] <- "label_4"
# Store observations and targets in lists
inputs <- list(H, C, R, E)
targets <- list(Y_H, Y_C, Y_R, Y_E)
# Perform correlation
outputs <- list()
for(df in inputs){
data.cor <- cor(df)
high.cor <- findCorrelation(data.cor, cutoff=0.40)
outputs <- append(outputs, list(df[,-high.cor]))
}
library("glmnet")
lasso_cv <- list()
lasso_model <- list()
for(i in outputs){
for(j in targets){
lasso_cv[i] <- cv.glmnet(as.matrix(outputs[[i]]), as.matrix(targets[[j]]), standardize = TRUE, type.measure="mse", alpha = 1,nfolds = 3)
lasso_model[i] <- glmnet(as.matrix(outputs[[i]]), as.matrix(targets[[j]]),lambda = lasso_cv[i]$lambda_cv, alpha = 1, standardize = TRUE)
}
}
When i run my for loop, it gives this error:
Error in h(simpleError(msg, call)) :
erreur d'�valuation de l'argument 'x' lors de la s�lection d'une
m�thode pour la fonction 'as.matrix' : invalid subscript type 'list'
It seems to me that the error is in the range of the last for loop.
You wrote for(i in outputs), and then used as.matrix(outputs[[i]]). So, at the first iteration you are basically calling as.matrix(outputs[[outputs[[1]]), which does not make sense. Similar reasoning applies to for(j in targets).
Try to replace the code I quoted by for(i in seq_len(length(outputs))) and for(i in seq_len(length(targets))). That should work. In this way, at the first iteration as.matrix(outputs[[i]]) translates to as.matrix(outputs[[1]]), and similarly for the other line, which it seems to me is the idea you were looking for.
Ps I am not sure about your code. If we check, lasso_cv[i]$lambda_cv returns NULL for every i. Maybe you can check into it.

Custom expected returns in the Portfolio Analytics package

I have trouble incorporating custom expected returns in Portfolio Analytics package. Usually expected returns are some professional expectations / views or calculated separately from fundamental indicators. Portfolio Analytics allow to create custom moments function to calculate moments from past returns, but I don't understand how to incorporate already calculated returns to optimization problem. Any help is appreciated and here is small example dataset:
#Download package and sample returns
library(PortfolioAnalytics)
library(PerformanceAnalytics)
data(edhec)
returns <- tail(edhec[,1:4], 10)
#Example expected return xts that I'm usually working with. Calculated separately.
N <- 10
M <- 4
views <- as.xts(data.frame(matrix(rnorm(N*M,mean=0,sd=0.05), N, M)), order.by = index(returns))
colnames(views) <- colnames(returns)
Lets create basic portfolio with some objectives.
pf <- portfolio.spec(assets = colnames(returns))
pf <- add.constraint(portfolio = pf, type = "full_investment")
pf <- add.constraint(portfolio = pf, type = "long_only")
pf <- add.objective(portfolio = pf, type = "return", name = "mean")
pf <- add.objective(portfolio = pf, type = "risk", name = "StdDev")
Now I would like to optimize portfolio pf at each period and take account views (expected returns for that period) but I'm running out of ideas at this point.
I realise now, after setting the bounty, that the questions has already been answered here. I'll summarise as best as I can understand it.
When you call optimize.portfolio, there is an optional parameter momentFUN, which defines the moments of your portfolio. One of its arguments is momentargs, which you can pass through in optimize.portfolio.
First, you need to choose a set of expected returns. I'll assume the last entry in your views time series:
my.expected.returns = views["2009-08-31"]
You'll also need your own covariance matrix. I'll compute it from your returns:
my.covariance.matrix = cov(returns)
Finally, you'll need to define momentargs, which is a list consisting of mu (your expected returns), sigma (your covariance matrix), and third and fourth moments (which we'll set to zero):
num_assets = ncol(current.view)
momentargs = list()
momentargs$mu = my.expected.returns
momentargs$sigma = my.covariance.matrix
momentargs$m3 = matrix(0, nrow = num_assets, ncol = num_assets ^ 2)
momentargs$m4 = matrix(0, nrow = num_assets, ncol = num_assets ^ 3)
Now you're ready to optimize your portfolio:
o = optimize.portfolio(R = returns, portfolio = pf, momentargs = momentargs)
When you optimize with ROI and you pass your expectations as momentargs = expectations you need to add
add.objective(eff.port, type = "return", name = "mean")
As in the function code of optimize.portfolio line 382 or so, it is stated that it only takes the momentargs argument when the objective is return. Even if you already stated it as a constraint.

Value-at-Risk (Extreme-Value Theory) using Monte Carlo Simulation in R

I have code that successfully calculates VaR based on Extreme Value Theory using historical data. I'm trying to run this same code on multiple simulated price paths (i.e. calculating a VaR for each path) and then taking the median or average of those VaRs.
Every example I could find online had the simulation function return the price at the end of the period and then they replicated the function X many time. That makes sense to me, except that I essentially need to calculate value-at-risk for each simulated path. Below is the code I have so far. I can say that the code works when using historical data (i.e. the "evt" function works fine and the datatable is populated correctly when the lossOnly, u, and evtVar lines aren't in a function). However, I've been trying to implement simulation in the second function and trying various combinations, which have all failed.
library('RODBC')
library('nor1mix')
library('fExtremes')
library('QRM')
library('fGarch')
#function for computing the EVT VaR
evt <- function(data,u){
#fit excess returns to gpd to get estimates
gpdfit = tryCatch({
gpdfit <- gpdFit(data,u,type="mle")
}, warning = function(w) {
gpdfit <- gpdFit(data,u,type="mle",optfunc="nlminb")
return(gpdfit)
}, error = function(e) {
gpdfit <- gpdFit(data,u,type="pwm",optfunc="nlminb")
return(gpdfit)
}, finally = {})
#now calculate VaRs
xi <- gpdfit#fit$par.ests["xi"]
beta <- gpdfit#fit$par.ests["beta"]
Nu <- length(gpdfit#data$exceedances)
n <- length(data)
evtVar95 <- (u+((beta/xi)*(((n/Nu)*.05)^(-xi) - 1.)))*100
evtVar99 <- (u+((beta/xi)*(((n/Nu)*.01)^(-xi) - 1.)))*100
evtVar997 <- (u+((beta/xi)*(((n/Nu)*.003)^(-xi) - 1.)))*100
evtVar999 <- (u+((beta/xi)*(((n/Nu)*.001)^(-xi) - 1.)))*100
#return calculations
return(cbind(evtVar95,evtVar99,evtVar997,evtVar999,u,xi,beta,Nu,n))
}
#data <- read.table("pricedata.txt")
prices <- data$V1
returns <- diff(log(prices)) #or returns <- log(prices[-1]/prices[-n])
xi <- mean(returns)
std <- sd(returns)
N <- length(prices)
lstval <- prices[N]
options(scipen = 999)
p <- c(lstval, rep(NA, N-1))
gen.path <- function(){
N <- length(prices)
for(i in 2:N)
p[i] <- p[i-1] * exp(rnorm(1, xi, std))
# plot(p, type = "l", col = "brown", main = "Simulated Price")
#evt calculation
#first get only the losses and then make them absolute
lossOnly <- abs(p[p<0])
#get threshold
u <- quantile(lossOnly, probs = 0.9, names=FALSE)
evtVar <- evt(lossOnly,u)
return(evtVar)
}
runs <- 10
sim.evtVar <- replicate(runs, gen.path())
evtVar <- mean(sim.evtVar)
#add data to total table
VaR <- c(evtVar[1],evtVar[2],evtVar[3],evtVar[4],evtVar[5],evtVar[6],evtVar[7],evtVar[8],evtVar[9])
DF <- data.frame(VaR, row.names=c("evtVar95","evtVaR_99","evtVaR_997","evtVaR_999","u","xi","beta","Nu","n"))
In short, I'm trying to run the value-at-risk function (first function) within the monte carlo function (second function) and trying to put the average simulated values into a data tables. I know the first function works, but it's the second function that's driving me crazy. There are the errors I'm getting:
> sim.evtVar <- replicate(runs, gen.path())
Error in if (xi > 0.5) { : missing value where TRUE/FALSE needed
Called from: .gpdpwmFit(x, u)
Browse[1]> evtVar <- mean(sim.evtVar)
Error during wrapup: object 'sim.evtVar' not found
Browse[1]>
> #add data to total table
> VaR <- c(evtVar[1],evtVar[2],evtVar[3],evtVar[4],evtVar[5],evtVar[6],evtVar[7],evtVar[8],evtVar[9])
Error: object 'evtVar' not found
> DF <- data.frame(VaR, row.names=c("evtVar95","evtVaR_99","evtVaR_997","evtVaR_999","u","xi","beta","Nu","n"))
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ""function"" to a data.frame
Any help you can provide is greatly appreciated! Thank you in advance!
I think the Problem is this row:
lstval <- prices[N]
because if you take a stock price, that can't ever be negative, you produce an empty vector at this row in your function:
lossOnly <- abs(p[p<0])
you should try instead:
lstval <- min(returns)
if you want the highest negative return of your dataset

In R, how can I populate two columns with the results of the CIr function based on inputs from two other columns?

Thanks for any help in advance. I have a dataset with correlation values in a column called 'exit' and corresponding sample sizes (n) in a column called 'samplesize' in a data frame called 'dataset'.
My task is to create an R script to populate two full columns (CIleft and CIright) with the confidence interval outputs using the CIr function within the "psychometric" package for each row of data. This CIr function operates as follows, outputting the left and right confidence interval values:
CIr(r = .9, n = 100, level = .95)
[1] 0.8546667 0.9317133
Below is my unsuccessful script.
CI <- function(x)
{
require(psychometric)
library(psychometric)
r <- x["dataset$exit"];
n <- x["dataset$samplesize"];
results <- CIr(r, n, level = .95);
x["dataset$CIleft"] <- results[1];
x["dataset$CIright"] <- results[2];
}
One complication (which I believe may be relevant) is that test runs of "CI(x)" in the console produce the following errors:
// Error in CIz(z, n, level) : (list) object cannot be coerced to type 'double'
Then entering dataset2 <- as.matrix(dataset) and trying CI(x) again yields:
Error in dataset2$exit : $ operator is invalid for atomic vectors
And for
dataset3 <- lapply(dataset$exit, as.numeric)
dataset4 <- lapply(dataset$samplesize, as.numeric)
trying CI(x) again yields:
Error in 1 + x : non-numeric argument to binary operator //
Can anyone assist in helping me populate each row of my data frame with the appropriate output for CIleft and CIright, given that r = 'exit', and n = 'samplesize'?
I don't think you need a function.
library("psychometric")
dataset$lwr = NULL
dataset$upr = NULL
for (row in 1:nrow(dataset)){
dataset[["lwr"]][row] <- CIr(r = dataset[["exit"]][row], n = dataset[["samplesize"]][row], level = .95)[1]
dataset[["upr"]][row] <- CIr(r = dataset[["exit"]][row], n = dataset[["samplesize"]][row], level = .95)[2]
}
I will note though that it's generally advisable to avoid for loops in R because of its architecture (i.e., they're slow). Perhaps someone else can provide a solution with something else, e.g., apply. However, if you only have a small dataframe, the speed cost of using a for loop is unlikely to be noticeable.
Test Data:
set.seed(55); m = rnorm(26, 20, 40); dataset = data.frame( exit = seq(0, 1, 0.04), samplesize = abs(round(m)))
dataset$samplesize[dataset$samplesize == 0] = 5
dataset$exit[dataset$exit == 1] = 0.99

Resources