I do a MC simulation of a portfolio with 4 assets (Bond, equity, equity, cash market)
I use monthly steps and my simulation horizon is 10 years i.e. 120 steps. My final goal is to compute the yearly expected shortfall, i.e. taking the worst 5% of the Portfolio Returns.
The simulation seems to be ok - at the first glance. However, I have the impression the drift dominates the process over time, so my expected shortfall is even positive for the long end. Also the expected shortfall eventually decreases when I increase the weights for equity. This is also true when I set the expected return for each asset to zero and hence the increased risk should drag the expected shortfall down.
I expect a bug in my code but can't see it. Any advice highly appreciated!
#maturity in years
maturity <- 10
#Using monthly steps
nsteps <- maturity*12
dt <- maturity / nsteps
#number of assets
nAssets = 4
#number of simulations
nTrails = 10000
#expected return p.a. for each asset, stored in vector BM.mu
BM.mu <- rep(NA,nAssets)
BM.mu[1] <- 0.0072
BM.mu[2] <- 0.0365
BM.mu[3] <- 0.04702
BM.mu[4] <- 0.0005
#defining variable size
simulated.Returns <- array(NA, dim = c(nsteps+1, nTrails, nAssets))
cumulative.PortReturns <- matrix(rep(NA,nsteps*nTrails), nrow = nsteps, ncol = nTrails)
ES <- rep(NA, maturity)
#defining my monthly correlation and covariance matrix
corr_matrix <- matrix(c(1.000000000, -0.05081574, -0.07697585, 0.0051,
-0.050815743, 1.00000000, 0.80997805, -0.3540,
-0.076975850, 0.80997805, 1.00000000, -0.3130,
0.005095699, -0.35365332, -0.31278506, 1.0000), nrow = 4, ncol = 4)
cov_matrix <- matrix(c(1.44e-04, -2.20e-05, -3.86e-05, 8.44e-08,
-2.20e-05, 1.30e-03, 1.22e-03, -1.76e-05,
-3.86e-05, 1.22e-03, 1.75e-03, -1.81e-05,
8.44e-08, -1.76e-05, -1.81e-05, 1.90e-06), nrow = 4, ncol = 4)
#defining my portfolio weights
port.weights <- c(0.72, 0.07, 0.07, 0.14)
#performing cholesky decomposition
R <- (chol(corr_matrix))
#generating standard-normal, random variables
x <- array(rnorm(nsteps*nTrails*nAssets), c(nsteps*nTrails,nAssets))
#generating correlated standard-normal, random variables
ep <- x %*% R
#defining the drift
drift <- BM.mu - 0.5 * diag(cov_matrix)
#generating asset paths
temp = array(exp(as.vector(drift %*% t(dt)) + t(ep *sqrt(diag(cov_matrix)))), c(nAssets,nsteps,nTrails))
for(i in 2:nsteps) temp[,i,] = temp[,i,] * temp[,(i-1),]
#changing dimension of the array temp from dim(nAssets, nsteps, nTrails) to dim(nsteps, nAssets, nTrails)
simulated.Returns <- aperm(temp,c(2,1,3))
#computing portfolio returns for each simulation (nTrails). To do this, each step is weighted with "port.weights"
#Since I generate continuous returns, I first transform them into discrete, multiply with weights and then transform back into continuous.
for (z in 1:nTrails) {
for (i in 1:nsteps) cumulative.PortReturns[i,z] = log(1+((exp(simulated.Returns[i,,z]-1)-1) %*% port.weights))
}
#Finally I compute the monthly expected shortfall (5%-level) by taking the average of the 5% worst portfolio yields
#I do steps of 12 as I calculate the ES at the end of each year
z = 0
for (i in seq(12, nsteps, by = 12 )) {
z = z + 1
ES[z] <- mean(sort(cumulative.PortReturns[i,]) [1:(0.05*nTrails)])
}
#plotting a sample of simulated portfolio returns
#library(QRM)
plot(as.timeSeries(cumulative.PortReturns[,1:100]), plot.type = 'single')
From your comments, you have defined BM.mu to be the annual expected return for each asset. However, you are simulating each sample path using monthly rather than annual steps. This then needs to be incorporated in your drift variable by scaling BM.mu to the expected monthly return accordingly:
#defining the drift
drift <- BM.mu/12 - 0.5 * diag(cov_matrix)
Without this, you are computing the drift value using an annual expected return value and a monthly covariance matrix. This is resulting in a larger drift than you expect, which would impact on the results you are seeing.
Related
This is an extended question I found from here (Method #1: http://santiago.begueria.es/2010/10/generating-spatially-correlated-random-fields-with-r/) and here (Method #2: https://gist.github.com/brentp/1306786). I know these two sites covered very well (Thanks!) with relatively small size of dimension (e.g., 1000x1). I am trying to generate spatially clustered binary data with large size of dimension like >=100000x1 dimension, for example, c(1,1,1,1,0,1,0,0,0,0, …, 0,0,0,0,0,0,0,0,0,0,0,0) with 1000 times / case study. Here are slightly modified codes from the sites.
# Method #1
dim1 <- 1000
dim2 <- 1
xy <- expand.grid(seq_len(dim1), seq_len(dim2))
colnames(xy) <- c("x", "y")
geo.model <- gstat(formula = z~x+y, locations = ~x+y, dummy = TRUE, beta = 0,
model = vgm(psill = 1,"Exp",
range = dim1), # Range parameter!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
nmax = 30) # Spatial correlation model
sim.mat <- predict(geo.model, newdata = xy, nsim = 1)
sim.mat[,3] <- ifelse(sim.mat[,3] > quantile(sim.mat[,3], .1), 0, 1)
plot(sim.mat[, 3])
# Method #2
# generate autocorrelated data.
nLags = 1000 # number of lags (size of region)
# fake, uncorrelated observations
X = rnorm(nLags)
# fake sigma... correlated decreases distance.
sigma = diag(nLags)
corr = .999
sigma <- corr ^ abs(row(sigma)-col(sigma))
#sigma
# Y is autocorrelated...
Y <- t(X %*% chol(sigma))
y <- ifelse(Y >= quantile(Y, probs=.9), 1, 0)[, 1]
plot(y)
Both methods work very well to generate binary data when dim1 is less than 10000. However, when I tried several hundred thousand (e.g., >= 100,000), it seems to take a long time or memory issue.
For example, when I used “nLags = 50000” in Method #2, I got an error message (“Error: cannot allocate vector of size 9.3 Gb”) after the code “sigma <- corr ^ abs(row(sigma)-col(sigma))”.
I would like to find an efficient (time- and memory-saving) way to generate such a spatially clustered binary data 1000 times (especially, with dim1 >= 100000) per each case study (about 200 cases).
I have thought about applying multiple probabilities in "sample" function or probability distribution. I am not sure how to and beyond my scope.
I am trying to compute the Bayes factor of an A/B test dataset that can be found here. However, I end up with a NaN because the beta coefficient evaluates to zero. In calculating the likelihoods, I am assuming that it follows the binomial distribution. Hence, I am following this formula:
likelihood = choose(n,k) * Beta(k+1,n-k+1)
The code can be found below
data <- read.csv(file="ab_data.csv", header=TRUE, sep=",")
control <- data[which(data$group == "control"),]
treatment <- data[which(data$group == "treatment"),]
#compute bayes factor
n1 = nrow(control)
r1 = sum(control$converted)
n2 = nrow(treatment)
r2 = sum(treatment$converted)
likelihood_control <- choose(n1,r1) * beta(r1+1, n1-r1+1)
likelihood_treatment <- choose(n2,r2) * beta(r2+1, n2-r2+1)
bayes_factor <- likelihood_control/ likelihood_treatment
beta(r1+1, n1+r1+1)
beta(r2+1, n2-r2+1)
bayes_factor
As you observed, the problem is that the beta function is returning 0, but this is not because the likelihood is actually 0, it's just that the likelihood is so small the computer is storing it as 0. The second issue is that choose is returning Inf. Again, this is not because the value is actually infinite, it's just that R can't internally store values that large. The solution is to use logarithms, which grow much more slowly, and then exponentiate at the end. Below should work (I tested the logchoose function, and it seems to work)
logchoose <- function(n, k){
num <- sum(log(seq(n - k + 1, n)))
denom <- sum(log(1:k))
return(num - denom)
}
likelihood_control <- logchoose(n1,r1) + lbeta(r1+1, n1-r1+1)
likelihood_treatment <- logchoose(n2,r2) + lbeta(r2+1, n2-r2+1)
bayes_factor <- exp(likelihood_control - likelihood_treatment)
bayes_factor
I am trying to make a similar analysis to McNeil & Frey in their paper 'Estimation of tail-related risk measures for heteroscedastic financial time series: an extreme value approach' but I am stuck with a problem when implementing the models.
The approach is to fit a AR(1)-GARCH(1,1) model in order to estimate the the one-day ahead forecast of the VaR using a window of 1000 observations.
I have simulated data that should work fine with my model, and I assume that if I would be doing this correct, the observed coverage rate should be close to the theoretical one. However it is always below the theoretical coverage rate, and I don´t know why.
I beleive that this is how the calculation of the estimated VaR is done
VaR_hat = mu_hat + sigma_hat * qnorm(alpha)
, but I might be wrong. I have tried to find related questions here at stack but I have not found any.
How I approach this can be summarized in three steps.
Simulate 2000 AR(1)-GARCH(1,1) observations and fit a corresponding model and extract the one day prediction of the conditional mean and standard deviation using a window of 1000 observations.(Thereby making 1000 predictions)
Use the predicted values and the normal quantile to calculate the VaR for the wanted confidence level.
Check if the coverage rate is close to the theoretical one.
If someone could help me I would be extremely thankful, and if I'm unclear in my formalation please just tell me and I'll try to come up with a better explanation to the problem.
The code I'm using is attached below.
Thank you in advance
library(fGarch)
nObs <- 2000 # Number of observations.
quantileLevel <- 0.95 # Since we expect 5% exceedances.
from <- seq(1,1000) # Lower index vector for observations in model.
to <- seq(1001,2000) # Upper index vector for observations in model.
VaR_vec <- rep(0,(nObs-1000)) # Empty vector for storage of 1000 VaR estimates.
# Specs for simulated data (including AR(1) component and all components for GARC(1,1)).
spec = garchSpec(model = list(omega = 1e-6, alpha = 0.08, beta = 0.91, ar = 0.10),
cond.dist = 'norm')
# Simulate 1000 data points.
data_sim <- c(garchSim(spec, n = nObs, n.start = 1000))
for (i in 1:1000){
# The rolling window of 1000 observations.
data_insert <- data_sim[from[i]:to[i]]
# Fitting an AR(1)-GARCH(1,1) model with normal cond.dist.
fitted_model <- garchFit(~ arma(1,0) + garch(1,1), data_insert,
trace = FALSE,
cond.dist = "norm")
# One day ahead forecast of conditional mean and standard deviation.
predict(fitted_model, n.ahead = 1)
prediction_model <- predict(fitted_model, n.ahead = 1)
mu_pred <- prediction_model$meanForecast
sigma_pred <- prediction_model$standardDeviation
# Calculate VaR forecast
VaR_vec[i] <- mu_pred + sigma_pred*qnorm(quantileLevel)
if (length(to)-i != 0){
print(c('Countdown, just',(length(to) - i),'iterations left'))
} else {
print(c('Done!'))
}
}
# Exctract only the estiamtes ralated to the forecasts.
compare_data_sim <- data_sim[1001:length(data_sim)]
hit <- rep(0,length(VaR_vec))
# Count the amount of exceedances.
for (i in 1:length(VaR_vec)){
hit[i] <- sum(VaR_vec[i] <= compare_data_sim[i])
}
plot(data_sim[1001:2000], type = 'l',
ylab = 'Simulated data', main = 'Illustration of one day ahead prediction of 95%-VaR')
lines(VaR_vec, col = 'red')
cover_prop <- sum(hit)/length(hit)
print(sprintf("Diff theoretical level and VaR coverage = %f", (1-quantileLevel) - cover_prop))
I am studying the property of a simulated time series z_t. In brief, z_t is generated as follows: y_t is an AR(1) process with innovations e_t; z_t is y_t * e_t. I expect z_t to show lag-1 correlation.
I performed the Ljung-Box test for z_t in R and found that my result depends a lot on the seed value (for certain seeds I get p-value < 0.001, for others I get p-value near 1; I tested 10 seeds). I also tried to compute the p-value explicitly based on a possible definition of the LB test and found that it's always small. The same analysis done in Python provides low p-values. I suspect that I may be using the Box.test() function incorrectly.
Here is some code to reproduce the problem; the last 3 lines are for my hands-on LB test:
n <- 1000; phi = 0.9; set.seed(9)
errors <- rnorm(n); Y <- rep(0, times <- n)
for (k in 2:1000){
Y[k] <- phi * Y[k-1] + errors[k]
}
y <- ts(Y)
z <- y * errors
Box.test(z, lag=1, type="Ljung-Box", fitdf = 0)
# note: result doesn't depend very much on number of lags
# sometimes result > 0.1.
r1 <- sum(z*lag(z,-1)) / sum(z**2) #1st order autocorrelation
LB <- n * ((n+2)/(n-1))*r1**2 #LB statistic for lag = 1.
1 - pchisq(LB, 1) # p-value of the LB test; always << 1.
I have a problem with lsoda in deSolve package in R. (It might be applicable to ode function too). I am modeling the dynamics of a food web using a set of ODEs calculating abundances of 5 species in two identical food webs which are connected through dispersal.
the abundances are calculated in 2000 time steps, and they are not supposed to be negative or less than 1e-6. In that case the result should be changed into 0. I could not find any parameter for lsoda to turn negative results into zero. I tried the following trick in my ODE function:
solve.model <- function(t,y, parms){
solve.model <- function(t,y, parms){
y <- ifelse(y<1e-6, 0, y)
#ODE functions here
#...
#...
return(list(dy))
}
but it seems not working. Below is a sample of abundances of species in a web.
I will be very grateful for your help, and hope the sample code can give enough information about my problem.
Babak
P.S. I am solving the following ODE set for the abundances of species(the first two equations) and resource change (third equation)
the corresponding code for the function is as below
solve.model <- function(t, y, parms){
y <- ifelse(y<1e-6, 0, y)
with(parms,{
# return from vector form into matrix form for calculations
(R <- as.matrix(y[(max(no.species)*length(no.species)+1):length(y)]))
(N <- matrix(y[1:(max(no.species)*length(no.species))], ncol=length(no.species)))
dy1 <- matrix(nrow=max(no.species), ncol=length(no.species))
dy2 <- matrix(nrow=length(no.species), ncol=1)
for (i in 1:no.webs){
species <- no.species[i]
(abundance <- N[1:species,i])
adj <- as.matrix(webs[[i]])
a.temp <- a[1:species, 1:species]*adj
b.temp <- b[1:species, 1:species]*adj
h.temp <- h[1:species, 1:species]*adj
#Calculating sigmas in denominator of Holing type II functional response
(sum.over.preys <- abundance%*%(a.temp*h.temp))
(sum.over.predators <- (a.temp*h.temp)%*%abundance)
#Calculating growth of basal
(basal.growth <- basals[,i]*N[,i]*(mu*R[i]/(K+R[i])-m))
# Calculating growth for non-basal species
no.basal <- rep(1,len=species)-basals[1:species]
predator.growth<- rep(0, max(no.species))
(predator.growth[1:species] <- ((abundance%*%(a.temp*b.temp))/(1+sum.over.preys)-m*no.basal)*abundance)
predation <- rep(0, max(no.species))
(predation[1:species] <- (((a.temp*b.temp)%*%abundance)/t(1+sum.over.preys))*abundance)
(pop <- basal.growth + predator.growth - predation)
dy1[,i] <- pop
dy2[i] <- 0.0005 #Change in the resource
}
#Calculating dispersals .they can be easily replaced
# by adjacency maps of connections between food webs arbitrarily!
# added to solve the problem of negative abundances
deltas <- append(c(dy1), dy2)
return(list(append(c(dy1),dy2)))
})
}
this function is used by lsoda by the following call:
temp.abund[[j]] <- lsoda(y=initials, func=solve.model, times=0:max.time, parms=parms)