Fit a copula model in R - r

I want to accomplish the task of creating an optimal portfolio of stocks, the yield between which is modeled using kopulas.
And I have data: return of 4 stocks:
s1 <- read.csv('s1.csv',header=F)$V2
s2 <- read.csv('s2.csv',header=F)$V2
s3 <- read.csv('s3.csv',header=F)$V2
s4 <- read.csv('s4.csv',header=F)$V2
Then I tried to fit t-copula and plot the density
t.cop <- tCopula(dim=4)
set.seed(500)
m <- pobs(as.matrix(cbind(s1,s2,s3,s4)))
fit <- fitCopula(t.cop,m,method='ml')
coef(fit)
rho <- coef(fit)[1]
df <- coef(fit)[2]
persp(tCopula(dim=2,rho,df=df),dCopula)
But I cant understand how to build other types of copulas(vine copulas for example). And how can I find an optimal portfolio?

Related

ARIMA giving forecasts with higher RMSE than AR

I am trying to argue that ARIMA models are better than AR models i.e since AR is a subset of ARIMA, the best ARIMA model will not be worse than the best AR model, but may be better. I have used an AR(6) model, and then used auto.arima() in R which has told me that an ARIMA(1,0,2) model is optimal using AICc. I have used both of these to do a rolling window forecast, but am getting an RMSE of 3.901 for AR(6) and 4.503 for ARIMA(1,0,2). My code for the forecasting is below (I know it is not very advanced but I'm a beginner and this is the best way I could find - it matches my results by hand):
#find moving averages and residual errors
ma=rep(NA,14976)
for (i in 3:14976){
ma[i] = mean(ds[(i-2):(i-1)])
}
frame <- ds-ma
#fit model
model <- arima(ds[1:14676],order=c(1,0,2),include.mean=TRUE,method="ML")
`%+=%` = function(e1,e2){eval.parent(substitute(e1 <- e1 + e2))}
training_data <- data[1:14676]
test_data <- data[14677:14976]
window <- 1
window1 <- 2
coef <- model$coef
history <- training_data[(length(training_data)-window+1):14676]
predictions <- list()
for (i in (1:length(test_data))){
length <- length(history)
lag <- array()
for (d in ((length-window+1):length)){
lag[d-i+1] <- history[d]}
yhat <- coef[length(coef)]-1
for (t in (1:window)){
yhat %+=% (coef[t]*lag[window-t+1])}
if (window1 != 0){
for (j in ((window+1):(window+window1))){
yhat %+=% (coef[j]*frame[14676+i-j+1])}
}
obs <- test_data[i]
predictions <- append(predictions,yhat)
history <- append(history,obs)
print(predictions)
}
The graph that comes out for the ARIMA(1,0,2) forecast (compared to the actual values in the test set) looks better, but is quite raised. It seems like the intercept needs to be lower, which does give a better RMSE, but arima() gave the intercept it did so I haven't changed it.

Two methods of recovering fitted values from a Bayesian Structural Time Series model yield different results

Two conceptually plausible methods of retrieving in-sample predictions (or "conditional expectations") of y[t] given y[t-1] from a bsts model yield different results, and I don't understand why.
One method uses the prediction errors returned by bsts (defined as e=y[t] - E(y[t]|y[t-1]); source: https://rdrr.io/cran/bsts/man/one.step.prediction.errors.html):
library(bsts)
get_yhats1 <- function(fit){
# One step prediction errors defined as e=y[t] - yhat (source: )
# Recover yhat by y-e
bsts.pred.errors <- bsts.prediction.errors(fit, burn=SuggestBurn(0.1, fit))$in.sample
predictions <- t(apply(bsts.pred.errors, 1, function(e){fit$original.series-e}))
return(predictions)
}
Another sums the contributions of all model component at time t.
get_yhats2 <- function(fit){
burn <- SuggestBurn(0.1, fit)
X <- fit$state.contributions
niter <- dim(X)[1]
ncomp <- dim(X)[2]
nobs <- dim(X)[3]
# initialize final fit/residuals matrices with zeros
predictions <- matrix(data = 0, nrow = niter - burn, ncol = nobs)
p0 <- predictions
comps <- seq_len(ncomp)
for (comp in comps) {
# pull out the state contributions for this component and transpose to
# a niter x (nobs - burn) array
compX <- X[-seq_len(burn), comp, ]
# accumulate the predictions across each component
predictions <- predictions + compX
}
return(predictions)
}
Fit a model:
## Air passengers data
data("AirPassengers")
# 11 years, monthly data (timestep=monthly) --> 132 observations
Y <- stats::window(AirPassengers, start=c(1949,1), end=c(1959,12))
y <- log(Y)
ss <- AddLocalLinearTrend(list(), y)
ss <- AddSeasonal(ss, y, nseasons=12, season.duration=1)
bsts.model <- bsts(y, state.specification=ss, niter=500, family='gaussian')
Compute and compare predictions using each of the functions
p1 <- get_yhats1(bsts.model)
p2 <- get_yhats2(bsts.model)
# Compare predictions for t=1:5, first MCMC iteration:
p1[1,1:5]; p2[1,1:5]
I'm the author of bsts.
The 'prediction errors' in bsts come from the filtering distribution. That is, they come from p(state | past data). The state contributions come from the smoothing distribution, i.e. p(state | all data). The filtering distribution looks backward in time, while the smoothing distribution looks both forward and backward. One typically needs the filtering distribution while using a fitted model, and the smoothing distribution while fitting the model in the first place.

Random forest variable importance AND direction of correlation for binomial response

I am using the randomForest package in R, but am not partial to solutions using other packages.
my RF model is using various continuous and categorical variables to predict extinction risk (Threatened, Non_Threatened). I would like to be able to show the direction of variable importance for predictors used in my RF model. Other publications have done exactly this: Figure 1 in https://www.pnas.org/content/pnas/109/9/3395.full.pdf
Any ideas on how to do something similar? One suggestion I read said to simply compare the difference between two partial dependence plots (example below), but I feel this may not be the best way.
Any help would be greatly appreciated.
partialPlot(final_rf, rf_train, size_mat,"Threatened")
partialPlot(final_rf, rf_train, size_mat,"Non_Threatened")
response = Threatened
response = Non_Threatened
You could use something like an average marginal effect (or like below, an average first difference) approach.
First, I'll make some data
set.seed(11)
n = 200
p = 5
X = data.frame(matrix(runif(n * p), ncol = p))
yhat = 10 * sin(pi* X[ ,1] * X[,2]) +20 *
(X[,3] -.5)^2 + 10 * -X[ ,4] + 5 * -X[,5]
y = as.numeric((yhat+ rnorm(n)) > mean(yhat))
df <- as.data.frame(cbind(X,y))
Next, we'll estimate the RF model:
library(randomForest)
rf <- randomForest(as.factor(y) ~ ., data=df)
Net, we can loop through each variable, in each time through the loop, we're adding one standard deviation to a single x variable for all observations. In your approach, you could also change from one category to another for categorical variables. Then, we predict the probability of a positive response under both conditions - the original condition and the one with a standard deviation added to each variable. Then we could take the difference and summarize.
nx <- names(df)
nx <- nx[-which(nx == "y")]
res <- NULL
for(i in 1:length(nx)){
p1 <- predict(rf, newdata=df, type="prob")
df2 <- df
df2[[nx[i]]] <- df2[[nx[i]]] + sd(df2[[nx[i]]])
p2 <- predict(rf, newdata=df2, type="prob")
diff <- (p2-p1)[,2]
res <- rbind(res, c(mean(diff), sd(diff)))
}
colnames(res) <- c("effect", "sd")
rownames(res) <- nx
res
# effect sd
# X1 0.11079 0.18491252
# X2 0.10265 0.16552070
# X3 0.02015 0.07951409
# X4 -0.11687 0.16671916
# X5 -0.04704 0.10274836

Standard errors of impacts in a spatial regression lagsarlm

I am using a spatial lag and durbin regression models and I would like to estimate the standard errors of the impacts. Any ideas on how to do this?
Reproducible example below using a durbin model
# data
data(oldcol)
# neighbours lists
lw <- nb2listw(COL.nb, style="W")
# regression
fit_durb <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD, type="Durbin",
listw=lw, method="eigen",
zero.policy=T, na.action="na.omit")
# power traces
W <- as(lw, "CsparseMatrix")
trMC <- trW(W, type="MC", listw = lw)
# Impacts
imp <- summary(impacts(fit_durb, tr=trMC, R=1000), zstats=TRUE, short=TRUE)
You should be able to use the MC samples stored in the imp object to get the standard errors, for instance:
test1<-lapply(imp$sres, function(x){apply(x, 2, mean)})
test2<-lapply(imp$sres, function(x){apply(x, 2, sd)}
test1$direct/test2$direct
give the same z values as returned by imp

R - Modelling Multivariate GARCH (rugarch and ccgarch)

First time asking a question here, I'll do my best to be explicit - but let me know if I should provide more info! Second, that's a long question...hopefully simple to solve for someone ;)! So using "R", I'm modelling multivariate GARCH models based on some paper (Manera et al. 2012).
I model the Constant Conditional Correlation (CCC) and Dynamic Conditional Correlation (DCC) models with external regressors in the mean equations; using "R" version 3.0.1 with package "rugarch" version 1.2-2 for the univariate GARCH with external regressors, and "ccgarch" package (version 0.2.0-2) for the CCC/DCC models. (I'm currently looking into the "rmgarch" package - but it seems to be only for the DCC and I need CCC model too.)
I have problem in the mean equations of my models. In the paper that I mentionned above, the parameter estimates of the mean equation between the CCC and DCC models changes! And I don't know how I would do that in R...
(currently, looking on Google and into Tsay's book "analysis of financial time series" and Engle's book "Anticipating correlations" to find my mistake)
What I mean by "my mean equations don't change between CCC and DCC models", it the following: I specify the univariate GARCH for my n=5 time series with the package rugarch. Then, I use the estimates parameters of the GARCH (ARCH + GARCH terms) and use them for both the CCC and DCC functions "eccc.sim()" and "dcc.sim()". Then, from eccc.estimation() and dcc.estimation() functions, I can retrieve the estimates for the variance equations as well as the correlation matrices. But not for the mean equation.
I post the R-code (reproducible and my original one) for univariate models and the CCC model only. Thank you already for reading my post!!!!!
Note: in the code below, "data.repl" is a "zoo" object of dim 843x22 (9 daily Commodities returns series and explanatory variables series). The multivariate GARCH is for 5 series only.
Reproducible code:
# libraries:
library(rugarch)
library(ccgarch)
library(quantmod)
# Creating fake data:
dataRegr <- matrix(rep(rnorm(3149, 11, 1),1), ncol=1, nrow=3149)
dataFuelsLag1 <- matrix(rep(rnorm(3149, 24, 8),2), ncol=2, nrow=3149)
#S&P 500 via quantmod and Yahoo Finance
T0 <- "2000-06-23"
T1 <- "2012-12-31"
getSymbols("^GSPC", src="yahoo", from=T0, to=T1)
sp500.close <- GSPC[,"GSPC.Close"],
getSymbols("UBS", src="yahoo", from=T0, to=T1)
ubs.close <- UBS[,"UBS.Close"]
dataReplic <- merge(sp500.close, ubs.close, all=TRUE)
dataReplic[which(is.na(dataReplic[,2])),2] <- 0 #replace NA
### (G)ARCH modelling ###
#########################
# External regressors: macrovariables and all fuels+biofuel Working's T index
ext.regr.ext <- dataRegr
regre.fuels <- cbind(dataFuelsLag1, dataRegr)
### spec of GARCH(1,1) spec with AR(1) ###
garch11.fuels <- as.list(1:2)
for(i in 1:2){
garch11.fuels[[i]] <- ugarchspec(mean.model = list(armaOrder=c(1,0),
external.regressors = as.matrix(regre.fuels[,-i])))
}
### fit of GARCH(1,1) AR(1) ###
garch11.fuels.fit <- as.list(1:2)
for(i in 1:2){
garch11.fuels.fit[[i]] <- ugarchfit(garch11.fuels[[i]], dataReplic[,i])
}
##################################################################
#### CCC fuels: with external regression in the mean eqaution ####
##################################################################
nObs <- length(data.repl[-1,1])
coef.unlist <- sapply(garch11.fuels.fit, coef)
cccFuels.a <- rep(0.1, 2)
cccFuels.A <- diag(coef.unlist[6,])
cccFuels.B <- diag(coef.unlist[7, ])
cccFuels.R <- corr.test(data.repl[,fuels.ind], data.repl[,fuels.ind])$r
# model=extended (Jeantheau (1998))
ccc.fuels.sim <- eccc.sim(nobs = nObs, a=cccFuels.a, A=cccFuels.A,
B=cccFuels.B, R=cccFuels.R, model="extended")
ccc.fuels.eps <- ccc.fuels.sim$eps
ccc.fuels.est <- eccc.estimation(a=cccFuels.a, A=cccFuels.A,
B=cccFuels.B, R=cccFuels.R,
dvar=ccc.fuels.eps, model="extended")
ccc.fuels.condCorr <- round(corr.test(ccc.fuels.est$std.resid,
ccc.fuels.est$std.resid)$r,digits=3)
My original code:
### (G)ARCH modelling ###
#########################
# External regressors: macrovariables and all fuels+biofuel Working's T index
ext.regr.ext <- as.matrix(data.repl[-1,c(10:13, 16, 19:22)])
regre.fuels <- cbind(fuel.lag1, ext.regr.ext) #fuel.lag1 is the pre-lagged series
### spec of GARCH(1,1) spec with AR(1) ###
garch11.fuels <- as.list(1:5)
for(i in 1:5){
garch11.fuels[[i]] <- ugarchspec(mean.model = list(armaOrder=c(1,0),
external.regressors = as.matrix(regre.fuels[,-i])))
}# regre.fuels[,-i] => "-i" because I model an AR(1) for each mean equation
### fit of GARCH(1,1) AR(1) ###
garch11.fuels.fit <- as.list(1:5)
for(i in 1:5){
j <- i
if(j==5){j <- 7} #because 5th "fuels" is actually column #7 in data.repl
garch11.fuels.fit[[i]] <- ugarchfit(garch11.fuels[[i]], as.matrix(data.repl[-1,j])))
}
#fuelsLag1.names <- paste(cmdty.names[fuels.ind], "(-1)")
fuelsLag1.names <- cmdty.names[fuels.ind]
rowNames.ext <- c("Constant", fuelsLag1.names, "Working's T Gasoline", "Working's T Heating Oil",
"Working's T Natural Gas", "Working's T Crude Oil",
"Working's T Soybean Oil", "Junk Bond", "T-bill",
"SP500", "Exch.Rate")
ic.n <- c("Akaike", "Bayes")
garch11.ext.univSpec <- univ.spec(garch11.fuels.fit, ols.fit.ext, rowNames.ext,
rowNum=c(1:15), colNames=cmdty.names[fuels.ind],
ccc=TRUE)
##################################################################
#### CCC fuels: with external regression in the mean eqaution ####
##################################################################
# From my GARCH(1,1)-AR(1) model, I extract ARCH and GARCH
# in order to model a CCC GARCH model:
nObs <- length(data.repl[-1,1])
coef.unlist <- sapply(garch11.fuels.fit, coef)
cccFuels.a <- rep(0.1, length(fuels.ind))
cccFuels.A <- diag(coef.unlist[17,])
cccFuels.B <- diag(coef.unlist[18, ])
#based on Engle(2009) book, page 31:
cccFuels.R <- corr.test(data.repl[,fuels.ind], data.repl[,fuels.ind])$r
# model=extended (Jeantheau (1998))
# "allow the squared errors and variances of the series to affect
# the dynamics of the individual conditional variances
ccc.fuels.sim <- eccc.sim(nobs = nObs, a=cccFuels.a, A=cccFuels.A,
B=cccFuels.B, R=cccFuels.R, model="extended")
ccc.fuels.eps <- ccc.fuels.sim$eps
ccc.fuels.est <- eccc.estimation(a=cccFuels.a, A=cccFuels.A,
B=cccFuels.B, R=cccFuels.R,
dvar=ccc.fuels.eps, model="extended")
ccc.fuels.condCorr <- round(corr.test(ccc.fuels.est$std.resid,
ccc.fuels.est$std.resid)$r,digits=3)
colnames(ccc.fuels.condCorr) <- cmdty.names[fuels.ind]
rownames(ccc.fuels.condCorr) <- cmdty.names[fuels.ind]
lowerTri(ccc.fuels.condCorr, rep=NA)
Are you aware that there is a whole package rmgarch for multivariate GARCH models?
Per its DESCRIPTION, it covers
Feasible multivariate GARCH models including DCC, GO-GARCH and
Copula-GARCH.
Well, I hope this is not too late. Here is what I found from the rmgarch manual: "the CCC model is calculated using a static GARCH copula (Normal) model".

Resources