Using tsboot to obtain confidence interval from a regression with lags - r

I would like to do a bootstrap of regression coefficient in a return model that includes two lags.
I have snp_ret vector with returns obtained from quantmod. The data looks like this:
head(snp_ret)
ret
1998-10-13 -0.2920975
1998-10-14 1.0728374
1998-10-15 4.0882022
1998-10-16 0.8489058
1998-10-19 0.5635226
1998-10-20 0.1448549
Obtaining bootstrap for coefficients should be simple:
getC=function(myData){
return(coef(lm(formula = dyn(ret ~ lag(ret, c(-1,-9))), data=myData) ))
}
tsboot(snp_ret, getC, R = 100, l = 18, sim = "fixed")
The following error appears:
Error in merge.zoo(ret, lag(ret, c(-1, -9)), retclass = "list", all
= TRUE) : series cannot be merged with non-unique index entries in a series
I suspect that it has to do with the fact that regression has two lags, but do not know how to proceed.
If possible, please help.

All right, I found a workaround, so maybe this will be interesting to somebody else... Using arima function instead of lag operators helped.
getC <- function(myData) {
reg <- suppressWarnings(arima(myData, order = c(9, 0, 0), fixed = c(NA, 0,0,0,0,0,0,0,NA,NA)))
return((coef(reg)[c(1,9,10)]))
Note that arima has a weird way of selecting lags - you have to force to zero coefficients on lags that you don't want to include

Related

Error with svyglm function in survey package in R: "all variables must be in design=argument"

New to stackoverflow. I'm working on a project with NHIS data, but I cannot get the svyglm function to work even for a simple, unadjusted logistic regression with a binary predictor and binary outcome variable (ultimately I'd like to use multiple categorical predictors, but one step at a time).
El_under_glm<-svyglm(ElUnder~SO2, design=SAMPdesign, subset=NULL, family=binomial(link="logit"), rescale=FALSE, correlation=TRUE)
Error in eval(extras, data, env) :
object '.survey.prob.weights' not found
I changed the variables to 0 and 1 instead:
Under_narm$SO2REG<-ifelse(Under_narm$SO2=="Heterosexual", 0, 1)
Under_narm$ElUnderREG<-ifelse(Under_narm$ElUnder=="No", 0, 1)
But then get a different issue:
El_under_glm<-svyglm(ElUnderREG~SO2REG, design=SAMPdesign, subset=NULL, family=binomial(link="logit"), rescale=FALSE, correlation=TRUE)
Error in svyglm.survey.design(ElUnderREG ~ SO2REG, design = SAMPdesign, :
all variables must be in design= argument
This is the design I'm using to account for the weights -- I'm pretty sure it's correct:
SAMPdesign=svydesign(data=Under_narm, id= ~NHISPID, weight= ~SAMPWEIGHT)
Any and all assistance appreciated! I've got a good grasp of stats but am a slow coder. Let me know if I can provide any other information.
Using some make-believe sample data I was able to get your model to run by setting rescale = TRUE. The documentation states
Rescaling of weights, to improve numerical stability. The default
rescales weights to sum to the sample size. Use FALSE to not rescale
weights.
So, one solution maybe is just to set rescale = TRUE.
library(survey)
# sample data
Under_narm <- data.frame(SO2 = factor(rep(1:2, 1000)),
ElUnder = sample(0:1, 1000, replace = TRUE),
NHISPID = paste0("id", 1:1000),
SAMPWEIGHT = sample(c(0.5, 2), 1000, replace = TRUE))
# with 'rescale' = TRUE
SAMPdesign=svydesign(ids = ~NHISPID,
data=Under_narm,
weights = ~SAMPWEIGHT)
El_under_glm<-svyglm(formula = ElUnder~SO2,
design=SAMPdesign,
family=quasibinomial(), # this family avoids warnings
rescale=TRUE) # Weights rescaled to the sum of the sample size.
summary(El_under_glm, correlation = TRUE) # use correlation with summary()
Otherwise, looking code for this function's method with 'survey:::svyglm.survey.design', it seems like there may be a bug. I could be wrong, but by my read when 'rescale' is FALSE, .survey.prob.weights does not appear to get assigned a value.
if (is.null(g$weights))
g$weights <- quote(.survey.prob.weights)
else g$weights <- bquote(.survey.prob.weights * .(g$weights)) # bug?
g$data <- quote(data)
g[[1]] <- quote(glm)
if (rescale)
data$.survey.prob.weights <- (1/design$prob)/mean(1/design$prob)
There may be a work around if you assign a vector of numeric values to .survey.prob.weights in the global environment. No idea what these values should be, but your error goes away if you do something like the following. (.survey.prob.weights needs to be double the length of the data.)
SAMPdesign=svydesign(ids = ~NHISPID,
data=Under_narm,
weights = ~SAMPWEIGHT)
.survey.prob.weights <- rep(1, 2000)
El_under_glm<-svyglm(formula = ElUnder~SO2,
design=SAMPdesign,
family=quasibinomial(),
rescale=FALSE)
summary(El_under_glm, correlation = TRUE)

R is only returning non-zero coefficient estimates when using the "poly" function to generate predictors. How do I get the zero values into a vector?

I'm using regsubsets from the leaps library to perform the best subset selection. I need to compare the coefficients it generates to the "true" coefficients I specified when simulating the data (by comparison, meaning, the difference between them squared, and the square root taken of the sum), for each number of predictors.
Since there are 16 different models that regsubsets generated, I use a loop to do this automatically. It would work except that when I extract the coefficients from the best model fit with x predictors, it only gives me the non-zero coefficients of the polynomial fit. This messes up the size of the coefi vector causing it to be smaller in size than the truecoef true coefficients vector.
If I could somehow force all coefficients to be spat out from the model, I wouldn't have an issue. But after looking extensively, I don't know how to do that.
Alternative ways of solving this problem would also be appreciated.
library(leaps)
regfit.train=regsubsets(y ~ poly(x,25, raw = TRUE), data=mydata[train,], nvmax=25)
truecoef = c(3,0,-7,4,-2,8,0,-5,0,2,0,4,5,6,3,2,2,0,3,1,1)
coef.errors = rep(NA, 16)
for (i in 1:16) {
coefi = coef(regfit.train, id=i)
coef.errors[i] = mean((truecoef-coefi)^2)
}
The equation I'm trying to estimate, where j is the coefficient and r refers to the best model containing "r" coefficients:
Thanks!
This is how I ended up solving it (with some help):
The loop indexes which coefficients are available and performs the subtraction, for those unavailable, it assumes they are zero.
truecoef = c(3,0,-7,4,-2,8,0,-5,0,2,0,4,5,6,3,2,2,0,3,1,1)
val.errors = rep(NA, 16)
x_cols = colnames(x, do.NULL = FALSE, prefix = "x.")
for (i in 1:16) {
coefis = coef(regfit.train, id = i)
val.errors[i] = sqrt(sum((truecoef[x_cols %in% names(coefis)] -
coefis[names(coefis) %in% x_cols])^2) + sum(truecoef[!(x_cols %in% names(coefis))])^2)
}

Multivariate Copulas and uniroot error in R

I am trying to construct a multivariate copula method with time series (19 observations) for 7 risk indicators. First I would like to find which is the most suitable copula I should use for this dataset. I would expect that the best fit would be the Gumbel method with survival function but still want to make sure that I am at the right track. For this, I am using Vinecopula package in R. To begin with, I convert my data to uniform marginals using rank method with the function:
umr= apply(dataset,2,rank)/(nrow(dataset)+1)
Then, I try to run the following function:
st_rvine= RVineStructureSelect(dataset, familyset = NA, type = 0, selectioncrit = "AIC", indeptest = FALSE, level = NA, progress = FALSE, weights = NA, treecrit = "tau",rotations = TRUE, se=FALSE, presel = TRUE, method = "mle",cores = 1)
After this however I get the following error:
Error in uniroot(function(x) tau - frankTau(x), lower = 0 + .Machine$double.eps^0.5, :
f() values at end points not of opposite sign
I still haven't found what I may be doing wrong. Can anyone guide me on this please? Is there a mistake on my formulas or there a step that I am missing?
Thank you in advance.

How to use covariates in rddtools rdd_reg_lm function?

I am trying to run a parametric RD regression using the rddtools R package. However, the package documentation is not very clear to me.
First: the function to define an RD object is:
rdd_data(y, x, covar, cutpoint, z, labels, data)
where covar, in the help file, means only "Exogeneous variables" . But what type? A data frame? A list?
Second: The function rdd_reg_lm again demands informing covariates in this way:
rdd_reg_lm(rdd_object, covariates = NULL, order = 1, bw = NULL,
slope = c("separate", "same"), covar.opt = list(strategy = c("include",
"residual"), slope = c("same", "separate"), bw = NULL),
covar.strat = c("include", "residual"), weights)
Where, according to the help file, the covariates argument means simply "Formula to include covariates". Again, it is not clear to me what is exactly the correct way of applying these covariates.
Moreover, is it possible to include multiple covariates in this function rdd_data() and rdd_reg_lm()?
I appreciate some help here. I have already read the help and vignette files again and again, searched in many blogs and still nothing.
I have already checked this topic below
How to include a linear trend in a regression discontinuity design using rddtools
which showed me the following example:
rd.medic <- rdd_data(y = er ,x = ageyrs, covar = ageyrs, cutpoint=65, data = medicare)
rd.reg <- rdd_reg_lm(rdd_object=rd.medic, covariates = 'ageyrs', slope =
("same"), covar.opt = list("include"))
Even so, the syntax is still not clear to me, as I am trying to add multiple covariates without success
Thanks!
You can create a data frame with your covariates and then include it in rdd_data.
covariates<-data.frame(z1=ageyrs, z2=ageyrs2)
rd.medic <- rdd_data(y = er ,x = ageyrs, covar = covariates, cutpoint=65, data = medicare)
rd.reg <- rdd_reg_lm(rdd_object=rd.medic, covariates =TRUE, slope =("same"))

Weighted Portmanteau Test for Fitted GARCH process

I have fitted a GARCH process to a time series and analyzed the ACF for squared and absolute residuals to check the model goodness of fit. But I also want to do a formal test and after searching the internet, The Weighted Portmanteau Test (originally by Li and Mak) seems to be the one.
It's from the WeightedPortTest package and is one of the few (perhaps the only one?) that properly tests the GARCH residuals.
While going through the instructions in various documents I can't wrap my head around what the "h.t" argument wants. It says in the info in R that I need to assign "a numeric vector of the conditional variances". This may be simple to an experienced user, though I'm struggling to understand. What is it that I need to do and preferably how would I code it in R?
Thankful for any kind of help
Taken directly from the documentation:
h.t: a numeric vector of the conditional variances
A little toy example using the fGarch package follows:
library(fGarch)
library(WeightedPortTest)
spec <- garchSpec(model = list(alpha = 0.6, beta = 0))
simGarch11 <- garchSim(spec, n = 300)
fit <- garchFit(formula = ~ garch(1, 0), data = simGarch11)
Weighted.LM.test(fit#residuals, fit#h.t, lag = 10)
And using garch() from the tseries package:
library(tseries)
fit2 <- garch(as.numeric(simGarch11), order = c(0, 1))
summary(fit2)
# comparison of fitted values:
tail(fit2$fitted.values[,1]^2)
tail(fit#h.t)
# comparison of residuals after unstandardizing:
unstd <- fit2$residuals*fit2$fitted.values[,1]
tail(unstd)
tail(fit#residuals)
Weighted.LM.test(unstd, fit2$fitted.values[,1]^2, lag = 10)

Resources