Estimating multiple OLS with AR residuals - r

I am new to modeling in R, so I'm stumbling a bit...
I have a model in Eviews, which I have to translate to R and make further upgrades.
The model is multiple OLS with AR(1) of residuals.
I implemented it like this
model1 <- lm(y ~ x1 + x2 + x3, data)
data$e <- dplyr:: lag(residuals(model1), 1)
model2 <- lm(y ~ x1 + x2 + x3 + e, data)
My issue is the same as it is in this thread and I expected it: while parameter estimations are similar, they are different enought that I cannot use it.
I am planing of using ARIMA from stats package, but the problem is implementation. How to make AR(1) on residuals, and make other variables as they are?

Provided I understood you correctly, you can supply external regressors to your arima model through the xreg argument.
You don't provide sample data so I don't have anything to play with, but your model should translate to something like
model <- arima(data$y, xreg = as.matrix(data[, c("x1", "x2", "x3")]), order = c(1, 0, 0))
Explanation: The first argument data$y contains your time series data. xreg contains your external regressors as a matrix, with every column containing as many observations for that regressor as you have time points. order = c(1, 0, 0) defines an AR(1) model.

Related

Clustered standard errors with imputed and weighted data in R

I am attempting to get clustered SEs (at the school level in my data) with data that is both imputed (MICE) and weighted (CBPS). I have tried a couple different approaches that have thrown different errors.
This is what I have to start, which works fine:
library(tidyverse)
library(mice)
library(MatchThem)
library(CBPS)
tempdata <- mice(d, m = 10, maxit = 50, meth = "pmm", seed = 99)
weighted_data <- weightthem(trtmnt ~ x1 + x2 + x3,
data = tempdata,
method = "cbps",
estimand = "ATT")
Using this (https://www.r-bloggers.com/2021/05/clustered-standard-errors-with-r/) as a guide, I attempted all 3, which all resulted in various types of error messages.
My data is in a restricted server so unfortunately I can't bring it into here to reproduce things exactly, although if it's useful I could attempt to recreate some sample data.
So attempting with estimatr first, I get this error:
m1 <- estimatr::lm_robust(outcome ~ trtmnt + x1 + x2 + x3,
clusters = schoolID,
data = weighted_data)
Error in eval_tidy(mfargs[[da]], data = data) :
object 'schoolID' not found
I have no clue where the schoolID variable would have dropped out/not be recognized. It isn't part of the weighting procedure but it should still be in the data frame...if I use it as a covariate in a standard model without clustering, it's there.
I also attempted with miceadds and got this error:
m2 <- miceadds::lm.cluster(outcome ~ trtmnt + x1 + x2 + x3,
cluster = "schoolID",
data = weighted_data)
Error in as.data.frame.default(data) :
cannot coerce class `"wimids"` to a data.frame
And finally, with sandwich and lmtest:
library(sandwich)
library(lmtest)
m3 <- weighted_models <- with(weighted_data,
exp=lm(outcome ~ trtmnt + x1 + x2 + x3))
msandwich <- coeftest(m3, vcov = vcovCL, cluster = ~schoolID)
Error in UseMethod("estfun") :
no applicable method for `estfun` applied to an object of class "c(`mimira`, `mira`)"
Any ideas on any of the above methods, or where to go next?
You were really close. You need to use with(weighted_data, .) to fit a model in your weighted datasets, and you need to use estimatr::lm_robust() to get the clustered standard errors. So try the following:
weighted_models <- with(weighted_data,
estimatr::lm_robust(outcome ~ trtmnt + x1 + x2 + x3,
cluster = schoolID))
Your first and second approaches were incorrect because you supplied weighted_data to a single model as if it were a data frame, but it's not; it's a complicated wimids object. You need to use the with() infrastructure to fit a model to the imputed weighted data.
Your third approach was close, but coeftest() needs to be used on a single model, not a mimira object, which contains all the models fit the imputed datasets. Although you can use coeftest() inside with() with mira objects, you cannot do so with mimira objects from MatchThem. This is where estimatr::lm_robust() comes in since it is able to apply the clustering within each imputed dataset.
I also recommend you take a look at this blog post on estimating treatment effects after weighting with multiply imputed data. The only difference in your case to the code presented in the post is that you would change vcov = "HC3" to vcov = ~schoolID in whichever function you use.

Validating a model and introducing a new predictor in glm

I am hitting my head against the computer...
I have a prediction model in R that goes like this
m.final.glm <- glm(binary_outcome ~ rcs(PredictorA, parms=kn.a) + rcs(PredictorB, parms=kn.b) + PredictorC , family = "binomial", data = train_data)
I want to validate this model on test_data2 - first by updating the linear predictor (lp)
train_data$lp <- predict(m.final.glm, train_data)
test_data2$lp <- predict(m.final.glm, test_data2)
lp2 <- predict(m.final.glm, test_data2)
m.update2.lp <- glm(binary_outcome ~ 1, family="binomial", offset=lp2, data=test_data2)
m.update2.lp$coefficients[1]
m.final.update2.lp <- m.final.glm
m.final.update2.lp$coefficients[1] <- m.final.update2.lp$coefficients[1] + m.update2.lp$coefficients[1]
m.final.update2.lp$coefficients[1]
p2.update.lp <- predict(m.final.update2.lp, test_data2, type="response")
This gets me to the point where I have updated the linear predictor, i.e. in the summary of the model only the intercept is different, but the coefficients of each predictor are the same.
Next, I want to include a new predictor (it is categorical, if that matters), PredictorD, into the updated model. This means that the model has to have the updated linear predictor and the same coefficients for Predictors A, B and C but the model also has to contain Predictor D and estimate its significance.
How do I do this? I will be very grateful if you could help me with this. Thanks!!!

Lagged Residual as Independent Variable in R

I am building a factor model to estimate future equity returns. I'd like to include an autoregressive residual term in this model. I'd like to have yesterday's error (the difference between yesterday's predicted return and actual return) to be included in the regression as an independent variable. What type of autoregressive model is this called? I've searched through various time series econometrics texts and have not found this particular model described. My current solution in R is to rerun the regression at every discrete time step (t), and manually include yesterday's residual, but I am curious if there is a more efficient method or package that does this.
Below is some sample code without the residual term included:
Data:
# fake data
set.seed(333)
df <- data.frame(seq(as.Date("2017/1/1"), as.Date("2017/2/19"), "days"),
matrix(runif(50*506), nrow = 50, ncol = 506))
names(df) <- c("Date", paste0("var", 1:503), c("mktrf", "smb", "hml"))
Then I store my necessary variables for regression:
1.All the dep var
x = df[,505:507]
2.All the indep var
y <- df[,2:504]
4.Fit all the models
list_models_AR= lapply(y, function(y)
with(x, lm(y ~ mktrf + smb + hml , na.action = na.exclude)))
It’s a ARIMA(0, 0, 1), with regressors model

Replacing intercept with dummy variables in ARIMAX models in R

I am attempting to fit an ARIMAX model to daily consumption data in R. When I perform an OLS regression with lm() I am able to include a dummy variable for each unit and remove the constant term (intercept) to avoid less then full rank matrices.
lm1 <- lm(y ~ -1 + x1 + x2 + x3, data = dat)
I have not found a way to do this with arima() which forces me to use the constant term and exclude one of the dummy variables.
with(dat, arima(y, xreg = cbind(x1, x2))
Is there a specific reason why arima() doesn't allow this and is there a way to bypass?
See the documentation for the argument include.mean in ?arima, it seems you want the following: arima(y, xreg = cbind(x1, x2), include.mean=FALSE).
Be also aware of the definition of the model fitted by ARIMA as pointed by #RichardHardy.

MCMClogit confusion

Could anybody explain to me why
simulatedCase <- rbinom(100,1,0.5)
simDf <- data.frame(CASE = simulatedCase)
posterior_m0 <<- MCMClogit(CASE ~ 1, data = simDf, b0 = 0, B0 = 1)
always results in a MCMC acceptance ratio of 0? Any explanation would be greatly appreciated!
I think your problem is the model formula, since logistic regression models have no error term. Thus you model CASE ~ 1 should be replaced by something like CASE ~ x (the predictor variable x is mandatory). Here is your example, modified:
CASE <- rbinom(100,1,0.5)
x <- 1:100
posterior_m0 <- MCMClogit (CASE ~ x, b0 = 0, B0 = 1)
classic_m0 <- glm (CASE ~ x, family=binomial(link="logit"), na.action=na.pass)
So I think your problem is not related to the MCMCpack library (disclaimer: I have never used this package).
For anyone stumbling into this same problem :
It seems that the MCMClogit function cannot handle anything but B0=0 if your model only has an intercept.
If you add a covariate, then you can specify a precision just fine.
I would consider other packages (such as arm or rjags) if you really want to sample from this model. For a list of options available for Bayesian regression, see http://cran.r-project.org/web/views/Bayesian.html

Resources