R - fixed effect of panel data analysis and robust standard errors - r

I am working with the panel data through plm package in R. And now I am considering a fixed effect model of group (cities), time, and two ways of group and time, respectively. Because I detected heteroskedasticity through the Breusch-Pagan test, I compute robust standard errors.
I read a help ?vcovHC, but I could not understand fully how to utilize coeftest.
My current code is:
library(plm)
library(lmtest)
library(sandwich)
fem_city <- plm (z ~ x+y, data = rawdata, index = c("city","year"), model = "within", effect = "individual")
fem_year <- plm (z ~ x+y, data = rawdata, index = c("city","year"), model = "within", effect = "time")
fem_both <- plm (z ~ x+y, data = rawdata, index = c("city","year"), model = "within", effect = "twoways")
coeftest(fem_city, vcovHC(fem_city, type = 'HC3', cluster = 'group')
coeftest(fem_year, vcovHC(fem_city, type = 'HC3', cluster = 'time')
In order to compute the robust standard errors, are codes of coeftest appropriate? I am wondering that how to set the cluster option for effect = 'individual and effect = 'time' each.
For example, I set coeftest codes:
cluster = 'group' in plm of fem_city for effect = 'individual' in coeftest
cluster = 'time' in plm of fem_year for effect = 'time' in coeftest
Is this way appropriate?
And, how to compute the robust standard error for twoways of both city and year?

Set cluster='group' if you want to cluster on the variable serving as the individual index (city in your example).
Set cluster='time' if you want to cluster on the variable serving as the time index (yearin your example).
You can cluster on the time index even for a fixed effects one-way individual model.
For clustering on both index variables, you cannot do that with plm::vcovHC. Look at vcovDC from the same packages which provides double clustering (DC = double clustering), e.g.,
coeftest(fem_city, vcovDC(fem_city)

Related

Cluster robust standard errors for mixed effect/LMER models?

I'm estimating a mixed effects model using simulated data. The basis of this is a conjoint experiment: there are N number of countries in the study with P participants and each respondent is shown the experiment twice. This means that there are NxPx2 observations. Heterogeneity is introduced into the data at the country level and so I run a mixed effect model using lmer with random effects varying by country to account for this variance. However, because each respondent does the experiment twice, I also want to cluster my standard errors at the individual level. My data and model looks something like this:
library(lme4)
data(iris)
# generating IDs for observations
iris <- iris %>% mutate(id = rep(1:(n()/2), each = 2))
#run model
mod <- lmer(Sepal.Length~Sepal.Width+Petal.Length+Petal.Width + (Sepal.Width+Petal.Length+Petal.Width || Species), data=iris, REML = F, control = lmerControl(optimizer = 'bobyqa'))
I then attempt to get clustered SEs using the parameters package:
library(parameters)
param <- model_parameters(
mod,
robust = TRUE,
vcov_estimation = "CR",
vcov_type = "CR1",
vcov_args = list(cluster = iris$id)
)
This returns an error:
Error in vcovCR.lmerMod(obj = new("lmerModLmerTest", vcov_varpar = c(0.00740122363004, : Non-nested random effects detected. clubSandwich methods are not available for such models.
I'm not married to any one method or anything. I just want to return clustered SEs for this type of model specification. As of now I can't find any package that does this. Does anyone know how this can be done, or if such a model even makes sense? I'm new to MLMs but I was thinking if I were to run this as a simple linear model I would lm_robust and cluster by individual so it makes sense to me that I should do the same here as well.

How to code Fixed effects Poisson model in R?

I am trying to fit a fixed effects Poisson model in R using pglm function. I need to use both individual and time fixed effects in the model. How can I include both of them?
My example data are:
library(pglm)
library(readstata13)
library(lmtest)
library(MASS)
ships<-readstata13::read.dta13("http://www.stata-
press.com/data/r13/ships.dta")
ships$lnservice=log(ships$service)
ships$time <- rep(1:8, 5)
I have tried using the following code to estimate the model with time and individual fixed effects:
res <- pglm(accident ~
op_75_79+co_65_69+co_70_74+co_75_79+lnservice,family = poisson, data =
ships, effect = "twoways", model="within", index = c("ship", "time"))
summary(res)
The code works, however I am getting the same results as with using the model with only individual and not time effects:
res <- pglm(accident ~
op_75_79+co_65_69+co_70_74+co_75_79+lnservice,family = poisson, data =
ships, effect = "individual", model="within", index = c("ship"))
summary(res)
Do you know where could be the problem? I don't expect the two models to produce the same estimates.

Linear Mixed-Effects Models for a big spatial auto-correlated dataset

So, I am working with a big dataset (55965 points). I am trying to run a LME accounting for correlation. But R will return me this
Error: 'sumLenSq := sum(table(groups)^2)' = 3.13208e+09 is too large.
Too large or no groups in your correlation structure?
I can not subset it since I need all the points. My questions are:
Is there some setting I can change in the function?
If not, is there any other package with similar function that would run such a big dataset?
Here is a reproducible example:
require(nlme)
my.data<- matrix(data = 0, nrow = 55965, ncol = 3)
my.data<- as.data.frame(my.data)
dummy <- rep(1, 55965)
my.data$dummy<- dummy
my.data$V1<- seq(780, 56744)
my.data$V2<- seq(1:55965)
my.data$X<- seq(49.708, 56013.708)
my.data$Y<-seq(-12.74094, -55977.7409)
null.model <- lme(fixed = V1~ V2, data = my.data, random = ~ 1 | dummy, method = "ML")
spatial_model <- update(null.model, correlation = corGaus(1, form = ~ X + Y), method = "ML")
Since you have assigned a grouping factor with only one level, there are no groups in the data, which is what the error message reports. If you just want to account for spatial autocorrelation, with no other random effects, use gls from the same package.
Edit: A further note on 2 different approaches to modelling spatial autocorrelation: The corrGauss (and other corrSpatial type functions) implement spatial correlation models for regression residuals, which is different from, say, a spatial random effect added to the model based on county/district/grid identity.

R equivalent to Stata's xtregar

I'm doing a replication of an estimation done with Stata's xtregar command, but I'm using R instead.
The xtregar command implements the method from Baltagi and Wu (1999) "Unequally spaced panel data regressions with AR(1) disturbances" paper. As Stata describes it:
xtregar fits cross-sectional time-series regression models when the disturbance term is first-order autoregressive. xtregar offers a within estimator for fixed-effects models and a GLS estimator for random-effects models. xtregar can accommodate unbalanced panels whose observations are unequally spaced over time.
So far, for the fixed-effects model, I used the plm package for R. The attempt looks like this:
plm(data=A, y ~ x1 + x2, effect = "twoways", model = "within")
Nevertheless is not complete (comparing to xtregar description) and the results are not quite like the ones Stata provides. Furthermore, Stata's command needs to set a panel variable and a time variable, feature that's (as far as I can tell) absent in the plm environment.
Should I settle with plm or is there another way of doing this?
PS: I searched thoroughly different websites but failed to find a equivalent to Stata's xtregar.
Update
After reading Croissant and Millo (2008) "Panel Data Econometrics in R: The plm Package", specifically seccion 7.4 "Some useful 'econometric' models in nlme" I used something like this for the Random Effects part of the estimation:
gls(data=A, y ~ x1 + x2, correlation = corAR1(0, form = ~ year | pays), na.action = na.exclude)
Nevertheless the following has results closer to those of Stata
lme(data=A, y ~ x1 + x2, random = ~ 1 | pays, correlation = corAR1(0, form = ~ year | pays), na.action = na.exclude)
Try {panelAR}. This is a package for regressions in panel data that addresses AR1 type of autocorrelations.
Unfortunately, I do not own Stata, so I can not test which correlation method to replicate in panelCorrMethod
library(panelAR)
model <-
panelAR(formula = y ~ x1 + x2,
data = A,
panelVar = 'pays',
timeVar = 'year',
autoCorr = 'ar1',
rho.na = TRUE,
bound.rho = TRUE,
panelCorrMethod ='phet' # You might need to change this parameter. 'phet' uses the HW Sandwich stimator for heteroskedasticity cases, but others are available.
)

Converting a mixed model with repeated and random effects and different covariance structures from SAS to R

I have a model, created in SAS by a colleague, with a repeated effect that has an ARH1 (autoregressive heterogeneous variances) covariance structure and a random effect (with a variance components covariance structure) that I am trying to re-create in R, where I have more experience.
The original SAS code is:
PROC MIXED DATA=mylib.sep_cover_data plots=all ALPHA=0.15 CL COVTEST;
CLASS soil_grp dummy_year plot pasture;
MODEL cover = pcp soil_grp / ALPHA=0.15 CL residual SOLUTION;
RANDOM pasture / ALPHA=0.15 CL SOLUTION;
REPEATED dummy_year / subject = plot type = ARH(1);
After looking through other similar questions on here, I'm pretty sure I'm able to re-create the repeated statement in R, including the covariance structure, using the nlme library:
library(nlme)
cover.data <- read.csv("https://drive.google.com/uc?export=donload&id=0Bxdatltmq5ljMVlObXh1NHFGck0", header = TRUE)
cover.data <- cover.data[cover.data$Flag=="data",]
cover.data$soil_grp <- factor(cover.data$soil_grp)
cover.data$dummy_year <- factor(cover.data$dummy_year)
cover.data$pcp <- as.numeric(as.character(cover.data$pcp))
cover.data$cover <- as.numeric(as.character(cover.data$cover))
model.test1 <- gls(cover ~ Pcp + soil_grp, corr = corAR1(, form = ~ 1 | year/plot), weights = varIdent(form = ~ 1 | dummy_year), data = cover.data, na.action = "na.omit")
but I can't figure out how to add in the random effect (with a different covariance structure) into this model.
Additionally, I can put both the repeated and random variables into the same model but don't know how to specify the covariance structures for them.
model.test2 <- lme(cover ~ Pcp_jan_jun + soil, random = list( plot = ~ 1, pasture = ~1), data = cover.data, na.action = "na.omit")
Is is possible to put both a repeated and random effect with different covariance structures into the same model in R? If so, how do I code R to do it?
The data can be downloaded from
https://drive.google.com/uc?export=donload&id=0Bxdatltmq5ljMVlObXh1NHFGck0

Resources