I'm doing a replication of an estimation done with Stata's xtregar command, but I'm using R instead.
The xtregar command implements the method from Baltagi and Wu (1999) "Unequally spaced panel data regressions with AR(1) disturbances" paper. As Stata describes it:
xtregar fits cross-sectional time-series regression models when the disturbance term is first-order autoregressive. xtregar offers a within estimator for fixed-effects models and a GLS estimator for random-effects models. xtregar can accommodate unbalanced panels whose observations are unequally spaced over time.
So far, for the fixed-effects model, I used the plm package for R. The attempt looks like this:
plm(data=A, y ~ x1 + x2, effect = "twoways", model = "within")
Nevertheless is not complete (comparing to xtregar description) and the results are not quite like the ones Stata provides. Furthermore, Stata's command needs to set a panel variable and a time variable, feature that's (as far as I can tell) absent in the plm environment.
Should I settle with plm or is there another way of doing this?
PS: I searched thoroughly different websites but failed to find a equivalent to Stata's xtregar.
Update
After reading Croissant and Millo (2008) "Panel Data Econometrics in R: The plm Package", specifically seccion 7.4 "Some useful 'econometric' models in nlme" I used something like this for the Random Effects part of the estimation:
gls(data=A, y ~ x1 + x2, correlation = corAR1(0, form = ~ year | pays), na.action = na.exclude)
Nevertheless the following has results closer to those of Stata
lme(data=A, y ~ x1 + x2, random = ~ 1 | pays, correlation = corAR1(0, form = ~ year | pays), na.action = na.exclude)
Try {panelAR}. This is a package for regressions in panel data that addresses AR1 type of autocorrelations.
Unfortunately, I do not own Stata, so I can not test which correlation method to replicate in panelCorrMethod
library(panelAR)
model <-
panelAR(formula = y ~ x1 + x2,
data = A,
panelVar = 'pays',
timeVar = 'year',
autoCorr = 'ar1',
rho.na = TRUE,
bound.rho = TRUE,
panelCorrMethod ='phet' # You might need to change this parameter. 'phet' uses the HW Sandwich stimator for heteroskedasticity cases, but others are available.
)
Related
I'm estimating a mixed effects model using simulated data. The basis of this is a conjoint experiment: there are N number of countries in the study with P participants and each respondent is shown the experiment twice. This means that there are NxPx2 observations. Heterogeneity is introduced into the data at the country level and so I run a mixed effect model using lmer with random effects varying by country to account for this variance. However, because each respondent does the experiment twice, I also want to cluster my standard errors at the individual level. My data and model looks something like this:
library(lme4)
data(iris)
# generating IDs for observations
iris <- iris %>% mutate(id = rep(1:(n()/2), each = 2))
#run model
mod <- lmer(Sepal.Length~Sepal.Width+Petal.Length+Petal.Width + (Sepal.Width+Petal.Length+Petal.Width || Species), data=iris, REML = F, control = lmerControl(optimizer = 'bobyqa'))
I then attempt to get clustered SEs using the parameters package:
library(parameters)
param <- model_parameters(
mod,
robust = TRUE,
vcov_estimation = "CR",
vcov_type = "CR1",
vcov_args = list(cluster = iris$id)
)
This returns an error:
Error in vcovCR.lmerMod(obj = new("lmerModLmerTest", vcov_varpar = c(0.00740122363004, : Non-nested random effects detected. clubSandwich methods are not available for such models.
I'm not married to any one method or anything. I just want to return clustered SEs for this type of model specification. As of now I can't find any package that does this. Does anyone know how this can be done, or if such a model even makes sense? I'm new to MLMs but I was thinking if I were to run this as a simple linear model I would lm_robust and cluster by individual so it makes sense to me that I should do the same here as well.
I have data from an Experience Sampling Study, which consists of 8140 observations nested in 106 participants. I want to test if there is a mediation, in which I also want to compare the predictors (X1= socialInteraction_tech, X2= socialInteraction_ftf, M = MPEE_int, Y= wellbeing). X1, X2, and M are person-mean centred in order to obtain the within-person effects. To account for the autocorrelation I have fit a model with an ARMA(2,1) structure. We control for time with the variable "obs".
This is the final model including all variables of interest:
fit_mainH1xmy <- lme(fixed = wellbeing ~ 1 + obs # Controls
+ MPEE_int_centred + socialInteraction_tech_centred + socialInteraction_ftf_centred,
random = ~ 1 + obs | ID, correlation = corARMA(form = ~ obs | ID, p = 2, q = 1),
data = file, method = "ML", na.action=na.exclude)
summary(fit_mainH1xmy)
The mediation is partial, as my predictor X still significantly predicts Y after adding M.
However, I can't find a way to calculate c'(cprime), the indirect effect.
I have found the mlma package, but it looks weird and requires me to do transformations to my data.
I have tried melting the data in a long format and using lmer() to fit the model (following https://quantdev.ssri.psu.edu/sites/qdev/files/ILD_Ch07_2017_Within-PersonMedationWithMLM.html), but lmer() does not let me take into account the moving average (MA-part of the ARMA(2,1) structure).
Does anyone know how I could now obtain the indirect effect?
I want to estimate a fixed effects model while using panel-corrected standard errors as well as Prais-Winsten (AR1) transformation in order to solve panel heteroscedasticity, contemporaneous spatial correlation and autocorrelation.
I have time-series cross-section data and want to perform regression analysis. I was able to estimate a fixed effects model, panel corrected standard errors and Prais-winsten estimates individually. And I was able to include panel corrected standard errors in a fixed effects model. But I want them all at once.
# Basic ols model
ols1 <- lm(y ~ x1 + x2, data = data)
summary(ols1)
# Fixed effects model
library('plm')
plm1 <- plm(y ~ x1 + x2, data = data, model = 'within')
summary(plm1)
# Panel Corrected Standard Errors
library(pcse)
lm.pcse1 <- pcse(ols1, groupN = Country, groupT = Time)
summary(lm.pcse1)
# Prais-Winsten estimates
library(prais)
prais1 <- prais_winsten(y ~ x1 + x2, data = data)
summary(prais1)
# Combination of Fixed effects and Panel Corrected Standard Errors
ols.fe <- lm(y ~ x1 + x2 + factor(Country) - 1, data = data)
pcse.fe <- pcse(ols.fe, groupN = Country, groupT = Time)
summary(pcse.fe)
In the Stata command: xtpcse it is possible to include both panel corrected standard errors and Prais-Winsten corrected estimates, with something allong the following code:
xtpcse y x x x i.cc, c(ar1)
I would like to achieve this in R as well.
I am not sure that my answer will completely address your concern, these days I've been trying to deal with the same problem that you mention.
In my case, I ran the Prais-Winsten function from the package prais where I included my model with the fixed effects. Afterwards, I correct for heteroskedasticity using the function vcovHC.prais which is analogous to vcovHC function from the package sandwich.
This basically will give you White's/sandwich heteroskedasticity-consistent covariance matrix which, if you later fit into the function coeftest from the package lmtest, it will give you the table output with the corrected standard errors. Taking your posted example, see below the code that I have used:
# Prais-Winsten estimates with Fixed Effects
library(prais)
prais.fe <- prais_winsten(y ~ x1 + x2 + factor(Country), data = data)
library(lmtest)
prais.fe.w <- coeftest(prais.fe, vcov = vcovHC.prais(prais.fe, "HC1")
h.m1 # run the object to see the output with the corrected standard errors.
Alas, I am aware that the sandwhich heteroskedasticity-consistent standard errors are not exactly the same as the Beck and Katz's PCSEs because PCSE deals with panel heteroskedasticity while sandwhich SEs addresses overall heteroskedasticity. I am not totally sure in how much these two differ in practice, but something is something.
I hope my answer was somehow helpful, this is actually my very first answer :D
I have a problem with the estimation of a panel data model in R.
I want to estimate the effect of a change in the Real GDP and the relative price level in the respective country on the contribution of the tourism sector.
If I use the command
Y <- cbind(ln_Differences_Contribution)
X <- cbind(ln_price_differences, Differences_ln_gdp)
and then
fixed <- plm(Y~X, data=pdata, model = "within")
I do not have an effect for the different years.
Is there anyway I can add a time variable?
If you want to control for the effect of the time period ("years"), you can use the effect argument to additionally specify time effects in a two-way within model:
plm(y ~ x1 + x2, data = pdata, model = "within", effect = "twoways")
Another way would be to explicitly include the time variable in the formula as a factor and specifying a one-way individual model as in your question, assuming your time variable is called year:
plm(y ~ x1 + x2 + factor(year), data = pdata, model = "within")
The output would be a bit "polluted" by the explicit estimates for the years (usually, one is not interested in those).
This question regards how to code variable selection in a probit model with marginal effects (either directly or by calling some pre-existing package).
I'm conducting a little probit regression of the effects of free and commercial availability of films on the level of piracy of those films as a TLAPD-related blog post.
The easy way of running a probit in R is typically through glm, i.e.:
probit <- glm(y ~ x1 + x2, data=data, family =binomial(link = "probit"))
but that's problematic for interpretation because it doesn't supply marginal effects.
Typically, if I want marginal effects from a probit regression I define this function (I don't recall the original source, but it's a popular function that gets re-posted a lot):
mfxboot <- function(modform,dist,data,boot=500,digits=3){
x <- glm(modform, family=binomial(link=dist),data)
# get marginal effects
pdf <- ifelse(dist=="probit",
mean(dnorm(predict(x, type = "link"))),
mean(dlogis(predict(x, type = "link"))))
marginal.effects <- pdf*coef(x)
# start bootstrap
bootvals <- matrix(rep(NA,boot*length(coef(x))), nrow=boot)
set.seed(1111)
for(i in 1:boot){
samp1 <- data[sample(1:dim(data)[1],replace=T,dim(data)[1]),]
x1 <- glm(modform, family=binomial(link=dist),samp1)
pdf1 <- ifelse(dist=="probit",
mean(dnorm(predict(x, type = "link"))),
mean(dlogis(predict(x, type = "link"))))
bootvals[i,] <- pdf1*coef(x1)
}
res <- cbind(marginal.effects,apply(bootvals,2,sd),marginal.effects/apply(bootvals,2,sd))
if(names(x$coefficients[1])=="(Intercept)"){
res1 <- res[2:nrow(res),]
res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep=""),res1)),nrow=dim(res1)[1])
rownames(res2) <- rownames(res1)
} else {
res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep="")),nrow=dim(res)[1]))
rownames(res2) <- rownames(res)
}
colnames(res2) <- c("marginal.effect","standard.error","z.ratio")
return(res2)
}
Then run the regression like this:
mfxboot(modform = "y ~ x1 + x2",
dist = "probit",
data = piracy)
but using that approach I don't know that I can run any variable selection algorithms like forward, backward, stepwise, etc.
What's the best way to solve this problem? Is there a better way of running probits in R that reports marginal effects and also allows for automated model selection? Or should I focus on using mfxboot and doing variable selection with that function?
Thanks!
It is not clear why there is a problem. Model (variable) selection and computing of the marginal effects for a given model are sequential, and there is no reason to try to combine the two.
Here is how you might go about computing marginal effects and their bootstrapped standard effects post model (variable) selection:
Perform variable selection using your preferred model selection procedure (including bootstrap model selection techniques as discussed below, not to be confused with the bootstrap you will use to compute the standard errors of the marginal effects for the final model).
Here is an example on the dataset supplied in this question. Note also that this is in no way an endorsement of the use of stepwise variable selection methods.
#================================================
# read in data, and perform variable selection for
# a probit model
#================================================
dfE = read.csv("ENAE_Probit.csv")
formE = emploi ~ genre +
filiere + satisfaction + competence + anglais
glmE = glm(formula = formE,
family = binomial(link = "probit"),
data = dfE)
# perform model (variable) selection
glmStepE = step(object = glmE)
Now pass the selected model to a function that computes the marginal effects.
#================================================
# function: compute marginal effects for logit and probit models
# NOTE: this assumes that an intercept has been included by default
#================================================
fnMargEffBin = function(objBinGLM) {
stopifnot(objBinGLM$family$family == "binomial")
vMargEff = switch(objBinGLM$family$link,
probit = colMeans(outer(dnorm(predict(objBinGLM,
type = "link")),
coef(objBinGLM))[, -1]),
logit = colMeans(outer(dlogis(predict(objBinGLM,
type = "link")),
coef(objBinGLM))[, -1])
)
return(vMargEff)
}
# test the function
fnMargEffBin(glmStepE)
Here is the output:
> fnMargEffBin(glmStepE)
genre filiere
0.06951617 0.04571239
To get at the standard errors of the marginal effects, you could bootstrap the marginal effects, using, for example, the Boot function from the car function since it provides such a neat interface to bootstrap derived statistics from glm estimates.
#================================================
# compute bootstrap std. err. for the marginal effects
#================================================
margEffBootE = Boot(object = glmStepE, f = fnMargEffBin,
labels = names(coef(glmE))[-1], R = 100)
summary(margEffBootE)
Here is the output:
> summary(margEffBootE)
R original bootBias bootSE bootMed
genre 100 0.069516 0.0049706 0.045032 0.065125
filiere 100 0.045712 0.0013197 0.011714 0.048900
Appendix:
As a matter of theoretical interest, there are two ways to interpret your bootstrapped variable selection ask.
You can perform model selection (variable selection) by using as a measure of fit a bootstrap model fit criteria. The theory for this is outlined in Shao (1996), and requires a subsampling approach.
You then compute marginal effects and their bootstrap standard errors conditional on the best model selected above.
You can perform variable selection on multiple bootstrap samples, and arrive at either one best model by looking at the variables retained across the multiple bootstrap model selections, or use a model averaging estimator. The theory for this is discussed in Austin and Tu (2004).
You then compute marginal effects and their bootstrap standard errors conditional on the best model selected above.