Panel data in R plm package - r

I have a problem with the estimation of a panel data model in R.
I want to estimate the effect of a change in the Real GDP and the relative price level in the respective country on the contribution of the tourism sector.
If I use the command
Y <- cbind(ln_Differences_Contribution)
X <- cbind(ln_price_differences, Differences_ln_gdp)
and then
fixed <- plm(Y~X, data=pdata, model = "within")
I do not have an effect for the different years.
Is there anyway I can add a time variable?

If you want to control for the effect of the time period ("years"), you can use the effect argument to additionally specify time effects in a two-way within model:
plm(y ~ x1 + x2, data = pdata, model = "within", effect = "twoways")
Another way would be to explicitly include the time variable in the formula as a factor and specifying a one-way individual model as in your question, assuming your time variable is called year:
plm(y ~ x1 + x2 + factor(year), data = pdata, model = "within")
The output would be a bit "polluted" by the explicit estimates for the years (usually, one is not interested in those).

Related

How to add coeficient to linear model

I want to manually set a coefficient for a variable that is not input in my linear model so that I can perform a spatial prediction.
I'm going to try to expose my question in the most simple and clear way.
What I have:
a raster stack with 4 binary variables for soil cover: agro, open,
tran and urb
a linear model lm(formula = no2 ~ open + tran + urb, data = df)
The reason why I only used 3 of the variables in my linear ways to prevent multicollinearity in the model, because they are proportions of land coverage that add up to 100%.
So, my goal is to add a coefficient to my model for the agro variable, so that all of the 4 variables are used correctly in raster::predict()
You can use the offset term in the formula and include the desired coefficient and variable therein:
lm(formula = no2 ~ open + tran + urb + offset(agro*400), data = df)
So this is regressing formula on open, tran and urb plus the fixed term agro * 400. For more than one given coefficient, add the appropriate additional offset() terms.
You can avoid the collinearity by leaving the intercept out of your model. Use
lm(formula = no2 ~ open + tran + urb + agro - 1, data = df)
and you'll be able to estimate coefficients for all of the predictors (but no intercept term).

Calculate indirect effect of 1-1-1 (within-person, multilevel) mediation analyses

I have data from an Experience Sampling Study, which consists of 8140 observations nested in 106 participants. I want to test if there is a mediation, in which I also want to compare the predictors (X1= socialInteraction_tech, X2= socialInteraction_ftf, M = MPEE_int, Y= wellbeing). X1, X2, and M are person-mean centred in order to obtain the within-person effects. To account for the autocorrelation I have fit a model with an ARMA(2,1) structure. We control for time with the variable "obs".
This is the final model including all variables of interest:
fit_mainH1xmy <- lme(fixed = wellbeing ~ 1 + obs # Controls
+ MPEE_int_centred + socialInteraction_tech_centred + socialInteraction_ftf_centred,
random = ~ 1 + obs | ID, correlation = corARMA(form = ~ obs | ID, p = 2, q = 1),
data = file, method = "ML", na.action=na.exclude)
summary(fit_mainH1xmy)
The mediation is partial, as my predictor X still significantly predicts Y after adding M.
However, I can't find a way to calculate c'(cprime), the indirect effect.
I have found the mlma package, but it looks weird and requires me to do transformations to my data.
I have tried melting the data in a long format and using lmer() to fit the model (following https://quantdev.ssri.psu.edu/sites/qdev/files/ILD_Ch07_2017_Within-PersonMedationWithMLM.html), but lmer() does not let me take into account the moving average (MA-part of the ARMA(2,1) structure).
Does anyone know how I could now obtain the indirect effect?

Why I can't include country dummies in my fixed effects model?

First of all I build the following dataframe (country_Id as factor variable and year as numeric):
mydata = pdata.frame(mydata, index = c("country_Id","year"),row.names = TRUE)
Then I check it with:
index(mydata)
pdim(mydata)
is.pconsecutive(mydata)
class(mydata)
Everything seems to be fine but I want to include country-dummies in the fixed-effects model it does not work
femodel_1 <- plm(y~x + ldvx + factor(country_Id) , data= mydata, model = "within")
And another problem is that my random model shows that the individual variance is 0
remodel_1 <- plm(y~x + ldvx , data= mydata, model = "random")
Unfortunately I can not find the problem.
Your one-way fixed effect model already takes care of the country fixed effects. This is why you cannot add them to the model's formula again - it would not make sense- they just disappear. So you are fine with:
femodel_1 <- plm(y~x + ldvx, data= mydata, model = "within")
If you want to values of the country fixed effects, use fixef(fe_model1).
About your random effect model:
The Swamy-Arora RE estimator does not guarantee positive variance estimates (read up on this in a good econometrics text book or look here: https://stats.stackexchange.com/questions/176827/error-in-plm-random-effects-swamy-arora-swar-estimator-with-lagged-dependent/181444#181444). You can try to change your model (if it makes sense) and/or change your data a bit (e.g., more observations). Also, you can try to switch to a different RE estimator - plm offers a few.

R equivalent to Stata's xtregar

I'm doing a replication of an estimation done with Stata's xtregar command, but I'm using R instead.
The xtregar command implements the method from Baltagi and Wu (1999) "Unequally spaced panel data regressions with AR(1) disturbances" paper. As Stata describes it:
xtregar fits cross-sectional time-series regression models when the disturbance term is first-order autoregressive. xtregar offers a within estimator for fixed-effects models and a GLS estimator for random-effects models. xtregar can accommodate unbalanced panels whose observations are unequally spaced over time.
So far, for the fixed-effects model, I used the plm package for R. The attempt looks like this:
plm(data=A, y ~ x1 + x2, effect = "twoways", model = "within")
Nevertheless is not complete (comparing to xtregar description) and the results are not quite like the ones Stata provides. Furthermore, Stata's command needs to set a panel variable and a time variable, feature that's (as far as I can tell) absent in the plm environment.
Should I settle with plm or is there another way of doing this?
PS: I searched thoroughly different websites but failed to find a equivalent to Stata's xtregar.
Update
After reading Croissant and Millo (2008) "Panel Data Econometrics in R: The plm Package", specifically seccion 7.4 "Some useful 'econometric' models in nlme" I used something like this for the Random Effects part of the estimation:
gls(data=A, y ~ x1 + x2, correlation = corAR1(0, form = ~ year | pays), na.action = na.exclude)
Nevertheless the following has results closer to those of Stata
lme(data=A, y ~ x1 + x2, random = ~ 1 | pays, correlation = corAR1(0, form = ~ year | pays), na.action = na.exclude)
Try {panelAR}. This is a package for regressions in panel data that addresses AR1 type of autocorrelations.
Unfortunately, I do not own Stata, so I can not test which correlation method to replicate in panelCorrMethod
library(panelAR)
model <-
panelAR(formula = y ~ x1 + x2,
data = A,
panelVar = 'pays',
timeVar = 'year',
autoCorr = 'ar1',
rho.na = TRUE,
bound.rho = TRUE,
panelCorrMethod ='phet' # You might need to change this parameter. 'phet' uses the HW Sandwich stimator for heteroskedasticity cases, but others are available.
)

How to include a year fixed effect (in a year-quarter panel data) in R using plm function?

Thank you all in advance for your help. My question is essentially a "bump" of the following question: R: plm -- year fixed effects -- year and quarter data.
Basically, I was wondering if there is anyway using the plm function in R to include a fixed effect that is not at the same level as the data. For example, suppose you have the following data
library(plm)
id <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)
year <- c(1999,1999,1999,1999,2000,2000,2000,2000,1999,1999,1999,1999,2000,2000,2000,2000)
qtr <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
y <- rnorm(16, mean=0, sd=1)
x <- rnorm(16, mean=0, sd=1)
data <- data.frame(id=id,year=year,qtr=qtr,y_q=paste(year,qtr,sep="_"),y=y,x=x)
This is a panel data set, with the cross sectional unit marked as "id" and the time unit at the year-quarter level. However, I only want to actually include a fixed effect for year, I do not want to include a fixed effect for year-quarter. However, if you try running this regression,
reg1 <- plm(y ~ x, data=data,index=c("id", "year"), model="within",effect="time")
I get the following error:
duplicate couples (time-id) Error in pdim.default(index[[1]],
index[[2]]) :
Now, to add to the post I previously linked, if you are using a fixed effects model, one way to get around this is to manually put in the fixed effects as a vector of dummy variables, and just use pooled cross section regression. For example,
reg1 <- plm(y ~ x + factor(id) + factor(year), data=data,index=c("id", "year"), model="pooling",effect="time")
If that works for you, then great! However, this solution does not work for me because I definitely need to use the plm function. The reason why is because I actually want to put in a year random effect, and I'm not sure how to do that "manually". Is there a work around for this using the plm function?
Thanks!
Vincent
You will need to make the combination of year and quarter the time dimension of your data set, i.e., use y_q as the second index variable.
This model:
reg_q <- plm(y ~ x, data=data, index=c("id", "y_q"), model="within", effect="time")
will take care of quartly effects only.
This model:
reg_ind_year <- plm(y ~ x + factor(year), data=data, index=c("id", "y_q"), model="within", effect="individual")
will take care of individual effects and yearly effects (note the inclusion of factor(year) in the model). It does no take quarterly effects into account.

Resources