How to estimate a Spatial Autoregressive model in R? - r

I am trying to estimate some spatial models in R using the data from a paper on spatial econometric models using cross-section time series data by Franzese & Hays (2007).
I focus on their results given in table 4 (see below).
Using lm I am able to replicate their results for the OLS, S-OLS, and S-2SLS models.
However, in trying to estimate the S-ML (Spatial Maximum Likelihood) model I run into trouble.
If I use a GLM model there are some minor differences for some of the explanatory variables but there is quite a large margin with regard to the estimated coefficient for the spatial lag (output shown below).
I'm not entirely sure about why GLM is not the right estimation method in this case.
Using GLS I get results similar to GLM (possibly related).
require(MASS)
m4<-glm(lnlmtue~lnlmtue_1+SpatLag+DENSITY+DEIND+lngdp_pc+UR+TRADE+FDI+LLVOTE+LEFTC+TCDEMC+GOVCON+OLDAGE+factor(cc)+factor(year),family=gaussian,data=fh)
summary(m4)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.199091355 3.924227850 1.835 0.068684 .
lnlmtue_1 0.435487985 0.080844033 5.387 0.000000293 ***
SpatLag -0.437680018 0.101078950 -4.330 0.000028105 ***
DENSITY 0.007633016 0.010268468 0.743 0.458510
DEIND 0.040270153 0.032304496 1.247 0.214618
I tried using the splm package but this leads to even larger consistencies (output shown below).
Moreover, I'm not able to include fixed effects in the model.
require(splm)
m4a<-spml(lnlmtue~lnlmtue_1+DENSITY+DEIND+lngdp_pc+UR+TRADE+FDI+LLVOTE+LEFTC+ TCDEMC+GOVCON+OLDAGE,data=fh,index=c("cc","year"),listw=mat2listw(wmat),
model="pooling",spatial.error="none",lag=T)
summary(m4a)
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 1.79439070 0.78042284 2.2993 0.02149 *
lnlmtue_1 0.75795987 0.04828145 15.6988 < 2e-16 ***
DENSITY -0.00026038 0.00203002 -0.1283 0.89794
DEIND -0.00489516 0.01414457 -0.3461 0.72928
So basically my question really is how does one properly estimate a SAR model with cross-section time-series data in R?
R-code
Replication data
Adjacency matrix

Is it critical that you use R?
I suggest that you examine the features of Geoda, a free spatial analysis package available from Arizona State University.
Though I have only used it to run basic spatial OLS (not 2SLS), I was pleased with Geoda's flexibility and visualization tools. I encourage you to skim the documentation and consider downloading the latest release.
If you must use R, I suggest exploring the GeoXp package (http://cran.r-project.org/web/packages/GeoXp/index.html).

Related

R formula for average partial effect of linear regression with interaction terms

What is the most efficient way to obtain the average partial effect for a variable in a multiple linear regression model that has interaction terms?
I can do this by manually finding the mean of each interaction variable and subtracting that value in a new regression, but there must be a better way.
This is the model.
install.packages('wooldridge')
data(catholic,package='wooldridge')
model<-lm(math12~cathhs+cathhs*lfaminc+cathhs*motheduc+cathhs*fatheduc,data=data)
Is there a way to get the average partial effect for the variable "cathhs" without manually subtracting the mean from each interaction term in a new regression model?
You can try the marginaleffects package (Disclaimer: I am the author.) This package allows you to compute many different quantities of interest, including what the documentation calls “Average Marginal Effects” (or average slopes), which sounds like what you may be looking for (the terminology in this area is super inconsistent):
library(marginaleffects)
library(wooldridge)
data(catholic, package='wooldridge')
model<-lm(math12~cathhs+cathhs*lfaminc+cathhs*motheduc+cathhs*fatheduc,data=catholic)
mfx <- marginaleffects(model)
summary(mfx)
Term Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
1 cathhs 1.7850 0.46538 3.836 0.00012524 0.8729 2.6972
2 lfaminc 1.8461 0.14268 12.939 < 2.22e-16 1.5665 2.1257
3 motheduc 0.7125 0.06216 11.463 < 2.22e-16 0.5906 0.8343
4 fatheduc 0.8928 0.05618 15.891 < 2.22e-16 0.7827 1.0029
Model type: lm
Prediction type: response

Model averaging with MuMIn: interpretation of coefficient-names in results

I am doing model averaging with MuMIn and trying to interpret the results.
Everything works fine, but I am wondering about the names of my coefficients in the results:
Model-averaged coefficients:
(full average)
Estimate Std. Error Adjusted SE
cond((Int)) 0.9552775 0.0967964 0.0969705
cond(Distanzpunkt) -0.0001217 0.0001451 0.0001453
cond(area_km2) 0.0022712 0.0030379 0.0030422
cond(prop) 0.0487036 0.1058994 0.1060808
Does someone know, what "cond()" tells me and why it appears in the model output?
Within the models, the coefficients are named "Distanzpunkt", "area_km2" and "prop".
Were you fitting a zero inflation model with glmmTMB? If so, then cond() is referring to the terms in the conditional model, rather than the zero-inflation model.

Logit regression : glmer vs bife

I am working on a panel dataset and trying to run a logit regression with fixed effects.
I found that glmer models from the lme4 package and the bife package are suited for this kind of work.
However when I run a regression with each model I do not have the same results (estimates, standard errors, etc.)
Here is the code and results for the glmer model with an intercept:
glmer_1 <- glmer(CVC_dummy~at_log + (1|year), data=own, family=binomial(link="logit"))
summary(glmer_1)
Estimate Std. Error zvalue Pr(>|z|)
(Intercept) -6.43327 0.09635 -66.77 <2e-16 ***
at_log 0.46335 0.01101 42.09 <2e-16 ***
Without an intercept:
glmer_2 <- glmer(CVC_dummy~at_log + (1|year)-1, data=own, family=binomial(link="logit"))
summary(glmer_2)
Estimate Std.Error z value Pr(>|z|)
at_log 0.46554 0.01099 42.36 <2e-16 ***
And with the bife package:
bife_1 <- bife(CVC_dummy~at_log | year, data=own, model="logit")
summary(bife_1)
Estimate Std. error t-value Pr(> t)
at_log 0.4679 0.0110 42.54 <2e-16 ***
Why are estimated coefficients of at_log different between the two packages?
Which package should I use ?
There is quite a confusion about the terms fixed effects and random effects. From your first sentence, I guess that you intend to calculate a fixed-effects model.
However, while bife calculates fixed-effects models, glmer calculates random-effects models/mixed-effects models.
Both often get confused because random-effects models differ between fixed effects (your usual coefficients, the independent variables you are interested in) and random effects (the variances/std. dev. of your random intercepts and/or random slopes).
On the other hand, fixed-effects models are called that way because they cancel out individual differences by including a dummy variable (-1) for each group, hence by including a fixed effect for each group.
However, not all fixed-effects models work by including indicator-variables: Bife works with pseudo demeaning - yet, the results are the same and it is still called a fixed-effects model.

Autocorrelation in Panel Data

I employed a random-effects plm model (from package plm) to estimate my coefficients and I used the vcovHC in order to correct for heteroskedasticity.
However, how can I correct for autocorrelation as well? Should I use vcovNW or vcovSCC? because I tried with the vcovHC Arellano but, as it would be expected, I obtain the same results as by using only vcovHC.
How can I take the residuals autocorrelation into account in such a model?

predict() functionality usage in rms package

I have created a regression model using ols() from rms package
data_Trans <- ols(Check ~ rcs(data_XVar,6))
Since this is built using restricted cubic spline with 6 knots I get 5 coefficients with one intercept.
Now I could not understand how to apply this model over new sets of coefficient values. Any example to perform this would be really helpful.Further, I am not sure whether we have specify any knot positions or the model saves the previous knot positions saved while building the model.

Resources