I'm trying to understand why R packages "plm" and "fixest" give me different standard errors when I'm estimating a panel model using heteroscedasticity-robust standard errors ("HC1") and state fixed effects.
Does anyone have a hint for me?
Here is the code:
library(AER) # For the Fatality Dataset
library(plm) # PLM
library(fixest) # Fixest
library(tidyverse) # Data Management
data("Fatalities")
# Create new variable : fatality rate
Fatalities <- Fatalities %>%
mutate(fatality_rate = (fatal/pop)*10000)
# Estimate Fixed Effects model using the plm package
plm_reg <- plm(fatality_rate ~ beertax,
data = Fatalities,
index = c("state", "year"),
effect = "individual")
# Print Table with adjusted standard errors
coeftest(plm_reg, vcov. = vcovHC, type = "HC1")
# Output
>t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
beertax -0.65587 0.28880 -2.271 0.02388 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Estimate the very same model using the fixest package
# fixest is much faster and user friendly (in my opinion)
fixest_reg <- feols(fatality_rate ~ beertax | state ,
data = Fatalities,
vcov = "HC1",
panel.id = ~ state + year)
# print table
etable(fixest_reg)
#output
> fixest_reg
Dependent Var.: fatality_rate
beertax -0.6559** (0.2033)
Fixed-Effects: ------------------
state Yes
_______________ __________________
S.E. type Heteroskedas.-rob.
Observations 336
R2 0.90501
Within R2 0.04075
In this example, the standard error is larger when using plm compared to the fixest results (the same is true if state+year fixed effects are used). Does anyone know the reason for this to happen?
Actually the VCOVs are different.
In plm vcovHC defaults to Arellano (1987) which also takes into account serial correlation. See documentation here.
If you add the argument method = "white1", you end up with the same type of VCOV.
Finally, you also need to change how the fixed-effects are accounted for in fixest to obtain the same standard-errors (see details on small sample correction here).
Here are the results:
# Requesting "White" VCOV
coeftest(plm_reg, vcov. = vcovHC, type = "HC1", method = "white1")
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> beertax -0.65587 0.18815 -3.4858 0.0005673 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Changing the small sample correction in fixest (discarding the fixed-effects)
etable(fixest_reg, vcov = list("hc1", hc1 ~ ssc(fixef.K = "none")), fitstat = NA)
#> fixest_reg fixest_reg
#> Dependent Var.: fatality_rate fatality_rate
#>
#> beertax -0.6559** (0.2033) -0.6559*** (0.1882)
#> Fixed-Effects: ------------------ -------------------
#> state Yes Yes
#> _______________ __________________ ___________________
#> S.E. type Heteroskedas.-rob. Heteroskedast.-rob.
# Final comparison
rbind(se(vcovHC(plm_reg, type = "HC1", method = "white1")),
se(fixest_reg, hc1 ~ ssc(fixef.K = "none")))
#> beertax
#> [1,] 0.1881536
#> [2,] 0.1881536
Related
I am trying to use "linearHypothesis" function from "car" package to test coefficients of a model estimated with "ols" from "rms" package. The function works with "lrm" objects but not with "ols" objects. Have you got any alternatives? I know that using "lm" would sort the issue but I want to use "ols" since it is easier getting clustered standard errors there.
You can use glht from the multcomp package.
library(rms)
library(multcomp)
d <- datadist(swiss); options(datadist="d")
fit <- ols(Fertility ~ ., data = swiss)
summary(fit)
test <- glht(fit, linfct = "Agriculture = 0")
summary(test)
# Fit: ols(formula = Fertility ~ ., data = swiss, x = TRUE)
#
# Linear Hypotheses:
# Estimate Std. Error z value Pr(>|z|)
# Agriculture == 0 -0.1721 0.0703 -2.448 0.0144 *
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I’m using the multcomp package to generate contrasts for a geeglm (binomial(link="logit") ) model in R. I am running the geeglm model running the following script.
Library(geepack)
u1<-geeglm(outcome~ px_race_jama,id=npi_gp, family=binomial(link="logit"),data=mf)
Summary(u1)
Call:
geeglm(formula = outcome ~ px_race_jama, family = binomial(link = "logit"),
data = mf, id = npi_gp)
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) -0.4671 0.1541 9.19 0.0024 **
px_race_jama1 0.0959 0.1155 0.69 0.4067
px_race_jama2 -0.0293 0.1503 0.04 0.8453
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 1 0.0506
Correlation: Structure = independenceNumber of clusters: 83 Maximum cluster size: 792
To get the contrasts for the model I run the script
Library(multcomp)
glht(u1,mcp(px_race_jama="Tukey"))
I receive the error:
Error in match.arg(type) :
'arg' should be one of “pearson”, “working”, “response”
Error in modelparm.default(model, ...) :
no ‘vcov’ method for ‘model’ found!
Alternatively, I have tried creating a contrast matrix:
contrast.matrix <- rbind(
`Other-Black` = c(0, -1, 1))
comps <- glht(u1, contrast.matrix)
summary(comps)
However, I receive the same error. Any help on how to correctly generate the contrasts would be greatly appreciated.
Respectfully,
Jdukes
Something like this?
contrast.matrix <- matrix(c(0,-1,1,
0,1,-1),nrow=2,byrow=TRUE)
contrasts_geeglm <- function(fit,model_matrix,vcov_type = "robust"){
vcov_gee = if(vcov_type =="robust"){
fit$geese$vbeta}else{fit$geese$vbeta.naiv}
contrast_est = coef(fit)%*%t(model_matrix)
contrast_se = sqrt(model_matrix%*%vcov_gee%*% t(model_matrix))
output = data.frame(Estimate = contrast_est[1,],
SE = diag(contrast_se)) %>%
mutate(LCI = Estimate - 1.96*SE,
UCI = Estimate + 1.96*SE)
return(output)
}
contrasts_geeglm(u1,contrast.matrix,vcov_type="robust")
I want to see whether the fixed effect Group2 in my model is significant. The model is:
Response ~ Group1 + Group2 + Gender + Age + BMI + (1 | Subject)
To check the significance I create a null model not containing the effect Group2:
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
and the full model containing the effect Group2:
Resp.model = lmer(Response~Group1+Group2+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
Then I use anova() to compare the two, but I get an error:
anova(Resp.null, Resp.model)
## Error in anova.merMod(Resp.null, Resp.model) :
## models were not all fitted to the same size of dataset
I think that the problem is that Group1 contains NaN, but I thought that linear mixed models were robust to missing data.
How can I solve this problem and compare the two models?
Do I have to delete the rows corresponding to NaN and fit Resp.null without these rows?
The data can be downloaded here.
Please note that you should replace "<undefined>" with NaN like this:
mydata = read.csv("mydata.csv")
mydata[mydata == "<undefined>"] <- NA
To avoid the "models were not all fitted to the same size of dataset" error in anova, you must fit both models on the exact same subset of data.
There are two simple ways to do this, and while this reproducible example uses lm and update, for lmer objects the same approach should work:
# 1st approach
# define a convenience wrapper
update_nested <- function(object, formula., ..., evaluate = TRUE){
update(object = object, formula. = formula., data = object$model, ..., evaluate = evaluate)
}
# prepare data with NAs
data(mtcars)
for(i in 1:ncol(mtcars)) mtcars[i,i] <- NA
xa <- lm(mpg~cyl+disp, mtcars)
xb <- update_nested(xa, .~.-cyl)
anova(xa, xb)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 2nd approach
xc <- update(xa, .~.-cyl, data=na.omit(mtcars[ , all.vars(formula(xa))]))
anova(xa, xc)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If however you're only interested in testing a single variable (e.g. Group2), then perhaps the Anova() or linearHypothesis() in car would work as well for this usecase.
See also:
How to update `lm` or `glm` model on same subset of data?
R error which says "Models were not all fitted to the same size of dataset"
Fit Resp.model first, then use Resp.model#frame as data argument.
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=Resp.model#frame,REML=FALSE)
I would like to get pairwise comparisons of adjusted means using lsmeans(), while supplying a robust coefficient-covariance matrix (e.g. vcovHC). Usually functions on regression models provide a vcov argument, but I can't seem to find any such argument in the lsmeans package.
Consider this dummy example, originally from CAR:
require(car)
require(lmtest)
require(sandwich)
require(lsmeans)
mod.moore.2 <- lm(conformity ~ fcategory + partner.status, data=Moore)
coeftest(mod.moore.2)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.197778 1.372669 7.4292 4.111e-09 ***
## fcategorymedium -1.176000 1.902026 -0.6183 0.539805
## fcategoryhigh -0.080889 1.809187 -0.0447 0.964555
## partner.statushigh 4.606667 1.556460 2.9597 0.005098 **
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
coeftest(mod.moore.2, vcov.=vcovHAC)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.197778 0.980425 10.4014 4.565e-13 ***
## fcategorymedium -1.176000 1.574682 -0.7468 0.459435
## fcategoryhigh -0.080889 2.146102 -0.0377 0.970117
## partner.statushigh 4.606667 1.437955 3.2036 0.002626 **
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
lsmeans(mod.moore.2, list(pairwise ~ fcategory), adjust="none")[[2]]
## contrast estimate SE df t.ratio p.value
## low - medium 1.17600000 1.902026 41 0.618 0.5398
## low - high 0.08088889 1.809187 41 0.045 0.9646
## medium - high -1.09511111 1.844549 41 -0.594 0.5560
##
## Results are averaged over the levels of: partner.status
As you can see, lsmeans() estimates p-values using the default variance-covariance matrix.
How can I obtain pairwise contrasts using the vcovHAC variance estimate?
It turns out that there is a wonderful and seamless interface between lsmeans and multcomp packages (see ?lsm), whereas lsmeans provides support for glht().
require(multcomp)
x <- glht(mod.moore.2, lsm(pairwise ~ fcategory), vcov=vcovHAC)
## Note: df set to 41
summary(x, test=adjusted("none"))
##
## Simultaneous Tests for General Linear Hypotheses
##
## Fit: lm(formula = conformity ~ fcategory + partner.status, data = Moore)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## low - medium == 0 1.17600 1.57468 0.747 0.459
## low - high == 0 0.08089 2.14610 0.038 0.970
## medium - high == 0 -1.09511 1.86197 -0.588 0.560
## (Adjusted p values reported -- none method)
This is at least one way to achieve this. I'm still hoping someone knows of an approach using lsmeans only...
Another way to approach this is to hack into the lsmeans object, and manually replace the variance-covariance matrix prior to summary-ing the object.
mod.lsm <- lsmeans(mod.moore.2, ~ fcategory)
mod.lsm#V <- vcovHAC(mod.moore.2) ##replace default vcov with custom vcov
pairs(mod.lsm, adjust = "none")
## contrast estimate SE df t.ratio p.value
## low - medium 1.17600000 1.574682 41 0.747 0.4594
## low - high 0.08088889 2.146102 41 0.038 0.9701
## medium - high -1.09511111 1.861969 41 -0.588 0.5597
##
## Results are averaged over the levels of: partner.status
I'm not sure if this was possible using the 'lsmeans' package but it is using the updated emmeans package.
Moore <- within(carData::Moore, {
partner.status <- factor(partner.status, c("low", "high"))
fcategory <- factor(fcategory, c("low", "medium", "high"))
})
mod.moore.2 <- lm(conformity ~ fcategory + partner.status, data=Moore)
lmtest::coeftest(mod.moore.2, vcov.= sandwich::vcovHAC)
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 10.197778 0.980425 10.4014 4.565e-13 ***
#> fcategorymedium -1.176000 1.574682 -0.7468 0.459435
#> fcategoryhigh -0.080889 2.146102 -0.0377 0.970117
#> partner.statushigh 4.606667 1.437955 3.2036 0.002626 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
emmeans::emmeans(
mod.moore.2, trt.vs.ctrl ~ fcategory,
vcov = sandwich::vcovHAC(mod.moore.2),
adjust = "none")$contrasts
#> contrast estimate SE df t.ratio p.value
#> medium - low -1.1760 1.57 41 -0.747 0.4594
#> high - low -0.0809 2.15 41 -0.038 0.9701
#>
#> Results are averaged over the levels of: partner.status
Created on 2021-07-08 by the reprex package (v0.3.0)
Note, you can't just write the following
emmeans::emmeans(
mod.moore.2, trt.vs.ctrl ~ fcategory,
vcov = sandwich::vcovHAC,
adjust = "none")$contrasts
due to conflict with the sandwich::vcovHAC command which also has an adjust option. (I had incorrectly thought this was a bug).
OR
use update to inject a custom vcov matrix into your emmeans/emmGrid object.
Example:
# create an emmeans object from your fitted model
emmob <- emmeans(thismod, ~ predictor)
# generate a robust vcov matrix using a function
# from the sandwich or clubSandwich package
vcovR <- vcovHC(thismod, type="HC3")
# turn the resulting object into a (square) matrix
vcovRm <- matrix(vcovR, ncol=ncol(vcovR))
# update the V slot of the emmeans/emmGrid object
emmob <- update(emmob, V=vcovRm)
I want to extract the standard errors from the output of the tsls command in the sem R package.
Using some generic code as an example:
fit = tsls(Y ~ X, ~Z)
summary(fit)
The summary function outputs several things besides the regression estimates (e.g., model formulas, summary of the residuals).
I want an equivalent to fit$coef that outputs standard errors. But that doesn't seem to be an option. All the code used to do the equivalent for glm and lm output doesn't seem to work here. Is there any way to hack the output?
Sometimes it takes a little bit of digging to find where these values are coming from. The best place to look, if you don't get any clues from str(fit), would be to look at what summary.tsls is doing.
With some help from getAnywhere("summary.tsls"), we see:
getAnywhere("summary.tsls")
# A single object matching ‘summary.tsls’ was found
# It was found in the following places
# registered S3 method for summary from namespace sem
# namespace:sem
# with value
#
# function (object, digits = getOption("digits"), ...)
# {
# ###
# ### \\\SNIP///
# ###
# std.errors <- sqrt(diag(object$V))
# ###
# ### \\\SNIP///
# ###
# }
# <bytecode: 0x503c530>
# <environment: namespace:sem>
So, to get the value you are looking for, you need to calculate it yourself with:
sqrt(diag(fit$V))
A reproducible example:
library(sem)
fit <- tsls(Q ~ P + D, ~ D + F + A, data=Kmenta)
summary(fit)
#
# 2SLS Estimates
#
# Model Formula: Q ~ P + D
#
# Instruments: ~D + F + A
#
# Residuals:
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -3.4300 -1.2430 -0.1895 0.0000 1.5760 2.4920
#
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 94.63330387 7.92083831 11.94738 1.0762e-09 ***
# P -0.24355654 0.09648429 -2.52431 0.021832 *
# D 0.31399179 0.04694366 6.68869 3.8109e-06 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 1.9663207 on 17 degrees of freedom
sqrt(diag(fit$V))
# (Intercept) P D
# 7.92083831 0.09648429 0.04694366