How to obtain lsmeans() pairwise contrasts with custom vcov? - r

I would like to get pairwise comparisons of adjusted means using lsmeans(), while supplying a robust coefficient-covariance matrix (e.g. vcovHC). Usually functions on regression models provide a vcov argument, but I can't seem to find any such argument in the lsmeans package.
Consider this dummy example, originally from CAR:
require(car)
require(lmtest)
require(sandwich)
require(lsmeans)
mod.moore.2 <- lm(conformity ~ fcategory + partner.status, data=Moore)
coeftest(mod.moore.2)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.197778 1.372669 7.4292 4.111e-09 ***
## fcategorymedium -1.176000 1.902026 -0.6183 0.539805
## fcategoryhigh -0.080889 1.809187 -0.0447 0.964555
## partner.statushigh 4.606667 1.556460 2.9597 0.005098 **
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
coeftest(mod.moore.2, vcov.=vcovHAC)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.197778 0.980425 10.4014 4.565e-13 ***
## fcategorymedium -1.176000 1.574682 -0.7468 0.459435
## fcategoryhigh -0.080889 2.146102 -0.0377 0.970117
## partner.statushigh 4.606667 1.437955 3.2036 0.002626 **
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
lsmeans(mod.moore.2, list(pairwise ~ fcategory), adjust="none")[[2]]
## contrast estimate SE df t.ratio p.value
## low - medium 1.17600000 1.902026 41 0.618 0.5398
## low - high 0.08088889 1.809187 41 0.045 0.9646
## medium - high -1.09511111 1.844549 41 -0.594 0.5560
##
## Results are averaged over the levels of: partner.status
As you can see, lsmeans() estimates p-values using the default variance-covariance matrix.
How can I obtain pairwise contrasts using the vcovHAC variance estimate?

It turns out that there is a wonderful and seamless interface between lsmeans and multcomp packages (see ?lsm), whereas lsmeans provides support for glht().
require(multcomp)
x <- glht(mod.moore.2, lsm(pairwise ~ fcategory), vcov=vcovHAC)
## Note: df set to 41
summary(x, test=adjusted("none"))
##
## Simultaneous Tests for General Linear Hypotheses
##
## Fit: lm(formula = conformity ~ fcategory + partner.status, data = Moore)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## low - medium == 0 1.17600 1.57468 0.747 0.459
## low - high == 0 0.08089 2.14610 0.038 0.970
## medium - high == 0 -1.09511 1.86197 -0.588 0.560
## (Adjusted p values reported -- none method)
This is at least one way to achieve this. I'm still hoping someone knows of an approach using lsmeans only...
Another way to approach this is to hack into the lsmeans object, and manually replace the variance-covariance matrix prior to summary-ing the object.
mod.lsm <- lsmeans(mod.moore.2, ~ fcategory)
mod.lsm#V <- vcovHAC(mod.moore.2) ##replace default vcov with custom vcov
pairs(mod.lsm, adjust = "none")
## contrast estimate SE df t.ratio p.value
## low - medium 1.17600000 1.574682 41 0.747 0.4594
## low - high 0.08088889 2.146102 41 0.038 0.9701
## medium - high -1.09511111 1.861969 41 -0.588 0.5597
##
## Results are averaged over the levels of: partner.status

I'm not sure if this was possible using the 'lsmeans' package but it is using the updated emmeans package.
Moore <- within(carData::Moore, {
partner.status <- factor(partner.status, c("low", "high"))
fcategory <- factor(fcategory, c("low", "medium", "high"))
})
mod.moore.2 <- lm(conformity ~ fcategory + partner.status, data=Moore)
lmtest::coeftest(mod.moore.2, vcov.= sandwich::vcovHAC)
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 10.197778 0.980425 10.4014 4.565e-13 ***
#> fcategorymedium -1.176000 1.574682 -0.7468 0.459435
#> fcategoryhigh -0.080889 2.146102 -0.0377 0.970117
#> partner.statushigh 4.606667 1.437955 3.2036 0.002626 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
emmeans::emmeans(
mod.moore.2, trt.vs.ctrl ~ fcategory,
vcov = sandwich::vcovHAC(mod.moore.2),
adjust = "none")$contrasts
#> contrast estimate SE df t.ratio p.value
#> medium - low -1.1760 1.57 41 -0.747 0.4594
#> high - low -0.0809 2.15 41 -0.038 0.9701
#>
#> Results are averaged over the levels of: partner.status
Created on 2021-07-08 by the reprex package (v0.3.0)
Note, you can't just write the following
emmeans::emmeans(
mod.moore.2, trt.vs.ctrl ~ fcategory,
vcov = sandwich::vcovHAC,
adjust = "none")$contrasts
due to conflict with the sandwich::vcovHAC command which also has an adjust option. (I had incorrectly thought this was a bug).

OR
use update to inject a custom vcov matrix into your emmeans/emmGrid object.
Example:
# create an emmeans object from your fitted model
emmob <- emmeans(thismod, ~ predictor)
# generate a robust vcov matrix using a function
# from the sandwich or clubSandwich package
vcovR <- vcovHC(thismod, type="HC3")
# turn the resulting object into a (square) matrix
vcovRm <- matrix(vcovR, ncol=ncol(vcovR))
# update the V slot of the emmeans/emmGrid object
emmob <- update(emmob, V=vcovRm)

Related

R plm vs fixest package - different results?

I'm trying to understand why R packages "plm" and "fixest" give me different standard errors when I'm estimating a panel model using heteroscedasticity-robust standard errors ("HC1") and state fixed effects.
Does anyone have a hint for me?
Here is the code:
library(AER) # For the Fatality Dataset
library(plm) # PLM
library(fixest) # Fixest
library(tidyverse) # Data Management
data("Fatalities")
# Create new variable : fatality rate
Fatalities <- Fatalities %>%
mutate(fatality_rate = (fatal/pop)*10000)
# Estimate Fixed Effects model using the plm package
plm_reg <- plm(fatality_rate ~ beertax,
data = Fatalities,
index = c("state", "year"),
effect = "individual")
# Print Table with adjusted standard errors
coeftest(plm_reg, vcov. = vcovHC, type = "HC1")
# Output
>t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
beertax -0.65587 0.28880 -2.271 0.02388 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Estimate the very same model using the fixest package
# fixest is much faster and user friendly (in my opinion)
fixest_reg <- feols(fatality_rate ~ beertax | state ,
data = Fatalities,
vcov = "HC1",
panel.id = ~ state + year)
# print table
etable(fixest_reg)
#output
> fixest_reg
Dependent Var.: fatality_rate
beertax -0.6559** (0.2033)
Fixed-Effects: ------------------
state Yes
_______________ __________________
S.E. type Heteroskedas.-rob.
Observations 336
R2 0.90501
Within R2 0.04075
In this example, the standard error is larger when using plm compared to the fixest results (the same is true if state+year fixed effects are used). Does anyone know the reason for this to happen?
Actually the VCOVs are different.
In plm vcovHC defaults to Arellano (1987) which also takes into account serial correlation. See documentation here.
If you add the argument method = "white1", you end up with the same type of VCOV.
Finally, you also need to change how the fixed-effects are accounted for in fixest to obtain the same standard-errors (see details on small sample correction here).
Here are the results:
# Requesting "White" VCOV
coeftest(plm_reg, vcov. = vcovHC, type = "HC1", method = "white1")
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> beertax -0.65587 0.18815 -3.4858 0.0005673 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Changing the small sample correction in fixest (discarding the fixed-effects)
etable(fixest_reg, vcov = list("hc1", hc1 ~ ssc(fixef.K = "none")), fitstat = NA)
#> fixest_reg fixest_reg
#> Dependent Var.: fatality_rate fatality_rate
#>
#> beertax -0.6559** (0.2033) -0.6559*** (0.1882)
#> Fixed-Effects: ------------------ -------------------
#> state Yes Yes
#> _______________ __________________ ___________________
#> S.E. type Heteroskedas.-rob. Heteroskedast.-rob.
# Final comparison
rbind(se(vcovHC(plm_reg, type = "HC1", method = "white1")),
se(fixest_reg, hc1 ~ ssc(fixef.K = "none")))
#> beertax
#> [1,] 0.1881536
#> [2,] 0.1881536

SLR of transformed data in R

For Y = % of population with income below poverty level and X = per capita income of population, I have constructed a box-cox plot and found that the lambda = 0.02020:
bc <- boxcox(lm(Percent_below_poverty_level ~ Per_capita_income, data=tidy.CDI), plotit=T)
bc$x[which.max(bc$y)] # gives lambda
Now I want to fit a simple linear regression using the transformed data, so I've entered this code
transform <- lm((Percent_below_poverty_level**0.02020) ~ (Per_capita_income**0.02020))
transform
But all I get is the error message
'Error in terms.formula(formula, data = data) : invalid power in formula'. What is my mistake?
You could use bcPower() from the car package.
## make sure you do install.packages("car") if you haven't already
library(car)
data(Prestige)
p <- powerTransform(prestige ~ income + education + type ,
data=Prestige,
family="bcPower")
summary(p)
# bcPower Transformation to Normality
# Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
# Y1 1.3052 1 0.9408 1.6696
#
# Likelihood ratio test that transformation parameter is equal to 0
# (log transformation)
# LRT df pval
# LR test, lambda = (0) 41.67724 1 1.0765e-10
#
# Likelihood ratio test that no transformation is needed
# LRT df pval
# LR test, lambda = (1) 2.623915 1 0.10526
mod <- lm(bcPower(prestige, 1.3052) ~ income + education + type, data=Prestige)
summary(mod)
#
# Call:
# lm(formula = bcPower(prestige, 1.3052) ~ income + education +
# type, data = Prestige)
#
# Residuals:
# Min 1Q Median 3Q Max
# -44.843 -13.102 0.287 15.073 62.889
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -3.736e+01 1.639e+01 -2.279 0.0250 *
# income 3.363e-03 6.928e-04 4.854 4.87e-06 ***
# education 1.205e+01 2.009e+00 5.999 3.78e-08 ***
# typeprof 2.027e+01 1.213e+01 1.672 0.0979 .
# typewc -1.078e+01 7.884e+00 -1.368 0.1746
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 22.25 on 93 degrees of freedom
# (4 observations deleted due to missingness)
# Multiple R-squared: 0.8492, Adjusted R-squared: 0.8427
# F-statistic: 131 on 4 and 93 DF, p-value: < 2.2e-16
Powers (more often represented by ^ than ** in R, FWIW) have a special meaning inside formulas [they represent interactions among variables rather than mathematical operations]. So if you did want to power-transform both sides of your equation you would use the I() or "as-is" operator:
I(Percent_below_poverty_level^0.02020) ~ I(Per_capita_income^0.02020)
However, I think you should do what #DaveArmstrong suggested anyway:
it's only the predictor variable that gets transformed
the Box-Cox transformation is actually (y^lambda-1)/lambda (although the shift and scale might not matter for your results)

How can I fit the randon effect in the gamm (or gamm4) in r?

I would like to know if my response variable (y) is significantly affected by a categorical variable ("x"), with two discrete values ("high" and "low").
This response value "y" has spatial autocorrelation, which are numerical spatial eigenvector, named as "MEM2" and "MEM5". I'm trying to fix these spatial eigenvectors into the function using a gamm4. But I don't know where I fit them into the formula.
I've made by two ways:
(1) First:
m2 <- gamm(y ~ x, random=list(MEM2=~1, MEM5=~1), family=poisson )
summary(m2)
summary(m2$gam)
The output
Family: poisson
Link function: log
Formula:
y ~ x
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4574 0.2029 2.254 0.0300 *
xbaixa -0.6330 0.3338 -1.897 0.0655 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.0625
Scale est. = 1 n = 40
(2) Second:
m2 <- gamm(y ~ x +s(MEM2) + s(MEM5),
random=list(MEM2=~1, MEM5=~1), family=poisson )
summary(m2)
summary(m2$gam)
The output:
Family: poisson
Link function: log
Formula:
y ~ x + s(MEM2) + s(MEM5)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3434 0.2275 1.509 0.140
xbaixa -0.4438 0.3689 -1.203 0.237
Approximate significance of smooth terms:
edf Ref.df F p-value
s(MEM2) 1 1 1.343 0.254
s(MEM5) 1 1 0.994 0.325
R-sq.(adj) = 0.0567
Scale est. = 1 n = 40
Anyone knows if it's right? And which I should use?

F-score and standardized Beta for heteroscedasticity-corrected covariance matrix (hccm) in R

I have multiple regression models which failed Breusch-Pagan tests, and so I've recalculated the variance using a heteroscedasticity-corrected covariance matrix, like this: coeftest(lm.model,vcov=hccm(lm.model)). coeftest() is from the lmtest package, while hccm() is from the car package.
I'd like to provide F-scores and standardized betas, but am not sure how to do this, because the output looks like this...
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.000261 0.038824 0.01 0.995
age 0.004410 0.041614 0.11 0.916
exercise -0.044727 0.023621 -1.89 0.059 .
tR -0.038375 0.037531 -1.02 0.307
allele1_num 0.013671 0.038017 0.36 0.719
tR:allele1_num -0.010077 0.038926 -0.26 0.796
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Any advice on how to report these so they are as consistent as possible with the standard summary() and Anova() output from R and car, and the function std_beta() from the sjmisc package?
In case anyone else has this question, here was my solution. It is not particularly elegant, but it works.
I simply used the function for std_beta as a template, and then changed the input for the standard error to that derived from the std_beta() function.
# This is taken from std_beta function from sj_misc package.
# =====================================
b <-coef(lm.model) # Same Estimate
b <-b[-1] # Same intercept
fit.data <- as.data.frame(stats::model.matrix(lm.model)) # Same model.
fit.data <- fit.data[, -1] # Keep intercept
fit.data <- as.data.frame(sapply(fit.data, function(x) if (is.factor(x))
to_value(x, keep.labels = F)
else x))
sx <- sapply(fit.data, sd, na.rm = T)
sy <- sapply(as.data.frame(lm.model$model)[1], sd, na.rm = T)
beta <- b * sx/sy
se <-coeftest(lm.model,vcov=hccm(lm.model))[,2] # ** USE HCCM covariance for SE **
se <- se[-1]
beta.se <- se * sx/sy
data.frame(beta = beta, ci.low = (beta - beta.se *
1.96), ci.hi = (beta + beta.se * 1.96))
For the F-scores, I just squared the t-values.
I hope this saves someone some time.

Inference about Slope coefficient in R

By default lm summary test slope coefficient equal to zero. My question is very basic. I want to know how to test slope coefficient equal to non-zero value. One approach could be to use confint but this does not provide p-value. I also wonder how to do one-sided test with lm.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2,10,20, labels=c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)
summary(lm.D9)
Call:
lm(formula = weight ~ group)
Residuals:
Min 1Q Median 3Q Max
-1.0710 -0.4938 0.0685 0.2462 1.3690
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.0320 0.2202 22.850 9.55e-15 ***
groupTrt -0.3710 0.3114 -1.191 0.249
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-squared: 0.07308, Adjusted R-squared: 0.02158
F-statistic: 1.419 on 1 and 18 DF, p-value: 0.249
confint(lm.D9)
2.5 % 97.5 %
(Intercept) 4.56934 5.4946602
groupTrt -1.02530 0.2833003
Thanks for your time and effort.
as #power says, you can do by your hand.
here is an example:
> est <- summary.lm(lm.D9)$coef[2, 1]
> se <- summary.lm(lm.D9)$coef[2, 2]
> df <- summary.lm(lm.D9)$df[2]
>
> m <- 0
> 2 * abs(pt((est-m)/se, df))
[1] 0.2490232
>
> m <- 0.2
> 2 * abs(pt((est-m)/se, df))
[1] 0.08332659
and you can do one-side test by omitting 2*.
UPDATES
here is an example of two-side and one-side probability:
> m <- 0.2
>
> # two-side probability
> 2 * abs(pt((est-m)/se, df))
[1] 0.08332659
>
> # one-side, upper (i.e., greater than 0.2)
> pt((est-m)/se, df, lower.tail = FALSE)
[1] 0.9583367
>
> # one-side, lower (i.e., less than 0.2)
> pt((est-m)/se, df, lower.tail = TRUE)
[1] 0.0416633
note that sum of upper and lower probabilities is exactly 1.
Use the linearHypothesis function from car package. For instance, you can check if the coefficient of groupTrt equals -1 using.
linearHypothesis(lm.D9, "groupTrt = -1")
Linear hypothesis test
Hypothesis:
groupTrt = - 1
Model 1: restricted model
Model 2: weight ~ group
Res.Df RSS Df Sum of Sq F Pr(>F)
1 19 10.7075
2 18 8.7292 1 1.9782 4.0791 0.05856 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The smatr package has a slope.test() function with which you can use OLS.
In addition to all the other good answers, you could use an offset. It's a little trickier with categorical predictors, because you need to know the coding.
lm(weight~group+offset(1*(group=="Trt")))
The 1* here is unnecessary but is put in to emphasize that you are testing against the hypothesis that the difference is 1 (if you want to test against a hypothesis of a difference of d, then use d*(group=="Trt")
You can use t.test to do this for your data. The mu parameter sets the hypothesis for the difference of group means. The alternative parameter lets you choose between one and two-sided tests.
t.test(weight~group,var.equal=TRUE)
Two Sample t-test
data: weight by group
t = 1.1913, df = 18, p-value = 0.249
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2833003 1.0253003
sample estimates:
mean in group Ctl mean in group Trt
5.032 4.661
t.test(weight~group,var.equal=TRUE,mu=-1)
Two Sample t-test
data: weight by group
t = 4.4022, df = 18, p-value = 0.0003438
alternative hypothesis: true difference in means is not equal to -1
95 percent confidence interval:
-0.2833003 1.0253003
sample estimates:
mean in group Ctl mean in group Trt
5.032 4.661
Code up your own test. You know the estimated coeffiecient and you know the standard error. You could construct your own test stat.

Resources