Contrast between variables in glmmTMB - r

As a reproducible example, let's use the next no-sense example:
> library(glmmTMB)
> summary(glmmTMB(am ~ disp + hp + (1|carb), data = mtcars))
Family: gaussian ( identity )
Formula: am ~ disp + hp + (1 | carb)
Data: mtcars
AIC BIC logLik deviance df.resid
34.1 41.5 -12.1 24.1 27
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
carb (Intercept) 2.011e-11 4.485e-06
Residual 1.244e-01 3.528e-01
Number of obs: 32, groups: carb, 6
Dispersion estimate for gaussian family (sigma^2): 0.124
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.7559286 0.1502385 5.032 4.87e-07 ***
disp -0.0042892 0.0008355 -5.134 2.84e-07 ***
hp 0.0043626 0.0015103 2.889 0.00387 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Actually, my real model family is nbinom2. I want to make a contrast test between disp and hp. So, I try:
> glht(glmmTMB(am ~ disp + hp + (1|carb), data = mtcars), linfct = matrix(c(0,1,-1)))
Error in glht.matrix(glmmTMB(am ~ disp + hp + (1 | carb), data = mtcars), :
‘ncol(linfct)’ is not equal to ‘length(coef(model))’
How can I avoid this error?
Thank you!

The problem is actually fairly simple: linfct needs to be a matrix with the number of columns equal to the number of parameters. You specified matrix(c(0,1,-1)) without specifying numbers of rows or columns, so R made a column matrix by default. Adding nrow=1 seems to work.
library(glmmTMB)
library(multcomp)
m1<- glmmTMB(am ~ disp + hp + (1|carb), data = mtcars)
modelparm.glmmTMB <- function (model, coef. = function(x) fixef(x)[[component]],
vcov. = function(x) vcov(x)[[component]],
df = NULL, component="cond", ...) {
multcomp:::modelparm.default(model, coef. = coef., vcov. = vcov.,
df = df, ...)
}
glht(m1, linfct = matrix(c(0,1,-1),nrow=1))

Related

Full versus partial marginal effect using fixest package

I have would like to know the full marginal effect of the continuous variable provtariff given the interaction term Female * provtariff on the outcome variable log(totalinc) as well as the coefficient of the interaction term.
Using the code:
feols(log(totalinc) ~ i(Female, provtariff) | hhid02 + year,
data = inc0402_p,
weights = ~hhwt,
vcov = ~tinh)
I got the following results
OLS estimation, Dep. Var.: log(totalinc)
Observations: 24,966
Weights: hhwt
Fixed-effects: hhid02: 11,018, year: 2
Standard-errors: Clustered (tinh)
Estimate Std. Error t value Pr(>|t|)
Female::0:provtariff 5.79524 1.84811 3.13577 0.0026542 **
Female::1:provtariff 2.66994 2.09540 1.27419 0.2075088
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 7.61702 Adj. R2: 0.670289
Within R2: 0.045238s
However, when I implement the following code
feols(log(totalinc) ~ Female*provtariff | hhid02 + year,
data = inc0402_p,
weights = ~hhwt,
vcov = ~tinh)
I get the following results
OLS estimation, Dep. Var.: log(totalinc)
Observations: 24,966
Weights: hhwt
Fixed-effects: hhid02: 11,018, year: 2
Standard-errors: Clustered (tinh)
Estimate Std. Error t value Pr(>|t|)
Female -0.290019 0.029894 -9.70142 6.6491e-14 ***
provtariff 4.499561 1.884625 2.38751 2.0130e-02 *
Female:provtariff -0.433963 0.170505 -2.54516 1.3512e-02 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 7.52022 Adj. R2: 0.678592
Within R2: 0.069349
Should the provtariff coefficient in the latter model not be the same as the coefficient for Female::0:provtariff in the first model?
No, clearly the two models are different because one includes two parameters and the other one includes 3. They won’t produce equivalent results. More specifically, one of your models includes only the interactions, but no “constitutive” term, whereas the other model includes both.
Here is a reproducible example with a 3rd model that reproduces your model with the * asterisk, but uses the fixest interaction syntax with i(). You’ll see that some of the coefficients and standard errors are exactly identical to those in the 2nd model, and that R2 are the same. This suggests that m2 and m3 are equivalent:
library(fixest)
library(modelsummary)
library(marginaleffects)
# Your
m1 <- feols(mpg ~ i(am, hp) | gear, data = mtcars)
m2 <- feols(mpg ~ am * hp | gear, data = mtcars)
m3 <- feols(mpg ~ am + i(am, hp) | gear, data = mtcars)
models <- list(m1, m2, m3)
modelsummary(models)
(1)
(2)
(3)
am = 0 × hp
-0.076
-0.056
(0.025)
(0.006)
am = 1 × hp
-0.059
-0.071
(0.009)
(0.021)
am
5.568
5.568
(1.575)
(1.575)
hp
-0.056
(0.006)
am × hp
-0.015
(0.019)
Num.Obs.
32
32
32
R2
0.763
0.797
0.797
Std.Errors
by: gear
by: gear
by: gear
FE: gear
X
X
X
We can further check the equivalence between models 2 and 3 by computing the partial derivative of the outcome with respect to one of the predictors. In economics they call this slope a “marginal effect”, although the terminology changes across disciplines, and I am not sure if that is the quantity you are interested in when you say “marginal effects”:
marginaleffects(m2, variables = "hp") |> summary()
#> Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> 1 hp mean(dY/dX) -0.062 0.01087 -5.705 1.1665e-08 -0.0833 -0.0407
#>
#> Model type: fixest
#> Prediction type: response
marginaleffects(m3, variables = "hp") |> summary()
#> Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> 1 hp mean(dY/dX) -0.062 0.01087 -5.705 1.1665e-08 -0.0833 -0.0407
#>
#> Model type: fixest
#> Prediction type: response

Confidence interval for sigma in a purely fixed effect model

Is there a standard way to estimate confidence interval for the variance parameter of a linear model with fixed-effect. E.g. given:
reg=lm(formula = 100/mpg ~ disp + hp + wt + am, data = mtcars)
how can I get the confidence interval for the variance parameter. confint only details fixed effect and lmer from lme4 does not accept model without level-2 random-effect, which is my case here.
Unfortunately, you have to implement it yourself.
Like so :
reg <- lm(formula = 100/mpg ~ disp + hp + wt + am, data = mtcars)
alpha <- 0.05
n <- length(resid(reg))
sigma <- summary(reg)$sigma
sigma*n/qchisq(1-alpha/2, df = n-2) ; sigma*n/qchisq(alpha/2, df = n-2)
> sigma*n/qchisq(1-alpha/2, df = n-2) ; sigma*n/qchisq(alpha/2, df = n-2)
[1] 0.4600539
[1] 1.287194
It comes from the relation :
I assume you are looking for the summary() function.
The code shows the following:
data(mtcars)
reg<-lm(formula = 100/mpg ~ disp + hp + wt + am, data = mtcars)
summary(reg)
# Call:
# lm(formula = 100/mpg ~ disp + hp + wt + am, data = mtcars)
#
# Residuals:
# Min 1Q Median 3Q Max
# -1.6923 -0.3901 0.0579 0.3649 1.2608
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.740648 0.738594 1.003 0.32487
# disp 0.002703 0.002715 0.996 0.32832
# hp 0.005275 0.003253 1.621 0.11657
# wt 1.001303 0.302761 3.307 0.00267 **
# am 0.155815 0.375515 0.415 0.68147
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.6754 on 27 degrees of freedom
# Multiple R-squared: 0.8527, Adjusted R-squared: 0.8309
# F-statistic: 39.08 on 4 and 27 DF, p-value: 7.369e-11
To select it, you can store the summary as a variable and select the coefficients.
summa<-summary(reg)
summa$coefficients
With that, one can select the sd covariate that you want and do the confidence interval with the % of interest. To learn the confidence interval, one can read how it is done here
R does it automatically using confint(object, parms, level)
In your case, confint(reg, level = 0.95)

How to fix coefficients in R for categorical variables

I would like to know how to put offsets (or fixed coefficients) in a model on categorical variables for each different level and see how that effects the other variables. I'm not sure how to exactly code that.
library(tidyverse)
mtcars <- as_tibble(mtcars)
mtcars$cyl <- as.factor(mtcars$cyl)
model1 <- glm(mpg ~ cyl + hp, data = mtcars)
summary(model1)
This gives the following:
Call:
glm(formula = mpg ~ cyl + hp, data = mtcars)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.818 -1.959 0.080 1.627 6.812
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.65012 1.58779 18.044 < 2e-16 ***
cyl6 -5.96766 1.63928 -3.640 0.00109 **
cyl8 -8.52085 2.32607 -3.663 0.00103 **
hp -0.02404 0.01541 -1.560 0.12995
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 9.898847)
`Null deviance: 1126.05 on 31 degrees of freedom`
Residual deviance: 277.17 on 28 degrees of freedom
AIC: 169.9
Number of Fisher Scoring iterations: 2
I would like to set the cylinders to different offsets, say 6 cylinders to -4 and 8 cylinders to -9 so I can see what that does to horse power. I tried this in the below code but get an errror so I'm not sure the correct way to do one unique value in a categorical variable much less more than one.
model2 <- glm(mpg ~ offset(I(-4 * cyl[6]))+ hp, data = mtcars)
Would anyone help me figure out how to correctly do this?
In a fresh R session:
glm(mpg ~ offset(I(-4 * (cyl == 6) + -9 * (cyl == 8))) + hp, data = mtcars)
# Call: glm(formula = mpg ~ offset(I(-4 * (cyl == 6) + -9 * (cyl == 8))) +
# hp, data = mtcars)
#
# Coefficients:
# (Intercept) hp
# 27.66881 -0.01885
#
# Degrees of Freedom: 31 Total (i.e. Null); 30 Residual
# Null Deviance: 353.8
# Residual Deviance: 302 AIC: 168.6

r lm vectorised control variables

I often have to write long equations with controls variables that do not change.
For instance, hp is my variable of interest (x) that change between models and vs + am + gear + carb are my controls
lm(disp ~ hp + vs + am + gear + carb, mtcars)
Then my x is drat and then wt but my controls are the same.
lm(disp ~ drat + vs + am + gear + carb, mtcars)
lm(disp ~ wt + vs + am + gear + carb, mtcars)
I would find it quite useful sometimes to be able to reduce the equations to something like
y = 'disp'
x = 'hp'
controls = 'vs + am + gear + carb'
lm(y ~ x + controls, mtcars)
Any idea how I could achieve that?
The code below constructs a string formula (with a small edit to #ZheyuanLi's comment) to feed to lm and also uses the map function from purrr (a tidyverse package) to create a separate model for each variable in the x vector. Each element of the list models contains the model object and the name of the element is the value of x that was used in the model formula.
library(tidyverse)
y = 'disp'
x = c('hp','wt')
controls=c("vs","am","gear","carb")
models = map(setNames(x,x),
~ lm(paste(y, paste(c(.x, controls), collapse="+"), sep="~"),
data=mtcars))
map(models, summary)
$hp
Call:
lm(formula = paste(y, paste(c(.x, controls), collapse = "+"),
sep = "~"), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-85.524 -19.153 1.109 14.957 115.804
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 261.9238 73.2477 3.576 0.0014 **
hp 1.2021 0.2453 4.900 4.38e-05 ***
vs -63.7135 26.5957 -2.396 0.0241 *
am -56.0468 30.7338 -1.824 0.0797 .
gear -31.6231 23.4816 -1.347 0.1897
carb -14.3237 10.1169 -1.416 0.1687
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 47.97 on 26 degrees of freedom
Multiple R-squared: 0.8743, Adjusted R-squared: 0.8502
F-statistic: 36.18 on 5 and 26 DF, p-value: 6.547e-11
$wt
Call:
lm(formula = paste(y, paste(c(.x, controls), collapse = "+"),
sep = "~"), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-74.153 -36.993 -2.097 30.616 102.331
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.875 108.220 0.267 0.79172
wt 88.577 18.810 4.709 7.25e-05 ***
vs -92.669 25.186 -3.679 0.00107 **
am -3.734 34.662 -0.108 0.91503
gear -4.688 25.271 -0.186 0.85427
carb -8.455 9.662 -0.875 0.38955
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 48.88 on 26 degrees of freedom
Multiple R-squared: 0.8695, Adjusted R-squared: 0.8445
F-statistic: 34.66 on 5 and 26 DF, p-value: 1.056e-10

Testing the equality of multiple coefficients in R

I have the following model:
y = b1_group1*X1 + b1_group2*X1 + b2_group1*X2 + b2_group2*X2 + ... +
b10_group1*X10 + b10_group2*X10
Easily made in R as follows:
OLS <- lm(Y ~ t1:Group + t2:Group + t3:Group + t4:Group + t5:Group + t6:Group +
t7:Group + t8:Group + t9:Group + t10:Group,weights = weight, Alldata)
In STATA, I can now do the following test:
test (b1_group1=b1_group2) (b2_group1=b2_group2) (b3_group1=b3_group2)
b1_group1 - b1_group2 = 0
b2_group1 - b2_group2 = 0
b3_group1 - b3_group2 = 0
Which tells me whether the group of coefficents from X1, X2 and X3 are jointly different between Group 1 and Group 2 by means of an F test.
Can somebody please tell how how to do this in R? Thanks!
Look at this example:
library(car)
mod <- lm(mpg ~ disp + hp + drat*wt, mtcars)
linearHypothesis(mod, c("disp = hp", "disp = drat", "disp = drat:wt" ))
Linear hypothesis test
Hypothesis:
disp - hp = 0
disp - drat = 0
disp - drat:wt = 0
Model 1: restricted model
Model 2: mpg ~ disp + hp + drat * wt
Res.Df RSS Df Sum of Sq F Pr(>F)
1 29 211.80
2 26 164.67 3 47.129 2.4804 0.08337 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
See ?linearHypothesis for a variety of other ways to specify the test.
Alternative:
The above shows you a quick and easy way to carry out hypothesis tests. Users with a solid understanding of the algebra of hypothesis tests may find the following approach more convenient, at least for simple versions of the test. Let's say we want to test whether or not the coefficients on cyl and carb are identical.
mod <- lm(mpg ~ disp + hp + cyl + carb, mtcars)
The following tests are equivalent:
Test one:
linearHypothesis(mod, c("cyl = carb" ))
Linear hypothesis test
Hypothesis:
cyl - carb = 0
Model 1: restricted model
Model 2: mpg ~ disp + hp + cyl + carb
Res.Df RSS Df Sum of Sq F Pr(>F)
1 28 238.83
2 27 238.71 1 0.12128 0.0137 0.9076
Test two:
rmod<- lm(mpg ~ disp + hp + I(cyl + carb), mtcars)
anova(mod, rmod)
Analysis of Variance Table
Model 1: mpg ~ disp + hp + cyl + carb
Model 2: mpg ~ disp + hp + I(cyl + carb)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 27 238.71
2 28 238.83 -1 -0.12128 0.0137 0.9076

Resources