Full versus partial marginal effect using fixest package - r

I have would like to know the full marginal effect of the continuous variable provtariff given the interaction term Female * provtariff on the outcome variable log(totalinc) as well as the coefficient of the interaction term.
Using the code:
feols(log(totalinc) ~ i(Female, provtariff) | hhid02 + year,
data = inc0402_p,
weights = ~hhwt,
vcov = ~tinh)
I got the following results
OLS estimation, Dep. Var.: log(totalinc)
Observations: 24,966
Weights: hhwt
Fixed-effects: hhid02: 11,018, year: 2
Standard-errors: Clustered (tinh)
Estimate Std. Error t value Pr(>|t|)
Female::0:provtariff 5.79524 1.84811 3.13577 0.0026542 **
Female::1:provtariff 2.66994 2.09540 1.27419 0.2075088
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 7.61702 Adj. R2: 0.670289
Within R2: 0.045238s
However, when I implement the following code
feols(log(totalinc) ~ Female*provtariff | hhid02 + year,
data = inc0402_p,
weights = ~hhwt,
vcov = ~tinh)
I get the following results
OLS estimation, Dep. Var.: log(totalinc)
Observations: 24,966
Weights: hhwt
Fixed-effects: hhid02: 11,018, year: 2
Standard-errors: Clustered (tinh)
Estimate Std. Error t value Pr(>|t|)
Female -0.290019 0.029894 -9.70142 6.6491e-14 ***
provtariff 4.499561 1.884625 2.38751 2.0130e-02 *
Female:provtariff -0.433963 0.170505 -2.54516 1.3512e-02 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 7.52022 Adj. R2: 0.678592
Within R2: 0.069349
Should the provtariff coefficient in the latter model not be the same as the coefficient for Female::0:provtariff in the first model?

No, clearly the two models are different because one includes two parameters and the other one includes 3. They won’t produce equivalent results. More specifically, one of your models includes only the interactions, but no “constitutive” term, whereas the other model includes both.
Here is a reproducible example with a 3rd model that reproduces your model with the * asterisk, but uses the fixest interaction syntax with i(). You’ll see that some of the coefficients and standard errors are exactly identical to those in the 2nd model, and that R2 are the same. This suggests that m2 and m3 are equivalent:
library(fixest)
library(modelsummary)
library(marginaleffects)
# Your
m1 <- feols(mpg ~ i(am, hp) | gear, data = mtcars)
m2 <- feols(mpg ~ am * hp | gear, data = mtcars)
m3 <- feols(mpg ~ am + i(am, hp) | gear, data = mtcars)
models <- list(m1, m2, m3)
modelsummary(models)
(1)
(2)
(3)
am = 0 × hp
-0.076
-0.056
(0.025)
(0.006)
am = 1 × hp
-0.059
-0.071
(0.009)
(0.021)
am
5.568
5.568
(1.575)
(1.575)
hp
-0.056
(0.006)
am × hp
-0.015
(0.019)
Num.Obs.
32
32
32
R2
0.763
0.797
0.797
Std.Errors
by: gear
by: gear
by: gear
FE: gear
X
X
X
We can further check the equivalence between models 2 and 3 by computing the partial derivative of the outcome with respect to one of the predictors. In economics they call this slope a “marginal effect”, although the terminology changes across disciplines, and I am not sure if that is the quantity you are interested in when you say “marginal effects”:
marginaleffects(m2, variables = "hp") |> summary()
#> Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> 1 hp mean(dY/dX) -0.062 0.01087 -5.705 1.1665e-08 -0.0833 -0.0407
#>
#> Model type: fixest
#> Prediction type: response
marginaleffects(m3, variables = "hp") |> summary()
#> Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
#> 1 hp mean(dY/dX) -0.062 0.01087 -5.705 1.1665e-08 -0.0833 -0.0407
#>
#> Model type: fixest
#> Prediction type: response

Related

R Fixest Package: IV Estimation Without Further Exogenous Variables

I intend to run instrumental variable regressions with fixed effects using the fixest package's feols function. However, I am having issues with the syntax specifying an estimation without further exogenous controls.
Consider the following example:
# Load package
require("fixest")
# Load data
df <- airquality
I would like to something like the following, i.e. explaining the outcome via the instrumented endogenous variable and fixed effects:
feols(Temp | Month + Day | Ozone ~ Wind, df)
This, however, produces an error:
The dependent variable is a constant. Estimation cannot be done.
It only works, when I add further exogenous covariates (as in the documentation's examples):
feols(Temp ~ Solar.R | Month + Day | Ozone ~ Wind, df)
How do I fix this? How do I run the estimation without further controls, such as Solar.R in this case?
Note: I post this on Stack Overflow rather than Cross Validated because the question relates to a coding syntax issue, and not to the econometric techniques underlying the estimations.
actually there seems to be a misunderstanding on how to write the formula.
The syntax is: Dep_var ~ Exo_vars | Fixed-effects | Endo_vars ~ Instruments.
The parts Fixed-effects and Endo_vars ~ Instruments are optional. On the other hand, the part with Exo_vars must always be there, be it with only the intercept.
Knowing that, the following works:
base = iris
names(base) = c("y", "x1", "x_endo", "x_inst", "fe")
feols(y ~ 1 | x_endo ~ x_inst, base)
#> TSLS estimation, Dep. Var.: y, Endo.: x_endo, Instr.: x_inst
#> Second stage: Dep. Var.: y
#> Observations: 150
#> Standard-errors: Standard
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 4.345900 0.08096 53.679 < 2.2e-16 ***
#> fit_x_endo 0.398477 0.01964 20.289 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.404769 Adj. R2: 0.757834
#> F-test (1st stage): stat = 1,882.45 , p < 2.2e-16 , on 1 and 148 DoF.
#> Wu-Hausman: stat = 3.9663, p = 0.048272, on 1 and 147 DoF.
# Same with fixed-effect
feols(y ~ 1 | fe | x_endo ~ x_inst, base)
#> TSLS estimation, Dep. Var.: y, Endo.: x_endo, Instr.: x_inst
#> Second stage: Dep. Var.: y
#> Observations: 150
#> Fixed-effects: fe: 3
#> Standard-errors: Clustered (fe)
#> Estimate Std. Error t value Pr(>|t|)
#> fit_x_endo 0.900061 0.117798 7.6407 0.016701 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.333489 Adj. R2: 0.833363
#> Within R2: 0.57177
#> F-test (1st stage): stat = 44.77 , p = 4.409e-10, on 1 and 146 DoF.
#> Wu-Hausman: stat = 0.001472, p = 0.969447 , on 1 and 145 DoF.
Getting back to the original example:
feols(Temp | Month + Day | Ozone ~ Wind, df) means that the dependent variable will be Temp | Month + Day | Ozone with the pipe here meaning the logical OR, leading to a 1 for all observations. Hence the error message.
To fix it and obtain an appropriate behavior, use feols(Temp ~ 1 | Month + Day | Ozone ~ Wind, df).

Contrast between variables in glmmTMB

As a reproducible example, let's use the next no-sense example:
> library(glmmTMB)
> summary(glmmTMB(am ~ disp + hp + (1|carb), data = mtcars))
Family: gaussian ( identity )
Formula: am ~ disp + hp + (1 | carb)
Data: mtcars
AIC BIC logLik deviance df.resid
34.1 41.5 -12.1 24.1 27
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
carb (Intercept) 2.011e-11 4.485e-06
Residual 1.244e-01 3.528e-01
Number of obs: 32, groups: carb, 6
Dispersion estimate for gaussian family (sigma^2): 0.124
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.7559286 0.1502385 5.032 4.87e-07 ***
disp -0.0042892 0.0008355 -5.134 2.84e-07 ***
hp 0.0043626 0.0015103 2.889 0.00387 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Actually, my real model family is nbinom2. I want to make a contrast test between disp and hp. So, I try:
> glht(glmmTMB(am ~ disp + hp + (1|carb), data = mtcars), linfct = matrix(c(0,1,-1)))
Error in glht.matrix(glmmTMB(am ~ disp + hp + (1 | carb), data = mtcars), :
‘ncol(linfct)’ is not equal to ‘length(coef(model))’
How can I avoid this error?
Thank you!
The problem is actually fairly simple: linfct needs to be a matrix with the number of columns equal to the number of parameters. You specified matrix(c(0,1,-1)) without specifying numbers of rows or columns, so R made a column matrix by default. Adding nrow=1 seems to work.
library(glmmTMB)
library(multcomp)
m1<- glmmTMB(am ~ disp + hp + (1|carb), data = mtcars)
modelparm.glmmTMB <- function (model, coef. = function(x) fixef(x)[[component]],
vcov. = function(x) vcov(x)[[component]],
df = NULL, component="cond", ...) {
multcomp:::modelparm.default(model, coef. = coef., vcov. = vcov.,
df = df, ...)
}
glht(m1, linfct = matrix(c(0,1,-1),nrow=1))

How to fix coefficients in R for categorical variables

I would like to know how to put offsets (or fixed coefficients) in a model on categorical variables for each different level and see how that effects the other variables. I'm not sure how to exactly code that.
library(tidyverse)
mtcars <- as_tibble(mtcars)
mtcars$cyl <- as.factor(mtcars$cyl)
model1 <- glm(mpg ~ cyl + hp, data = mtcars)
summary(model1)
This gives the following:
Call:
glm(formula = mpg ~ cyl + hp, data = mtcars)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.818 -1.959 0.080 1.627 6.812
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.65012 1.58779 18.044 < 2e-16 ***
cyl6 -5.96766 1.63928 -3.640 0.00109 **
cyl8 -8.52085 2.32607 -3.663 0.00103 **
hp -0.02404 0.01541 -1.560 0.12995
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 9.898847)
`Null deviance: 1126.05 on 31 degrees of freedom`
Residual deviance: 277.17 on 28 degrees of freedom
AIC: 169.9
Number of Fisher Scoring iterations: 2
I would like to set the cylinders to different offsets, say 6 cylinders to -4 and 8 cylinders to -9 so I can see what that does to horse power. I tried this in the below code but get an errror so I'm not sure the correct way to do one unique value in a categorical variable much less more than one.
model2 <- glm(mpg ~ offset(I(-4 * cyl[6]))+ hp, data = mtcars)
Would anyone help me figure out how to correctly do this?
In a fresh R session:
glm(mpg ~ offset(I(-4 * (cyl == 6) + -9 * (cyl == 8))) + hp, data = mtcars)
# Call: glm(formula = mpg ~ offset(I(-4 * (cyl == 6) + -9 * (cyl == 8))) +
# hp, data = mtcars)
#
# Coefficients:
# (Intercept) hp
# 27.66881 -0.01885
#
# Degrees of Freedom: 31 Total (i.e. Null); 30 Residual
# Null Deviance: 353.8
# Residual Deviance: 302 AIC: 168.6

lm: use product of two variables as a single variable

I am running the following piece of code:
lm(ath ~ HAPP + IQ2 + OPEN2 + INCOME*EXPEC,data=data)
Which, of course, lead me to the output:
Standardized weighted residuals 2:
Min 1Q Median 3Q Max
-3.2644 -0.5461 -0.0223 0.4158 3.2217
Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.730e+00 3.141e+00 1.824 0.068112 .
HAPP -7.765e-02 8.958e-02 -0.867 0.386014
IQ2 5.080e-04 7.453e-05 6.816 9.38e-12 ***
OPEN2 -5.038e-06 5.114e-06 -0.985 0.324640
INCOME -1.837e-02 1.211e-01 -0.152 0.879395
EXPEC -3.336e-01 1.161e-01 -2.873 0.004067 **
INCOME:EXPEC 2.645e-03 7.597e-04 3.481 0.000499 ***
Phi coefficients (precision model with identity link):
Estimate Std. Error z value Pr(>|z|)
(phi) 9.489 1.363 6.96 3.41e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Type of estimator: ML (maximum likelihood)
Log-likelihood: 222.5 on 8 Df
Pseudo R-squared: 0.6938
Number of iterations: 36 (BFGS) + 4 (Fisher scoring)
I need to drop the INCOME and EXPEC lines (with Estimate, Std. Error, z value and Pr(>|z|)) from the regression in a really elegant way (I need to run like a million models, so I can't do it by hand one by one). Please note that those variables (INCOME and EXPEC) were not included in the original set of individual variables. This is, ONLY the requested variables (and the demanded interactions, of course) should be printed.
Any piece of advice?
Thanks!!! :D
You can use the AsIs function. See the example below;
fit <- lm(Sepal.Length ~ Sepal.Width + I(Petal.Length * Petal.Width) , data = iris)
fit
# Call:
# lm(formula = Sepal.Length ~ Sepal.Width + I(Petal.Length * Petal.Width),
# data = iris)
#
# Coefficients:
# (Intercept) Sepal.Width
# 4.1072 0.2688
# I(Petal.Length * Petal.Width)
# 0.1578
library(broom)
tidy(fit)
# term estimate std.error statistic p.value
# 1 (Intercept) 4.1072163 0.266529393 15.409994 1.702125e-32
# 2 Sepal.Width 0.2687704 0.081280587 3.306698 1.186597e-03
# 3 I(Petal.Length * Petal.Width) 0.1578160 0.007517941 20.991921 4.426899e-46
If you only need part of the coefficients you can just use the coef function from base R and subset the indices you like. For example:
a1 <- lm(Sepal.Length ~ Sepal.Width + I(Petal.Length * Petal.Width) , data = iris)
coefficients(a1)[1:2]
(Intercept) Sepal.Width
4.1072163 0.2687704
If you need the formula call as well you could do a1$call
a1$call
lm(formula = Sepal.Length ~ Sepal.Width + I(Petal.Length * Petal.Width),
data = iris)
Or if you need any other argument just take a look at str(a1) or str(summary(a1))

summary dataframe from several multiple regression outputs

I am doing multiple OLS regressions. I have used the following lm function:
GroupNetReturnsStockPickers <- read.csv("GroupNetReturnsStockPickers.csv", header=TRUE, sep=",", dec=".")
ModelGroupNetReturnsStockPickers <- lm(StockPickersNet ~ Mkt.RF+SMB+HML+WML, data=GroupNetReturnsStockPickers)
names(GroupNetReturnsStockPickers)
summary(ModelGroupNetReturnsStockPickers)
Which gives me the summary output of:
Call:
lm(formula = StockPickersNet ~ Mkt.RF + SMB + HML + WML, data = GroupNetReturnsStockPickers)
Residuals:
Min 1Q Median 3Q Max
-0.029698 -0.005069 -0.000328 0.004546 0.041948
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.655e-05 5.981e-04 0.078 0.938
Mkt.RF -1.713e-03 1.202e-02 -0.142 0.887
SMB 3.006e-02 2.545e-02 1.181 0.239
HML 1.970e-02 2.350e-02 0.838 0.403
WML 1.107e-02 1.444e-02 0.766 0.444
Residual standard error: 0.009029 on 251 degrees of freedom
Multiple R-squared: 0.01033, Adjusted R-squared: -0.005445
F-statistic: 0.6548 on 4 and 251 DF, p-value: 0.624
This is perfect. However, I am doing a total of 10 multiple OLS regressions, and I wish to create my own summary output, in a data frame, where I extract the Intercept Estimate, the tvalue estimate, and the p-value, for all 10 analyzes individually. Hence it would be a 10x3, where the columns names would be Model1, Model2,..,Model10, and row names: Value, t-value and p-Value.
I appreciate any help.
There's a few packages that do this (stargazer and texreg) as well as this code for outreg.
In any case, if you are only interested in the intercept here is one approach:
# Estimate a bunch of different models, stored in a list
fits <- list() # Create empty list to store models
fits$model1 <- lm(Ozone ~ Solar.R, data = airquality)
fits$model2 <- lm(Ozone ~ Solar.R + Wind, data = airquality)
fits$model3 <- lm(Ozone ~ Solar.R + Wind + Temp, data = airquality)
# Combine the results for the intercept
do.call(cbind, lapply(fits, function(z) summary(z)$coefficients["(Intercept)", ]))
# RESULT:
# model1 model2 model3
# Estimate 18.598727772 7.724604e+01 -64.342078929
# Std. Error 6.747904163 9.067507e+00 23.054724347
# t value 2.756222869 8.518995e+00 -2.790841389
# Pr(>|t|) 0.006856021 1.052118e-13 0.006226638
Look at the broom package, which was created to do exactly what you are asking for. The only difference is that it puts the models into rows and the different statistics into columns, and I understand that you would prefer the opposite, but you can work around that afterwards if it is really necessary.
To give you an example, the function tidy() converts a model output into a dataframe.
model <- lm(mpg ~ cyl, data=mtcars)
summary(model)
Call:
lm(formula = mpg ~ cyl, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.9814 -2.1185 0.2217 1.0717 7.5186
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.8846 2.0738 18.27 < 2e-16 ***
cyl -2.8758 0.3224 -8.92 6.11e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared: 0.7262, Adjusted R-squared: 0.7171
F-statistic: 79.56 on 1 and 30 DF, p-value: 6.113e-10
And
library(broom)
tidy(model)
yields the following data frame:
term estimate std.error statistic p.value
1 (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
2 cyl -2.87579 0.3224089 -8.919699 6.112687e-10
Look at ?tidy.lm to see more options, for instance for confidence intervals, etc.
To combine the output of your ten models into one dataframe, you could use
library(dplyr)
bind_rows(one, two, three, ... , .id="models")
Or, if your different models come from regressions using the same dataframe, you can combine it with dplyr:
models <- mtcars %>% group_by(gear) %>% do(data.frame(tidy(lm(mpg~cyl, data=.), conf.int=T)))
Source: local data frame [6 x 8]
Groups: gear
gear term estimate std.error statistic p.value conf.low conf.high
1 3 (Intercept) 29.783784 4.5468925 6.550360 1.852532e-05 19.960820 39.6067478
2 3 cyl -1.831757 0.6018987 -3.043297 9.420695e-03 -3.132080 -0.5314336
3 4 (Intercept) 41.275000 5.9927925 6.887440 4.259099e-05 27.922226 54.6277739
4 4 cyl -3.587500 1.2587382 -2.850076 1.724783e-02 -6.392144 -0.7828565
5 5 (Intercept) 40.580000 3.3238331 12.208796 1.183209e-03 30.002080 51.1579205
6 5 cyl -3.200000 0.5308798 -6.027730 9.153118e-03 -4.889496 -1.5105036

Resources