How to report a standardized model in stargazer package? - r

I made the following simple regression model and used stargazer to output a table that plots the standardized vs non-standardized regression model coefficients, standard errors and p-values.
library(lm.beta)
mod <- lm(mpg ~ cyl + disp, mtcars)
summary(mod)
mod_std <- lm.beta(mod)
summary(mod_std)$coe[, 2]
library(stargazer)
stargazer(mod, mod_std,
coef = list(mod$coefficients,
mod_std$standardized.coefficients),
type='text')
And this is the output:
==========================================================
Dependent variable:
----------------------------
mpg
(1) (2)
----------------------------------------------------------
cyl -1.587** -0.470
(0.712) (0.712)
disp -0.021** -0.423***
(0.010) (0.010)
Constant 34.661*** 0.000
(2.547) (2.547)
----------------------------------------------------------
Observations 32 32
R2 0.760 0.760
Adjusted R2 0.743 0.743
Residual Std. Error (df = 29) 3.055 3.055
F Statistic (df = 2; 29) 45.808*** 45.808***
==========================================================
Note: *p<0.1; **p<0.05; ***p<0.01
As can be observed here, the standard errors that are reported by the stargazer (for the coefficients) are the same for the standardized model and the non-standardized one. This is not correct as standard errors should change with the standardization of coefficients. Is there a way to report the correct standard errors? Or if not, simply remove them?
Lastly, what also changes from the standardized to the non-standardized models are the significance levels (of the coefficients). These should not change as they are not affected by standardization. Is there a way to prevent stargazer from modifying them? p or p.auto arguments maybe would work but I have no idea how to use them.
Reference for lm.beta: Stefan Behrendt (2014). lm.beta: Add Standardized Regression Coefficients to lm-Objects. R package version 1.5-1. https://CRAN.R-project.org/package=lm.beta

You would need to enter the additional values by hand, list-wise for each model, as you started with the coefficients. Standardized se=, the p= values (for the stars), ... as well as the the GOFs (R2, R2adj., ...), read options in help page: ?stargazer.
However, lm.beta appears to add nothing but the standardized coefficients, and none are yet calculated to report them.
Standardized standard errors are calculated using the formula SE*beta_star/beta.
So you could wrap a function, and calculate them, in order to fill them in the stargazer table:
std_se <- \(x) x[, 'Std. Error']*x[, 'Standardized']/x[, 'Estimate']
std_se(summary(mod_std)$coefficients)
# (Intercept) cyl disp
# 0.0000000 0.2109356 0.2109356
However, it might definitely be easier to calculate a actual standardized model
mod_std2 <- lm(mpg ~ cyl + disp, as.data.frame(scale(mtcars)))
summary(mod_std2) |> getElement('coefficients')
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 34.66099474 2.54700388 13.608536 4.022869e-14
# cyl -1.58727681 0.71184427 -2.229809 3.366495e-02
# disp -0.02058363 0.01025748 -2.006696 5.418572e-02
and put that in:
stargazer(mod, mod_std2, type='text')
# ==========================================================
# Dependent variable:
# ----------------------------
# mpg
# (1) (2)
# ----------------------------------------------------------
# cyl -1.587** -0.470**
# (0.712) (0.211)
# disp -0.021* -0.423*
# (0.010) (0.211)
# Constant 34.661*** -0.000
# (2.547) (0.090)
# ----------------------------------------------------------
# Observations 32 32
# R2 0.760 0.760
# Adjusted R2 0.743 0.743
# Residual Std. Error (df = 29) 3.055 0.507
# F Statistic (df = 2; 29) 45.808*** 45.808***
# ==========================================================
# Note: *p<0.1; **p<0.05; ***p<0.01

I managed to make the following script:
stargazer(mod_std,
coef=list(mod_std$standardized.coefficients),
se=list(summary(mod_std)$coe[, 2]),
p=list(summary(mod)$coe[, 4]),
type='text',
omit.stat = c("all"),
keep = c("cyl","disp"),
report = c('vcp'), notes.append = FALSE,
notes = "Coefficients are standardized")
With the following output:
===================================
Dependent variable:
-----------------------------
mpg
-----------------------------------
cyl -0.470
p = 0.034
disp -0.423
p = 0.055
===================================
===================================
Note: Coefficients are standardized
Here, the standardized coefficients are reported, together with the p-values from the original model (which should be unchanged across standardization).

Related

What is ref in fixest's feols used for fixed-effect estimation?

I am going through an R example of using interaction terms in a fixed effect model. The example can be found here.
The example uses the fixest package and uses the syntax var::fe(ref). I don't understand what ref is and what it does here. How do I select the value for ref?
I have come across this explanation on Google: "You can interact a numeric variable with a "factor-like" variable by using i(factor_var, continuous_var, ref), where continuous_var will be interacted with each value of factor_var and the argument ref is a value of factor_var taken as a reference (optional)." - I do not understand the role of this "reference" here.
Any insight will be highly appreciated.
When you estimate a model with a categorical predictors entered as a series of dummy variables or, equivalent, a fixed effects models, you must always omit one of the dummies to avoid perfect collinearity. The dummy you omit is the “reference category”.
The choice of reference category is arbitrary, it does not change the predictions of the model, but it does affect how you interpret the coefficients of the remaining dummy variables. This is well-known, and in most regression intro textbooks.
In fixest, you can use the ref argument of the i() function to determine which category will be omitted. Below, you will see that the drat coefficient stays exactly the same, but that the other coefficients change because the reference category changes:
library(fixest)
library(modelsummary)
mod1 <- lm(mpg ~ drat + factor(cyl) * hp, data = mtcars)
mod2 <- feols(mpg ~ drat + hp * i(cyl), data = mtcars)
#> The variable 'hp:cyl::8' has been removed because of collinearity (see $collin.var).
mod3 <- feols(mpg ~ drat + hp * i(cyl, ref = 8), data = mtcars)
models <- list(mod1, mod2, mod3)
modelsummary(models, fmt = 6)
Model 1
Model 2
Model 3
(Intercept)
26.771696
26.771696
13.796313
(8.719507)
(8.719507)
(5.057123)
drat
1.939525
1.939525
1.939525
(1.646230)
(1.646230)
(1.646230)
factor(cyl)6
-12.041741
(7.883606)
factor(cyl)8
-12.975383
(6.689497)
hp
-0.096854
-0.023706
-0.023706
(0.047378)
(0.018221)
(0.018221)
factor(cyl)6 × hp
0.080976
(0.071010)
factor(cyl)8 × hp
0.073149
(0.052855)
cyl = 6
-12.041741
0.933642
(7.883606)
(7.341465)
cyl = 8
-12.975383
(6.689497)
hp × cyl = 4
-0.073149
-0.073149
(0.052855)
(0.052855)
hp × cyl = 6
0.007828
0.007828
(0.053174)
(0.053174)
cyl = 4
12.975383
(6.689497)
Num.Obs.
32
32
32
R2
0.799
0.799
0.799
R2 Adj.
0.751
0.751
0.751
AIC
169.4
169.4
169.4
BIC
181.1
181.1
181.1
Log.Lik.
-76.677
F
16.601
RMSE
2.66
2.66
2.66
Std.Errors
IID
IID

How to have both lm and fixest model with different vcov standard errors in modelsummary package?

I'd like to create an LM with HC3 corrected standard errors and a fixest model with cluster robust standard errors in the same table.
see my MRE below:
df <- mtcars
models <- list()
models[[1]] <- lm(cyl~disp, data = df)
models[[2]] <- feols(cyl~disp|as.factor(gear), data = df)
library(modelsummary)
# this works
modelsummary::modelsummary(models)
# but these do not
modelsummary::modelsummary(models, vcov = c("HC3", "cluster"))
modelsummary::modelsummary(models, vcov = c(HC3, cluster))
modelsummary::modelsummary(models, vcov = list(HC3, cluster))
modelsummary::modelsummary(models, vcov = list(vcovHC, cluster))
modelsummary::modelsummary(models, vcov = list(vcovHC, vcovHC))
modelsummary::modelsummary(models, vcov = c(vcovHC, vcovHC))
Okay--I figured out a hack, but still leaving question open in case someone finds out a more slick solution.
df <- mtcars
models <- list()
fit <- lm(cyl~disp, data = df)
models[[1]] <- coeftest(fit, vcovHC(fit, type = "HC3"))
models[[2]] <- summary(feols(cyl~disp|as.factor(gear), data = df), "cluster")
library(modelsummary)
# this works
modelsummary::modelsummary(models)
By default fixest automatically computes cluster-robust standard errors when you include fixed effects. If you set vcov=NULL, modelsummary will return the default standard errors, which will then be cluster-robust.
Alternatively, you can set vcov=~gear to compute the standard errors using the sandwich::vcovCL function under the hood.
Using version 0.10.0 of modelsummary and 0.10.4 of fixest, we can do:
library(modelsummary)
library(fixest)
models <- list(
lm(cyl~disp, data = mtcars),
feols(cyl ~ disp | gear, data = mtcars),
feols(cyl ~ disp | gear, data = mtcars))
modelsummary(models,
fmt = 6,
vcov = list("HC3", NULL, ~gear))
Model 1
Model 2
Model 3
(Intercept)
3.188568
(0.289130)
disp
0.012998
0.012061
0.012061
(0.001310)
(0.002803)
(0.002803)
Num.Obs.
32
32
32
R2
0.814
0.819
0.819
R2 Adj.
0.807
0.800
0.800
R2 Within
0.614
0.614
R2 Pseudo
AIC
79.1
80.2
80.2
BIC
83.5
86.1
86.1
Log.Lik.
-36.573
-36.107
-36.107
F
98.503
Std.Errors
HC3
by: gear
by: gear
FE: gear
X
X
Notice that the results are the same in the 2nd and 3rd column, and also equal to the plain fixest summary:
summary(models[[2]])
#> OLS estimation, Dep. Var.: cyl
#> Observations: 32
#> Fixed-effects: gear: 3
#> Standard-errors: Clustered (gear)
#> Estimate Std. Error t value Pr(>|t|)
#> disp 0.012061 0.002803 4.30299 0.049993 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.747808 Adj. R2: 0.799623
#> Within R2: 0.614334
In some past versions of modelsummary there were issues with labelling of the standard errors at the bottom of the table. I will soon be working on a more robust system to make sure the label matches the standard errors. In most cases, modelsummary calculates the robust standard errors itself, so it has full control over labelling and computation. Packages like fixest and estimatr make things a bit more tricky because they sometimes hold several SEs, and because the default is not always “classical”.

How to correctly input custom standard errors with stagazer

I'm doing some quantile regressions, and want to format the resulting tables with stargazer. However, stargazer doesn't have an option to calculate standard errors with bootstraps for clustered data, which is what I need. So, I will need to input the standard errors manually, using stagazer's se = argument, but I'm not sure how it works, exactly.
model <- lm(mpg ~ wt, data = mtcars)
stargazer(model, type = 'text', se = list(1, 1))
===============================================
Dependent variable:
---------------------------
mpg
-----------------------------------------------
wt -5.344
Constant 37.285***
(1.000)
-----------------------------------------------
Observations 32
R2 0.753
Adjusted R2 0.745
Residual Std. Error 3.046 (df = 30)
F Statistic 91.375*** (df = 1; 30)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
In the code and output above, there are two coefficients, one for the wt variables and one for the intercept. In the se = argument, I entered two arbitrary standard errors, one for each coefficient. However, the output shows only the standard error for the intercept, but not for the other variable.
Any idea what is happening?

Omit output of a factor variable in stargazer?

If I'm running a fixed effects model in r using lm and the factor command, how can I suppress the factor variable coefficients in a stargazer model?
i.e. my model is:
m1<-lm(GDP~pop_growth + factor(city))
and I want to report findings with only an intercept and coefficient on pop_growth, not coefficients on every dummy variable for cities.
EDIT: Issue was, as it turns out, with variable name encoding. omit="city" works.
As the author said there is an omit option:
library(stargazer)
model<-lm(mpg~disp+factor(cyl), data=mtcars)
stargazer(model, type="text", omit="cyl")
===============================================
Dependent variable:
---------------------------
mpg
-----------------------------------------------
disp -0.027**
(0.011)
Constant 29.535***
(1.427)
-----------------------------------------------
Observations 32
R2 0.784
Adjusted R2 0.760
Residual Std. Error 2.950 (df = 28)
F Statistic 33.807*** (df = 3; 28)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01

Appending statistics to coeftest output to include in stargazer tables

I have a glm model for which I use coeftest from the lmtest package to estimate robust standard errors. When I use stargazer to produce regression tables I get the correct results but without the number of observations and other relevant statistics like the null deviance and the model deviance.
Here's an example:
library(lmtest)
library(stargazer)
m1 <- glm(am ~ mpg + cyl + disp, mtcars, family = binomial)
# Simple binomial regression
# For whatever reason, let's say I want to use coeftest to estimate something
m <- coeftest(m1)
stargazer(m, type = "text", single.row = T) # This is fine, but I want to also include the number of observations
# the null deviance and the model deviance.
I'm specifically interested in the number of observations, the null deviance and the residual deviance.
I thought that If I replaced the old coefficient matrix with the new one, I'd get the correct estimates with the correct statistics and stargazer would recognize the model and print it correctly. For that, I've tried substituting the coefficients, SE's, z statistic and p values from the coeftest model in the m1 model but some of these statistics are computed with summary.glm and are not included in the m1 output. I could easily substitute these coefficients in the summary output but stargazer doesn't recognize summary type class. I've tried adding attributes to the m object with the specific statistics but they don't show up in the output and stargazer doesn't recognize it.
Note: I know stargazer can compute robust SE's but I'm also doing other computations, so the example needs to include the coeftest output.
Any help is appreciated.
It may be easiest to pass the original models into stargazer, and then use coeftest to pass in custom values for standard errors (se = ), confidence intervals (ci.custom = ) and/or p values (p = ). See below for how to easily handle a list containing multiple models.
suppressPackageStartupMessages(library(lmtest))
suppressPackageStartupMessages(library(stargazer))
mdls <- list(
m1 = glm(am ~ mpg, mtcars, family = poisson),
m2 = glm(am ~ mpg + cyl + disp, mtcars, family = poisson)
)
# Calculate robust confidence intervals
se_robust <- function(x)
coeftest(x, vcov. = sandwich::sandwich)[, "Std. Error"]
# Original SE
stargazer(mdls, type = "text", single.row = T, report = "vcsp")
#>
#> ===============================================
#> Dependent variable:
#> -----------------------------
#> am
#> (1) (2)
#> -----------------------------------------------
#> mpg 0.106 (0.042) 0.028 (0.083)
#> p = 0.012 p = 0.742
#> cyl 0.435 (0.496)
#> p = 0.381
#> disp -0.014 (0.009)
#> p = 0.151
#> Constant -3.247 (1.064) -1.488 (3.411)
#> p = 0.003 p = 0.663
#> -----------------------------------------------
#> Observations 32 32
#> Log Likelihood -21.647 -20.299
#> Akaike Inf. Crit. 47.293 48.598
#> ===============================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
# With robust SE
stargazer(
mdls, type = "text", single.row = TRUE, report = "vcsp",
se = lapply(mdls, se_robust))
#>
#> ===============================================
#> Dependent variable:
#> -----------------------------
#> am
#> (1) (2)
#> -----------------------------------------------
#> mpg 0.106 (0.025) 0.028 (0.047)
#> p = 0.00002 p = 0.560
#> cyl 0.435 (0.292)
#> p = 0.137
#> disp -0.014 (0.007)
#> p = 0.042
#> Constant -3.247 (0.737) -1.488 (2.162)
#> p = 0.00002 p = 0.492
#> -----------------------------------------------
#> Observations 32 32
#> Log Likelihood -21.647 -20.299
#> Akaike Inf. Crit. 47.293 48.598
#> ===============================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
Created on 2020-11-09 by the reprex package (v0.3.0)
If I get you right, you could try the following:
First, assign your stargazer analysis to an object like this
stargazer.values <- stargazer(m, type = "text", single.row = T)
then check the code of the stargazer command with body(stargazer).
Hopefully you can find objects for values that stargazers uses but does not report. You can then address them like this (if there is, for example, an object named "null.deviance"
stargazers.values$null.deviance
Or, if it is part of another data frame, say df, it could go like this
stargazers.values$df$null.deviance
maybe a code like this could be helpful
print(null.deviance <- stargazers.values$null.deviance)
Hope this helps!

Resources