Reference category in regression table - r

I've got results from a linear regression model with a factor variable in R that I would like pretty up and then output into LaTeX. Ideally the factor variable would be presented in the table via a row that gives the name of the variable and the reference category but is otherwise blank and then rows with indented text below that give the levels of the factor together with the corresponding estimates.
I've long used the stargazer package to get regression results from R into LaTeX but see no way of achieving the result I want with it. An example:
library(ggplot2)
library(stargazer)
levels(diamonds$cut)
options(contrasts = c("contr.treatment", "contr.treatment"))
model1 <- lm(price~cut,data=diamonds)
stargazer(model1,type='text')
This yields the default output:
===============================================
Dependent variable:
---------------------------
price
-----------------------------------------------
cutGood -429.893***
(113.849)
cutVery Good -376.998***
(105.164)
cutPremium 225.500**
(104.395)
cutIdeal -901.216***
(102.412)
Constant 4,358.758***
(98.788)
-----------------------------------------------
Observations 53,940
R2 0.013
Adjusted R2 0.013
Residual Std. Error 3,963.847 (df = 53935)
F Statistic 175.689*** (df = 4; 53935)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
Here's what I want:
===============================================
Dependent variable:
---------------------------
price
-----------------------------------------------
Cut (Reference: Fair)
Good -429.893***
(113.849)
Very Good -376.998***
(105.164)
Premium 225.500**
(104.395)
Ideal -901.216***
(102.412)
Constant 4,358.758***
(98.788)
-----------------------------------------------
Observations 53,940
R2 0.013
Adjusted R2 0.013
Residual Std. Error 3,963.847 (df = 53935)
F Statistic 175.689*** (df = 4; 53935)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
Is there any way to achieve this in stargazer without too much hackery? Are there other packages in which this would be simpler to do?

Not entirely what you wanted, but you're able to manually specify covariate labels via the covariate.labels argument. I haven't been able to find out how you could add a header, though, requiring you to manually add the linebreak.
stargazer(model1,type='text',
covariate.labels=c("Cut (Reference: Fair) Good",
". Very good",
". Premium",
". Ideal"))
======================================================
Dependent variable:
---------------------------
price
------------------------------------------------------
Cut (Reference: Fair) Good -429.893***
(113.849)
. Very good -376.998***
(105.164)
. Premium 225.500**
(104.395)
. Ideal -901.216***
(102.412)
Constant 4,358.758***
(98.788)
------------------------------------------------------
Observations 53,940
R2 0.013
Adjusted R2 0.013
Residual Std. Error 3,963.847 (df = 53935)
F Statistic 175.689*** (df = 4; 53935)
======================================================
Note: *p<0.1; **p<0.05; ***p<0.01

This gives reasonably close to what was desired as an ASCII output. Whether it succeeds in Latex will require that you test it. The handling of \n may not have the same side-effects there.
stargazer(model1,type='text', column.labels="\nCut (Reference: Fair)",
covariate.labels=c(". Good",
". Very good",
". Premium",
". Ideal"))
Console:
=================================================
Dependent variable:
---------------------------
price
Cut (Reference: Fair)
-------------------------------------------------
. Good -429.893***
(113.849)
. Very good -376.998***
(105.164)
. Premium 225.500**
(104.395)
. Ideal -901.216***
(102.412)
Constant 4,358.758***
(98.788)
-------------------------------------------------
Observations 53,940
R2 0.013
Adjusted R2 0.013
Residual Std. Error 3,963.847 (df = 53935)
F Statistic 175.689*** (df = 4; 53935)
=================================================
Note: *p<0.1; **p<0.05; ***p<0.01

Related

How to report a standardized model in stargazer package?

I made the following simple regression model and used stargazer to output a table that plots the standardized vs non-standardized regression model coefficients, standard errors and p-values.
library(lm.beta)
mod <- lm(mpg ~ cyl + disp, mtcars)
summary(mod)
mod_std <- lm.beta(mod)
summary(mod_std)$coe[, 2]
library(stargazer)
stargazer(mod, mod_std,
coef = list(mod$coefficients,
mod_std$standardized.coefficients),
type='text')
And this is the output:
==========================================================
Dependent variable:
----------------------------
mpg
(1) (2)
----------------------------------------------------------
cyl -1.587** -0.470
(0.712) (0.712)
disp -0.021** -0.423***
(0.010) (0.010)
Constant 34.661*** 0.000
(2.547) (2.547)
----------------------------------------------------------
Observations 32 32
R2 0.760 0.760
Adjusted R2 0.743 0.743
Residual Std. Error (df = 29) 3.055 3.055
F Statistic (df = 2; 29) 45.808*** 45.808***
==========================================================
Note: *p<0.1; **p<0.05; ***p<0.01
As can be observed here, the standard errors that are reported by the stargazer (for the coefficients) are the same for the standardized model and the non-standardized one. This is not correct as standard errors should change with the standardization of coefficients. Is there a way to report the correct standard errors? Or if not, simply remove them?
Lastly, what also changes from the standardized to the non-standardized models are the significance levels (of the coefficients). These should not change as they are not affected by standardization. Is there a way to prevent stargazer from modifying them? p or p.auto arguments maybe would work but I have no idea how to use them.
Reference for lm.beta: Stefan Behrendt (2014). lm.beta: Add Standardized Regression Coefficients to lm-Objects. R package version 1.5-1. https://CRAN.R-project.org/package=lm.beta
You would need to enter the additional values by hand, list-wise for each model, as you started with the coefficients. Standardized se=, the p= values (for the stars), ... as well as the the GOFs (R2, R2adj., ...), read options in help page: ?stargazer.
However, lm.beta appears to add nothing but the standardized coefficients, and none are yet calculated to report them.
Standardized standard errors are calculated using the formula SE*beta_star/beta.
So you could wrap a function, and calculate them, in order to fill them in the stargazer table:
std_se <- \(x) x[, 'Std. Error']*x[, 'Standardized']/x[, 'Estimate']
std_se(summary(mod_std)$coefficients)
# (Intercept) cyl disp
# 0.0000000 0.2109356 0.2109356
However, it might definitely be easier to calculate a actual standardized model
mod_std2 <- lm(mpg ~ cyl + disp, as.data.frame(scale(mtcars)))
summary(mod_std2) |> getElement('coefficients')
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 34.66099474 2.54700388 13.608536 4.022869e-14
# cyl -1.58727681 0.71184427 -2.229809 3.366495e-02
# disp -0.02058363 0.01025748 -2.006696 5.418572e-02
and put that in:
stargazer(mod, mod_std2, type='text')
# ==========================================================
# Dependent variable:
# ----------------------------
# mpg
# (1) (2)
# ----------------------------------------------------------
# cyl -1.587** -0.470**
# (0.712) (0.211)
# disp -0.021* -0.423*
# (0.010) (0.211)
# Constant 34.661*** -0.000
# (2.547) (0.090)
# ----------------------------------------------------------
# Observations 32 32
# R2 0.760 0.760
# Adjusted R2 0.743 0.743
# Residual Std. Error (df = 29) 3.055 0.507
# F Statistic (df = 2; 29) 45.808*** 45.808***
# ==========================================================
# Note: *p<0.1; **p<0.05; ***p<0.01
I managed to make the following script:
stargazer(mod_std,
coef=list(mod_std$standardized.coefficients),
se=list(summary(mod_std)$coe[, 2]),
p=list(summary(mod)$coe[, 4]),
type='text',
omit.stat = c("all"),
keep = c("cyl","disp"),
report = c('vcp'), notes.append = FALSE,
notes = "Coefficients are standardized")
With the following output:
===================================
Dependent variable:
-----------------------------
mpg
-----------------------------------
cyl -0.470
p = 0.034
disp -0.423
p = 0.055
===================================
===================================
Note: Coefficients are standardized
Here, the standardized coefficients are reported, together with the p-values from the original model (which should be unchanged across standardization).

How to correctly input custom standard errors with stagazer

I'm doing some quantile regressions, and want to format the resulting tables with stargazer. However, stargazer doesn't have an option to calculate standard errors with bootstraps for clustered data, which is what I need. So, I will need to input the standard errors manually, using stagazer's se = argument, but I'm not sure how it works, exactly.
model <- lm(mpg ~ wt, data = mtcars)
stargazer(model, type = 'text', se = list(1, 1))
===============================================
Dependent variable:
---------------------------
mpg
-----------------------------------------------
wt -5.344
Constant 37.285***
(1.000)
-----------------------------------------------
Observations 32
R2 0.753
Adjusted R2 0.745
Residual Std. Error 3.046 (df = 30)
F Statistic 91.375*** (df = 1; 30)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
In the code and output above, there are two coefficients, one for the wt variables and one for the intercept. In the se = argument, I entered two arbitrary standard errors, one for each coefficient. However, the output shows only the standard error for the intercept, but not for the other variable.
Any idea what is happening?

How to keep just one variable in stargazer regression output? (oposite of "omit")

Does anyone know what could be the opposite of stargazer's argument "omit" when making a regression table output?
I'm trying to show just one (or a few) covariates from a regression. I know one could use "omit" and then list all the variable's names which one doesn't want to be shown in the output, but is there any way to call variable's names one actually wants to maintain in the final table?
I'm having a hard time dealing with interactions between dummy variables directly called within a linear model. For example, let's say I want to run the following model:
# Libraries
library(stargazer)
# Data:
data <- data.frame(
"Y" = rnorm(100,20,45),
"Dummy1" = sample(c(1,0),100, replace = T),
"Dummy2" = sample(c(1,0),100, replace = T),
"Dummy3" =sample(c(1,0),100, replace = T))
# Model:
model1 <- lm(Y ~ Dummy1*Dummy2*Dummy3, data)
And let's say I want to report in the output stargazer table only the triple interaction. But when I try, for example, to remove the results of the simple variable "Dummy1", stargazer drops all the variables that begin with "Dummy1" therefore also removing the triple interaction.
# Problem
stargazer(model1, type = "text", omit = "Dummy1")
===============================================
Dependent variable:
---------------------------
Y
-----------------------------------------------
Dummy2 23.705
(17.236)
Dummy3 19.221
(17.591)
Dummy2:Dummy3 -25.568
(23.908)
Constant 5.373
(12.188)
-----------------------------------------------
Observations 100
R2 0.099
Adjusted R2 0.031
Residual Std. Error 43.943 (df = 92)
F Statistic 1.450 (df = 7; 92)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
How do I make a table with just the triple interaction's result ? Any guess?
Instead of using omit, you can use keep to keep only the variables that you need.
stargazer::stargazer(model1, type = "text", keep = 'Dummy1:Dummy2:Dummy3')
================================================
Dependent variable:
---------------------------
Y
------------------------------------------------
Dummy1:Dummy2:Dummy3 42.430
(35.315)
------------------------------------------------
Observations 100
R2 0.145
Adjusted R2 0.080
Residual Std. Error 43.587 (df = 92)
F Statistic 2.222** (df = 7; 92)
================================================
Note: *p<0.1; **p<0.05; ***p<0.01
The main effects and constant terms can be matched with
stargazer(model1, type = "text",omit="^.{6,8}$") # terms with length 6 to 8 characters
Or:
stargazer(model1, type = "text",keep="^[^:]+$") #not any :
The two variable effects can be matched with:
stargazer(model1, type = "text",omit="^[^:]{6}[:][^:]{6}$") # not-:*6,then :, then not-:*6
So the combination can be matched with:
stargazer(model1, type = "text",omit="^.{6,8}$|^[^:]{6}[:][^:]{6}$")
The more general version to Ronak Shah's approach of using keep parameter patterns would be:
stargazer(model1, type = "text",keep="[:].+[:]") #keeps any with 2 (or more) interaction variables

Can I add standard errors *and* confidence intervals in a single column using Stargazer?

I like Stargazer quite a bit but am running into some issues trying to report standard errors and confidence intervals all in a single table.
Consider this simple regression as a reproducible example:
set.seed(04152020)
x <- rnorm(100)
y <- 2*x + rnorm(100)
m1 <- lm(y~x)
I can report standard errors and confidence intervals in two separate tables no problem using the ci option.
library(stargazer)
# standard errors
stargazer(m1, type = "text")
# confidence intervals
stargazer(m1, ci = FALSE, type = "text")
A workaround to get them into a single table is to "report" the model twice, but then the coefficients are repeated unnecessarily. For example, the following code:
stargazer(list(m1, m1),
ci = c(FALSE, TRUE),
type = "text")
Produces:
==========================================================
Dependent variable:
----------------------------
y
(1) (2)
----------------------------------------------------------
x 1.981*** 1.981***
(0.110) (1.766, 2.196)
Constant -0.218** -0.218**
(0.104) (-0.421, -0.014)
----------------------------------------------------------
Observations 100 100
R2 0.769 0.769
Adjusted R2 0.766 0.766
Residual Std. Error (df = 98) 1.032 1.032
F Statistic (df = 1; 98) 325.893*** 325.893***
==========================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Is there a way to put both standard errors and confidence intervals into a single column automatically, like you can do with p-values? E.g. this code:
stargazer(m1,
ci = c(FALSE, TRUE),
report = ('vcsp'),
type = "text")
Produces exactly what I want, but with p-values, and the documentation for the option that allows for it—report—seems to only allow the choice for p-values, as indicated by this question and answer.
===============================================
Dependent variable:
---------------------------
y
-----------------------------------------------
x 1.981
(0.110)
p = 0.000
Constant -0.218
(0.104)
p = 0.039
-----------------------------------------------
Observations 100
R2 0.769
Adjusted R2 0.766
Residual Std. Error 1.032 (df = 98)
F Statistic 325.893*** (df = 1; 98)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
I don't know how to do this in stargazer, but you can easily achieve the desired result with the modelsummary package. (Disclaimer: I am the author.) The
library(modelsummary)
set.seed(04152020)
x <- rnorm(100)
y <- 2*x + rnorm(100)
m1 <- lm(y~x)
modelsummary(m1, statistic = c("std.error", "conf.int"))
You can also do crazy things like this, as described on the website:
modelsummary(models, gof_omit = ".*",
statistic = c("conf.int",
"s.e. = {std.error}",
"t = {statistic}",
"p = {p.value}"))

Omit output of a factor variable in stargazer?

If I'm running a fixed effects model in r using lm and the factor command, how can I suppress the factor variable coefficients in a stargazer model?
i.e. my model is:
m1<-lm(GDP~pop_growth + factor(city))
and I want to report findings with only an intercept and coefficient on pop_growth, not coefficients on every dummy variable for cities.
EDIT: Issue was, as it turns out, with variable name encoding. omit="city" works.
As the author said there is an omit option:
library(stargazer)
model<-lm(mpg~disp+factor(cyl), data=mtcars)
stargazer(model, type="text", omit="cyl")
===============================================
Dependent variable:
---------------------------
mpg
-----------------------------------------------
disp -0.027**
(0.011)
Constant 29.535***
(1.427)
-----------------------------------------------
Observations 32
R2 0.784
Adjusted R2 0.760
Residual Std. Error 2.950 (df = 28)
F Statistic 33.807*** (df = 3; 28)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01

Resources