Formatting multiple models on top of each other in stargazer - r

I want to format multiple univariate model outputs on top of one another using stargazer (with the same dependent variable), and I can't get them to not show up side-by-side.
data(iris)
stargazer(multinom(Species ~ Sepal.Length, data = iris),
multinom(Species ~ Sepal.Width, data = iris),
type = "text", apply.coef = exp, p.auto = FALSE, omit = "Constant")
Which gives the following output:
============================================================
Dependent variable:
------------------------------------------
versicolor virginica versicolor virginica
(1) (2) (3) (4)
------------------------------------------------------------
Sepal.Length 123.479*** 941.955***
(0.907) (1.022)
Sepal.Width 0.002*** 0.017***
(0.991) (0.844)
------------------------------------------------------------
Akaike Inf. Crit. 190.068 190.068 260.537 260.537
============================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Rather than having "versicolor" and "virginica" repeated twice for the different models, I just want each of them once, with the different model predictors and estimates underneath one another.
Is there any way of doing this?

starpolishr can do the job of panel format model outputs, but only for latex objects and only for equal model statistics.
install.packages("remotes")
remotes::install_github("ChandlerLutz/starpolishr")
## -- Regressoin example -- ##
library(stargazer)
data(mtcars)
##First set up models without weight
mod.mtcars.1 <- lm(mpg ~ hp, mtcars)
mod.mtcars.2 <- lm(mpg ~ hp + cyl, mtcars)
star.out.1 <- stargazer(mod.mtcars.1, mod.mtcars.2, keep.stat = "n")
##Second set of models with weight as a regressor
mod.mtcars.3 <- lm(mpg ~ hp + wt, mtcars)
mod.mtcars.4 <- lm(mpg ~ hp + cyl + wt, mtcars)
star.out.2 <- stargazer(mod.mtcars.3, mod.mtcars.4, keep.stat = c("n", "rsq"))
##stargazer panel -- same summary statistics across panels.
star.panel.out <- star_panel(star.out.1, star.out.2,
panel.names = c("Without Weight", "With Weight")
)
print(star.panel.out)
Here you have to remove quotation marks and line numbering, and can compile as .tex Output looks like this:
It's still kind of a workaround and I cannot figure out a visual benefit of alignment model outputs in vertical/panel format. What you usually do is landscape your side-by-side tables in your final document.

Related

Adding coefficient group titles to modelsummary output

I'm trying to group coefficients together in a modelsummary output table and add row titles for these groups :
library(modelsummary)
ols1 <- lm(mpg ~ cyl + disp + hp,
data = mtcars,
na.action = na.omit
)
modelsummary(ols1,
title = "Table 1",
stars = TRUE
)
The modelsummary documentation (https://cran.r-project.org/web/packages/modelsummary/modelsummary.pdf) suggests this might be something to do with the shape and group_map arguments, but I can't really figure out how to use them.
Any guidance would be very helpful, thanks!
When the documentation mentions “groups”, it refers to models like multinomial logits where each predictor has one coefficient per outcome level. In this example, the “group” column is called “response”:
library(nnet)
library(modelsummary)
mod <- multinom(cyl ~ mpg + hp, data = mtcars, trace = FALSE)
modelsummary(
mod,
output = "markdown",
shape = response ~ model)
Model 1
(Intercept)
6
0.500
(41.760)
8
8.400
(0.502)
mpg
6
-83.069
(416.777)
8
-120.167
(508.775)
hp
6
16.230
(81.808)
8
20.307
(87.777)
Num.Obs.
32
R2
1.000
R2 Adj.
0.971
AIC
12.0
BIC
20.8
RMSE
0.00
What you probably mean is something different: Adding manual labels to sets of coefficients. This is easy to achieve because modelsummary() produces a kableExtra or a gt table which can be customized in infinite ways.
https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html
For example, you may want to look at the group_rows function from kableExtra:
library(kableExtra)
mod <- lm(mpg ~ cyl + disp + hp, data = mtcars)
modelsummary(mod) |>
group_rows(index = c("Uninteresting" = 4,
"Interesting" = 4,
"Other" = 7))

Best way to report multiple regression models on several dimensions (evolving model formulation and year of data)

Situation
I am fitting a series of evolving regression models. For the purposes of this question, we can think of these models in terms of Model A, Model B, and Model C. All models share at least one same covariate.
I am also fitting these models for two separate years of data. Again, for the purposes of this question, the years will be 2000 and 2010.
In an attempt to simplify the reporting of results, I am attempting to combine the reporting of the regressions into a single table that would have some kind of the following format:
2000 2010
Model A
Coef Ex1
Model B
Coef Ex1
Coef Ex2
Model C
Coef Ex1
Coef Ex2
Coef Ex3
The idea being that someone can look quickly at Coef Ex1 across several models and years.
What Have I Tried
I have tried to achieve the above table using both R stargazer and kable packages. With stargazer I can get the fully formatted table for a single model formulation across many years (e.g., stargazer(modelA2000, modelA2010), but I cannot figure out how to stack additional model formulations on the rows.
For kable I have been able to stack horizontal models, but I have not been able to add in additional years (e.g., coefs <- bind_rows(tidy(modelA2000), tidy(modelB2000), tidy(modelC2000)); coefs %>% kable()).
Question: how can I use stargazer or kable to report evolving regression models (which share the same covariates) in the rows but also with year of cross section on the column? I think I can somehow extend the answer posted here, although I'm not sure how.
Reproducible example
# Load the data
mtcars <- mtcars
# Create example results for models A, B, and C for 2000
modelA2000 <- lm(mpg ~ cyl, data = mtcars)
modelB2000 <- lm(mpg ~ cyl + wt, data = mtcars)
modelC2000 <- lm(mpg ~ cyl + wt + disp, data = mtcars)
# Slightly modify data for second set of results
mtcars$cyl <- mtcars$cyl*runif(1)
# Fit second set of results. Same models, pretending it's a different year.
modelA2010 <- lm(mpg ~ cyl, data = mtcars)
modelB2010 <- lm(mpg ~ cyl + wt, data = mtcars)
modelC2010 <- lm(mpg ~ cyl + wt + disp, data = mtcars)
Two notes before starting:
You want a pretty "custom" table, so it is almost inevitable that some manual operations will be required.
My answer relies on the development version of modelsummary, which you can install like this:
library(remotes)
install_github("vincentarelbundock/modelsummary")
We will need 4 concepts, many of them related to the broom package:
broom::tidy a function that takes a statistical model and returns a data.frame of estimates with one row per coefficient.
broom::glance a function that takes a statistical model and returns a one-row data.frame with model characteristics (e.g., number of observations)
modelsummary_list a list with 2 elements called "tidy" and "glance", and with a class name of "modelsummary_list".
The modelsummary package allows you to draw regression tables. Under the hood, it uses broom::tidy and broom::glance to extract information from those models. Users can also supply their own information about a model by supplying a list to which we assign the class modelsummary_list, as documented here.
EDIT: The recommended way to do this in modelsummary is now to use the group argument. Scroll to the end of this post for illustrative code.
Obsolete example with useful discussion
The modelsummary_wide is a function that was initially designed to "stack" results from several models with several groups of coefficients. This is useful for things like multinomial models, but it also helps us in your case, where you have multiple models in multiple groups (here: years).
First, we load packages, tweak the data, and estimate our models:
library(modelsummary)
library(broom)
library(dplyr)
mtcars2010 <- mtcars
mtcars2010$cyl <- mtcars$cyl * runif(1)
models <- list(
"A" = list(
lm(mpg ~ cyl, data = mtcars),
lm(mpg ~ cyl, data = mtcars2010)),
"B" = list(
lm(mpg ~ cyl + wt, data = mtcars),
lm(mpg ~ cyl + wt, data = mtcars2010)),
"C" = list(
lm(mpg ~ cyl + wt + disp, data = mtcars),
lm(mpg ~ cyl + wt + disp, data = mtcars2010)))
Notice that we saved our models in three groups, in a list of list.
Then, we define a tidy_model function that accepts a list of two models (one per year), combines the information on those two models, and creates a modelsummary_list object (again, please refer to the documentation). Note that we assign the "year" information to a "group" column in the tidy object.
We apply this function to each of our three groups of models using lapply.
tidy_model <- function(model_list) {
# tidy estimates
tidy_2000 <- broom::tidy(model_list[[1]])
tidy_2010 <- broom::tidy(model_list[[2]])
# create a "group" column
tidy_2000$group <- 2000
tidy_2010$group <- 2010
ti <- bind_rows(tidy_2000, tidy_2010)
# glance estimates
gl <- data.frame("N" = stats::nobs(model_list[[1]]))
# output
out <- list(tidy = ti, glance = gl)
class(out) <- "modelsummary_list"
return(out)
}
models <- lapply(models, tidy_model)
Finally, we call the modelsummary_wide with the stacking="vertical" argument to obtain this table:
modelsummary_wide(models, stacking = "vertical")
Of course, the table can be adjusted, coefficients renamed, etc. using the other arguments of the modelsummary_wide function or with kableExtra or some other package supported by the output argument.
More modern example without detailed explanation
library("modelsummary")
library("broom")
library("quantreg")
mtcars2010 <- mtcars
mtcars2010$cyl <- mtcars$cyl * runif(1)
models <- list(
"A" = list(
"2000" = rq(mpg ~ cyl, data = mtcars),
"2010" = rq(mpg ~ cyl, data = mtcars2010)),
"B" = list(
"2000" = rq(mpg ~ cyl + wt, data = mtcars),
"2010" = rq(mpg ~ cyl + wt, data = mtcars2010)),
"C" = list(
"2000" = rq(mpg ~ cyl + wt + disp, data = mtcars),
"2010" = rq(mpg ~ cyl + wt + disp, data = mtcars2010)))
tidy_model <- function(model_list) {
# tidy estimates
tidy_2000 <- broom::tidy(model_list[[1]])
tidy_2010 <- broom::tidy(model_list[[2]])
# create a "group" column
tidy_2000$group <- "2000"
tidy_2010$group <- "2010"
ti <- bind_rows(tidy_2000, tidy_2010)
# output
out <- list(tidy = ti, glance = data.frame("nobs 2010" = length(model_list[[1]]$fitted.values)))
class(out) <- "modelsummary_list"
return(out)
}
models <- lapply(models, tidy_model)
modelsummary(models,
group = model + term ~ group,
statistic = "conf.int")
2000
2010
A
(Intercept)
36.800
36.800
[30.034, 42.403]
[30.034, 42.403]
cyl
-2.700
-67.944
[-3.465, -1.792]
[-87.204, -45.102]
B
(Intercept)
38.871
38.871
[30.972, 42.896]
[30.972, 42.896]
cyl
-1.743
-43.858
[-2.154, -0.535]
[-54.215, -13.472]
wt
-2.679
-2.679
[-5.313, -1.531]
[-5.313, -1.531]
C
(Intercept)
40.683
40.683
[31.235, 47.507]
[31.235, 47.507]
cyl
-1.993
-50.162
[-3.137, -1.322]
[-78.948, -33.258]
wt
-2.937
-2.937
[-5.443, -1.362]
[-5.443, -1.362]
disp
0.003
0.003
[-0.009, 0.035]
[-0.009, 0.035]

using "at" argument of margins function in R for logit model

I want to be able to analyze the marginal effect of continuous and binary variables in a logit model. I am hoping for R to provide what the independent marginal effect of hp is at its mean (in this example that is at 200), while also finding the marginal effect of the vs variable equaling 1. I am hoping the output table also includes the SE, p value, and z score. I am having trouble with the table and when I have gotten it to run it doesn't evaluate the two variables independently. Here is an MRE below. Thank you!
mod2 <- glm(am ~ hp + factor(vs), data=mtcars, family=binomial)
margins(mod2)
#> Average marginal effects
#> glm(formula = am ~ hp + factor(vs), family = binomial, data = mtcars)
#> hp vs1
#> -0.00203 -0.03154
#code where I am trying to evaluate at the desired values.
margins(mod2, at=list(hp=200, vs=1))
This is because you've changed vs to a factor.
Consider the following
library(margins)
mod3 <- glm(am ~ hp + vs, data=mtcars, family=binomial)
margins(mod3, at=list(hp=200, vs=1))
# Average marginal effects at specified values
# glm(formula = am ~ hp + vs, family = binomial, data = mtcars)
#
# at(hp) at(vs) hp vs
# 200 1 -0.001783 -0.02803
There is no real reason to turn vs into a factor here; it's dichotomous.

margins.plot: using the 'which' argument to choose which margins to include in plot

I am trying to plot marginal effects in r based on a logistic regression. For example:
data <- mtcars
mod <- glm(am ~ cyl + hp + wt + mpg, family = binomial, data = data)
library(margins)
marg <- margins(mod, atmeans = TRUE)
summary(marg)
I can run the margins plot command:
plot(marg)
which plots marginal effects and confidence intervals for all of the IVs. I only want to include in the plot cyl and hp, my explanatory variables of interest. According to r documentation, this can be accomplished using the 'which' argument, which takes a character vector. However, the documentation doesn't say how to use this argument. Does anyone know how to use the 'which' argument to ask margins.plot to plot only select marginal effects? Unfortunately, the margins plot help page, linked above, does not have any examples.
plot image
Before plotting, we can specify variables of interest with the variables option within the margins()function.
mod <- glm(am ~ cyl + hp + wt + mpg, family=binomial, data=mtcars)
library(margins)
marg <- margins(mod, variables=c("cyl", "hp"))
plot(marg)
Gives:

lapply to estimate with many dependent variables then tabulate with Stargazer

I'm trying to: (1) estimate multiple models where only the dependent variable changes (2) Tabulate the results with the Stargazer package
The following code works, but I have to repeat a line of code for each model:
library(stargazer)
data(mtcars)
reg1 <- lm(mpg ~ am + gear + carb, data=mtcars)
reg2 <- lm(cyl ~ am + gear + carb, data=mtcars)
reg3 <- lm(disp ~ am + gear + carb, data=mtcars)
stargazer(reg1, reg2, reg3,
title="Regression Results", type="text",
df=FALSE, digits=3)
You can see that the (trimmed) output has the correct headings for the dependent variables (mpg, cyl, disp):
Regression Results
==================================================
Dependent variable:
------------------------------
mpg cyl disp
(1) (2) (3)
--------------------------------------------------
am 3.545* -0.176 -40.223
(1.897) (0.615) (48.081)
If I use lapply and paste, it ends up changing the headings of the dependent variables in stargazer:
dependents <- c('mpg', 'cyl', 'disp')
outs <- lapply(dependents, function(x) {
fit <- lm(paste(x,'~', 'am + gear + carb'), data=mtcars)})
stargazer(outs[[1]], outs[[2]], outs[[3]],
title="Regression Results", type="text",
df=FALSE, digits=3)
gives the output where x is the heading for the dependent variables:
Regression Results
==================================================
Dependent variable:
------------------------------
x
(1) (2) (3)
--------------------------------------------------
am 3.545* -0.176 -40.223
(1.897) (0.615) (48.081)
Is there any way for me to fix this? Thank you.
If you create the formula before you run the regression it should work. I just separated the formula creation and the regression.
dependents <- c('mpg', 'cyl', 'disp')
outs <- lapply(dependents, function(x) {
formula <- as.formula(paste(x,'~', 'am + gear + carb'))
fit <- lm(formula, data=mtcars)})
stargazer(outs[[1]], outs[[2]], outs[[3]],
title="Regression Results", type="text",
df=FALSE, digits=3)

Resources