I am analysing a panel dataset with dummy variable fixed effects.
For example, my model looks like
model1 <- lm_robust(y ~ x1 + x2 + factor(x3):factor(x3), clusters = clusters, data = data)
While there are a lot of categories of x3 and x4 generating hundreds of dummies in the table which I do not want to show.
Is there an easier way with texreg to keep only the estimates of x1 and x2 in the output table?
As I am using lm_robust for clustering, so stargazer is not the option here.
Thank you very much in advance.
I do not know how to do this in texreg, but here is one option using
the modelsummary package for
R (disclaimer: I
am the author).
You can use the coef_map argument to select the coefficients to
display. You may also want to look at the coef_omit argument.
library(estimatr)
library(modelsummary)
mod <- lm_robust(hp ~ ., data = mtcars)
modelsummary(mod, coef_map = c("cyl", "disp"))
Model 1
cyl
8.204
(10.203)
disp
0.439
(0.179)
Num.Obs.
32
R2
0.903
R2 Adj.
0.857
Std.Errors
HC2
Related
I'm trying to group coefficients together in a modelsummary output table and add row titles for these groups :
library(modelsummary)
ols1 <- lm(mpg ~ cyl + disp + hp,
data = mtcars,
na.action = na.omit
)
modelsummary(ols1,
title = "Table 1",
stars = TRUE
)
The modelsummary documentation (https://cran.r-project.org/web/packages/modelsummary/modelsummary.pdf) suggests this might be something to do with the shape and group_map arguments, but I can't really figure out how to use them.
Any guidance would be very helpful, thanks!
When the documentation mentions “groups”, it refers to models like multinomial logits where each predictor has one coefficient per outcome level. In this example, the “group” column is called “response”:
library(nnet)
library(modelsummary)
mod <- multinom(cyl ~ mpg + hp, data = mtcars, trace = FALSE)
modelsummary(
mod,
output = "markdown",
shape = response ~ model)
Model 1
(Intercept)
6
0.500
(41.760)
8
8.400
(0.502)
mpg
6
-83.069
(416.777)
8
-120.167
(508.775)
hp
6
16.230
(81.808)
8
20.307
(87.777)
Num.Obs.
32
R2
1.000
R2 Adj.
0.971
AIC
12.0
BIC
20.8
RMSE
0.00
What you probably mean is something different: Adding manual labels to sets of coefficients. This is easy to achieve because modelsummary() produces a kableExtra or a gt table which can be customized in infinite ways.
https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html
For example, you may want to look at the group_rows function from kableExtra:
library(kableExtra)
mod <- lm(mpg ~ cyl + disp + hp, data = mtcars)
modelsummary(mod) |>
group_rows(index = c("Uninteresting" = 4,
"Interesting" = 4,
"Other" = 7))
I JUST found out about this amazing R package, modelsummary.
It doesn't seem like it offers an ability to transpose regression outputs.
I know that you cannot do a tranposition within kable-extra, which is my go-to for ordinary table outputs in R. Since modelsummary relies on kable-extra for post-processing, I'm wondering if this is possible. Has anyone else figured it out?
Ideally I'd like to preserve the stars of my regression output.
This is available in STATA (below):
Thanks in advance!
You can flip the order of the terms in the group argument formula. See documentation here and also here for many examples.
library(modelsummary)
mod <- list(
lm(mpg ~ hp, mtcars),
lm(mpg ~ hp + drat, mtcars))
modelsummary(mod, group = model ~ term)
(Intercept)
hp
drat
Model 1
30.099
-0.068
(1.634)
(0.010)
Model 2
10.790
-0.052
4.698
(5.078)
(0.009)
(1.192)
The main problem with this strategy is that there is not (yet) an automatic way to append goodness of fit statistics. So you would probably have to rig something up by creating a data.frame and feeding it to the add_columns argument. For example:
N <- sapply(mod, function(x) get_gof(x)$nobs)
N <- data.frame(N = c(N[1], "", N[2], ""))
modelsummary(mod,
group = model ~ term,
add_columns = N,
align = "lcccc")
(Intercept)
hp
drat
N
Model 1
30.099
-0.068
32
(1.634)
(0.010)
Model 2
10.790
-0.052
4.698
32
(5.078)
(0.009)
(1.192)
If you have ideas about the best default behavior for goodness of fit statistics, please file a feature request on Github.
Situation
I am fitting a series of evolving regression models. For the purposes of this question, we can think of these models in terms of Model A, Model B, and Model C. All models share at least one same covariate.
I am also fitting these models for two separate years of data. Again, for the purposes of this question, the years will be 2000 and 2010.
In an attempt to simplify the reporting of results, I am attempting to combine the reporting of the regressions into a single table that would have some kind of the following format:
2000 2010
Model A
Coef Ex1
Model B
Coef Ex1
Coef Ex2
Model C
Coef Ex1
Coef Ex2
Coef Ex3
The idea being that someone can look quickly at Coef Ex1 across several models and years.
What Have I Tried
I have tried to achieve the above table using both R stargazer and kable packages. With stargazer I can get the fully formatted table for a single model formulation across many years (e.g., stargazer(modelA2000, modelA2010), but I cannot figure out how to stack additional model formulations on the rows.
For kable I have been able to stack horizontal models, but I have not been able to add in additional years (e.g., coefs <- bind_rows(tidy(modelA2000), tidy(modelB2000), tidy(modelC2000)); coefs %>% kable()).
Question: how can I use stargazer or kable to report evolving regression models (which share the same covariates) in the rows but also with year of cross section on the column? I think I can somehow extend the answer posted here, although I'm not sure how.
Reproducible example
# Load the data
mtcars <- mtcars
# Create example results for models A, B, and C for 2000
modelA2000 <- lm(mpg ~ cyl, data = mtcars)
modelB2000 <- lm(mpg ~ cyl + wt, data = mtcars)
modelC2000 <- lm(mpg ~ cyl + wt + disp, data = mtcars)
# Slightly modify data for second set of results
mtcars$cyl <- mtcars$cyl*runif(1)
# Fit second set of results. Same models, pretending it's a different year.
modelA2010 <- lm(mpg ~ cyl, data = mtcars)
modelB2010 <- lm(mpg ~ cyl + wt, data = mtcars)
modelC2010 <- lm(mpg ~ cyl + wt + disp, data = mtcars)
Two notes before starting:
You want a pretty "custom" table, so it is almost inevitable that some manual operations will be required.
My answer relies on the development version of modelsummary, which you can install like this:
library(remotes)
install_github("vincentarelbundock/modelsummary")
We will need 4 concepts, many of them related to the broom package:
broom::tidy a function that takes a statistical model and returns a data.frame of estimates with one row per coefficient.
broom::glance a function that takes a statistical model and returns a one-row data.frame with model characteristics (e.g., number of observations)
modelsummary_list a list with 2 elements called "tidy" and "glance", and with a class name of "modelsummary_list".
The modelsummary package allows you to draw regression tables. Under the hood, it uses broom::tidy and broom::glance to extract information from those models. Users can also supply their own information about a model by supplying a list to which we assign the class modelsummary_list, as documented here.
EDIT: The recommended way to do this in modelsummary is now to use the group argument. Scroll to the end of this post for illustrative code.
Obsolete example with useful discussion
The modelsummary_wide is a function that was initially designed to "stack" results from several models with several groups of coefficients. This is useful for things like multinomial models, but it also helps us in your case, where you have multiple models in multiple groups (here: years).
First, we load packages, tweak the data, and estimate our models:
library(modelsummary)
library(broom)
library(dplyr)
mtcars2010 <- mtcars
mtcars2010$cyl <- mtcars$cyl * runif(1)
models <- list(
"A" = list(
lm(mpg ~ cyl, data = mtcars),
lm(mpg ~ cyl, data = mtcars2010)),
"B" = list(
lm(mpg ~ cyl + wt, data = mtcars),
lm(mpg ~ cyl + wt, data = mtcars2010)),
"C" = list(
lm(mpg ~ cyl + wt + disp, data = mtcars),
lm(mpg ~ cyl + wt + disp, data = mtcars2010)))
Notice that we saved our models in three groups, in a list of list.
Then, we define a tidy_model function that accepts a list of two models (one per year), combines the information on those two models, and creates a modelsummary_list object (again, please refer to the documentation). Note that we assign the "year" information to a "group" column in the tidy object.
We apply this function to each of our three groups of models using lapply.
tidy_model <- function(model_list) {
# tidy estimates
tidy_2000 <- broom::tidy(model_list[[1]])
tidy_2010 <- broom::tidy(model_list[[2]])
# create a "group" column
tidy_2000$group <- 2000
tidy_2010$group <- 2010
ti <- bind_rows(tidy_2000, tidy_2010)
# glance estimates
gl <- data.frame("N" = stats::nobs(model_list[[1]]))
# output
out <- list(tidy = ti, glance = gl)
class(out) <- "modelsummary_list"
return(out)
}
models <- lapply(models, tidy_model)
Finally, we call the modelsummary_wide with the stacking="vertical" argument to obtain this table:
modelsummary_wide(models, stacking = "vertical")
Of course, the table can be adjusted, coefficients renamed, etc. using the other arguments of the modelsummary_wide function or with kableExtra or some other package supported by the output argument.
More modern example without detailed explanation
library("modelsummary")
library("broom")
library("quantreg")
mtcars2010 <- mtcars
mtcars2010$cyl <- mtcars$cyl * runif(1)
models <- list(
"A" = list(
"2000" = rq(mpg ~ cyl, data = mtcars),
"2010" = rq(mpg ~ cyl, data = mtcars2010)),
"B" = list(
"2000" = rq(mpg ~ cyl + wt, data = mtcars),
"2010" = rq(mpg ~ cyl + wt, data = mtcars2010)),
"C" = list(
"2000" = rq(mpg ~ cyl + wt + disp, data = mtcars),
"2010" = rq(mpg ~ cyl + wt + disp, data = mtcars2010)))
tidy_model <- function(model_list) {
# tidy estimates
tidy_2000 <- broom::tidy(model_list[[1]])
tidy_2010 <- broom::tidy(model_list[[2]])
# create a "group" column
tidy_2000$group <- "2000"
tidy_2010$group <- "2010"
ti <- bind_rows(tidy_2000, tidy_2010)
# output
out <- list(tidy = ti, glance = data.frame("nobs 2010" = length(model_list[[1]]$fitted.values)))
class(out) <- "modelsummary_list"
return(out)
}
models <- lapply(models, tidy_model)
modelsummary(models,
group = model + term ~ group,
statistic = "conf.int")
2000
2010
A
(Intercept)
36.800
36.800
[30.034, 42.403]
[30.034, 42.403]
cyl
-2.700
-67.944
[-3.465, -1.792]
[-87.204, -45.102]
B
(Intercept)
38.871
38.871
[30.972, 42.896]
[30.972, 42.896]
cyl
-1.743
-43.858
[-2.154, -0.535]
[-54.215, -13.472]
wt
-2.679
-2.679
[-5.313, -1.531]
[-5.313, -1.531]
C
(Intercept)
40.683
40.683
[31.235, 47.507]
[31.235, 47.507]
cyl
-1.993
-50.162
[-3.137, -1.322]
[-78.948, -33.258]
wt
-2.937
-2.937
[-5.443, -1.362]
[-5.443, -1.362]
disp
0.003
0.003
[-0.009, 0.035]
[-0.009, 0.035]
I have estimated a linear regression model using lm(x~y1 + y1 + ... + yn) and to counter the present heteroscedasticity I had R estimate the robust standard errors with
coeftest(model, vcov = vcovHC(model, type = "HC0"))
I know that (robust) R squared and F statistic from the "normal" model are still valid, but how do I get R to report them in the output? I want to fuse several regression output from different specifications together with stargazer and it would become very chaotic if I had to enter the non-robust model along just to get these statistics. Ideally I want to enter a regression output into stargazer that contains these statistics, thus importing it to their framework.
Thanks in advance for all answers
I don't have a solution with stargarzer, but I do have a couple of viable alternatives for regression tables with robust standard errors:
Option 1
Use the modelsummary package to make your tables.
it has a statistic_override argument which allows you to supply a function that calculates a robust variance covariance matrix (e.g., sandwich::vcovHC.
library(modelsummary)
library(sandwich)
mod1 <- lm(drat ~ mpg, mtcars)
mod2 <- lm(drat ~ mpg + vs, mtcars)
mod3 <- lm(drat ~ mpg + vs + hp, mtcars)
models <- list(mod1, mod2, mod3)
modelsummary(models, statistic_override = vcovHC)
Note 1: The screenshot above is from an HTML table, but the modelsummary package can also save Word, LaTeX or markdown tables.
Note 2: I am the author of this package, so please treat this as a potentially biased view.
Option 2
Use the estimatr::lm_robust function, which automatically includes robust standard errors. I believe that estimatr is supported by stargazer, but I know that it is supported by modelsummary.
library(estimatr)
mod1 <- lm_robust(drat ~ mpg, mtcars)
mod2 <- lm_robust(drat ~ mpg + vs, mtcars)
mod3 <- lm_robust(drat ~ mpg + vs + hp, mtcars)
models <- list(mod1, mod2, mod3)
modelsummary(models)
This is how to go about it. You need to use model object that is supported by stargazer as a template and then you can provide a list with standard errors to be used:
library(dplyr)
library(lmtest)
library(stargazer)
# Basic Model ---------------------------------------------------------------------------------
model1 <- lm(hp ~ factor(gear) + qsec + cyl + factor(am), data = mtcars)
summary(model1)
# Robust standard Errors ----------------------------------------------------------------------
model_robust <- coeftest(model1, vcov = vcovHC(model1, type = "HC0"))
# Get robust standard Errors (sqrt of diagonal element of variance-covariance matrix)
se = vcovHC(model1, type = "HC0") %>% diag() %>% sqrt()
stargazer(model1, model1,
se = list(NULL, se), type = 'text')
Using this approach you can use stargazer even for model objects that are not supported. You only need coefficients, standard errors and p-values as vectors. Then you can 'mechanically insert' even unsupported models.
One last Note. You are correct that once heteroskedasticity is present, Rsquared can still be used. However, overall F-test as well as t-tests are NOT valid anymore.
I have a dataset that is missing values. I imputed using the mice package and ran my linear model using lm and pool for the results. I only get unstandardized beta weights. Is there a way to get standardized beta weights?
There are two ways in which you can do so (which I know of), there can be many:
1) First method:
You need to first scale your data, so assume you imputed your data first then you can do as following:
A toy example:
mtcars1 <- mtcars[,c("mpg", "disp", "hp", "wt", "qsec", "drat")]
mtcars_scaled <- data.frame(sapply(mtcars1, scale), stringsAsFactors=F) ##scaling for standardization,
model_fit_st <- lm(mpg ~ disp + wt + drat, data=mtcars_scaled)
Here model_fit_st is your standardized result but it does however having the intercept(which is kind of odd, the reason being that we supplied it using lm, it will generate an intercept), however if you compare it with QuantPsyc::lm.beta function coefficients value will match.
2) Second Method:
Here QuantPsyc::lm.beta can be used once you install QuantPsyc package which is for generating standardized betas like below.
QuantPsyc::lm.beta(lm(mpg ~ disp + wt + drat, data=mtcars))
Off-course apart from intercept(there is no sense of having intercept in standardized betas) both the results (via scaling and quantpsyc outcome) is matching here.