How to build a summary table of glm's parameters and AICcWt - r

I have a dataset that I am using to build generalised linear models. The response variable is binary (absence/presence) and the explanatory variables are categorical.
CODE
library(tidyverse)
library(AICcmodavg)
# Data
set.seed(123)
t <- tibble(ID = 1:100,
A = as.factor(sample(c(0, 1), 100, T)),
B = as.factor(sample(c("black", "white"), 100, T)),
C = as.factor(sample(c("pos", "neg", "either"), 100, T)))
# Candidate set of models - Binomial family because response variable
# is binary (0 for absent & 1 for present)
# Global model is A ~ B_black + C_either
m1 <- glm(A ~ 1, binomial, t)
m2 <- glm(A ~ B, binomial, t)
m3 <- glm(A ~ C, binomial, t)
m4 <- glm(A ~ B + C, binomial, t)
# List with all models
ms <- list(null = m1, m_B = m2, m_C = m3, m_BC = m4)
# Summary table
aic_tbl <- aictab(ms)
PROBLEM
I want to build a table like the one below that summarises the coefficients, standard errors, and Akaike weights of the models within my candidate set.
QUESTION
Can anyone suggest how to best build this table using my list of models and AIC table?

Just to point it out: broom gets you half-way to where you want to get by turning the model output into a dataframe, which you can then reshape.
library(broom)
bind_rows(lapply(ms, tidy), .id="key")
key term estimate std.error statistic p.value
1 null (Intercept) -0.12014431182649532 0.200 -0.59963969517107030 0.549
2 m_B (Intercept) 0.00000000000000123 0.283 0.00000000000000433 1.000
3 m_B Bwhite -0.24116205496397874 0.401 -0.60071814968372905 0.548
4 m_C (Intercept) -0.47957308026188367 0.353 -1.35892869678271544 0.174
5 m_C Cneg 0.80499548069651150 0.507 1.58784953814722285 0.112
6 m_C Cpos 0.30772282333522433 0.490 0.62856402205887851 0.530
7 m_BC (Intercept) -0.36339654526926718 0.399 -0.90984856337213305 0.363
8 m_BC Bwhite -0.25083209866475475 0.408 -0.61515191157571303 0.538
9 m_BC Cneg 0.81144822536950656 0.508 1.59682131202527056 0.110
10 m_BC Cpos 0.32706970242195277 0.492 0.66527127770403538 0.506
And if you must insist of the layout of your table, I came up with the following (arguably clumsy) way of rearranging everything:
out <- bind_rows(lapply(ms, tidy), .id="mod")
t1 <- out %>% select(mod, term, estimate) %>% spread(term, estimate) %>% base::t
t2 <- out %>% select(mod, term, std.error) %>% spread(term, std.error) %>% base::t
rownames(t2) <- paste0(rownames(t2), "_std_e")
tmp <- rbind(t1, t2[-1,])
new_t <- as.data.frame(tmp[-1,])
colnames(new_t) <- tmp[1,]
new_t
Alternatively, you may want to familiarise yourself with packages that are meant to display model output for publication, e.g. texreg or stargazer come to mind:
library(texreg)
screenreg(ms)
==================================================
null m_B m_C m_BC
--------------------------------------------------
(Intercept) -0.12 0.00 -0.48 -0.36
(0.20) (0.28) (0.35) (0.40)
Bwhite -0.24 -0.25
(0.40) (0.41)
Cneg 0.80 0.81
(0.51) (0.51)
Cpos 0.31 0.33
(0.49) (0.49)
--------------------------------------------------
AIC 140.27 141.91 141.66 143.28
BIC 142.87 147.12 149.48 153.70
Log Likelihood -69.13 -68.95 -67.83 -67.64
Deviance 138.27 137.91 135.66 135.28
Num. obs. 100 100 100 100
==================================================
*** p < 0.001, ** p < 0.01, * p < 0.05

Related

Visualize Generalized Additive Model (GAM) in R

I want to achieve a GAM plot that looks like this
Image from https://stats.stackexchange.com/questions/179947/statistical-differences-between-two-hourly-patterns/446048#446048
How can I accomplish this?
Model is
model = gam(y ~ s(t) + g, data = d)
The general way to do this is to compute model estimates (fitted values) over the range of the covariate(s) of interest for each group. The reproducible example below illustrates once way to do this using {mgcv} to fit the GAM and my {gratia} package for some helper functions to facilitate the process.
library("gratia")
library("mgcv")
library("ggplot2")
eg_data <- data_sim("eg4", n = 400, dist = "normal", scale = 2, seed = 1)
m <- gam(y ~ s(x2) + fac, data = eg_data, method = "REML")
ds <- data_slice(m, x2 = evenly(x2, n = 100), fac = evenly(fac))
fv <- fitted_values(m, data = ds)
The last line gets you fitted values from the model at the covariate combinations specified in the data slice:
> fv
# A tibble: 300 × 6
x2 fac fitted se lower upper
<dbl> <fct> <dbl> <dbl> <dbl> <dbl>
1 0.00131 1 -1.05 0.559 -2.15 0.0412
2 0.00131 2 -3.35 0.563 -4.45 -2.25
3 0.00131 3 1.13 0.557 0.0395 2.22
4 0.0114 1 -0.849 0.515 -1.86 0.160
5 0.0114 2 -3.14 0.519 -4.16 -2.13
6 0.0114 3 1.34 0.513 0.332 2.34
7 0.0215 1 -0.642 0.474 -1.57 0.287
8 0.0215 2 -2.94 0.480 -3.88 -2.00
9 0.0215 3 1.54 0.473 0.616 2.47
10 0.0316 1 -0.437 0.439 -1.30 0.424
# … with 290 more rows
# ℹ Use `print(n = ...)` to see more rows
This object is in a form suitable for plotting with ggplot():
fv |>
ggplot(aes(x = x2, y = fitted, colour = fac)) +
geom_point(data = eg_data, mapping = aes(y = y), size = 0.5) +
geom_ribbon(aes(x = x2, ymin = lower, ymax = upper, fill = fac,
colour = NULL),
alpha = 0.2) +
geom_line()
which produces
You can enhance and/or modify this using your ggplot skills.
The basic point with this model is that you have a common smooth effect of a covariate (here x2) plus group means (for the factor fac). Hence the curves are "parallel".
Note that there's a lot of variation around the estimated curves in this model because the simulated data are from a richer model with group-specific smooths and smooth effects of other covariates.
gg.bs30 <- ggplot(data,aes(x=Predictor,y=Output,col=class))+geom_point()+
geom_smooth(method='gam',formula=y ~ splines::bs(x, 30)) + facet_grid(class ~.)
print(gg.bs30)
Code from -> https://github.com/mariocastro73/ML2020-2021/blob/master/scripts/gams-with-ggplot-classes.R

Extracting the p-value from an lmerTest model inside a loop function

I have a number of linear mixed models, which I have fitted with the lmerTest library, so that the summary() of the function would provide me with p-values of fixed effects.
I have written a loop function that extract the fixed effects of gender:time and gender:time:explanatory variable of interest.
Trying to now also extract the p-value of gender:time fixed effect (step 1) and also gender:time:explanatory variable (step 2).
Normally I can extract the p-value with this code:
coef(summary(model))[,5]["genderfemale:time"]
But inside the loop function it doesn't work and gives the error: "Error in coef(summary(model))[, 5] : subscript out of bounds"
See code
library(lmerTest)
# Create a list of models with interaction terms to loop over
models <- list(
mixed_age_interaction,
mixed_tnfi_year_interaction,
mixed_crp_interaction
)
# Create a list of explanatory variables to loop over
explanatoryVariables <- list(
"age_at_diagnosis",
"bio_drug_start_year",
"crp"
)
loop_function <- function(models, explanatoryVariables) {
# Create an empty data frame to store the results
coef_df <- data.frame(adj_coef_gender_sex = numeric(), coef_interaction_term = numeric(), explanatory_variable = character(), adj_coef_pvalue = numeric())
# Loop over the models and explanatory variables
for (i in seq_along(models)) {
model <- models[[i]]
explanatoryVariable <- explanatoryVariables[[i]]
# Extract the adjusted coefficients for the gender*time interaction
adj_coef <- fixef(model)["genderfemale:time"]
# Extract the fixed effect of the interaction term
interaction_coef <- fixef(model)[paste0("genderfemale:time:", explanatoryVariable)]
# Extract the p-value for the adjusted coefficient for gender*time
adj_coef_pvalue <- coef(summary(model))[,5]["genderfemale:time"]
# Add a row to the data frame with the results for this model
coef_df <- bind_rows(coef_df, data.frame(adj_coef_gender_sex = adj_coef, coef_interaction_term = interaction_coef, explanatory_variable = explanatoryVariable, adj_coef_pvalue = adj_coef_pvalue))
}
return(coef_df)
}
# Loop over the models and extract the fixed effects
coef_df <- loop_function(models, explanatoryVariables)
coef_df
My question is how can I extract the p-values from the models for gender:time and gender:time:explanatory variable and add them to the final data.frame coef_df?
Also adding a summary of one of the models for reference
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: basdai ~ 1 + gender + time + age_at_diagnosis + gender * time +
time * age_at_diagnosis + gender * age_at_diagnosis + gender *
time * age_at_diagnosis + (1 | ID) + (1 | country)
Data: dat
AIC BIC logLik deviance df.resid
254340.9 254431.8 -127159.5 254318.9 28557
Scaled residuals:
Min 1Q Median 3Q Max
-3.3170 -0.6463 -0.0233 0.6092 4.3180
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 154.62 12.434
country (Intercept) 32.44 5.695
Residual 316.74 17.797
Number of obs: 28568, groups: ID, 11207; country, 13
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 4.669e+01 1.792e+00 2.082e+01 26.048 < 2e-16 ***
genderfemale 2.368e+00 1.308e+00 1.999e+04 1.810 0.0703 .
time -1.451e+01 4.220e-01 2.164e+04 -34.382 < 2e-16 ***
age_at_diagnosis 9.907e-02 2.220e-02 1.963e+04 4.463 8.12e-06 ***
genderfemale:time 1.431e-01 7.391e-01 2.262e+04 0.194 0.8464
time:age_at_diagnosis 8.188e-02 1.172e-02 2.185e+04 6.986 2.90e-12 ***
genderfemale:age_at_diagnosis 8.547e-02 3.453e-02 2.006e+04 2.476 0.0133 *
genderfemale:time:age_at_diagnosis 4.852e-03 1.967e-02 2.274e+04 0.247 0.8052
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) gndrfm time ag_t_d gndrf: tm:g__ gnd:__
genderfemal -0.280
time -0.241 0.331
age_t_dgnss -0.434 0.587 0.511
gendrfml:tm 0.139 -0.519 -0.570 -0.293
tm:g_t_dgns 0.228 -0.313 -0.951 -0.533 0.543
gndrfml:g__ 0.276 -0.953 -0.329 -0.639 0.495 0.343
gndrfml::__ -0.137 0.491 0.567 0.319 -0.954 -0.596 -0.516
The internal function get_coefmat of {lmerTest} might be handy:
if fm is an example {lmer} model ...
library("lmerTest")
fm <- lmer(Informed.liking ~ Gender + Information * Product +
(1 | Consumer) + (1 | Consumer:Product),
data=ham
)
... you can obtain the coefficients including p-values as a dataframe like so (note the triple colon to expose the internal function):
df_coeff <- lmerTest:::get_coefmat(fm) |>
as.data.frame()
output:
## > df_coeff
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 5.8490289 0.2842897 322.3364 20.5741844 1.173089e-60
## Gender2 -0.2442835 0.2605644 79.0000 -0.9375169 3.513501e-01
## Information2 0.1604938 0.2029095 320.0000 0.7909626 4.295517e-01
## Product2 -0.8271605 0.3453291 339.5123 -2.3952818 1.714885e-02
## Product3 0.1481481 0.3453291 339.5123 0.4290057 6.681912e-01
## ...
edit
Here's a snippet which will return you the extracted coefficents for, e.g., models m1 and m2 as a combined dataframe:
library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
list('m1', 'm2') |> ## observe the quotes
map_dfr( ~ list(
model = .x,
coeff = lmerTest:::get_coefmat(get(.x)) |>
as.data.frame() |>
rownames_to_column()
)
) |>
as_tibble() |>
unnest_wider(coeff)
output:
## + # A tibble: 18 x 7
## model rowname Estimate `Std. Error` df `t value` `Pr(>|t|)`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 m1 (Intercept) 5.85 0.284 322. 20.6 1.17e-60
## 2 m1 Gender2 -0.244 0.261 79.0 -0.938 3.51e- 1
## ...
## 4 m1 Product2 -0.827 0.345 340. -2.40 1.71e- 2
## ...
## 8 m1 Information2:Product3 0.272 0.287 320. 0.946 3.45e- 1
## ...
## 10 m2 (Intercept) 5.85 0.284 322. 20.6 1.17e-60
## 11 m2 Gender2 -0.244 0.261 79.0 -0.938 3.51e- 1
## ...

Simple slope estimates differ between emmeans package and interactions package?

I computed simple slopes for an interaction with the sim_slopes() function from the interactions package and using the emtrends() function from the emmeans package and results (both the estimates and standard errors) seem to slightly differ even though both computations are based on the same linear model (using the lm() function). Is there an explanation for this? I've pasted the code and output below. x is a continuous variable and z is a categorical variable with 3 levels.
model1 <- lm(DV ~ z * x, data = data)
> sim_slopes(model1, pred = x, modx = z, johnson_neyman = FALSE)
SIMPLE SLOPES ANALYSIS
Slope of x when z = 3:
Est. S.E. t val. p
------ ------ -------- ------
0.50 0.10 4.89 0.00
Slope of x when z = 2:
Est. S.E. t val. p
------ ------ -------- ------
0.74 0.09 7.83 0.00
Slope of x when z = 1:
Est. S.E. t val. p
------ ------ -------- ------
0.33 0.10 3.37 0.00
> emtrends(model1, ~ z, var="x")
NOTE: Results may be misleading due to involvement in interactions
z x.trend SE df lower.CL upper.CL
1 0.290 0.0669 1016 0.158 0.421
2 0.618 0.0611 1016 0.498 0.738
3 0.411 0.0612 1016 0.291 0.531
I am trying to reproduce your results.
I tried interactions between z and x by using iris data, where
DV <- iris$Sepal.Length
z <- iris$Species # z is a categorical variable with 3 levels : setosa, virginica, and versicolor
x <- iris$Petal.Length #x is a continuous variable.
The resulted slope estimates and corresponding standard errors in sim_slopes and emtrends are almost identical.
model1 <- lm(DV ~ z * x, data = iris)
sim_slopes(model1, pred = x, modx = z, johnson_neyman = FALSE)
#SIMPLE SLOPES ANALYSIS
#Slope of x when z = virginica:
# Est. S.E. t val. p
#------ ------ -------- ------
# 1.00 0.09 11.43 0.00
# Slope of x when z = versicolor:
# Est. S.E. t val. p
#------ ------ -------- ------
# 0.83 0.10 8.10 0.00
# Slope of x when z = setosa:
# Est. S.E. t val. p
# ------ ------ -------- ------
# 0.54 0.28 1.96 0.05
emtrends(model1, ~ z, var="x")
# z x.trend SE df lower.CL upper.CL
# setosa 0.542 0.2768 144 -0.00476 1.09
# versicolor 0.828 0.1023 144 0.62611 1.03
# virginica 0.996 0.0871 144 0.82360 1.17
# Confidence level used: 0.95
However, this warning message: NOTE: Results may be misleading due to involvement in interactions does not emerge in my trial.
This result indicates that both packages use the same method to compute the slope. Due to the warning message that emerged in your result, I would suggest you to check again your z and x data and to reconsider how sensible the interaction of these variables is.
My suggestion is based on the explanation of the emmeans author here:
https://mran.microsoft.com/snapshot/2018-03-30/web/packages/emmeans/vignettes/interactions.html

How to run multiple linear regressions with different independent variables and dependent variables adding standardized coefficients in R?

I'm currently trying to run a loop performing linear regression for multiple independent variables (n = 6) with multiple dependent variables (n=1000).
Here is some example data, with age, sex, and education representing my independent variables of interest and testscore_* being my dependent variables.
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005, 1006,1007, 1008, 1009, 1010, 1011),
age = as.numeric(c('56', '43','59','74','61','62','69','80','40','55','58')),
sex = as.numeric(c('0','1','0','0','1','1','0','1','0','1','0')),
testscore_1 = as.numeric(c('23','28','30','15','7','18','29','27','14','22','24')),
testscore_2 = as.numeric(c('1','3','2','5','8','2','5','6','7','8','2')),
testscore_3 = as.numeric(c('18','20','19','15','20','23','19','25','10','14','12')),
education = as.numeric(c('5','4','3','5','2', '1','4','4','3','5','2')))
I have working code that allows me to run a regression model for multiple DVs (which I'm sure more experienced R users will dislike for its lack of efficiency):
y <- as.matrix(df[4:6])
#model for age
lm_results <- lm(y ~ age, data = df)
write.csv((broom::tidy(lm_results)), "lm_results_age.csv")
regression_results <-broom::tidy(lm_results)
standardized_coefficients <- lm.beta(lm_results)
age_standardize_results <- coef(standardized_coefficients)
write.csv(age_standardize_results, "lm_results_age_standardized_coefficients.csv")
I would then repeat this all by manually replacing age with sex and education
Does anyone have a more elegant way of running this - for example, by way of a loop for all IVs of interest (i.e. age, sex and education)?
Also would greatly appreciate anyone who would suggest a quick way of combining broom::tidy(lm_results) with standardized coefficients from lm.beta::lm.beta, i.e. combining the standardized regression coefficients with the main model output.
This is an adaptation for a similar workflow I had to use in the past. Remember to really penalize yourself for running a crazy number of models. I added a couple predictor columns to your dataframe. Good luck!!
Solution:
# Creating pedictor and outcome vectors
ivs_vec <- names(df)[c(2:6, 10)]
dvs_vec <- names(df)[7:9]
# Creating formulas and running the models
ivs <- paste0(" ~ ", ivs_vec)
dvs_ivs <- unlist(lapply(ivs, function(x) paste0(dvs_vec, x)))
formulas <- lapply(dvs_ivs, formula)
lm_results <- lapply(formulas, function(x) {
lm(x, data = df)
})
# Creating / combining results
tidy_results <- lapply(lm_results, broom::tidy)
dv_list <- lapply(as.list(stringi::stri_extract_first_words(dvs_ivs)), rep, 2)
tidy_results <- Map(cbind, dv_list, tidy_results)
standardized_results <- lapply(lm_results, function(x) coef(lm.beta::lm.beta(x)))
combined_results <- Map(cbind, tidy_results, standardized_results)
# Cleaning up final results
names(combined_results) <- dvs_ivs
combined_results <- lapply(combined_results, function(x) {row.names(x) <- c(NULL); x})
new_names <- c("Outcome", "Term", "Estimate", "Std. Error", "Statistic", "P-value", "Standardized Estimate")
combined_results <- lapply(combined_results, setNames, new_names)
Results:
combined_results[1:5]
$`testscore_1 ~ age`
Outcome Term Estimate Std. Error Statistic P-value
Standardized Estimate
1 testscore_1 (Intercept) 18.06027731 12.3493569 1.4624468 0.1776424 0.00000000
2 testscore_1 age 0.05835152 0.2031295 0.2872627 0.7804155 0.09531823
$`testscore_2 ~ age`
Outcome Term Estimate Std. Error Statistic P-value Standardized Estimate
1 testscore_2 (Intercept) 3.63788676 4.39014570 0.8286483 0.4287311 0.0000000
2 testscore_2 age 0.01367313 0.07221171 0.1893478 0.8540216 0.0629906
$`testscore_3 ~ age`
Outcome Term Estimate Std. Error Statistic P-value Standardized Estimate
1 testscore_3 (Intercept) 6.1215175 6.698083 0.9139208 0.3845886 0.0000000
2 testscore_3 age 0.1943125 0.110174 1.7636870 0.1116119 0.5068026
$`testscore_1 ~ sex`
Outcome Term Estimate Std. Error Statistic P-value Standardized Estimate
1 testscore_1 (Intercept) 22.5 3.099283 7.2597435 4.766069e-05 0.0000000
2 testscore_1 sex -2.1 4.596980 -0.4568217 6.586248e-01 -0.1505386
$`testscore_2 ~ sex`
Outcome Term Estimate Std. Error Statistic P-value Standardized Estimate
1 testscore_2 (Intercept) 3.666667 1.041129 3.521816 0.006496884 0.0000000
2 testscore_2 sex 1.733333 1.544245 1.122447 0.290723029 0.3504247
Data:
df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005, 1006,1007, 1008, 1009, 1010, 1011),
age = as.numeric(c('56', '43','59','74','61','62','69','80','40','55','58')),
sex = as.numeric(c('0','1','0','0','1','1','0','1','0','1','0')),
pred1 = sample(1:11, 11),
pred2 = sample(1:11, 11),
pred3 = sample(1:11, 11),
testscore_1 = as.numeric(c('23','28','30','15','7','18','29','27','14','22','24')),
testscore_2 = as.numeric(c('1','3','2','5','8','2','5','6','7','8','2')),
testscore_3 = as.numeric(c('18','20','19','15','20','23','19','25','10','14','12')),
education = as.numeric(c('5','4','3','5','2', '1','4','4','3','5','2')))
Stumbled upon this a year later and documenting a tidyverse solution same data as #Andrew.
library(dplyr)
library(purrr)
library(tidyr)
library(stringi)
# Creating pedictor and outcome vectors
ivs_vec <- names(df)[c(2:6, 10)]
dvs_vec <- names(df)[7:9]
# Creating formulas and running the models
ivs <- paste0(" ~ ", ivs_vec)
dvs_ivs <- unlist(map(ivs, ~paste0(dvs_vec, .x)))
models <- map(setNames(dvs_ivs, dvs_ivs),
~ lm(formula = as.formula(.x),
data = df))
basics <-
map(models, ~ broom::tidy(.)) %>%
map2_df(.,
names(.),
~ mutate(.x, which_dependent = .y)) %>%
select(which_dependent, everything()) %>%
mutate(term = gsub("\\(Intercept\\)", "Intercept", term),
which_dependent = stringi::stri_extract_first_words(which_dependent))
basics$std_estimate <-
map_dfr(models, ~ coef(lm.beta::lm.beta(.)), .id = "which_dependent") %>%
pivot_longer(.,
cols = -which_dependent,
names_to = "term",
values_to = "std_estimate",
values_drop_na = TRUE) %>%
pull(std_estimate)
basics
#> # A tibble: 36 x 7
#> which_dependent term estimate std.error statistic p.value std_estimate
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 testscore_1 Intercept 18.1 12.3 1.46 0.178 0
#> 2 testscore_1 age 0.0584 0.203 0.287 0.780 0.0953
#> 3 testscore_2 Intercept 3.64 4.39 0.829 0.429 0
#> 4 testscore_2 age 0.0137 0.0722 0.189 0.854 0.0630
#> 5 testscore_3 Intercept 6.12 6.70 0.914 0.385 0
#> 6 testscore_3 age 0.194 0.110 1.76 0.112 0.507
#> 7 testscore_1 Intercept 22.5 3.10 7.26 0.0000477 0
#> 8 testscore_1 sex -2.10 4.60 -0.457 0.659 -0.151
#> 9 testscore_2 Intercept 3.67 1.04 3.52 0.00650 0
#> 10 testscore_2 sex 1.73 1.54 1.12 0.291 0.350
#> # … with 26 more rows

Formatted latex regression tables with multiple models from broom output?

I have several models such as the example below for which I have estimates, standard errors, p-values, r2 etc. as data.frames in tidy format, but I don't have the original model objects (analysis was run on a different machine).
require(broom)
model <- lm(mpg ~ hp + cyl, mtcars)
tidy_model <- tidy(model)
glance_model <- glance(model)
# tidy_model
# # A tibble: 3 x 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 (Intercept) 36.9 2.19 16.8 1.62e-16
# 2 hp -0.0191 0.0150 -1.27 2.13e- 1
# 3 cyl -2.26 0.576 -3.93 4.80e- 4
# glance(model)
# # A tibble: 1 x 11
# r.squared adj.r.squared sigma ...
# * <dbl> <dbl> <dbl> ...
# 1 0.760 0.743 3.06 ...
There exist several packages (e.g. stargazer or texreg) which transform one or more model objects (lm, glm, etc.) into well-formatted regression tables side-by-side, see below for an example of texreg:
require(texreg)
screenreg(list(model1, model1)
# =================================
# Model 1 Model 2
# ---------------------------------
# (Intercept) 34.66 *** 34.66 ***
# (2.55) (2.55)
# cyl -1.59 * -1.59 *
# (0.71) (0.71)
# disp -0.02 -0.02
# (0.01) (0.01)
# ---------------------------------
# R^2 0.76 0.76
# Adj. R^2 0.74 0.74
# Num. obs. 32 32
# RMSE 3.06 3.06
# =================================
# *** p < 0.001, ** p < 0.01, * p < 0.05
Is there a similar package that uses tidy estimation results produced with broom as inputs rather than model objects to produce a table such as the above example?
Is there a similar package that uses tidy estimation results produced with broom as inputs
Not to my knowledge, but stargazer allows you to use custom inputs to generate regression tables. This allows us to create "fake" shell tables that we can populate with values from the tidy table. Using your example
# create fake models
dat <- lapply(tidy_model$term, function(...) rnorm(10))
dat <- as.data.frame(setNames(dat, c("mpg", tidy_model$term[-1])))
f <- as.formula(paste("mpg ~", paste(tidy_model$term[-1], collapse = " + ")))
fit <- lm(f, dat)
# set up model statistics
fit_stats <- data.frame(labels = names(glance_model),
mod1 = round(unlist(glance_model), 3),
mod2 = round(unlist(glance_model), 3),
row.names = NULL,
stringsAsFactors = FALSE)
We can then feed these values into stargazer:
library(stargazer)
stargazer(fit, fit, type = "text",
coef = list(tidy_model$estimate, tidy_model$estimate),
se = list(tidy_model$std.error, tidy_model$std.error),
add.lines = lapply(1:nrow(fit_stats), function(i) unlist(fit_stats[i, ])),
omit.table.layout = "s"
)
# ==========================================
# Dependent variable:
# ----------------------------
# mpg
# (1) (2)
# ------------------------------------------
# hp -0.019 -0.019
# (0.015) (0.015)
# cyl -2.265*** -2.265***
# (0.576) (0.576)
# Constant 36.908*** 36.908***
# (2.191) (2.191)
# ------------------------------------------
# r.squared 0.741 0.741
# adj.r.squared 0.723 0.723
# sigma 3.173 3.173
# statistic 41.422 41.422
# p.value 0 0
# df 3 3
# logLik -80.781 -80.781
# AIC 169.562 169.562
# BIC 175.425 175.425
# deviance 291.975 291.975
# df.residual 29 29
# ==========================================
# Note: *p<0.1; **p<0.05; ***p<0.01
I had another look at texreg, inspired by this answer, and there is a more native way to do this by defining an additional extraction method for texreg in addition to the previous answer:
extract_broom <- function(tidy_model, glance_model) {
# get estimates/standard errors from tidy
coef <- tidy_model$estimate
coef.names <- as.character(tidy_model$term)
se <- tidy_model$std.error
pvalues <- tidy_model$p.value
# get goodness-of-fit statistics from glance
glance_transposed <- as_tibble(cbind(name = names(glance_model), t(glance_model)))
gof.names <- as.character(glance_transposed$name)
gof <- as.double(glance_transposed$value)
gof.decimal <- gof %% 1 > 0
tr_object <- texreg::createTexreg(coef.names = coef.names,
coef = coef,
se = se,
pvalues = pvalues,
gof.names = gof.names,
gof = gof,
gof.decimal = gof.decimal)
return(tr_object)
}
This results in the following output:
texreg_model <- extract_broom(tidy_model, glance_model)
screenreg(list(texreg_model, texreg_model))
# =====================================
# Model 1 Model 2
# -------------------------------------
# (Intercept) 36.91 *** 36.91 ***
# (2.19) (2.19)
# hp -0.02 -0.02
# (0.02) (0.02)
# cyl -2.26 *** -2.26 ***
# (0.58) (0.58)
# -------------------------------------
# r.squared 0.74 0.74
# adj.r.squared 0.72 0.72
# sigma 3.17 3.17
# statistic 41.42 41.42
# p.value 0.00 0.00
# df 3 3
# logLik -80.78 -80.78
# AIC 169.56 169.56
# BIC 175.42 175.42
# deviance 291.97 291.97
# df.residual 29 29
# =====================================
# *** p < 0.001, ** p < 0.01, * p < 0.05

Resources