Can tbl_summary() be customized to display significance stars to the footer? - r

The gtsummary package in R has a new neat function add_significance_stars() which adds significance stars to coefficient estimates with small p-values in regression models. However, this function can only operate on tbl_regression or tbl_uvregression objects.
Is there a similar method that can be applied to a table_summary object so that p-value stars notate significant summary statistics?
library(tidyverse)
library(gtsummary)
This is a table_summary object with p-values displayed in a column.
mtcars %>%
select(gear, mpg, disp, hp, wt) %>%
tbl_summary(by = "gear") %>%
add_p()
table_summary object
This is a tbl_regression object with p-values displayed in the desired fashion in the footer
mtcars %>%
select(gear, mpg, disp, hp, wt) %>%
lm(formula = gear ~ mpg + disp + hp + wt) %>%
tbl_regression(intercept = TRUE) %>%
add_significance_stars()
tbl_regression object

The purpose of add_estimate_stars() is to replace the p-values with stars. If you'd like to add stars to p-values in a tbl_summary(), you can define a function that appends stars to significant p-values. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.0'
fmt_pvalue_with_stars <- function(x) {
dplyr::case_when(
x < 0.001 ~ paste0(style_pvalue(x), "***"),
x < 0.01 ~ paste0(style_pvalue(x), "**"),
x < 0.05 ~ paste0(style_pvalue(x), "*"),
TRUE ~ style_pvalue(x)
)
}
tbl <-
mtcars %>%
select(am, hp, cyl) %>%
tbl_summary(by = am) %>%
add_p(pvalue_fun = fmt_pvalue_with_stars) %>%
modify_footnote(p.value ~ "*p<0.05; **p<0.01; ***p<0.001")
Created on 2021-04-24 by the reprex package (v2.0.0)

Related

How to modify variable labels in gtsummary table

As recommended in the tutorial for gtsummary's tbl_regression function, I am using the labelled package to assign attribute labels to my regression variables. However, when my regression formula includes a quadratic term, the resulting table includes the same variable label twice:
library(gtsummary)
library(labelled)
library(tidyverse)
df <- as_tibble(mtcars)
var_label(df) <- list( disp = "Displacement", vs = "Engine type")
c("disp", "disp + I(disp^2)") %>%
map(
~ paste("vs", .x, sep = " ~ ") %>%
as.formula() %>%
glm(data = df,
family = binomial(link = "logit")) %>%
tbl_regression(exponentiate = TRUE)) %>%
tbl_merge()
Is there a way to modify the label for the quadratic term in this case?
If you assign the label inside the tbl_regression() function, you'll see what you want to get.
library(gtsummary)
c("disp", "disp + I(disp^2)") %>%
purrr::map(
~ paste("vs", .x, sep = " ~ ") %>%
as.formula() %>%
glm(data = mtcars, family = binomial(link = "logit")) %>%
tbl_regression(
exponentiate = TRUE,
label = list(
disp = "Displacement",
`I(disp^2)` = "Displacement^2"
)
)
) %>%
tbl_merge() %>%
as_kable()
#> ✖ `I(disp^2)` terms have not been found in `x`.
Characteristic
OR
95% CI
p-value
OR
95% CI
p-value
Displacement
0.98
0.96, 0.99
0.002
0.99
0.92, 1.07
0.8
Displacement^2
1.00
1.00, 1.00
0.8
Created on 2022-09-19 with reprex v2.0.2

Extracting the T Statistic from a function in R

I have this function that I got from a textbook that runs a couple of linear regressions and then saves the P-Value for each regression.
I would also like to save the T-Statistic as well but I am having a hard time finding the right syntax to enter for the select function.
Here is the current function.
models <- lapply(paste(factors, ' ~ a + b + c + d + e + f + g + h+ j -',factors),
function(f){ lm(as.formula(f), data = df) %>% # Call lm(.)
summary() %>% # Gather the output
"$"(coef) %>% # Keep only the coefs
data.frame() %>% # Convert to dataframe
filter(rownames(.) == "(Intercept)") %>% # Keep only the Intercept
dplyr::select(Estimate,`Pr...t..`)}) # Keep the coef & p-value
I know that I have to change the very last part of the function: dplyr::select(Estimate,`Pr...t..`) but after all my research and trial and error I am still stuck.
Here is a reproducible example using the mtcars data.
library(dplyr)
df <- mtcars
df <- df %>%
select(1,2,3,4,5,6,7)
factors <- c("mpg", "cyl", "disp", "hp", "drat", "wt")
models <- lapply(paste(factors, ' ~ mpg + cyl + disp + hp + drat + wt -',factors),
function(f){ lm(as.formula(f), data = df) %>% # Call lm(.)
summary() %>% # Gather the output
"$"(coef) %>% # Keep only the coefs
data.frame() %>% # Convert to dataframe
filter(rownames(.) == "(Intercept)") %>% # Keep only the Intercept
dplyr::select(Estimate,`Pr...t..`)} # Keep the coef & p-value
)
final <- matrix(unlist(models), ncol = 2, byrow = T) %>% # Switch from list to dataframe
data.frame(row.names = factors
Your example works for me. You can make this a little bit more "tidy" as follows:
library(broom)
sumfun <- function(f) {
lm(as.formula(f), data = df) %>%
tidy() %>%
filter(term == "(Intercept)") %>%
dplyr::select(estimate, p.value)
}
pp <- paste(factors, ' ~ mpg + cyl + disp + hp + drat + wt -',factors)
names(pp) <- factors
final <- purrr::map_dfr(pp, sumfun, .id = "factor")

Extract slope and r squared from grouped linear models using broom

I have a dataframe that I want to run linear models on by group, then use the broom package to extract the slope and r squared for each model. So far I am trying this:
library(tidyverse)
library(broom)
#read in the dataset
data(mtcars)
#add a group variable
mtcars <- mtcars %>% as_tibble() %>% mutate(LC = 1)
#create a second group
mtcars2 <- mtcars
mtcars2 <- mtcars2 %>% mutate(LC = 2)
#bind together
mtcars <- rbind(mtcars, mtcars2)
#groupby and run regressions
all_regress <- mtcars %>% group_by(LC) %>%
do(mod1 = lm(mpg ~ disp, data = .),
mod2 = lm(mpg ~ wt, data = .))
#use broom the extract the slope and rsq per group
glance <-all_regress %>% mutate(tidy = map(mod1, broom::tidy),
glance = map(mod1, broom::glance),
augment = map(mod1, broom::augment),
rsq = glance %>% map_dbl('r.squared'),
slope = tidy %>% map_dbl(function(x) x$estimate[2]))
but this fails with:
Error: Problem with `mutate()` input `tidy`.
x No tidy method for objects of class qr
ℹ Input `tidy` is `map(mod1, broom::tidy)`.
ℹ The error occurred in row 1.
If I do this without groups such as:
#read in the dataset
data(mtcars)
mtcars <- mtcars %>% as_tibble()
#run regressions
all_regress <- mtcars %>%
do(mod1 = lm(mpg ~ disp, data = .),
mod2 = lm(mpg ~ wt, data = .))
#use broom the extract the slope and rsq per group
glance <- all_regress %>% mutate(tidy = map(mod1, broom::tidy),
glance = map(mod1, broom::glance),
augment = map(mod1, broom::augment),
rsq = glance %>% map_dbl('r.squared'),
slope = tidy %>% map_dbl(function(x) x$estimate[2]))
there is no error.
I think simply adding ungroup() achieves what you need:
all_regress <- mtcars %>% group_by(LC) %>%
do(mod1 = lm(mpg ~ disp, data = .),
mod2 = lm(mpg ~ wt, data = .)) %>% ungroup()
#use broom the extract the slope and rsq per group
glance <-all_regress %>% mutate(tidy = map(mod1, broom::tidy),
glance = map(mod1, broom::glance),
augment = map(mod1, broom::augment),
rsq = glance %>% map_dbl('r.squared'),
slope = tidy %>% map_dbl(function(x) x$estimate[2]))
I used this approach, its longer but i think theres more control in the individual steps. Finally i created a tibble with lists columns containing each model.
library(tidyverse)
library(broom)
#read in the dataset
data(mtcars)
#add a group variable
mtcars <- mtcars %>% as_tibble() %>% dplyr::select(-c(vs, am, gear, carb, cyl)) %>% mutate(LC = 1)
#create a second group
mtcars2 <- mtcars
mtcars2 <- mtcars2 %>% mutate(LC = 2)
#bind together
mtcars <- bind_rows(mtcars2, mtcars)
#group_split and run regressions
all_regress <- mtcars %>% group_split(LC) %>%
map(~ list(mod1 = lm(mpg ~ disp, data = .),
mod2 = lm(mpg ~ wt, data = .)))
# example <- all_regress[[2]][[1]] %>% glance()
#the list has 2 levels with 2 models each
data <- all_regress %>%
map(~
map(.x, function(model){
#column lists are needed because each function output different objects
tibble(mod = list(model),
tidy = list(broom::tidy(model)),
glance = list(broom::glance(model)),
augment = list(broom::augment(model))) %>%
mutate(
rsq = list(glance[[1]]$r.squared),
slope = list(tidy[[1]]$estimate[2]))
} ))
data_final <-
data %>% map2(unique(mtcars$LC), ~
map2(.x, .y, function(each_model, lc){
mutate(each_model, LC = lc)
}))
final_format <- #because of the list structure i need to bind the two datasets in each level and then bind them again.
map(data_final, ~reduce(.x, rbind)) %>% reduce(rbind)
#acces the data
final_format[1, 1][[1]]

Fit models with robust standard errors

I am using the following R code to run several linear regression models and extract results to dataframe:
library(tidyverse)
library(broom)
data <- mtcars
outcomes <- c("wt", "mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
models <- expand.grid(outcomes, exposures) %>%
group_by(Var1) %>% rowwise() %>%
summarise(frm = paste0(Var1, "~factor(", Var2, ")")) %>%
group_by(model_id = row_number(),frm) %>%
do(tidy(lm(.$frm, data = data))) %>%
mutate(lci = estimate-(1.96*std.error),
uci = estimate+(1.96*std.error))
How can I modify my code to use robust standard errors similar to STATA?
* example of using robust standard errors in STATA
regress y x, robust
There is a comprehensive discussion about the robust standard errors in lm models at stackexchange.
You can update your code in the following way:
library(sandwich)
models <- expand.grid(outcomes, exposures) %>%
group_by(Var1) %>% rowwise() %>%
summarise(frm = paste0(Var1, "~factor(", Var2, ")")) %>%
group_by(model_id = row_number(),frm) %>%
do(cbind(
tidy(lm(.$frm, data = data)),
robSE = sqrt(diag(vcovHC(lm(.$frm, data = data), type="HC1"))) )
) %>%
mutate(
lci = estimate - (1.96 * std.error),
uci = estimate + (1.96 * std.error),
lciR = estimate - (1.96 * robSE),
uciR = estimate + (1.96 * robSE)
)
The important line is this:
sqrt(diag(vcovHC(lm(.$frm, data = data), type="HC1"))) )
Function vcovHC returns covariance matrix. You need to extract variances on the diagonal diag and take compute a square root sqrt.

dplyr, do(), extracting parameters from model without losing grouping variable

A slightly changed example from the R help for do():
by_cyl <- group_by(mtcars, cyl)
models <- by_cyl %>% do(mod = lm(mpg ~ disp, data = .))
coefficients<-models %>% do(data.frame(coef = coef(.$mod)[[1]]))
In the dataframe coefficients, there is the first coefficient of the linear model for each cyl group. My question is how can I produce a dataframe that contains not only a column with the coefficients, but also a column with the grouping variable.
===== Edit: I extend the example to try to make more clear my problem
Let's suppose that I want to extract the coefficients of the model and some prediction. I can do this:
by_cyl <- group_by(mtcars, cyl)
getpars <- function(df){
fit <- lm(mpg ~ disp, data = df)
data.frame(intercept=coef(fit)[1],slope=coef(fit)[2])
}
getprediction <- function(df){
fit <- lm(mpg ~ disp, data = df)
x <- df$disp
y <- predict(fit, data.frame(disp= x), type = "response")
data.frame(x,y)
}
pars <- by_cyl %>% do(getpars(.))
prediction <- by_cyl %>% do(getprediction(.))
The problem is that the code is redundant because I am fitting the model two times. My idea was to build a function that returns a list with all the information:
getAll <- function(df){
results<-list()
fit <- lm(mpg ~ disp, data = df)
x <- df$disp
y <- predict(fit, data.frame(disp= x), type = "response")
results$pars <- data.frame(intercept=coef(fit)[1],slope=coef(fit)[2])
results$prediction <- data.frame(x,y)
results
}
The problem is that I don't know how to use do() with the function getAll to obtain for example just a dataframe with the parameters (like the dataframe pars).
Like this?
coefficients <-models %>% do(data.frame(coef = coef(.$mod)[[1]], group = .[[1]]))
yielding
coef group
1 40.87196 4
2 19.08199 6
3 22.03280 8
Using the approach of Hadley Wickham in this video:
library(dplyr)
library(purrr)
library(broom)
fitmodel <- function(d) lm(mpg ~ disp, data = d)
by_cyl <- mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(mod = map(data, fitmodel),
pars = map(mod, tidy),
pred = map(mod, augment))
pars <- by_cyl %>% unnest(pars)
prediction <- by_cyl %>% unnest(pred)

Resources