I am trying to figure out how to add customized columns for labels when using gtsummary -- for example I want to add a column with headings for each summary statistics that I have. I don't want this in the characteristic column, I want this to be on the left of the characteristic group.
I am not sure what is the best way to achieve this is using gtsummary, I have the rest of the table but would need the column with the modifiable header in yellow.
This is the code I have so far:
library(tidyverse)
library(gtsummary)
trial %>%
dplyr::select(age, trt) %>%
gtsummary::tbl_summary(.,
by = trt,
missing = "no",
type = age ~ "continuous2",
statistic = age ~ c(
"{N_nonmiss}",
"{mean} ({sd})",
"{median} ({p25}, {p75})",
"{min}, {max}"),
digits = age ~ c(0, 1, 1, 1),
label = age ~ " ") %>%
gtsummary::add_overall() %>%
# This will add the Total column
gtsummary::add_stat_label(label = age ~ c("N",
"Mean (SD)",
"Median (Q1, Q3)",
"Min, Max")) %>%
gtsummary::modify_header(
label ~ "Summary Statistics",
stat_0 ~ "Total",
stat_1 ~ "Drug A",
stat_2 ~ "Drug B"
) %>%
gtsummary::modify_table_body(~ .x %>% dplyr::relocate(stat_0, .after = stat_2))
You're so close! See code example below :)
library(gtsummary)
library(tidyverse)
packageVersion("gtsummary")
#> [1] '1.5.2.9026'
trial %>%
dplyr::select(age, trt) %>%
gtsummary::tbl_summary(
by = trt,
missing = "no",
type = age ~ "continuous2",
statistic = age ~ c(
"{N_nonmiss}",
"{mean} ({sd})",
"{median} ({p25}, {p75})",
"{min}, {max}"),
digits = age ~ c(0, 1, 1, 1)
) %>%
gtsummary::add_overall(last = TRUE) %>%
modify_table_body(
~ .x %>%
mutate(
new_label = ifelse(row_type == "label", label, ""),
label = ifelse(row_type == "label", "", label),
.before = label
)
) %>%
modify_header(
new_label ~ "Characteristic",
label ~ "Summary Statistics",
stat_0 ~ "Total",
stat_1 ~ "Drug A",
stat_2 ~ "Drug B"
) %>%
modify_column_alignment(new_label, "left") %>%
as_kable()
Characteristic
Summary Statistics
Drug A
Drug B
Total
Age
N
91
98
189
Mean (SD)
47.0 (14.7)
47.4 (14.0)
47.2 (14.3)
Median (IQR)
46.0 (37, 59.0)
48.0 (39, 56.0)
47.0 (38, 57.0)
Range
6.0, 78.0
9.0, 83.0
6.0, 83.0
Created on 2022-04-21 by the reprex package (v2.0.1)
Related
I've got the following reprex
library(tidyverse)
library(gtsummary)
set.seed(50)
dat <- data.frame(exposed = sample(c("Unexposed","Exposed"), 100, TRUE),
year = rep(c(1985,1986), each = 50),
Age = rnorm(100, 85, 1),
Transit = sample(c("Bus", "Train", "Walk", "Car"), 100, TRUE))
dat %>%
tbl_strata(strata = year,
~ .x %>%
tbl_summary(
by = exposed,
include = c(Age, Transit),
statistic = list(Age ~ "{mean} ± {sd}"),
digits = Age ~ 1,
label = Age ~ "Age, mean ± SD"
)) %>%
modify_header(all_stat_cols() ~ "**{level}**") %>%
modify_footnote(update = everything() ~ NA)
which produces this table:
but when I try to add a new, separate footnote, the previous one gets overwritten
dat %>%
tbl_strata(strata = year,
~ .x %>%
tbl_summary(
by = exposed,
include = c(Age, Transit),
statistic = list(Age ~ "{mean} ± {sd}"),
digits = Age ~ 1,
label = Age ~ "Age, mean ± SD"
)) %>%
modify_header(all_stat_cols() ~ "**{level}**") %>%
modify_table_styling(columns = label,
rows = variable == "Age",
footnote = "Footnote 1") %>%
modify_table_styling(columns = label,
rows = label == "Transit",
footnote = "Footnote 2") %>%
modify_table_styling(columns = label,
rows = label == "Transit",
footnote = "Footnote 3") %>%
modify_footnote(update = everything() ~ NA)
and my table looks like this.
I've tried using modify_footnote as described here but I don't understand the documentation for how to get the footnotes out of the columns and into the rows.
The final output should look something like this.
For example, now I have two groups of data, Drug A and Drug B. I would like to add a column of the number of observations of each variable for only Drug A, how can I do that? I don't find a way using add_n.
The code for producing example table:
tbl_summary_ex2 <- trial %>% select(age, grade, response, trt) %>%
tbl_summary(
by = trt,
label = list(age ~ "Patient Age"),
statistic = list(all_continuous() ~ "{mean} ({sd})"),
digits = list(age ~ c(0, 1))
)
Here is one way to do it:
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.6.1'
# build table with only Drug A
tbl_summary_ex1 <-
trial %>%
dplyr::filter(trt == "Drug A") %>%
select(age, grade, response) %>%
tbl_summary(
label = list(age ~ "Patient Age"),
statistic = list(all_continuous() ~ "{mean} ({sd})"),
digits = list(age ~ c(0, 1))
) %>%
add_n(col_label = "**Drug A N**") %>%
modify_column_hide(all_stat_cols())
# build table split by treatment
tbl_summary_ex2 <-
trial %>%
select(age, grade, response, trt) %>%
tbl_summary(
by = trt,
label = list(age ~ "Patient Age"),
statistic = list(all_continuous() ~ "{mean} ({sd})"),
digits = list(age ~ c(0, 1))
)
# merge tables together
tbl_final <-
list(tbl_summary_ex1, tbl_summary_ex2) %>%
tbl_merge(tab_spanner = FALSE)
Created on 2022-08-19 by the reprex package (v2.0.1)
I want to create a function that would automatically generate the tables with summary statistics when i parse different column names. I am trying to create a function for gtsummary I have tried enquo and deparse but both don't seem to help. Can somebody please guide me in what I am doing wrong here.
get_stats <- function (var2) {
var2 <- dplyr::enquo(var2)
grp_val <- deparse(substitute(var2))
df %>%
gtsummary::tbl_summary(.,
by = trt,
missing = "no",
type =
list(!!var2 ~ "continuous2"),
statistic = list(
"{{var2}}" = c(
"{N_nonmiss}",
"{mean} ({sd})",
"{median} ({p25}, {p75})",
"{min}, {max}"
)
)
,
digits = !!var2 ~ c(0, 1, 1, 1)
)
}
The error I keep getting is Error: Error in type= argument input. Select from ‘age’, ‘trt’.
When I use this with the trial data without parsing anything it works fine.
trial %>%
dplyr::select(age, trt) %>%
dplyr::mutate_if(is.factor, as.character()) %>%
gtsummary::tbl_summary(
by = trt,
missing = "no",
type =
list(age ~ "continuous2"),
statistic = list(
"age" = c(
"{N_nonmiss}",
"{mean} ({sd})",
"{median} ({p25}, {p75})",
"{min}, {max}"
))
,
digits = age ~ c(0, 1, 1, 1)
)
Expected output from the code
Using rlang::as_name and named lists you could do:
library(gtsummary)
get_stats <- function(df, var2) {
var2_str <- rlang::as_name(rlang::enquo(var2))
df %>%
gtsummary::tbl_summary(.,
by = trt,
missing = "no",
type = setNames(list(c("continuous2")), var2_str),
statistic = setNames(list(c(
"{N_nonmiss}",
"{mean} ({sd})",
"{median} ({p25}, {p75})",
"{min}, {max}"
)), var2_str
),
digits = setNames(list(c(0, 1, 1, 1)), var2_str),
)
}
trial %>%
select(age, trt) %>%
dplyr::mutate_if(is.factor, as.character()) %>%
get_stats(age)
I would want to generate different tbl_summary tables from a loop(lapply function) over similar categorical variables (var1, var2, var3) applied to "by= " and assign each of them an object name e.g "tbl_var1", "tbl_var2" and "tbl_var3"
dflist <- c("var1",
"var2",
"var3")
vartbls = lapply(dflist, function(df) {
tbl_summary_ex2 <-
trial %>%
select(age, grade, response, trt) %>%
tbl_summary(
by = df,
label = list(age ~ "Patient Age"),
statistic = list(all_continuous() ~ "{mean} ({sd})"),
digits = list(age ~ c(0, 1))
)
}
)
Here is a reprex with a working version of your function with code to set the names:
library(gtsummary)
dflist <- c("age", "grade")
vartbls <- lapply(dflist, function(x) {
tbl_summary_ex2 <-
trial %>%
select(age, grade, response, trt) %>%
tbl_summary(
by = x,
label = list(age ~ "Patient Age"),
statistic = list(all_continuous() ~ "{mean} ({sd})"),
digits = list(age ~ c(0, 1))
)
}
)
names(vartbls) <- paste0("tbl_", dflist)
Here is a version using {purrr} and setting names before iterating:
library(gtsummary)
library(purrr)
result <- c("trt", "grade") %>%
purrr::set_names(paste0("tbl_", .)) %>%
purrr::map(., ~ trial %>%
select(age, grade, response, trt) %>%
tbl_summary(
by = .x,
label = list(age ~ "Patient Age"),
statistic = list(all_continuous() ~ "{mean} ({sd})"),
digits = list(age ~ c(0, 1))
))
When I do a tbl_stack, I'd like to show the total N of the combined tables in the tbl_stack in the header. At the moment the result appears to show the N of the first table in the stack.
trial %>%
select(age, grade, response, trt) %>%
filter(grade == "I") %>%
tbl_summary(
by = trt,
label = list(age ~ "Patient Age"),
statistic = list(all_continuous() ~ "{mean} ({sd})"),
digits = list(age ~ c(0, 1))
)
tbl_summary_ex2a <-
trial %>%
select(age, grade, response, trt) %>%
filter(grade %in% c("II", "III", "IV")) %>%
tbl_summary(
by = trt,
label = list(age ~ "Patient Age"),
statistic = list(all_continuous() ~ "{mean} ({sd})"),
digits = list(age ~ c(0, 1))
)
tbl_stack(tbls=list(tbl_summary_ex2, tbl_summary_ex2a))
Thanks for any tips,
Jeff
Yes, as the documentation of tbl_stack() indicates, the headers are retained from the first gtsummary in the stack. You can use the modify_header() function to change the headers, however. Additionally, these gtsummary tables have an internal object, .$df_by, that saves the Ns from each of your tables. You can sum the Ns across tables using these internal data frames. Example below doing this programmatically, but if it's easier you could simply hard code the Ns.
library(gtsummary)
library(tidyverse)
tbl_summary_ex2 <-
trial %>%
select(age, grade, response, trt) %>%
filter(grade == "I") %>%
tbl_summary(
by = trt,
label = list(age ~ "Patient Age"),
statistic = list(all_continuous() ~ "{mean} ({sd})"),
digits = list(age ~ c(0, 1)),
include = -grade
)
tbl_summary_ex2a <-
trial %>%
select(age, grade, response, trt) %>%
filter(grade %in% c("II", "III", "IV")) %>%
tbl_summary(
by = trt,
label = list(age ~ "Patient Age"),
statistic = list(all_continuous() ~ "{mean} ({sd})"),
digits = list(age ~ c(0, 1)),
include = -grade
)
# calculate the sum total Ns from both tables
list_N <-
tbl_summary_ex2$df_by %>%
bind_rows(tbl_summary_ex2a$df_by) %>%
select(by_col, by, n) %>%
group_by(by_col, by) %>%
summarise(n = sum(n)) %>%
mutate(
header_update =
str_glue("{by_col} ~ '**{by}**, N = {n}'") %>%
as.formula() %>%
list()
) %>%
pull(header_update)
list_N
tbl_stack(
tbls=list(tbl_summary_ex2, tbl_summary_ex2a),
group_header = c("Grade I", "Grade > I")
) %>%
modify_header(list_N)