How to arrange statistics as columns in gtsummary::tbl_summary function? - r

I have median,IQR, mean as the statistics to compute for variables in a dataset. The default out put looks like this:
How do I add a line in the tbl_summary function that make the summary table like below? ( this is just one of the variable showing)
summary_table<-df %>% tbl_summary(type = all_continuous() ~ "continuous2",statistic = list(all_continuous() ~ c("{mean}({sd})","{median}", "({p25}, {p75})","{min}, {max}")),

You can construct a table like this by merging multiple tbl_summary() tables. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'
stats <- c("N" = "{length}",
"Mean (SD)" = "{mean} ({sd})",
"(IQR)" = "({p25}, {p75})",
"Range" = "{min}, {max}")
tbl <-
purrr::imap(
stats,
~trial %>%
tbl_summary(include = "age", missing = "no", statistic = ~.x) %>%
modify_header(all_stat_cols() ~ stringr::str_glue("**{.y}**"))
) %>%
tbl_merge(tab_spanner = FALSE) %>%
modify_footnote(~NA)
Created on 2022-03-29 by the reprex package (v2.0.1)

Related

How to present interaction variables horizontally in tbl_regression in R

I want to present the coefficient of interaction horizontally rather than vertically with tbl_regression:
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.6.0'
tbl1 <- trial %>%
mutate(early=case_when(stage%in%c("T1","T2")~1,T~0)) %>%
glm(response ~ age * early , family = binomial, data=.) %>%
tbl_regression(
exponentiate = TRUE )
tbl1
tbl2 <- trial %>%
mutate(late=case_when(stage%in%c("T3","T4")~1,T~0)) %>%
glm(response ~ age * late, family = binomial, data=.) %>%
tbl_regression(
exponentiate = TRUE )
tbl2
tbl_stack (list(tbl1,tbl2))
Created on 2022-07-20 by the reprex package (v2.0.1)
I want to have the variables presented horizontally (similar to tbl_merg, but the values are from within the same module), in order to stack different values beneath it.
This requires some customization, but is certainly possible.
library(gtsummary)
#> #BlackLivesMatter
packageVersion("gtsummary")
#> [1] '1.6.1'
tbl1 <-
trial %>%
mutate(early = dplyr::case_when(stage %in% c("T1","T2") ~ 1, TRUE ~ 0)) %>%
glm(response ~ age * early , family = binomial, data=.) %>%
tbl_regression(exponentiate = TRUE)
tbl_final <-
1:3 %>%
purrr::map(
function(i) {
tbl1 %>%
modify_table_body(
~ .x %>%
dplyr::mutate(
label = label[1],
variable = variable[1],
row_type = row_type[1]
) %>%
dplyr::filter(dplyr::row_number() %in% i)
)
}
) %>%
tbl_merge(tab_spanner = c("**Variable**", "**Early**", "**Interaction**"))
Created on 2022-07-20 by the reprex package (v2.0.1)

Creating a tbl_continuous with weight data using gtsummary

Problem:
Can't find a way to create a tbl_continuous of a weighted numerical variable. I'm using tbl_svysummary to create my categorical variable tables but it's not useful when i try to do the same with numerical variable tables. Note: i'm not looking for a general mean of my numeric variable but separated by or groupped by a categorical variable.
Attempt:
For example, i've created this table with the help of the function tbl_continuous which does exactly what i want: the mean of my numeric variable but by the levels of my categorical variable. The only problem is that i can't insert a weight variable into this function.
```{r}
base2 %>%
as_label() %>%
select(ing_cap, ano, nacional, dominio) %>%
tbl_continuous(variable = ing_cap,
by = ano,
statistic = list(everything() ~ "{median}"))
```
Also, i have been creating weighted data with the srvyr package in the following way:
base2 %>%
labelled::drop_unused_value_labels() %>%
as_label() %>%
as_survey_design(weight = fac500a)
Could add up to the solution.
Request:
Create this same table (shown in the image) but with a weight variable. My weight variable in my data is called fac500a.
My data:
My data can be dowloaded from my github repo and has the following dimensions:
> dim(base2)
[1] 108103 44
https://github.com/aito123/quarto_blog/raw/master/posts/tablas_tesis/base2.sav
(dput output is long)
My current packages:
I'm using this r packages so far: tidyverse, srvyr, gtsummary, sjlabelled, haven
Conclusion:
Let me know if it's neccesary to provide more information.
The gtsummary package does not export an analogous function of tbl_continuous() for survey data. But you can construct the table. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.6.0'
svy_trial <- survey::svydesign(ids = ~1, data = trial, weights = ~1)
tbl <-
svy_trial %>%
tbl_strata2(
strata = grade,
~ .x %>%
tbl_svysummary(
by = trt,
include = age,
missing = "no",
label = list(age = .y)
) %>%
modify_header(all_stat_cols() ~ "**{level}**"),
.combine_with = "tbl_stack",
.combine_args = list(group_header = NULL)
) %>%
modify_table_body(
~ .x %>%
mutate(variable = "grade", row_type = "level") %>%
tibble::add_row(
row_type = "label",
variable = "grade",
label = "Grade",
.before = 1L
)
) %>%
modify_column_indent(columns = label, rows = row_type == "level") %>%
bold_labels() %>%
modify_spanning_header(all_stat_cols() ~ "**Treatment**") %>%
modify_footnote(all_stat_cols() ~ "Age: Median (IQR)")
Created on 2022-05-16 by the reprex package (v2.0.1)

How to add pooled standard error in tbl_summary() and eta effect size?

I am trying to include Pooled Standard Error (PSE) and Eta square to tbl_summary(). PSE is calculated by sqrt(mean(residuals^2)/n), I tried to calculate step by step by extracting the residuals from either aov() or lm(), but I got the error saying The dimension of respected variable and the added statistic do not match. Expecting statistic/dataframe to be length/ no. rows 1. Here is my code:
PSE <- function(data, variable, by,...) {
aov(data[["variable"]] ~ as.factor(data[[by]]))$residuals
}
Dataset_TPA_Full %>%
select(diet,hardness_g,adhesiveness_g_sec, resilence, cohesion, springiness, gumminess, chewiness, firmness_g_force_1, density_g_l)%>%
tbl_summary(
by = diet,
statistic = all_continuous() ~ "{mean} ± {sem}",
label = list(hardness_g = "Hardness (g)",
adhesiveness_g_sec = "Adhesiveness (g/ sec)",
resilence = "Resilience",
cohesion = "Cohesion",
springiness = "Springiness",
gumminess = "Gumminess",
chewiness = "Chewiness",
firmness_g_force_1 = "Firmness (g)",
density_g_l = "Density (g/ L)")
) %>%
add_p(
test = all_continuous() ~ "aov",
) %>%
add_stat(fns = all_continuous() ~ PSE) %>%
modify_header(label = "**Treatment**", p.value = "**p-value**") %>%
bold_labels() %>%
bold_levels()
Also when I tried to add Eta squared using this code, it return missing data argument when I put it in the add_stat() function
my_ES_test <- function(data, variable, by, ...) {
aovmod = aov(data[[variable]] ~ data[[by]])
lsr::etaSquared(aovmod)[1,1]
}
Can you help me with this?
Thank you.
This should do it:
library(gtsummary)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
sem <- function(x){
sqrt(var(x, na.rm=TRUE)/sum(!is.na(x)))
}
PSE <- function(data, variable, by,...) {
e <- aov(data[[variable]] ~ as.factor(data[[by]]))$residuals
sqrt(mean(e^2)/length(e))
}
mtcars %>%
select(cyl, mpg, hp, disp, drat, qsec)%>%
tbl_summary(
by = cyl,
statistic = all_continuous() ~ "{mean} ± {sem}",
label = list(mpg = "Miles per Gallon",
hp = "Horsepower",
disp = "Displacement",
drat = "Rear Axel Ratio",
qsec = "1/4 Mile Time")
) %>%
add_p(
test = all_continuous() ~ "aov",
) %>%
add_stat(fns = all_continuous() ~ PSE) %>%
modify_header(label = "**Treatment**", p.value = "**p-value**", add_stat_1 = "**PSE**") %>%
bold_labels() %>%
bold_levels()
Created on 2022-04-17 by the reprex package (v2.0.1)
Note, the PSE() function had two problems. First, the data[["variable"]] should be data[[variable]] (without the quotes around variable). Second, you had the function return the residuals, not the PSE calculation you described in the question. Now, it returns the appropriate result. I also am not sure where you got the sem() function, so I just made one that calculates the standard error of the mean.
Updated PSE function
PSE <- function(data, variable, by,...) {
s <- data %>%
group_by(!!sym(by)) %>%
summarise(s = var(!!sym(variable)),
n = n()) %>%
mutate(num = s*(n-1))
psd <- sqrt(sum(s$num)/(sum(s$n) - nrow(s)))
psd*sqrt(sum(1/s$n))
}

Grouping Rows in GTSummary

I am trying to group some rows/variables (both categorical and continuous) to help with the table readability in a large dataset.
Here is the dummy dataset:
library(gtsummary)
library(tidyverse)
library(gt)
set.seed(11012021)
# Create Dataset
PIR <-
tibble(
siteidn = sample(c("1324", "1329", "1333", "1334"), 5000, replace = TRUE, prob = c(0.2, 0.45, 0.15, 0.2)) %>% factor(),
countryname = sample(c("NZ", "Australia"), 5000, replace = TRUE, prob = c(0.3, 0.7)) %>% factor(),
hospt = sample(c("Metropolitan", "Rural"), 5000, replace = TRUE, prob = c(0.65, 0.35)) %>% factor(),
age = rnorm(5000, mean = 60, sd = 20),
apache2 = rnorm(5000, mean = 18.5, sd=10),
apache3 = rnorm(5000, mean = 55, sd=20),
mechvent = sample(c("Yes", "No"), 5000, replace = TRUE, prob = c(0.4, 0.6)) %>% factor(),
sex = sample(c("Female", "Male"), 5000, replace = TRUE) %>% factor(),
patient = TRUE
) %>%
mutate(patient_id = row_number())%>%
group_by(
siteidn) %>% mutate(
count_site = row_number() == 1L) %>%
ungroup()%>%
group_by(
patient_id) %>% mutate(
count_pt = row_number() == 1L) %>%
ungroup()
Then I use the following code to generate my table:
t1 <- PIR %>%
select(patientn = count_pt, siten = count_site, age, sex, apache2, apache3, apache2, mechvent, countryname) %>%
tbl_summary(
by = countryname,
missing = "no",
statistic = list(
patientn ~ "{n}",
siten ~ "{n}",
age ~ "{mean} ({sd})",
apache2 ~ "{mean} ({sd})",
mechvent ~ "{n} ({p}%)",
sex ~ "{n} ({p}%)",
apache3 ~ "{mean} ({sd})"),
label = list(
siten = "Number of ICUs",
patientn = "Number of Patients",
age = "Age",
apache2 = "APACHE II Score",
mechvent = "Mechanical Ventilation",
sex = "Sex",
apache3 = "APACHE III Score")) %>%
modify_header(stat_by = "**{level}**") %>%
add_overall(col_label = "**Overall**")
t2 <- PIR %>%
select(patientn = count_pt, siten = count_site, age, sex, apache2, apache3, apache2, mechvent, hospt) %>%
tbl_summary(
by = hospt,
missing = "no",
statistic = list(
patientn ~ "{n}",
siten ~ "{n}",
age ~ "{mean} ({sd})",
apache2 ~ "{mean} ({sd})",
mechvent ~ "{n} ({p}%)",
sex ~ "{n} ({p}%)",
apache3 ~ "{mean} ({sd})"),
label = list(
siten = "Number of ICUs",
patientn = "Number of Patients",
age = "Age",
apache2 = "APACHE II Score",
mechvent = "Mechanical Ventilation",
sex = "Sex",
apache3 = "APACHE III Score")) %>%
modify_header(stat_by = "**{level}**")
tbl <-
tbl_merge(
tbls = list(t1, t2),
tab_spanner = c("**Country**", "**Hospital Type**")
) %>%
modify_spanning_header(stat_0_1 ~ NA) %>%
modify_footnote(everything() ~ NA)
This produces the following table:
I would like to group certain rows together for ease of reading. Ideally, I would like the table to look like this:
I have attempted using the gt package, with the following code:
tbl <-
tbl_merge(
tbls = list(t1, t2),
tab_spanner = c("**Country**", "**Hospital Type**")
) %>%
modify_spanning_header(stat_0_1 ~ NA) %>%
modify_footnote(everything() ~ NA) %>%
as_gt() %>%
gt::tab_row_group(
group = "Severity of Illness Scores",
rows = 7:8) %>%
gt::tab_row_group(
group = "Patient Demographics",
rows = 3:6) %>%
gt::tab_row_group(
group = "Numbers",
rows = 1:2)
This produces the desired table:
There are a couple of issues I'm having with the way that I'm doing this.
When I try to use the row names (variables), an error message comes up (Can't subset columns that don't exist...). Is there a way to do this by using the variable names? With larger tables, I am getting into some trouble with using the row numbers method of assigning row names. This is particularly true when there is a single variable that loses its place as it's moved to the end to account for the grouped rows.
Is there a way to do this prior to piping into tbl_summary? Although I like the output of this table, I use Word as my output document for statistical reports and would like the ability to be able to format the tables in Word if need be (or by my collaborators). I usually use gtsummary::as_flextable for table output.
Thanks again,
Ben
When I try to use the row names (variables), an error message comes up (Can't subset columns that don't exist...). Is there a way to do this by using the variable names? With larger tables, I am getting into some trouble with using the row numbers method of assigning row names. This is particularly true when there is a single variable that loses its place as it's moved to the end to account for the grouped rows.
There are two ways to go about this, 1. build separate tables for each group, then stack them, and 2. add a grouping column to .$table_body then group the tibble by the new variable.
library(gtsummary)
library(dplyr)
packageVersion("gtsummary")
#> '1.3.6'
# Method 1 - Stack separate tables
t1 <- trial %>% select(age) %>% tbl_summary()
t2 <- trial %>% select(grade) %>% tbl_summary()
tbl1 <-
tbl_stack(
list(t1, t2),
group_header = c("Demographics", "Tumor Characteristics")
) %>%
modify_footnote(all_stat_cols() ~ NA)
# Method 2 - build a grouping variable
tbl2 <-
trial %>%
select(age, grade) %>%
tbl_summary() %>%
modify_table_body(
mutate,
groupname_col = case_when(variable == "age" ~ "Deomgraphics",
variable == "grade" ~ "Tumor Characteristics")
)
2.Is there a way to do this prior to piping into tbl_summary? Although I like the output of this table, I use Word as my output document for statistical reports and would like the ability to be able to format the tables in Word if need be (or by my collaborators). I usually use gtsummary::as_flextable for table output.
The examples above modify the table before exporting to gt format, so you can export these example to flextable. However, flextable does not have the same built-in header row functionality (or at least I am unaware of it, and don't use it in as_flex_table()), and the output would look like the table below. I recommend installing the dev version of gt from GitHub and export to RTF (supported by Word)--they've made many updates to RTF output in the last months, and it may work for you.
I think I might have a solution for this (thanks, obviously, to Daniel Sjoberg and team providing us with the modify_table_body function)
All you need to do is edit the underlying data frame to add a variable with your desired grouping row using modify_table_body, and then put it in the position you want it to be in, like this:
library(gtsummary)
library(dplyr)
packageVersion("gtsummary")
trial%>%
select(age, stage, grade)%>%
tbl_summary()%>%
modify_table_body(
~.x %>%
# add your variable
rbind(
tibble(
variable="Demographics",
var_type=NA,
var_label = "Demographics",
row_type="label",
label="Demographics",
stat_0= NA))%>% # expand the components of the tibble as needed if you have more columns
# can add another one
rbind(
tibble(
variable="Tumor characteristics",
var_type=NA,
var_label = "Tumor characteristics",
row_type="label",
label="Tumor characteristics",
stat_0= NA))%>%
# specify the position you want these in
arrange(factor(variable, levels=c("Demographics",
"age",
"Tumor characteristics",
"stage",
"grade"))))%>%
# and you can then indent the actual variables
modify_column_indent(columns=label, rows=variable%in%c("age",
"stage",
"grade"))%>%
# and double indent their levels
modify_column_indent(columns=label, rows= (variable%in%c("stage",
"grade")
& row_type=="level"),
double_indent=T)

How to generate effect size [90%CI] in the summary table using R package “gtsummary”?

I am working on creating summary table using the R package "gtsummary". This is actually very good. The add_stat function gives you a lot of freedom to include add-ons. For example, in my area we want to inform the effect size with confidence interval (ES [90% CI]). So, I would like help to include the CI range. The code I implemented is working, but without digit control and without the CI range.
# Packages ----------------------------------------------------------------
library(gtsummary)
library(gt)
library(dplyr)
library(purrr)
# Example 1 ---------------------------------------------------------------
# fn returns ES value
my_EStest <- function(data, variable, by, ...) {
effsize::cohen.d(data[[variable]] ~ as.factor(data[[by]]),
conf.level=.90, pooled=TRUE, paired=FALSE,
hedges.correction=TRUE)$estimate
}
add_ES <-
trial %>%
select(trt, age) %>%
tbl_summary(by = trt, missing = "no",
statistic = list(all_continuous() ~ "{mean} ({sd})"),
digits = list(all_continuous() ~ c(1,1))) %>%
add_p(test = everything() ~ t.test) %>%
add_stat(
fns = everything() ~ my_EStest,
fmt_fun = style_pvalue,
header = "**ES**"
)
add_ES
# counterproof
effsize::cohen.d(age ~ trt, data = trial, conf.level=.90, return.dm=TRUE, pooled=TRUE, paired=FALSE, hedges.correction=TRUE)
I think the easiest way to do this is to add the confidence interval along with the estimate already formatted.
You update my_EStest function to return an already formatted statistic including both the estimate and the confidence interval. Does this output work for you?
library(tidyverse)
library(gtsummary)
my_EStest <- function(data, variable, by, ...) {
# Cohen's D
d <- effsize::cohen.d(data[[variable]] ~ as.factor(data[[by]]),
conf.level=.90, pooled=TRUE, paired=FALSE,
hedges.correction=TRUE)
# Formatting statistic with CI
est <- style_sigfig(d$estimate)
ci <- style_sigfig(d$conf.int) %>% paste(collapse = ", ")
# returning estimate with CI together
str_glue("{est} ({ci})")
}
add_ES <-
trial %>%
select(trt, age) %>%
tbl_summary(by = trt, missing = "no",
statistic = list(all_continuous() ~ "{mean} ({sd})"),
digits = list(all_continuous() ~ c(1,1))) %>%
add_p(test = everything() ~ t.test) %>%
add_stat(
fns = everything() ~ my_EStest,
fmt_fun = NULL,
header = "**ES (90% CI)**"
) %>%
modify_footnote(add_stat_1 ~ "Cohen's D (90% CI)")

Resources