I am trying to group some rows/variables (both categorical and continuous) to help with the table readability in a large dataset.
Here is the dummy dataset:
library(gtsummary)
library(tidyverse)
library(gt)
set.seed(11012021)
# Create Dataset
PIR <-
tibble(
siteidn = sample(c("1324", "1329", "1333", "1334"), 5000, replace = TRUE, prob = c(0.2, 0.45, 0.15, 0.2)) %>% factor(),
countryname = sample(c("NZ", "Australia"), 5000, replace = TRUE, prob = c(0.3, 0.7)) %>% factor(),
hospt = sample(c("Metropolitan", "Rural"), 5000, replace = TRUE, prob = c(0.65, 0.35)) %>% factor(),
age = rnorm(5000, mean = 60, sd = 20),
apache2 = rnorm(5000, mean = 18.5, sd=10),
apache3 = rnorm(5000, mean = 55, sd=20),
mechvent = sample(c("Yes", "No"), 5000, replace = TRUE, prob = c(0.4, 0.6)) %>% factor(),
sex = sample(c("Female", "Male"), 5000, replace = TRUE) %>% factor(),
patient = TRUE
) %>%
mutate(patient_id = row_number())%>%
group_by(
siteidn) %>% mutate(
count_site = row_number() == 1L) %>%
ungroup()%>%
group_by(
patient_id) %>% mutate(
count_pt = row_number() == 1L) %>%
ungroup()
Then I use the following code to generate my table:
t1 <- PIR %>%
select(patientn = count_pt, siten = count_site, age, sex, apache2, apache3, apache2, mechvent, countryname) %>%
tbl_summary(
by = countryname,
missing = "no",
statistic = list(
patientn ~ "{n}",
siten ~ "{n}",
age ~ "{mean} ({sd})",
apache2 ~ "{mean} ({sd})",
mechvent ~ "{n} ({p}%)",
sex ~ "{n} ({p}%)",
apache3 ~ "{mean} ({sd})"),
label = list(
siten = "Number of ICUs",
patientn = "Number of Patients",
age = "Age",
apache2 = "APACHE II Score",
mechvent = "Mechanical Ventilation",
sex = "Sex",
apache3 = "APACHE III Score")) %>%
modify_header(stat_by = "**{level}**") %>%
add_overall(col_label = "**Overall**")
t2 <- PIR %>%
select(patientn = count_pt, siten = count_site, age, sex, apache2, apache3, apache2, mechvent, hospt) %>%
tbl_summary(
by = hospt,
missing = "no",
statistic = list(
patientn ~ "{n}",
siten ~ "{n}",
age ~ "{mean} ({sd})",
apache2 ~ "{mean} ({sd})",
mechvent ~ "{n} ({p}%)",
sex ~ "{n} ({p}%)",
apache3 ~ "{mean} ({sd})"),
label = list(
siten = "Number of ICUs",
patientn = "Number of Patients",
age = "Age",
apache2 = "APACHE II Score",
mechvent = "Mechanical Ventilation",
sex = "Sex",
apache3 = "APACHE III Score")) %>%
modify_header(stat_by = "**{level}**")
tbl <-
tbl_merge(
tbls = list(t1, t2),
tab_spanner = c("**Country**", "**Hospital Type**")
) %>%
modify_spanning_header(stat_0_1 ~ NA) %>%
modify_footnote(everything() ~ NA)
This produces the following table:
I would like to group certain rows together for ease of reading. Ideally, I would like the table to look like this:
I have attempted using the gt package, with the following code:
tbl <-
tbl_merge(
tbls = list(t1, t2),
tab_spanner = c("**Country**", "**Hospital Type**")
) %>%
modify_spanning_header(stat_0_1 ~ NA) %>%
modify_footnote(everything() ~ NA) %>%
as_gt() %>%
gt::tab_row_group(
group = "Severity of Illness Scores",
rows = 7:8) %>%
gt::tab_row_group(
group = "Patient Demographics",
rows = 3:6) %>%
gt::tab_row_group(
group = "Numbers",
rows = 1:2)
This produces the desired table:
There are a couple of issues I'm having with the way that I'm doing this.
When I try to use the row names (variables), an error message comes up (Can't subset columns that don't exist...). Is there a way to do this by using the variable names? With larger tables, I am getting into some trouble with using the row numbers method of assigning row names. This is particularly true when there is a single variable that loses its place as it's moved to the end to account for the grouped rows.
Is there a way to do this prior to piping into tbl_summary? Although I like the output of this table, I use Word as my output document for statistical reports and would like the ability to be able to format the tables in Word if need be (or by my collaborators). I usually use gtsummary::as_flextable for table output.
Thanks again,
Ben
When I try to use the row names (variables), an error message comes up (Can't subset columns that don't exist...). Is there a way to do this by using the variable names? With larger tables, I am getting into some trouble with using the row numbers method of assigning row names. This is particularly true when there is a single variable that loses its place as it's moved to the end to account for the grouped rows.
There are two ways to go about this, 1. build separate tables for each group, then stack them, and 2. add a grouping column to .$table_body then group the tibble by the new variable.
library(gtsummary)
library(dplyr)
packageVersion("gtsummary")
#> '1.3.6'
# Method 1 - Stack separate tables
t1 <- trial %>% select(age) %>% tbl_summary()
t2 <- trial %>% select(grade) %>% tbl_summary()
tbl1 <-
tbl_stack(
list(t1, t2),
group_header = c("Demographics", "Tumor Characteristics")
) %>%
modify_footnote(all_stat_cols() ~ NA)
# Method 2 - build a grouping variable
tbl2 <-
trial %>%
select(age, grade) %>%
tbl_summary() %>%
modify_table_body(
mutate,
groupname_col = case_when(variable == "age" ~ "Deomgraphics",
variable == "grade" ~ "Tumor Characteristics")
)
2.Is there a way to do this prior to piping into tbl_summary? Although I like the output of this table, I use Word as my output document for statistical reports and would like the ability to be able to format the tables in Word if need be (or by my collaborators). I usually use gtsummary::as_flextable for table output.
The examples above modify the table before exporting to gt format, so you can export these example to flextable. However, flextable does not have the same built-in header row functionality (or at least I am unaware of it, and don't use it in as_flex_table()), and the output would look like the table below. I recommend installing the dev version of gt from GitHub and export to RTF (supported by Word)--they've made many updates to RTF output in the last months, and it may work for you.
I think I might have a solution for this (thanks, obviously, to Daniel Sjoberg and team providing us with the modify_table_body function)
All you need to do is edit the underlying data frame to add a variable with your desired grouping row using modify_table_body, and then put it in the position you want it to be in, like this:
library(gtsummary)
library(dplyr)
packageVersion("gtsummary")
trial%>%
select(age, stage, grade)%>%
tbl_summary()%>%
modify_table_body(
~.x %>%
# add your variable
rbind(
tibble(
variable="Demographics",
var_type=NA,
var_label = "Demographics",
row_type="label",
label="Demographics",
stat_0= NA))%>% # expand the components of the tibble as needed if you have more columns
# can add another one
rbind(
tibble(
variable="Tumor characteristics",
var_type=NA,
var_label = "Tumor characteristics",
row_type="label",
label="Tumor characteristics",
stat_0= NA))%>%
# specify the position you want these in
arrange(factor(variable, levels=c("Demographics",
"age",
"Tumor characteristics",
"stage",
"grade"))))%>%
# and you can then indent the actual variables
modify_column_indent(columns=label, rows=variable%in%c("age",
"stage",
"grade"))%>%
# and double indent their levels
modify_column_indent(columns=label, rows= (variable%in%c("stage",
"grade")
& row_type=="level"),
double_indent=T)
Related
I've got the following reprex
library(tidyverse)
library(gtsummary)
set.seed(50)
dat <- data.frame(exposed = sample(c("Unexposed","Exposed"), 100, TRUE),
year = rep(c(1985,1986), each = 50),
Age = rnorm(100, 85, 1),
Transit = sample(c("Bus", "Train", "Walk", "Car"), 100, TRUE))
dat %>%
tbl_strata(strata = year,
~ .x %>%
tbl_summary(
by = exposed,
include = c(Age, Transit),
statistic = list(Age ~ "{mean} ± {sd}"),
digits = Age ~ 1,
label = Age ~ "Age, mean ± SD"
)) %>%
modify_header(all_stat_cols() ~ "**{level}**") %>%
modify_footnote(update = everything() ~ NA)
which produces this table:
but when I try to add a new, separate footnote, the previous one gets overwritten
dat %>%
tbl_strata(strata = year,
~ .x %>%
tbl_summary(
by = exposed,
include = c(Age, Transit),
statistic = list(Age ~ "{mean} ± {sd}"),
digits = Age ~ 1,
label = Age ~ "Age, mean ± SD"
)) %>%
modify_header(all_stat_cols() ~ "**{level}**") %>%
modify_table_styling(columns = label,
rows = variable == "Age",
footnote = "Footnote 1") %>%
modify_table_styling(columns = label,
rows = label == "Transit",
footnote = "Footnote 2") %>%
modify_table_styling(columns = label,
rows = label == "Transit",
footnote = "Footnote 3") %>%
modify_footnote(update = everything() ~ NA)
and my table looks like this.
I've tried using modify_footnote as described here but I don't understand the documentation for how to get the footnotes out of the columns and into the rows.
The final output should look something like this.
Problem:
Can't find a way to create a tbl_continuous of a weighted numerical variable. I'm using tbl_svysummary to create my categorical variable tables but it's not useful when i try to do the same with numerical variable tables. Note: i'm not looking for a general mean of my numeric variable but separated by or groupped by a categorical variable.
Attempt:
For example, i've created this table with the help of the function tbl_continuous which does exactly what i want: the mean of my numeric variable but by the levels of my categorical variable. The only problem is that i can't insert a weight variable into this function.
```{r}
base2 %>%
as_label() %>%
select(ing_cap, ano, nacional, dominio) %>%
tbl_continuous(variable = ing_cap,
by = ano,
statistic = list(everything() ~ "{median}"))
```
Also, i have been creating weighted data with the srvyr package in the following way:
base2 %>%
labelled::drop_unused_value_labels() %>%
as_label() %>%
as_survey_design(weight = fac500a)
Could add up to the solution.
Request:
Create this same table (shown in the image) but with a weight variable. My weight variable in my data is called fac500a.
My data:
My data can be dowloaded from my github repo and has the following dimensions:
> dim(base2)
[1] 108103 44
https://github.com/aito123/quarto_blog/raw/master/posts/tablas_tesis/base2.sav
(dput output is long)
My current packages:
I'm using this r packages so far: tidyverse, srvyr, gtsummary, sjlabelled, haven
Conclusion:
Let me know if it's neccesary to provide more information.
The gtsummary package does not export an analogous function of tbl_continuous() for survey data. But you can construct the table. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.6.0'
svy_trial <- survey::svydesign(ids = ~1, data = trial, weights = ~1)
tbl <-
svy_trial %>%
tbl_strata2(
strata = grade,
~ .x %>%
tbl_svysummary(
by = trt,
include = age,
missing = "no",
label = list(age = .y)
) %>%
modify_header(all_stat_cols() ~ "**{level}**"),
.combine_with = "tbl_stack",
.combine_args = list(group_header = NULL)
) %>%
modify_table_body(
~ .x %>%
mutate(variable = "grade", row_type = "level") %>%
tibble::add_row(
row_type = "label",
variable = "grade",
label = "Grade",
.before = 1L
)
) %>%
modify_column_indent(columns = label, rows = row_type == "level") %>%
bold_labels() %>%
modify_spanning_header(all_stat_cols() ~ "**Treatment**") %>%
modify_footnote(all_stat_cols() ~ "Age: Median (IQR)")
Created on 2022-05-16 by the reprex package (v2.0.1)
I want to create a function that would automatically generate the tables with summary statistics when i parse different column names. I am trying to create a function for gtsummary I have tried enquo and deparse but both don't seem to help. Can somebody please guide me in what I am doing wrong here.
get_stats <- function (var2) {
var2 <- dplyr::enquo(var2)
grp_val <- deparse(substitute(var2))
df %>%
gtsummary::tbl_summary(.,
by = trt,
missing = "no",
type =
list(!!var2 ~ "continuous2"),
statistic = list(
"{{var2}}" = c(
"{N_nonmiss}",
"{mean} ({sd})",
"{median} ({p25}, {p75})",
"{min}, {max}"
)
)
,
digits = !!var2 ~ c(0, 1, 1, 1)
)
}
The error I keep getting is Error: Error in type= argument input. Select from ‘age’, ‘trt’.
When I use this with the trial data without parsing anything it works fine.
trial %>%
dplyr::select(age, trt) %>%
dplyr::mutate_if(is.factor, as.character()) %>%
gtsummary::tbl_summary(
by = trt,
missing = "no",
type =
list(age ~ "continuous2"),
statistic = list(
"age" = c(
"{N_nonmiss}",
"{mean} ({sd})",
"{median} ({p25}, {p75})",
"{min}, {max}"
))
,
digits = age ~ c(0, 1, 1, 1)
)
Expected output from the code
Using rlang::as_name and named lists you could do:
library(gtsummary)
get_stats <- function(df, var2) {
var2_str <- rlang::as_name(rlang::enquo(var2))
df %>%
gtsummary::tbl_summary(.,
by = trt,
missing = "no",
type = setNames(list(c("continuous2")), var2_str),
statistic = setNames(list(c(
"{N_nonmiss}",
"{mean} ({sd})",
"{median} ({p25}, {p75})",
"{min}, {max}"
)), var2_str
),
digits = setNames(list(c(0, 1, 1, 1)), var2_str),
)
}
trial %>%
select(age, trt) %>%
dplyr::mutate_if(is.factor, as.character()) %>%
get_stats(age)
I create some models like this using a nested tidyr dataframe:
set.seed(1)
library(tidyr)
library(dplyr)
library(sjPlot)
library(tibble)
library(purrr)
fits <- tribble(~group, ~colA, ~colB, ~colC,
sample(c("group1", "group2"), 10, replace = T), 0, sample(10, replace = T), sample(10, replace = T),
sample(c("group1", "group2"), 10, replace = T), 1, sample(10, replace = T), sample(10, replace = T)) %>%
unnest(cols = c(colB, colC)) %>%
nest(data=-group) %>%
mutate(fit= map(data, ~glm(formula = colA ~ colB + colC, data = .x, family="binomial"))) %>%
dplyr::select(group, fit) %>%
tibble::column_to_rownames("group")
I would like to use this data to create some quick marginal effects plots with sjPlot::plot_models like this
plot_models(as.list(fits), type = "pred", terms = c("colB", "colA", "colC"))
Unfortunately, I get the error
Error in if (fam.info$is_linear) tf <- NULL else tf <- "exp" :
argument is of length zero
In addition: Warning message:
Could not access model information.
I've played around a bit with the nesting of the data but I've been unable to get it into a format that sjPlot::plot_models will accept.
What I was expecting to get is a "Forest plot of multiple regression models" as described in the help file. Ultimately, the goal is to plot the marginal effects of regression models by group, which I was hoping the plot_models will do (please correct me if I'm wrong).
It think there are some issues with the original code as well as with the data. There are arguments from plot_model in the function call which are not supported in plot_models. I first show an example that shows how plot_models can be called and used with a nested tibble using {ggplot2}'s diamonds data set. Then I apply this approach to the OP's sample data, which doesn't yield useable results*. Finally, I create some new toy data to show how the approach could be applied to a binominal model.
(* In the original toy data the dependent variable is either always 0 or always 1 in each model so this is unlikely to yield useable results).
set.seed(1)
library(tidyr)
library(dplyr)
library(sjPlot)
library(tibble)
library(ggplot2)
# general example
fits <- tibble(id = c("x", "y", "z")) %>%
rowwise() %>%
mutate(fit = list(glm(reformulate(
termlabels = c("cut", "color", "depth", "table", "price", id),
response = "carat"),
data = diamonds)))
plot_models(fits$fit)
# OP's example data
fits2 <- tribble(~group, ~colA, ~colB, ~colC,
sample(c("group1", "group2"), 10, replace = T), 0,
sample(10, replace = T), sample(10, replace = T),
sample(c("group1", "group2"), 10, replace = T), 1,
sample(10, replace = T),
sample(10, replace = T)) %>%
unnest(cols = c(colB, colC)) %>%
nest(data = -group) %>%
rowwise() %>%
mutate(fit = list(glm(formula = colA ~ colB + colC, data = data, family="binomial")))
plot_models(fits2$fit)
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning: Removed 4 rows containing missing values (geom_point).
# new data for binominal model
n <- 500
g <- round(runif(n, 0L, 1L), 0)
x1 <- runif(n,0,100)
x2 <- runif(n,0,100)
y <- (x2 - x1 + rnorm(n,sd=20)) < 0
fits3 <- tibble(g, y, x1, x2) %>%
nest_by(g) %>%
mutate(fit = list(glm(formula = y ~ x1 + x2, data = data, family="binomial")))
plot_models(fits3$fit)
Created on 2021-01-23 by the reprex package (v0.3.0)
I am using gtsummary to summarise my linear regression results. I am trying to omit out the p-value for each sex (column.
Any support on this would be greatly appreciated. I have included dummy data to reproduce what I am trying to do, as well as an image of my linear reg table as it stand.
# install dev versions
remotes::install_github("ddsjoberg/gtsummary#mice_nnet")
remotes::install_github("larmarange/broom.helpers")
# load packages
library(gtsummary)
library(nnet)
theme_gtsummary_compact()
# dummy data
crime <-data.frame(city = sample(c("SF", "AR", "NYC","MN"),13000,replace = TRUE),
sex = sample(c("Male", "Female"),13000,replace = TRUE),
year = sample(as.numeric(sample(10:20, 13000, replace = TRUE)))
)
# serperate data sets by sex
crime_f <- crime %>%
filter(sex == "Female")
crime_m <- crime %>%
filter(sex == "Male")
# build model for females
mod_f <- lm(year ~ city, data = crime_f) %>%
tbl_regression(exponentiate = TRUE) %>%
modify_header(estimate ~ "**OR**")
# build model for males
mod_m <- lm(year ~ city, data = crime_m) %>%
tbl_regression(exponentiate = TRUE) %>%
modify_header(estimate ~ "**OR**")
# lm model tabulated with gtsummary
tbl <- tbl_merge(
tbls = list(mod_f, mod_m),
tab_spanner = c("**Female**", "**Male**")
)
tbl # check table
With the modify_table_header() function you can chose to hide columns in your output, including p-values:
tbl %>%
modify_table_header(
column = c(p.value_1, p.value_2),
hide = TRUE
)
Good luck!