Set values default theme and bold variables for gtsummary package - r

I have a question regarding setting the default theme for the gtsummary package.
library(tidyverse)
library(gtsummary)
library(gapminder)
gap <- gapminder %>%
dplyr::mutate_all(~ifelse(
sample(c(TRUE, FALSE), size = length(.), replace = TRUE, prob = c(0.8, 0.2)),
as.character(.),
NA)
) %>%
dplyr::mutate_at(vars(year:gdpPercap), ~as.numeric(.)
) %>%
dplyr::mutate(gdpPercap = ifelse(gdpPercap > median(gdpPercap, na.rm = TRUE), "high", "low"))
my_theme <-
list(
"tbl_summary-str:default_con_type" = "continuous2",
"tbl_summary-str:continuous_stat" = c("{median} ({p25} - {p75})",
"{mean} ({sd})",
"{min} - {max}"),
"tbl_summary-str:categorical_stat" = "{n} / {N} ({p}%)",
"style_number-arg:big.mark" = "",
"add_p.tbl_summary-attr:test.categorical" = "",
"tbl_summary-fn:percent_fun" = function(x) style_percent(x, digits = 3),
"add_p.tbl_summary-attr:test.categorical" = "chisq.test"
)
gap %>%
gtsummary::tbl_summary(
by = continent
)
I would like to know how I can set the default theme to add p values or for example make the labels bold. I tried the code above but it did not work. I know that I can add add_p() but I would like to know if I can do that in the theme so I don't have to type add_p when wanting to add p-values. Thank you for your help.

UPDATE:
As of gtsummary v1.4.0, you can set functions like add_p() and bold_labels() after each tbl_summary() using themes.
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.0'
# theme to always run add_p() and bold_labels() after tbl_summary()
list(
"tbl_summary-fn:addnl-fn-to-run" =
function(x) {
if (!is.null(x$by)) x <- add_p(x) # add_p if there is a by variable
bold_labels(x) # bold labels and return table
}
) %>%
set_gtsummary_theme()
tbl <-
trial %>%
select(age, grade, trt) %>%
tbl_summary(by = trt)
Created on 2021-04-14 by the reprex package (v2.0.0)
OLD POST:
There is no way to directly run add_p() and bold_labels() automatically after tbl_summary() using themes. I think the your best bet may be to define a new function that runs the additional functions.
tbl_summary_p <- function(...) {
tbl_summary(...) %>%
add_p() %>%
bold_labels()
}
gap %>% tbl_summary_p(by = continent)
You can, however, utilize the themes to bold the labels. Themes allow you to include any formatting commands using theme element as_gt-lst:addl_cmds. If you add the theme element below to your theme list, gt tables will have the label rows bolded.
# bold labels for gt output
"as_gt-lst:addl_cmds" =
list(tab_spanner = expr(gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_body(columns = gt::vars(label),
rows = row_type == "label"))))

Related

Is there a way of indenting levels in the new "as_forest_plot" function from gtsummary/bstfun?

Using the excellent GTsummary package to try and create a forest plot of model coefficients. I see an experimental function "as_forest_plot" has been added which works but I can't manage to indent the levels of categorical variables. So far I have tried using the modify_column_indent function but without success, (code and output below), not sure are the names of the rows I am calling incorrect and is it not possible at all yet? My hope is to be able to separate the different variables and also some way of showing which is the reference level.
Many thanks in advance
library(titanic)
library(tidyverse, warn.conflicts = FALSE)
library(gtsummary, warn.conflicts = FALSE)
library(bstfun, warn.conflicts = FALSE)
library(magrittr, warn.conflicts = FALSE)
library(janitor, warn.conflicts = FALSE)
titanic::titanic_train %>%
clean_names %>%
select(-c(name, parch, ticket, cabin, embarked)) %>%
mutate(pclass = factor(pclass)) %>%
glm(survived ~ . - passenger_id, data = ., family = "binomial") %>%
tbl_regression(exponentiate = TRUE) %>%
modify_column_indent(columns = label, rows = (header_row == TRUE)) %>%
modify_cols_merge(
pattern = "{estimate} ({ci})",
rows = !is.na(estimate)
) %>%
modify_header(estimate = "OR (95% CI)") %>%
as_forest_plot(
col_names = c("estimate", "p.value"),
col = forestplot::fpColors(box = "darkred"))
Yes, but you need to manually add the spaces. Example below!
library(gtsummary)
library(bstfun)
trial %>%
select(response, grade) %>%
mutate(grade = paste0(" ", grade)) %>%
tbl_uvregression(
method = glm,
y = response,
method.args = list(family = binomial),
exponentiate = TRUE
) %>%
as_forest_plot()

Calculating percent for missing values using gtsummary in Rstudio

My question is a bit similar to this one here.
I have this following codes:
library(gtsummary)
basicvars <- names(isoq) %in% c("homeless_nonself", "test_result")
basictable <- isoq[basicvars]
# summarize the data
table1 <- tbl_summary(basictable, missing = "always",
missing_text = "(Missing)",
percent = "cell",
type = all_dichotomous() ~"categorical"
) %>%
bold_labels()
############Selecting the order of variables
basiccompletetable <- basictable %>% select(test_result,homeless_nonself)
mutate(test_result = factor(test_result) %>% fct_explicit_na()) %>%
table3 <- tbl_summary(basiccompletetable, #missing = "always", missing_text = "(Missing)",
percent = "cell",
label = list(
test_result ~ "COVID-19 Test Result",
homeless_nonself ~ "Homeless",
),
sort = list(
test_result ~ "frequency",
homeless_nonself ~ "frequency",
),
type = list(all_character() ~ "categorical")
) %>%
modify_spanning_header(starts_with("stat_") ~ "**All**") %>%
modify_header(label = "**Variable**") %>% # update the column header
#add_n() %>%
bold_labels() %>%
as_gt() %>%
gt::tab_source_note(gt::md("*This data is simulated*"))
table3
It spits the output (not the complete output)
I am trying to show the percentages for the missing values. Tried first with test_result. Used this line of code mutate(test_result = factor(test_result) %>% fct_explicit_na()) %>% to what was suggested in the earlier question. However, I am seeing the same table as my output and there are no percentages on the missing values for the variable test_result.
Any suggestions why this is not working? Thanks

Complex tables with expss package

Hello to all expss experts (#Gregory Demin, if you read this message!), after few days discovering this package, I achieved nice things but still struggle a bit to create complex crosstabs with the tab_* family of functions, especially to create combinations with significance tests.
Let's start with an example given on the reference manual:
library(expss)
mtcars %>%
tab_significance_options(keep = "none", sig_labels = NULL, subtable_marks = "greater", mode = "append") %>%
tab_cols(total(), vs, am) %>%
tab_cells(cyl, gear) %>%
tab_stat_cpct() %>%
tab_last_add_sig_labels() %>%
tab_last_sig_cpct() %>%
tab_last_hstack("inside_columns") %>%
tab_pivot(stat_position = "inside_rows")
From this point, I do not know if the following actions are possible, and if so what scripts would do the trick:
1) It is quite simple with 'fre' function to display counts and percentages side by side, but is limited to this only purpose. How can we add the cases to the crosstab? (in the form of cases/percents/tests, in 3 distinct columns)
2) By default the significance tests output in this example is LETTERS, at 0.05 level. Both parameters can be changed. But is it possible to include two significance levels in a single table calculation? Something in the spirit of:
sig_level = c(0.01, 0.05)
sig_labels = c(LETTERS, letters)
3) Last (probably an easy one?), is there a possibility to force display of zeros? I have factor levels with frequencies=0, displayed with 0s in base R tables. With expss the label stays but the rows/columns remain empty.
Again, maybe what I am looking for does not exist with expss, but at least I will be sure of it.
Thank you!
Your second point (two-level significance) is not possible right now. However you can add second level significance with additional calculations on specially prepared table.
1 and 3 are quite easy:
library(expss)
data(mtcars)
mtcars %>%
tab_significance_options(keep = "none", sig_labels = NULL, subtable_marks = "greater", mode = "append") %>%
tab_cols(total(), vs, am) %>%
tab_cells(cyl, gear) %>%
# block for cases
tab_stat_cases(label = "cases") %>%
tab_last_add_sig_labels() %>%
# block for percent statistic
tab_stat_cpct(label = "%") %>% # percent
tab_last_add_sig_labels() %>%
tab_last_sig_cpct() %>%
tab_pivot(stat_position = "inside_columns") %>%
# converts NA to zero
recode(as.criterion(is.numeric) & is.na ~ 0, TRUE ~ copy)
UPDATE:
You can specify parts of the chain as custom functions to avoid repetition:
library(expss)
data(mtcars)
### tab cols
my_banner = mtcars %>%
tab_cols(total(), vs, am)
### table and formattig
my_custom_table = . %>%
tab_significance_options(keep = "none", sig_labels = NULL, subtable_marks = "greater", mode = "append") %>%
# block for cases
tab_stat_cases(label = "cases") %>%
tab_last_add_sig_labels() %>%
# block for percent statistic
tab_stat_cpct(label = "%") %>% # percent
tab_last_add_sig_labels() %>%
tab_last_sig_cpct() %>%
tab_pivot(stat_position = "inside_columns") %>%
# converts NA to zero
recode(as.criterion(is.numeric) & is.na ~ 0, TRUE ~ copy)
### here we build table
my_banner %>%
tab_cells(cyl, gear) %>%
my_custom_table()

Loop through columnames to format colums with R-Package formattable kableExtra (R dplyr)

Hei,
To compare several variants of data I produced a HTML report.
Given a special catagory some indexes in the database should be the same. To detect errors / incorrect entries in the database I compare the different categories in a table.
For better reading, it would be fine, to have coloured tables. This can be done easily with the formattable-Package.
My dataset:
require(tidyverse)
require(formattable)
require(kableExtra)
require(knitr)
df1 <- data.frame(V1 = c(68,sample(c("J","N"),size=15,replace = TRUE)),
V2 = c(10,sample(c("J","N"),size=15,replace = TRUE)),
V3 = c(1,sample(c("J","N"),size=15,replace = TRUE))
)
It has - in this example - 3 differnt variants. Only one is recomended. It is supposed, that the variant with the highest N (=first entry in each Vx-Column) is the real one.
My formated table is produced with this code:
df1 %>%
mutate(
V2 = ifelse((as.character(V2) == as.character(V1)) == FALSE,
cell_spec(V2, color = "red",bold = TRUE),
cell_spec(V2, color = "black",bold = FALSE)),
V3 = ifelse((as.character(V3) == as.character(V1)) == FALSE,
cell_spec(V3, color = "red",bold = TRUE),
cell_spec(V3, color = "black",bold = FALSE))
) %>%
kable(format = "html", escape = FALSE) %>%
kable_styling(c("striped", "condensed"), full_width = FALSE) %>%
row_spec(1, bold = T, color = "white", background = "#D7261E")
Two questions:
How to mutate in a loop?
This is necessary because the different categories I have to investigate can have up to 18 different variants. In each dataset, V1 is everytime the reference variant.
As you can see (run the code!) the first line (the "N"s) is coded in the wrong matter. Is it possible to compare from the second line on only (first line is set to TRUE by default)
This would be fine, because the first line is now formated in a matter that does not really make sense.
Thank you!
To answer your two questions:
Instead of looping over the columns, you can use mutate_all
Just take a copy of the first column and mutate it back in later
I have first made your cell_spec calls into functions to reduce clutter in the code.
red <- function(x) cell_spec(x, color = "red", bold = TRUE)
black <- function(x) cell_spec(x, color = "black", bold = FALSE)
c1 <- as.character(df1[[1]])
Now we can do this:
df1 %>%
select(-V1) %>%
mutate_all(function(x) ifelse(as.character(x) != df1[[1]], red(x), black(x))) %>%
mutate(V1 = black(c1)) %>%
mutate_all(function(x) `[<-`(x, 1, " ")) %>%
select(V1, V2, V3) %>%
kable(format = "html", escape = FALSE) %>%
kable_styling(c("striped", "condensed"), full_width = FALSE) %>%
row_spec(1, bold = T, color = "white", background = "#D7261E")
Which gives this result:
Thank you, #AllanCameron!
I 'm not familiar to the package purrr - I really should do more studies about it.
Your idea with purrr::map_dfc solved the problem.
Instead of the first column I need the first row (the digit-row), and of course with grepl it is possible to solve this. The condition in the ifelse-Statement is a little bit longer then.
My final solution is then:
df1 %>%
map_dfc(function(x) ifelse(as.character(x) != as.character(df1$V1) & !grepl("[[:digit:]]",x),
mark_true(x), mark_false(x))) %>%
select(V1, everything()) %>%
kable(format = "html", escape = FALSE) %>%
kable_styling(c("striped", "condensed"), full_width = FALSE) %>%
row_spec(1, bold = T, color = "white", background = "#D7261E")
Thank you very much!

Formatting Numbers in Flextable for Specific Columns

I'm using R version 3.6.1 in RStudio. I have flextable version 0.5.5 and officer version 0.3.5.
I'm having difficulty with formatting my numbers in flextables within RMarkdown. By default, all numbers show up with 3 decimal places. For some of my numbers, this is fine (and actually preferred), but for others, I want to remove the decimals.
Using the advice found here I was able to adjust my table so that all numbers are rounded to the nearest whole number. My code is below (example table used for reproduciblility; otherwise formatting is the same as my current code).
ft_test <- head(iris) %>% flextable() %>%
hline(part = 'header', border = fp_border(color = "black", width = 3)) %>%
align(align ='center', part = 'all') %>%
align(j = 1, align ='left', part = 'all') %>%
set_formatter_type(fmt_double = "%.0f")
ft_test
However, I only want certain columns to be whole numbers, and other columns to still have decimals. I've tried using the j argument to call certain columns:
ft_test <- head(iris) %>% flextable() %>%
hline(part = 'header', border = fp_border(color = "black", width = 3)) %>%
align(align ='center', part = 'all') %>%
align(j = 1, align ='left', part = 'all') %>%
set_formatter_type(fmt_double = "%.0f", j = 2)
ft_test
... but then I get an error telling me j = 2 is an unused argument.
Any suggestions for how to adjust the numbers of only some columns? Thanks in advance for your help!
You can not use argument j as it is not an argument of set_formatter_type. The function is setting formatters for one or several data type. In your case, it's better to use colformat_num.
library(flextable)
library(officer)
library(magrittr)
ft_test <- head(iris) %>% flextable() %>%
hline(part = 'header', border = fp_border(color = "black", width = 3)) %>%
align(align ='center', part = 'all') %>%
align(j = 1, align ='left', part = 'all') %>%
colformat_num(j = c("Sepal.Length", "Sepal.Width",
"Petal.Length", "Petal.Width"), digits = 1)
ft_test
You can learn more about formatting content here: https://davidgohel.github.io/flextable/articles/display.html

Resources