Hello to all expss experts (#Gregory Demin, if you read this message!), after few days discovering this package, I achieved nice things but still struggle a bit to create complex crosstabs with the tab_* family of functions, especially to create combinations with significance tests.
Let's start with an example given on the reference manual:
library(expss)
mtcars %>%
tab_significance_options(keep = "none", sig_labels = NULL, subtable_marks = "greater", mode = "append") %>%
tab_cols(total(), vs, am) %>%
tab_cells(cyl, gear) %>%
tab_stat_cpct() %>%
tab_last_add_sig_labels() %>%
tab_last_sig_cpct() %>%
tab_last_hstack("inside_columns") %>%
tab_pivot(stat_position = "inside_rows")
From this point, I do not know if the following actions are possible, and if so what scripts would do the trick:
1) It is quite simple with 'fre' function to display counts and percentages side by side, but is limited to this only purpose. How can we add the cases to the crosstab? (in the form of cases/percents/tests, in 3 distinct columns)
2) By default the significance tests output in this example is LETTERS, at 0.05 level. Both parameters can be changed. But is it possible to include two significance levels in a single table calculation? Something in the spirit of:
sig_level = c(0.01, 0.05)
sig_labels = c(LETTERS, letters)
3) Last (probably an easy one?), is there a possibility to force display of zeros? I have factor levels with frequencies=0, displayed with 0s in base R tables. With expss the label stays but the rows/columns remain empty.
Again, maybe what I am looking for does not exist with expss, but at least I will be sure of it.
Thank you!
Your second point (two-level significance) is not possible right now. However you can add second level significance with additional calculations on specially prepared table.
1 and 3 are quite easy:
library(expss)
data(mtcars)
mtcars %>%
tab_significance_options(keep = "none", sig_labels = NULL, subtable_marks = "greater", mode = "append") %>%
tab_cols(total(), vs, am) %>%
tab_cells(cyl, gear) %>%
# block for cases
tab_stat_cases(label = "cases") %>%
tab_last_add_sig_labels() %>%
# block for percent statistic
tab_stat_cpct(label = "%") %>% # percent
tab_last_add_sig_labels() %>%
tab_last_sig_cpct() %>%
tab_pivot(stat_position = "inside_columns") %>%
# converts NA to zero
recode(as.criterion(is.numeric) & is.na ~ 0, TRUE ~ copy)
UPDATE:
You can specify parts of the chain as custom functions to avoid repetition:
library(expss)
data(mtcars)
### tab cols
my_banner = mtcars %>%
tab_cols(total(), vs, am)
### table and formattig
my_custom_table = . %>%
tab_significance_options(keep = "none", sig_labels = NULL, subtable_marks = "greater", mode = "append") %>%
# block for cases
tab_stat_cases(label = "cases") %>%
tab_last_add_sig_labels() %>%
# block for percent statistic
tab_stat_cpct(label = "%") %>% # percent
tab_last_add_sig_labels() %>%
tab_last_sig_cpct() %>%
tab_pivot(stat_position = "inside_columns") %>%
# converts NA to zero
recode(as.criterion(is.numeric) & is.na ~ 0, TRUE ~ copy)
### here we build table
my_banner %>%
tab_cells(cyl, gear) %>%
my_custom_table()
Related
I am trying to get a table for my dataset that only shows rows which are above a certain value, but which still uses the numbers in those rows to get the means for the supersets. using the df diamonds, i have the following code. What I want is for all rows with column price less than 3000 to NOT show up, but to still have them figure into the means of price for superset rows of cut and color.
Kind of a separate question, but I'm also trying to figure out how to make it so that if there were to be only one row in any of the subset rows, then they would be on the same line as the superset rows- such as if cut Fair had only one color
library(knitr)
library(kableExtra)
df <- diamonds
a11<-df%>%
group_by(cut, color, clarity)%>%
summarize_at( .vars=c("price"),
.funs=~mean(.,na.rm=TRUE)) %>%
mutate(sort = 3)
b11<-df%>%
group_by(cut, color) %>%
summarize_at( .vars=c("price"),
.funs=~mean(.,na.rm=TRUE)) %>%
mutate(clarity="", sort = 2) %>%
select(cut, color, clarity,everything())
c11<-df %>%
group_by(cut) %>%
summarize_at( .vars=c("price"),
.funs=~mean(.,na.rm=TRUE)) %>%
mutate(color="", clarity= "", sort = 1) %>%
select(cut, color,everything())
table3<-rbind(a11,b11,c11)
table3%>%
arrange(cut, color, clarity) %>%
select(-sort)%>%
kbl(
caption = "Table",
longtable=T
) %>%
kable_paper(full_width = F) %>%
column_spec(1, bold = T) %>%
column_spec(3, italic = T) %>%
collapse_rows(columns = 1:3, valign = "top")%>%
add_header_above(header = c("Diamonds" = 4))```
Is there a way to programmatically select the last row of a table in a pipeline and highlight it with gt?
Highlighting a particular row index is trivial:
library(dplyr)
library(gt)
mtcars %>%
head(5) %>%
gt() %>%
tab_style(
style = list(cell_fill(color = "lightblue")),
locations = cells_body(columns = everything(), rows = 2)
)
But I would like to highlight the last row. Surprisingly, using tidyselect::last_col appears to work, but I do not understand why?
mtcars %>%
head(5) %>%
gt() %>%
tab_style(
style = list(cell_fill(color = "lightblue")),
locations = cells_body(columns = everything(), rows = tidyselect::last_col())
)
I have tried a few approaches like ~ nrow(.) and +Inf, but those fail. In addition, I would like to avoid having the index pre-determined because I would like to keep in a pipeline -- that is, I cannot do something like LAST_ROW <- nrow(mtcars) because of the number of rows is undetermined until the data goes through the pipeline.
This is one way to do it. Simply supply the length of one of the columns. I think it would be better to access the data frame from within the tab_style command, but I don't know if that's possible.
library(dplyr, warn.conflicts = FALSE)
library(gt)
mtcars %>%
head(5) %>%
gt() %>%
tab_style(
style = list(cell_fill(color = "lightblue")),
locations = cells_body(columns = everything(), rows = length(mpg))
)
I am trying to use R sparkline with gt. My question is very similar to this one Is it possible to use sparkline with gt?, but on top of simply using sparkline with gt as in the referenced question, I am trying to use it as part of the summary row. Below is the picture of what I have achieved so far. Here are my two questions:
How can I remove the two grey lines that are printed as part of the sparkline chart in the summary row?
Is there a better way to add sparkline to the summary row of a gt table?
library(tidyverse)
library(sparkline)
library(gt)
# toy data
df <- tibble(
name = rep(c("A", "B"), each = 20),
value = runif(40, min = -10, max = 10) %>% cumsum()
) %>%
group_by(name) %>%
mutate(
index = row_number()
) %>% ungroup()
# preparing the data for the standard sparkline
regular_sparkline_df <- df %>%
group_by(name) %>%
summarise(
chart = spk_chr(
value,
type="line"
)
)
# here I try to prepare the data for the summary row by getting the whole gt table and then removing the header
summary_row_sparkline_df <- df %>%
group_by(index) %>%
summarise(value = sum(value)) %>% ungroup() %>%
summarise(
chart = spk_chr(
value,
type="line"
)
) %>%
gt() %>%
fmt_markdown(columns = vars(chart)) %>%
gt:::as.tags.gt_tbl() %>%
htmltools::attachDependencies(htmlwidgets::getDependency("sparkline")) %>%
as.character() %>%
gsub('<thead.+</thead>', "", .) # removing the header of the table
# building the html and adding dependencies
p_html <- regular_sparkline_df %>%
gt() %>%
fmt_markdown(columns = vars(chart)) %>%
grand_summary_rows(
columns = "chart",
fns = list(Total = ~as.character(summary_row_sparkline_df)),
formatter = fmt_markdown
) %>%
gt:::as.tags.gt_tbl() %>%
htmltools::attachDependencies(htmlwidgets::getDependency("sparkline"))
# seeing the table in the RStudio
print(p_html, browse = interactive())
I have a question regarding setting the default theme for the gtsummary package.
library(tidyverse)
library(gtsummary)
library(gapminder)
gap <- gapminder %>%
dplyr::mutate_all(~ifelse(
sample(c(TRUE, FALSE), size = length(.), replace = TRUE, prob = c(0.8, 0.2)),
as.character(.),
NA)
) %>%
dplyr::mutate_at(vars(year:gdpPercap), ~as.numeric(.)
) %>%
dplyr::mutate(gdpPercap = ifelse(gdpPercap > median(gdpPercap, na.rm = TRUE), "high", "low"))
my_theme <-
list(
"tbl_summary-str:default_con_type" = "continuous2",
"tbl_summary-str:continuous_stat" = c("{median} ({p25} - {p75})",
"{mean} ({sd})",
"{min} - {max}"),
"tbl_summary-str:categorical_stat" = "{n} / {N} ({p}%)",
"style_number-arg:big.mark" = "",
"add_p.tbl_summary-attr:test.categorical" = "",
"tbl_summary-fn:percent_fun" = function(x) style_percent(x, digits = 3),
"add_p.tbl_summary-attr:test.categorical" = "chisq.test"
)
gap %>%
gtsummary::tbl_summary(
by = continent
)
I would like to know how I can set the default theme to add p values or for example make the labels bold. I tried the code above but it did not work. I know that I can add add_p() but I would like to know if I can do that in the theme so I don't have to type add_p when wanting to add p-values. Thank you for your help.
UPDATE:
As of gtsummary v1.4.0, you can set functions like add_p() and bold_labels() after each tbl_summary() using themes.
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.0'
# theme to always run add_p() and bold_labels() after tbl_summary()
list(
"tbl_summary-fn:addnl-fn-to-run" =
function(x) {
if (!is.null(x$by)) x <- add_p(x) # add_p if there is a by variable
bold_labels(x) # bold labels and return table
}
) %>%
set_gtsummary_theme()
tbl <-
trial %>%
select(age, grade, trt) %>%
tbl_summary(by = trt)
Created on 2021-04-14 by the reprex package (v2.0.0)
OLD POST:
There is no way to directly run add_p() and bold_labels() automatically after tbl_summary() using themes. I think the your best bet may be to define a new function that runs the additional functions.
tbl_summary_p <- function(...) {
tbl_summary(...) %>%
add_p() %>%
bold_labels()
}
gap %>% tbl_summary_p(by = continent)
You can, however, utilize the themes to bold the labels. Themes allow you to include any formatting commands using theme element as_gt-lst:addl_cmds. If you add the theme element below to your theme list, gt tables will have the label rows bolded.
# bold labels for gt output
"as_gt-lst:addl_cmds" =
list(tab_spanner = expr(gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_body(columns = gt::vars(label),
rows = row_type == "label"))))
as_flex_table_ex1 <-
trial %>%
select(trt, age, grade) %>%
tbl_summary(by = trt,missing = "no") %>%
add_p() %>%
as_flex_table() %>%
valign(j = 1,valign = "center")
save_as_docx(as_flex_table_ex1,path = "1.docx")
Now - when I open the table in word, the padding between grade I,II and III is very "airy". How can I reduce this padding? I tried using the padding(as_flex_table_ex1,padding.top=0,padding.bottom = 0,part="all"), but I cant really see a difference in the rendered docx.
So after a bit of research. This seems to do the trick.
as_flex_table_ex1 <-
trial %>%
select(trt, age, grade) %>%
tbl_summary(by = trt,missing = "no") %>%
add_p() %>%
as_flex_table() %>%
valign(j = 1,valign = "center")
as_flex_table_ex2 <- as_flex_table_ex1 %>%
hrule(rule = "exact") %>%
height(height=max(dim_pretty(.,part="body")$heights+0.3*max(dim_pretty(.,part="body")$heights)))
print(as_flex_table_ex2,preview="docx")
The dim_pretty factor multiplied by 0.3 is completely arbitrary. If someone is up to finding a legible height that automatically allows for different fonts and sizes it would be great.
I've run into the same issue. These are the commands I use to try to achieve a more compact table (much of what you've already tried). This is available in the function gtsummary::theme_gtsummary_compact().
x %>%
flextable::fontsize(size = 8, part = "all") %>%
flextable::padding(padding.top = 0, part = "all") %>%
flextable::padding(padding.bottom = 0, part = "all")
I have had more success using a R Markdown Word templates, which helps direct the formatting in the Word doc.
https://bookdown.org/yihui/rmarkdown-cookbook/word-template.html