Calculating percent for missing values using gtsummary in Rstudio - r

My question is a bit similar to this one here.
I have this following codes:
library(gtsummary)
basicvars <- names(isoq) %in% c("homeless_nonself", "test_result")
basictable <- isoq[basicvars]
# summarize the data
table1 <- tbl_summary(basictable, missing = "always",
missing_text = "(Missing)",
percent = "cell",
type = all_dichotomous() ~"categorical"
) %>%
bold_labels()
############Selecting the order of variables
basiccompletetable <- basictable %>% select(test_result,homeless_nonself)
mutate(test_result = factor(test_result) %>% fct_explicit_na()) %>%
table3 <- tbl_summary(basiccompletetable, #missing = "always", missing_text = "(Missing)",
percent = "cell",
label = list(
test_result ~ "COVID-19 Test Result",
homeless_nonself ~ "Homeless",
),
sort = list(
test_result ~ "frequency",
homeless_nonself ~ "frequency",
),
type = list(all_character() ~ "categorical")
) %>%
modify_spanning_header(starts_with("stat_") ~ "**All**") %>%
modify_header(label = "**Variable**") %>% # update the column header
#add_n() %>%
bold_labels() %>%
as_gt() %>%
gt::tab_source_note(gt::md("*This data is simulated*"))
table3
It spits the output (not the complete output)
I am trying to show the percentages for the missing values. Tried first with test_result. Used this line of code mutate(test_result = factor(test_result) %>% fct_explicit_na()) %>% to what was suggested in the earlier question. However, I am seeing the same table as my output and there are no percentages on the missing values for the variable test_result.
Any suggestions why this is not working? Thanks

Related

how to change % so it reads horizontally rather than vertically on Gtsummary tables?

I would like to reorientate my table so that the % ITN use is given by level, rather than % of observations in each level, for the ITN user and non-user group separately.
Basically, I would like the % to add up to 100 horizontally for rather than vertically
here is the code that I used to produce the table:
tbl_summary(pop.subtable1, by = ITN, missing = "no") %>%
modify_header(label = "Variable") %>%
modify_spanning_header(update = all_stat_cols() ~ "**ITN USE**") %>%
italicize_labels() %>% bold_labels() %>%
add_overall() %>%
modify_caption("Table 3. Bed Net Use Amongst Population Characteristics")
I'm wondering if there's a line of code that I can insert that will manually do this?
here's what the table looks like:
enter image description here
Based on #Mike's comment, here is a reproducible example so others can see the difference between percent = "column" (default) and percent = "row" for tbl_summary().
percent Indicates the type of percentage to return. Must be one of "column", "row", or "cell". Default is "column".
library(gtsummary)
data(trial)
## default (column)
trial %>%
select(trt, age, grade) %>%
tbl_summary(
by = trt,
percent = "column",
type = all_continuous() ~ "continuous2",
missing = "no"
)
## row
trial %>%
select(trt, age, grade) %>%
tbl_summary(
by = trt,
percent = "row",
type = all_continuous() ~ "continuous2",
missing = "no"
)

flextable and gtsummary: Title font is different from body font with save_as_docx()

I am trying to print regression tables to Microsoft Word files with gtsummary and flextable. However, despite specifying the styling whenever possible, the title of the table prints in a different font than the rest of the table. I want everything to be in Times New Roman/APA Style, but the title font keeps printing in Cambria. Outside of R, my default Microsoft Word font is Calibri.
I know there are other packages that can print regression tables to Microsoft Word, but I prefer gtsummary and flextable because my actual data is multiply imputed and I have found that gtsummary and flextable work well with multiply imputed data. This is a small issue, but any help is appreciated.
library(tidyverse)
library(gtsummary)
library(flextable)
packageVersion("gtsummary")
#> [1] '1.5.1'
packageVersion("flextable")
#> [1] '0.6.11.4'
# theme based on https://github.com/idea-labs/comsldpsy
apa_theme <- function (ft) {
ft %>%
flextable::font(fontname = "Times New Roman", part = "all") %>%
flextable::fontsize(size = 12, part = "all") %>%
flextable::align(align = "left", part = "all") %>%
flextable::align(align = "center", part = "header") %>%
flextable::rotate(rotation = "lrtb", align = "top", part = "body") %>%
flextable::border_remove() %>%
flextable::hline_top(border = officer::fp_border(width = 2), part = "all") %>%
flextable::hline_bottom(border = officer::fp_border(width = 2), part = "all") %>%
flextable::autofit()
}
set_flextable_defaults(font.family = "Times New Roman")
m1 <- lm(response ~ trt, data = trial) %>% tbl_regression()
m2 <- lm(response ~ trt + marker, data = trial) %>% tbl_regression()
m3 <- lm(response ~ trt + marker + age, data = trial) %>% tbl_regression()
tbl_merge(
tbls = list(m1, m2, m3)) %>%
modify_table_styling(align = "left") %>%
modify_caption("Why is the title in a different font?") %>%
as_flex_table() %>%
apa_theme() %>%
flextable::save_as_docx(path = "~/Desktop/weird_table.docx")
I was able to achieve the desired result by using flextable::add_header_lines instead of gtsummary::modify_caption and by revising apa_theme().
library(tidyverse)
library(gtsummary)
library(flextable)
# theme based on https://github.com/idea-labs/comsldpsy
apa_theme <- function (ft) {
ft %>%
flextable::font(fontname = "Times New Roman", part = "all") %>%
flextable::fontsize(size = 12, part = "all") %>%
flextable::align(align = "left", part = "body") %>%
flextable::align(align = "center", part = "header") %>%
flextable::rotate(rotation = "lrtb", align = "top", part = "body") %>%
flextable::border_remove() %>%
flextable::hline_top(border = officer::fp_border(width = 2),
part = "all") %>%
flextable::hline_bottom(border = officer::fp_border(width = 2),
part = "all") %>%
flextable::hline(i = 1, border = officer::fp_border(width = 1), part = "header") %>%
flextable::set_table_properties(layout = "autofit")
}
m1 <- lm(response ~ trt, data = trial) %>% tbl_regression()
m2 <- lm(response ~ trt + marker, data = trial) %>% tbl_regression()
m3 <- lm(response ~ trt + marker + age, data = trial) %>% tbl_regression()
tbl_merge(
tbls = list(m1, m2, m3)) %>%
modify_table_styling(align = "left") %>%
as_flex_table() %>%
add_header_lines(values = "Table looks better overall", top = TRUE) %>%
apa_theme() %>%
flextable::save_as_docx(path = "~/Desktop/good_table.docx")

How to put line breaks in the labels of the table output by gt_regression() and output it by LaTeX?

According to the documentation of the gtsummay package, you can use <br> in add_significance_stars() to break the labels of the table showing the results of the regression model in HTML, but it does not work for LaTeX.
I have tried other line break methods such as \n, but it still does not work. How can I make line breaks in LaTeX?
Here is an example in HTML.
df <-
mtcars %>%
lm(mpg ~ ., data = .)
df %>%
tbl_regression() %>%
add_significance_stars(
hide_se = TRUE,
pattern = "{estimate}{stars}<br>({std.error})"
) %>%
modify_header(estimate ~ "OLS<br>result")
And here is a LaTeX example.
df %>%
tbl_regression() %>%
add_significance_stars(
hide_se = TRUE,
pattern = "{estimate}{stars}<br>({std.error})"
) %>%
modify_header(estimate ~ "OLS<br>result") %>%
as_kable_extra(
format = "latex",
booktabs = TRUE
)
I created a table based on the answer, but I found that this method causes the layout to be broken when using tbl_merge().
I will present the problem code again.
# make nested dataframe
nest_df <-
mtcars %>%
tibble() %>%
group_nest(vs)
# make function
mod_fun <- function(df){lm(mpg ~ ., data = df)}
# map function
nest_df <-
nest_df %>%
mutate(model = map(data, mod_fun))
# make table
nest_df <-
nest_df %>%
mutate(
tbl = map(
.x = model,
~ tbl_regression(
.x,
) %>%
add_significance_stars(
hide_se = TRUE,
pattern = "{estimate}{stars}\\\\&({std.error})"
) %>%
modify_header(estimate ~ "OLS\\\\&result")
)
)
# merge table
nest_df_m <-
tbl_merge(
tbls = nest_df$tbl,
tab_spanner = c("type1", "type2")
)
# output merged table
nest_df_m %>%
as_kable_extra(
format = "latex",
booktabs = TRUE,
escape = FALSE
) %>%
kable_styling(position = "center")
Maybe this fits your need. You could a line break by
adding \\\\ (which gives an \\ in the latex code),
adding an & to put the std.error in the same column as the estimate,
setting escape=FALSE in as_kable_extra.
df %>%
tbl_regression() %>%
add_significance_stars(
hide_se = TRUE,
pattern = "{estimate}{stars}\\\\&({std.error})"
) %>%
modify_header(estimate ~ "OLS\\\\&result") %>%
as_kable_extra(
format = "latex",
booktabs = TRUE,
escape = FALSE
)

Set values default theme and bold variables for gtsummary package

I have a question regarding setting the default theme for the gtsummary package.
library(tidyverse)
library(gtsummary)
library(gapminder)
gap <- gapminder %>%
dplyr::mutate_all(~ifelse(
sample(c(TRUE, FALSE), size = length(.), replace = TRUE, prob = c(0.8, 0.2)),
as.character(.),
NA)
) %>%
dplyr::mutate_at(vars(year:gdpPercap), ~as.numeric(.)
) %>%
dplyr::mutate(gdpPercap = ifelse(gdpPercap > median(gdpPercap, na.rm = TRUE), "high", "low"))
my_theme <-
list(
"tbl_summary-str:default_con_type" = "continuous2",
"tbl_summary-str:continuous_stat" = c("{median} ({p25} - {p75})",
"{mean} ({sd})",
"{min} - {max}"),
"tbl_summary-str:categorical_stat" = "{n} / {N} ({p}%)",
"style_number-arg:big.mark" = "",
"add_p.tbl_summary-attr:test.categorical" = "",
"tbl_summary-fn:percent_fun" = function(x) style_percent(x, digits = 3),
"add_p.tbl_summary-attr:test.categorical" = "chisq.test"
)
gap %>%
gtsummary::tbl_summary(
by = continent
)
I would like to know how I can set the default theme to add p values or for example make the labels bold. I tried the code above but it did not work. I know that I can add add_p() but I would like to know if I can do that in the theme so I don't have to type add_p when wanting to add p-values. Thank you for your help.
UPDATE:
As of gtsummary v1.4.0, you can set functions like add_p() and bold_labels() after each tbl_summary() using themes.
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.0'
# theme to always run add_p() and bold_labels() after tbl_summary()
list(
"tbl_summary-fn:addnl-fn-to-run" =
function(x) {
if (!is.null(x$by)) x <- add_p(x) # add_p if there is a by variable
bold_labels(x) # bold labels and return table
}
) %>%
set_gtsummary_theme()
tbl <-
trial %>%
select(age, grade, trt) %>%
tbl_summary(by = trt)
Created on 2021-04-14 by the reprex package (v2.0.0)
OLD POST:
There is no way to directly run add_p() and bold_labels() automatically after tbl_summary() using themes. I think the your best bet may be to define a new function that runs the additional functions.
tbl_summary_p <- function(...) {
tbl_summary(...) %>%
add_p() %>%
bold_labels()
}
gap %>% tbl_summary_p(by = continent)
You can, however, utilize the themes to bold the labels. Themes allow you to include any formatting commands using theme element as_gt-lst:addl_cmds. If you add the theme element below to your theme list, gt tables will have the label rows bolded.
# bold labels for gt output
"as_gt-lst:addl_cmds" =
list(tab_spanner = expr(gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_body(columns = gt::vars(label),
rows = row_type == "label"))))

Disaggregate in the context of a time series

I have a dataset that I want to visualize overall and disaggregated by a few different variables. I created a flexdashboard with a toy shiny app to select the type of disaggregation, and working code to plot the correct subset.
My approach is repetitive, which is a hint to me that I'm missing out on a better way to do this. The piece that's tripping me up is the need to count by date and expand the matrix. I'm not sure how get group counts by week in one pipe. I do it in several steps and combine.
Thoughts?
(ps. I asked this question on RStudio Community, but I think it's probably more of a "SO question". I don't have permissions to delete it from RSC, so apologies for the cross-post.)
---
title: "test"
output:
flexdashboard::flex_dashboard:
theme: bootstrap
runtime: shiny
---
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(tibbletime)
library(dygraphs)
library(magrittr)
library(xts)
```
```{r global, include=FALSE}
set.seed(1)
dat <- data.frame(date = seq(as.Date("2018-01-01"),
as.Date("2018-06-30"),
"days"),
sex = sample(c("male", "female"), 181, replace=TRUE),
lang = sample(c("english", "spanish"), 181, replace=TRUE),
age = sample(20:35, 181, replace=TRUE))
dat <- sample_n(dat, 80)
```
Sidebar {.sidebar}
=====================================
```{r}
radioButtons("diss", label = "Disaggregation",
choices = list("All" = 1, "By Sex" = 2, "By Language" = 3),
selected = 1)
```
Page 1
=====================================
```{r}
# all
all <- reactive(
dat %>%
mutate(new = 1) %>%
arrange(date) %>%
# time series analysis
as_tbl_time(index = date) %>% # convert to tibble time object
select(date, new) %>%
collapse_by('1 week', side="start", clean=TRUE) %>%
group_by(date) %>%
mutate(total = sum(new, na.rm=TRUE)) %>%
distinct(date, .keep_all = TRUE) %>%
ungroup() %>%
# expand matrix to include weeks without data
complete(date = seq(date[1],
date[length(date)],
by = "1 week"),
fill = list(total = 0))
)
# males only
males <- reactive(
dat %>%
filter(sex=="male") %>%
mutate(new = 1) %>%
arrange(date) %>%
# time series analysis
as_tbl_time(index = date) %>%
select(date, new) %>%
collapse_by('1 week', side="start", clean=TRUE) %>%
group_by(date) %>%
mutate(total_m = sum(new, na.rm=TRUE)) %>%
distinct(date, .keep_all = TRUE) %>%
ungroup() %>%
# expand matrix to include weeks without data
complete(date = seq(date[1],
date[length(date)],
by = "1 week"),
fill = list(total_m = 0))
)
# females only
females <- reactive(
dat %>%
filter(sex=="female") %>%
mutate(new = 1) %>%
arrange(date) %>%
# time series analysis
as_tbl_time(index = date) %>%
select(date, new) %>%
collapse_by('1 week', side="start", clean=TRUE) %>%
group_by(date) %>%
mutate(total_f = sum(new, na.rm=TRUE)) %>%
distinct(date, .keep_all = TRUE) %>%
ungroup() %>%
# expand matrix to include weeks without data
complete(date = seq(date[1],
date[length(date)],
by = "1 week"),
fill = list(total_f = 0))
)
# english only
english <- reactive(
dat %>%
filter(lang=="english") %>%
mutate(new = 1) %>%
arrange(date) %>%
# time series analysis
as_tbl_time(index = date) %>%
select(date, new) %>%
collapse_by('1 week', side="start", clean=TRUE) %>%
group_by(date) %>%
mutate(total_e = sum(new, na.rm=TRUE)) %>%
distinct(date, .keep_all = TRUE) %>%
ungroup() %>%
# expand matrix to include weeks without data
complete(date = seq(date[1],
date[length(date)],
by = "1 week"),
fill = list(total_e = 0))
)
# spanish only
spanish <- reactive(
dat %>%
filter(lang=="spanish") %>%
mutate(new = 1) %>%
arrange(date) %>%
# time series analysis
as_tbl_time(index = date) %>%
select(date, new) %>%
collapse_by('1 week', side="start", clean=TRUE) %>%
group_by(date) %>%
mutate(total_s = sum(new, na.rm=TRUE)) %>%
distinct(date, .keep_all = TRUE) %>%
ungroup() %>%
# expand matrix to include weeks without data
complete(date = seq(date[1],
date[length(date)],
by = "1 week"),
fill = list(total_s = 0))
)
# combine
totals <- reactive({
all <- all()
females <- females()
males <- males()
english <- english()
spanish <- spanish()
all %>%
select(date, total) %>%
full_join(select(females, date, total_f), by = "date") %>%
full_join(select(males, date, total_m), by = "date") %>%
full_join(select(english, date, total_e), by = "date") %>%
full_join(select(spanish, date, total_s), by = "date")
})
# convert to xts
totals_ <- reactive({
totals <- totals()
xts(totals, order.by = totals$date)
})
# plot
renderDygraph({
totals_ <- totals_()
if (input$diss == 1) {
dygraph(totals_[, "total"],
main= "All") %>%
dySeries("total", label = "All") %>%
dyRangeSelector() %>%
dyOptions(useDataTimezone = FALSE,
stepPlot = TRUE,
drawGrid = FALSE,
fillGraph = TRUE)
} else if (input$diss == 2) {
dygraph(totals_[, c("total_f", "total_m")],
main = "By sex") %>%
dyRangeSelector() %>%
dySeries("total_f", label = "Female") %>%
dySeries("total_m", label = "Male") %>%
dyOptions(useDataTimezone = FALSE,
stepPlot = TRUE,
drawGrid = FALSE,
fillGraph = TRUE)
} else {
dygraph(totals_[, c("total_e", "total_s")],
main = "By language") %>%
dyRangeSelector() %>%
dySeries("total_e", label = "English") %>%
dySeries("total_s", label = "Spanish") %>%
dyOptions(useDataTimezone = FALSE,
stepPlot = TRUE,
drawGrid = FALSE,
fillGraph = TRUE)
}
})
```
Update:
#Jon Spring suggested writing a function to reduce some repetition (applied below), which is a nice improvement. The basic approach is the same, however. Segment, calculate, combine, plot. Is there a way to do this without breaking apart and putting back together?
---
title: "test"
output:
flexdashboard::flex_dashboard:
theme: bootstrap
runtime: shiny
---
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(tibbletime)
library(dygraphs)
library(magrittr)
library(xts)
```
```{r global, include=FALSE}
# generate data
set.seed(1)
dat <- data.frame(date = seq(as.Date("2018-01-01"),
as.Date("2018-06-30"),
"days"),
sex = sample(c("male", "female"), 181, replace=TRUE),
lang = sample(c("english", "spanish"), 181, replace=TRUE),
age = sample(20:35, 181, replace=TRUE))
dat <- sample_n(dat, 80)
# Jon Spring's function
prep_dat <- function(filtered_dat, col_name = "total") {
filtered_dat %>%
mutate(new = 1) %>%
arrange(date) %>%
# time series analysis
tibbletime::as_tbl_time(index = date) %>% # convert to tibble time object
select(date, new) %>%
tibbletime::collapse_by("1 week", side = "start", clean = TRUE) %>%
group_by(date) %>%
mutate(total = sum(new, na.rm = TRUE)) %>%
distinct(date, .keep_all = TRUE) %>%
ungroup() %>%
# expand matrix to include weeks without data
complete(
date = seq(date[1], date[length(date)], by = "1 week"),
fill = list(total = 0)
)
}
```
Sidebar {.sidebar}
=====================================
```{r}
radioButtons("diss", label = "Disaggregation",
choices = list("All" = 1, "By Sex" = 2, "By Language" = 3),
selected = 1)
```
Page 1
=====================================
```{r}
# all
all <- reactive(
prep_dat(dat)
)
# males only
males <- reactive(
prep_dat(
dat %>%
filter(sex == "male")
) %>%
rename("total_m" = "total")
)
# females only
females <- reactive(
prep_dat(
dat %>%
filter(sex == "female")
) %>%
rename("total_f" = "total")
)
# english only
english <- reactive(
prep_dat(
dat %>%
filter(lang == "english")
) %>%
rename("total_e" = "total")
)
# spanish only
spanish <- reactive(
prep_dat(
dat %>%
filter(lang == "spanish")
) %>%
rename("total_s" = "total")
)
# combine
totals <- reactive({
all <- all()
females <- females()
males <- males()
english <- english()
spanish <- spanish()
all %>%
select(date, total) %>%
full_join(select(females, date, total_f), by = "date") %>%
full_join(select(males, date, total_m), by = "date") %>%
full_join(select(english, date, total_e), by = "date") %>%
full_join(select(spanish, date, total_s), by = "date")
})
# convert to xts
totals_ <- reactive({
totals <- totals()
xts(totals, order.by = totals$date)
})
# plot
renderDygraph({
totals_ <- totals_()
if (input$diss == 1) {
dygraph(totals_[, "total"],
main= "All") %>%
dySeries("total", label = "All") %>%
dyRangeSelector() %>%
dyOptions(useDataTimezone = FALSE,
stepPlot = TRUE,
drawGrid = FALSE,
fillGraph = TRUE)
} else if (input$diss == 2) {
dygraph(totals_[, c("total_f", "total_m")],
main = "By sex") %>%
dyRangeSelector() %>%
dySeries("total_f", label = "Female") %>%
dySeries("total_m", label = "Male") %>%
dyOptions(useDataTimezone = FALSE,
stepPlot = TRUE,
drawGrid = FALSE,
fillGraph = TRUE)
} else {
dygraph(totals_[, c("total_e", "total_s")],
main = "By language") %>%
dyRangeSelector() %>%
dySeries("total_e", label = "English") %>%
dySeries("total_s", label = "Spanish") %>%
dyOptions(useDataTimezone = FALSE,
stepPlot = TRUE,
drawGrid = FALSE,
fillGraph = TRUE)
}
})
```
Thanks for explaining more about your goals. I think the approach #simon-s-a suggests will simplify things. If we can run the grouping dynamically, and structure it so that we don't need to know the possible components in those groups beforehand, it will be a lot easier to maintain.
Here's a minimum viable product that rebuilds the plotting function to include the grouping logic inside it.
Once grouped by date and whatever our grouping variable is, it counts how many rows each group has, then spreads those so each group gets a column.
Then I use padr::pad to pad out any missing time rows in between, and replace all the NA's with zeros.
Finally, that data frame is converted to an xts object and fed into dygraph, which seems to handle the multiple columns automatically.
Here:
---
title: "test"
output:
flexdashboard::flex_dashboard:
theme: bootstrap
runtime: shiny
---
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(tibbletime)
library(dygraphs)
library(magrittr)
library(xts)
```
```{r global, include=FALSE}
# generate data
set.seed(1)
dat <- data.frame(date = seq(as.Date("2018-01-01"),
as.Date("2018-06-30"),
"days"),
sex = sample(c("male", "female"), 181, replace=TRUE),
lang = sample(c("english", "spanish"), 181, replace=TRUE),
age = sample(20:35, 181, replace=TRUE))
dat <- dplyr::sample_n(dat, 80)
```
Sidebar {.sidebar}
=====================================
```{r}
radioButtons("diss", label = "Disaggregation",
choices = list("All" = "Total",
"By Sex" = "sex",
"By Language" = "lang"),
selected = "Total")
```
Page 1
=====================================
```{r plot}
renderDygraph({
grp_col <- rlang::sym(input$diss) # This converts the input selection to a symbol
dat %>%
mutate(Total = 1) %>% # This is a hack to let us "group" by Total -- all one group
# Here's where we unquote the symbol so that dplyr can use it
# to refer to a column. In this case I make a dummy column
# that's a copy of whatever column we want to group
mutate(my_group = !!grp_col) %>%
# Now we make a group for every existing combination of week
# (using lubridate::floor_date) and level of our grouping column,
# count how many rows in each group, and spread that to wide format.
group_by(date = lubridate::floor_date(date, "1 week"), my_group) %>%
count() %>% spread(my_group, n) %>% ungroup() %>%
# padr:pad() fills in any missing weeks in the sequence with new rows
# Then we replace all the NA's with zeroes.
padr::pad() %>% replace(is.na(.), 0) %>%
# Finally we can convert to xts and feed the wide table into digraph.
xts::xts(order.by = .$date) %>%
dygraph() %>%
dyRangeSelector() %>%
dyOptions(
useDataTimezone = FALSE, stepPlot = TRUE,
drawGrid = FALSE, fillGraph = TRUE
)
})
```
This is a good place to make a function, to shorten your code and make it less prone to error.
http://r4ds.had.co.nz/functions.html
A complicating bit is that programming with dplyr often requires wading into a framework called tidyeval, which is very powerful but can be intimidating.
https://dplyr.tidyverse.org/articles/programming.html
(Here's an alternative approach that sidesteps tidyeval: https://cran.r-project.org/web/packages/seplyr/vignettes/using_seplyr.html)
In your scenario, it's possible to avoid these challenges entirely by doing a bit of manipulation before and after your function. It's not as elegant, but works.
BTW, I can't guarantee it'll work since you didn't share a verifiable reprex (e.g. including a sample of data with the same form as yours), but it worked with the fake data I made up. (See bottom.) Sorry, I missed the chunk where your sample data was provided.
prep_dat <- function(filtered_dat, col_name = "total") {
filtered_dat %>%
mutate(new = 1) %>%
arrange(date) %>%
# time series analysis
tibbletime::as_tbl_time(index = date) %>% # convert to tibble time object
select(date, new) %>%
tibbletime::collapse_by("1 week", side = "start", clean = TRUE) %>%
group_by(date) %>%
mutate(total = sum(new, na.rm = TRUE)) %>%
distinct(date, .keep_all = TRUE) %>%
ungroup() %>%
# expand matrix to include weeks without data
complete(
date = seq(date[1], date[length(date)], by = "1 week"),
fill = list(total = 0)
)
}
Then you could call it with your filtered data and the name of the total column. This fragment should be able to replace the ~20 lines you're currently using:
males <- prep_dat(dat_fake %>%
filter(sex == "male")) %>%
rename("total_m" = "total")
Fake data that I tested on:
dat_fake <- tibble(
date = as.Date("2018-01-01") + runif(500, 0, 100),
new = runif(500, 0, 100),
sex = sample(c("male", "female"),
500, replace = TRUE),
lang = sample(c("english", "french", "spanish", "portuguese", "tagalog"),
500, replace = TRUE)
)
I think you can make some gains by changing the order of your preparation. Right now the flow of your app is approximately:
Data => prepare all combinations => select desired visualization => make plot
Consider instead:
Data => select desired visualization => prepare required combination => make plot
This would make use of Shiny's reactivity to (re)prepare the data required for the requested plot in response to changes in the user's selection.
By way of code snippets (Sorry, I don't have sufficient familiarity with flexdashboard and tibbletime to ensure this code runs, but I hope it is enough to highlight the approach):
Your control selects the column you want to focus on (note we use "All" = "'1'" so this evaluates to a constant in the group-by, else it has to be handled separately):
radioButtons("diss", label = "Disaggregation",
choices = list("All" = "'1'",
"By Sex" = "sex",
"By Language" = "lang",
"By other" = "column_name_of_'other'"),
selected = 1)
And then use this in your group by to prepare only the data required for the present visualization (you'll need to adjust the function suggested by #Jon_Spring in response to this earlier group-by):
preped_dat = reactive({
dat %>%
group_by_(input$diss) %>%
# etc
})
Before plotting (you'll need to adjust the plotting function in response to the possible change in data format):
renderDygraph({
totals = preped_data()
dygraph(totals) %>%
dySeries("total", label = ) %>%
dyRangeSelector()
})
With regard to group_by you can use group_by_ if all your arguments are text strings, or group_by(!! sym(input$diss), other_column_name) if you want to mix the text string input from your control with other column names.
One possible disadvantage of this change in approach is reduced responsiveness during interactivity if your data set is large. The present approach does all the computation up front and then minimal computation each selection - this may be preferable if you have a large amount of processing. My suggested approach will have minimal up front processing and moderate computation each selection.

Resources