How can I overlay values on flextable minibars? - r

The closest I can get with flextable is this:
What I'd like to achieve is something like this - knocked up with ggplot2:
Any ideas? or is this a feature request?
Code used to generate flextable:
library(tibble)
library(flextable)
tib <- tibble(v1 = letters[1:4],
v2 = c(1, 3, 5, 2))
tib %>%
flextable()%>%
width(j = 1:2, width = c(0.5, 1.5)) %>%
mk_par(j = 2,
value = as_paragraph(as_chunk(v2, formater = function(x) sprintf("%.0f", x)),
" ",
minibar(value = v2, max = sum(v2))
),
part = "body")

Related

Two colorbars in a single R plotly scatter plot

I think I'm basically looking for an R plotly equivalent to this python plotly post:
I have a XY data.frame that I'd like to plot using R's plotly, where each point belongs to either one of two types ("a"/"b"), and nested within each type is a group, and the group assignment is redundant.
My purpose is to color code the points according to the group frequency, where each type uses a different color scale.
Here's the data.frame:
library(dplyr)
set.seed(1)
df <- rbind(data.frame(type = "a", group = paste0("a", sample(1000, 500, replace = T))) %>%
cbind(as.data.frame(MASS::mvrnorm(n = 1000,mu = c(-5,-5),Sigma = matrix(c(5, 3, 4, 4), ncol=2)))),
data.frame(type = "b", group = paste0("b", sample(500, 50, replace = T))) %>%
cbind(as.data.frame(MASS::mvrnorm(n = 500,mu = c(5,5),Sigma = matrix(c(5, 3, 4, 4), ncol=2))))) %>%
dplyr::rename(x = V1, y = V2)
Here I compute the frequency of each group, for each type, and then add two artificial points per each type, with the global minimum and maximum frequency (f), so that the color scales use a comment numeric scale:
freq.df <- rbind(dplyr::group_by(dplyr::filter(df, type == "a"), type, group) %>%
dplyr::tally() %>%
dplyr::mutate(f = 100*n/sum(n)) %>%
dplyr::select(-n),
dplyr::group_by(dplyr::filter(df, type == "b"), type, group) %>%
dplyr::tally() %>%
dplyr::mutate(f = 100*n/sum(n)) %>%
dplyr::select(-n)) %>%
dplyr::ungroup() %>%
rbind(data.frame(type = c(rep("a", 2), rep("b", 2)), group = c(rep("a", 2), rep("b", 2)), f = rep(c(min(.$f), max(.$f)), 2), stringsAsFactors = F))
And now joining freq.df to df:
df <- df %>% dplyr::left_join(freq.df)
Here's how I'm trying to plot it:
plotly::plot_ly(marker = list(size = 3), type = 'scatter', mode = "markers", color = dplyr::filter(df, type == "a")$f, colors = viridis::viridis_pal(option = "D")(3), x = dplyr::filter(df, type == "a")$x, y = dplyr::filter(df, type == "a")$y) %>%
plotly::add_trace(marker = list(size = 3),type = 'scatter', mode = "markers",color = dplyr::filter(df, type == "b")$f,colors = viridis::viridis_pal(option = "A")(3), x = dplyr::filter(df, type == "b")$x,y=dplyr::filter(df,type == "b")$y) %>%
plotly::layout(xaxis = list(zeroline = F, showticklabels = F, showgrid = F),yaxis = list(zeroline = F,showticklabels = F, showgrid = F))
Which only gives me the colorbar of first color scale (viridis's cividis):
Any idea how to get both colorbars (viridis's cividis and viridis's magma) appear side by side?

Using summary_row() values to calculate group percentage with {gt} package?

I am trying to calculate the percentage for an entire group while using the summary_rows() function from the {gt} package. The problem I am encountering is how to create a function that uses summary_rows() values to calculate a percentage rowwise.
Sample df:
df <- tibble(
division = c("Science", "Science", "Science"),
department = c("Biology", "Biology", "Biology"),
course_num = c('101', '201', "301"),
widthraws = c(1, 2, 3),
unsucessful = c(0, 0 , 0),
successfull = c(1, 3, 4),
total_enrolled = c(2, 5, 7),
percent_successful = c(.50, .60, .57)
)
Sample of gt table:
df %>%
group_by(division, department) %>%
gt() %>%
summary_rows(
groups = TRUE,
columns = 4:7,
missing_text = " ",
fns = list(
total = ~sum(.)),
)
What I would want is the total row of the percent_successful column to be .57. Open to other ideas that would help me achieve this too.
Compute the percentage for total outside and add a layer
library(gt)
library(dplyr)
total_success_perc <- with(df, round(sum(successfull)/sum(total_enrolled), 2))
df %>%
group_by(division, department) %>%
gt() %>%
summary_rows(
groups = TRUE,
columns = 4:7,
missing_text = " ",
fns = list(
total = ~sum(.)),
) %>%
summary_rows(groups = TRUE, columns = 8, missing_text = " ",
fns = list(total = ~ c(total_success_perc)))
-output
library(tidyverse)
library(gt)
df <- tibble(
division = c("Science", "Science", "Science"),
department = c("Biology", "Biology", "Biology"),
course_num = c('101', '201', "301"),
widthraws = c(1, 2, 3),
unsucessful = c(0, 0 , 0),
successfull = c(1, 3, 4),
total_enrolled = c(2, 5, 7),
percent_successful = c(.50, .60, .57)
)
df %>%
group_by(division, department) %>%
gt() %>%
summary_rows(
groups = TRUE,
columns = c(widthraws:percent_successful),
missing_text = " ",
fns = list(
total = ~sum(.),
max = ~max(.),
min = ~min(.),
medain = ~median(.))
)

How can I bold each cell with minimum value for each column in R Markdown

I want to bold cells with minimum values for one of my data frames which I combined to be one. I want the minimum value in each column to be bolded.
The below R code shows my Minimum Working Example (MWE). The table consists of columns of five (5) randomly generated values from the normal distribution. The first three (3) show when std dev =1 while the last three (3) show when std dev =2. Each column is different based on its seed of generation and its std dev.
---
title: "Min Value to be Bolded"
output: pdf_document
---
```{r}
set.seed(1)
df1 <- data.frame(
seed1 = rnorm(5, mean = 0, sd = 1),
seed2 = rnorm(5, mean = 0, sd = 1),
seed3 = rnorm(5, mean = 0, sd = 1)
)
set.seed(1)
df2 <- data.frame(
seed1 = rnorm(5, mean = 0, sd = 2),
seed2 = rnorm(5, mean = 0, sd = 2),
seed3 = rnorm(5, mean = 0, sd = 2)
)
df <- cbind(df1, df2)
df |>
knitr::kable(format = "html", table.attr = "style='width:100%;'", digits = 2, align = 'c', caption = "5 Random Numbers from the Normal Dist at diff set and std dev") |>
kableExtra::kable_styling(bootstrap_options = 'bordered') |>
kableExtra::add_header_above(c('$sd = 1$' = 3, '$sd$ = 2' = 3)) |>
row_spec(3, bold=T, hline_after = TRUE)
```
The output
Instead of me to get the minimum values bold. I got the third row bold all through. I can get the values on R like this:
lapply(df1, 2, FUN = min)
but I need help on how to get the cells bolded with kable or kableExtral.
Please note I do not mean to do that for the whole data frame, I only need it for just df1 or df2.
You may add library(formattable) and then
df |>
lapply(\(x){ifelse(x == min(x), cell_spec(round(x,2), bold = TRUE), round(x,2))}) |>
knitr::kable(format = "html", table.attr = "style='width:100%;'", digits = 2, align = 'c',
caption = "5 Random Numbers from the Normal Dist at diff set and std dev",
escape = FALSE) |>
kableExtra::kable_styling(bootstrap_options = 'bordered') |>
kableExtra::add_header_above(c('$sd = 1$' = 3, '$sd$ = 2' = 3))
Or, if you only want for df1 you may start by
df1 |> lapply(\(x){ifelse(x == min(x), cell_spec(round(x, 2), bold = TRUE), round(x, 2))}) |> cbind(df2) |>
And for df2
df1 |> cbind(df2 |> lapply(\(x){ifelse(x == min(x), cell_spec(round(x, 2), bold = TRUE), round(x, 2))})) |>
A remark: With my solution names of dataframe become all x. You could add as.data.frame after the lapply to get the original names, but in this case they become changed because df has the names duplicated which is not a suitable practice. Instead, using col.names you may write the desired ones, for example
df |>
lapply(\(x){ifelse(x == min(x), cell_spec(round(x,2), bold = TRUE), round(x,2))}) |>
as.data.frame() |>
knitr::kable(format = "html", table.attr = "style='width:100%;'", digits = 2, align = 'c',
caption = "5 Random Numbers from the Normal Dist at diff set and std dev",
escape = FALSE, col.names = rep(c("seed1", "seed2", "seed3"), 2)) |>
kableExtra::kable_styling(bootstrap_options = 'bordered') |>
kableExtra::add_header_above(c('$sd = 1$' = 3, '$sd$ = 2' = 3))
Note also that as comment #phiver, I had to modify output: pdf_document to output: html_document, since otherwise code didn't work. To get a pdf, the last lines of the code should be as follows (then it works):
knitr::kable(format = "latex", digits = 2, align = 'c',
caption = "5 Random Numbers from the Normal Dist at diff set and std dev",
escape = FALSE, col.names = rep(c("seed1", "seed2", "seed3"), 2)) |>
kableExtra::kable_styling(bootstrap_options = 'bordered', full_width = TRUE) |>
kableExtra::add_header_above(c('$sd = 1$' = 3, '$sd$ = 2' = 3))

How to color the required cells in the table using "huxtable" library in R. Which is more elegant way to do it?

I need to mark some special cells in the table with a gray color. Something like this:
```{r}
library(huxtable)
library(magrittr)
sample_df <- data.frame(matrix(1:25, nrow = 5))
sample_df %>% as_huxtable() %>% set_all_borders(1) %>%
set_background_color(row = 2, col = 1, value = "grey") %>%
set_background_color(row = 3, col = 2, value = "grey") %>%
set_background_color(row = 4, col = 3, value = "grey") %>%
set_background_color(row = 5, col = 4, value = "grey") %>%
set_background_color(row = 6, col = 5, value = "grey")
```
And after "knitr" as HTML document it gives me the following (as a screenshot):
And that's what i need to get. BUT, my question is: What is more elegant way to do it instead of writing such strings of code? I tried to do it like this:
my_fan <- function(.data) {for (i in c(2:6))
{set_background_color(.data, row = i, col = i-1, value = "grey")}
.data
}
sample_df %>%
as_huxtable() %>% set_all_borders %>%
my_fan()
... And it doesn't give me any result at all. Any ideas?
You can use the old-school interface and a little-known fact about R subsetting:
sample_df <- data.frame(matrix(1:25, nrow = 5))
sample_df <- as_huxtable(sample_df)
background_color(sample_df)[matrix(c(2:6, 1:5), ncol = 2)] <- "grey"
sample_df
From ?Extract:
When indexing arrays by [ a single argument i can be a matrix with as
many columns as there are dimensions of x; the result is then a vector
with elements corresponding to the sets of indices in each row of i.
Or if you want to be really cool:
diag(background_color(sample_df[-1,])) <- "grey"
I'm amazed this worked :-)

Use multiple functions in mutate efficiently in R

I used mutate to add new variables in my dataframe. The code below is an example and in my real data I need to mutate lag1 to lag100 and lag01 to lag0100. I don't want to type these one by one.
Is there an efficient way to do this?
c <- mtcars %>% mutate_all( .funs = list(lag1 = ~lag(., 1),
lag2 = ~lag(., 2),
lag3 = ~lag(., 3),
lag4 = ~lag(., 4),
lag5 = ~lag(., 5),
lag6 = ~lag(., 6),
lag01 = ~rollmean(., 2, fill = NA, align = 'right'),
lag02 = ~rollmean(., 3, fill = NA, align = 'right'),
lag03 = ~rollmean(., 4, fill = NA, align = 'right'),
lag04 = ~rollmean(., 5, fill = NA, align = 'right'),
lag05 = ~rollmean(., 6, fill = NA, align = 'right'),
lag06 = ~rollmean(., 7, fill = NA, align = 'right')
))
Any help will be highly appreciated!
You can use across in new dplyr 1.0.0 to apply multiple functions to all columns. rollmean with align = 'right is same as using rollmeanr. You can assign relevant names using .names specification.
library(dplyr)
library(purrr)
library(zoo)
x <- 5
map_dfc(1:x, function(i) mtcars %>%
transmute(across(.fns = list(lag = ~lag(., i),
mean = ~rollmeanr(., i + 1, fill = NA)),
.names = paste0('{col}_{fn}_', i))))
In the example that follows I have subset mtcars keeping only 20 rows and 5 columns.
The code uses bind_cols/Map to apply the functions lag and rollmeanr (rollmean, align = "right").
library(dplyr)
library(zoo)
bind_cols(
mtcars[1:20, 1:5] %>%
mutate_all(.funs = Map(function(n) lag = ~lag(., n), 1:2)),
mtcars[1:20, 1:5] %>%
mutate_all(.funs = Map(function(n) lag0 = ~rollmeanr(., n, fill = NA), 1:2)) %>%
select(-(1:5))
)
Then it's a matter of assigning new column names.

Resources