R Highcharter - highlight same group in multiple stacked columns chart + order groups in columns - r

R newbie here :)
I have recently started using R library Highcharter as an alternative to ggplot2.
This is the sample code I am currently working on:
library(highcharter)
library(dplyr)
## Sample dataframe
YEAR <- c(2019,2020,2021)
CATEGORY <- c("dog", "cat", "mouse")
SAMPLE_DATA <- expand.grid(YEAR, CATEGORY)
names(SAMPLE_DATA)[1] <- "CATEGORY"
names(SAMPLE_DATA)[2] <- "YEAR"
SAMPLE_DATA$VALUE <- runif(n = 9, min = 400, max = 900)
## Chart
SAMPLE_DATA <- SAMPLE_DATA %>%
group_by(YEAR, CATEGORY) %>%
summarise(VALUE = sum(VALUE, na.rm = T))
highchart() %>%
hc_add_series(data = SAMPLE_DATA, hcaes(x = YEAR, y = round(VALUE,0), group = CATEGORY), type = "column") %>%
hc_plotOptions(column = list(stacking = "normal"))
What I am trying to do is:
Sort how the group "CATEGORY" is piled in each column, based on ascending/descending "VALUE"
Have that effect which highlights the same group in all columns as you hover over it
Does anyone have an idea? Thank you!

This is a late answer but I believe this is what you want.
Adding the data again because I think you swapped some column names on accident:
YEAR <- c(2019,2020,2021)
CATEGORY <- c("dog", "cat", "mouse")
SAMPLE_DATA <- expand.grid(YEAR, CATEGORY)
names(SAMPLE_DATA)[1] <- "YEAR"
names(SAMPLE_DATA)[2] <- "CATEGORY"
SAMPLE_DATA$VALUE <- runif(n = 9, min = 400, max = 900)
## Chart
SAMPLE_DATA <- SAMPLE_DATA %>%
group_by(YEAR, CATEGORY) %>%
summarise(VALUE = sum(VALUE, na.rm = T))
Creating plot:
SAMPLE_DATA %>%
ungroup() %>%
mutate(YEAR = factor(YEAR) %>% fct_reorder(VALUE, .desc = TRUE)) %>%
mutate(year_index = as.numeric(YEAR)) %>%
hchart(
type = "column",
hcaes(x = year_index,
y = VALUE,
group = CATEGORY,
name = YEAR),
) %>%
hc_xAxis(type = "category", labels = list(step = 1)) %>%
hc_plotOptions(series = list(stacking = TRUE))

Related

Change row group labels in gt table (with superscript/subscript and line breaks). Customising row group labels in R

I have the following data and table:
library(gt)
library(dplyr)
a <- rnorm(21, mean = 112, sd =12)
colour <- rep(c("Blue", "Red", "Green"), 7)
data <- data.frame(colour, a) %>%
group_by(colour) %>%
summarise(mean = mean(a), sd = sd(a), n = n()) %>%
mutate(grp = html("[H<sub>2</sub>O]")) %>%
rename(cat = colour)
b <- rnorm(21, mean = 60, sd =12)
day <- rep(c("2", "4", "6"), 7)
data2 <- data.frame(day, b) %>%
group_by(day) %>%
summarise(mean = mean(a), sd = sd(a), n = n()) %>%
mutate(grp = html("[H<sub>2</sub>O] Additition <br> (Days)")) %>%
rename(cat = day)
bind_rows(data, data2) %>%
group_by(grp) %>%
gt(rowname_col = "cat")
bind_rows(data, data2) %>%
group_by(grp) %>%
gt() %>%
tab_options(row_group.as_column = TRUE)
The row group labels appear literally as '[H<sub>2<\sub>O]', rather than [H2O] etc. It is likely that I am using HTML wrong and it needs to be used with another package/function. I have also tried using cols_label but doesn't recognise these as columns in the dataframe.
Is there also a way to have the row groups column vertically centered, rather than at the top where is currently is? How do you bold these row groups?
The html function won't work outside of a gt table, so you'll have to create the row groups using tab_row_group and add the html labels there.
data <- data.frame(colour, a) %>%
group_by(colour) %>%
summarise(mean = mean(a), sd = sd(a), n = n()) %>%
mutate(grp = "color") %>%
rename(cat = colour)
data2 <- data.frame(day, b) %>%
group_by(day) %>%
summarise(mean = mean(a), sd = sd(a), n = n()) %>%
mutate(grp = "day") %>%
rename(cat = day)
bind_rows(data, data2) %>%
gt() %>%
tab_row_group(
label = html("[H<sub>2</sub>O]"),
rows = grp == "color"
) %>%
tab_row_group(
label = html("[H<sub>2</sub>O] Additition <br> (Days)"),
rows = grp == "day"
) %>%
cols_hide(grp)

Skip "zero" level of dichotomous variables in expss tables

I want to create a summary table for some dichotomous variables using the expss package. Since the variables are dichotomous, one of the two levels would the sufficient to "show the picture".
I tried to use the function tab_net_cell, but was not able to get the right results. Here is some example code with BrCa (Breast cancer) with 1 or 0. I only want to show the number of patients with but not without breast cancer.
df <- data.frame(BrCa = c(1,1,1,0,0,0,NA,NA,0,0))
df$group <- c(1,2,1,2,1,2,1,2,1,2)
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa) %>%
expss::tab_stat_cpct(total_row_position = "none",label = "%") %>%
expss::tab_stat_cases(total_row_position = "none",label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows")
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa) %>%
expss::tab_net_cells("BrCa" = eq(1)) %>%
expss::tab_stat_cpct(total_row_position = "none",label = "%") %>%
expss::tab_stat_cases(total_row_position = "none",label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows")
The simplest way is to filter resulted table:
df <- data.frame(BrCa = c(1,1,1,0,0,0,NA,NA,0,0))
df$group <- c(1,2,1,2,1,2,1,2,1,2)
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa) %>%
expss::tab_stat_cpct(total_row_position = "none",label = "%") %>%
expss::tab_stat_cases(total_row_position = "none",label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows") %>%
expss::where(grepl(1, row_labels))
Another way is to use mean and sum instead of cpct and cases:
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa*100) %>%
expss::tab_stat_mean(label = "%") %>%
expss::tab_stat_sum(label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows")

how to make auto-separated years in a calendar with echarts4r

I'm trying to make calendar with echarts4r package.
library(tidyverse)
library(echarts4r)
dates <- seq.Date(as.Date("2017-01-01"), as.Date("2018-12-31"), by = "day")
values <- rnorm(length(dates), 20, 6)
year <- data.frame(date = dates, values = values)
year %>%
e_charts(date) %>%
e_calendar(range = "2017",top="40") %>%
e_calendar(range = "2018",top="260") %>%
e_heatmap(values, coord.system = "calendar") %>%
e_visual_map(max = 30) %>%
e_title("Calendar", "Heatmap")%>%
e_tooltip("item")
But this one didn't plot 2018 year.
How to make auto-separated years in a calendar?
Is any solution like fill from ggplot?
Expected output : this
The API is admittedly clunky and unintuitive but it is doable. You need to add the two calendars as you do already, reference their index in your e_heatmap function (so that the heatmaps is plotted against the correct calendar). Also, I use e_data in order to pass the values (x) for the second calendar. Make sure to adjust to position of the calendars so that they do not overlap (i.e.: top = 300).
dates18 <- seq.Date(as.Date("2018-01-01"), as.Date("2018-12-31"), by = "day")
dates17 <- seq.Date(as.Date("2017-01-01"), as.Date("2017-12-31"), by = "day")
values <- rnorm(length(dates18), 20, 6)
df <- data.frame(date18 = dates18, date17 = dates17, values = values)
df %>%
e_charts(date18) %>%
e_calendar(range = "2018") %>%
e_heatmap(values, coord.system = "calendar", calendarIndex = 0, name = "2018") %>%
e_data(df, date17) %>%
e_calendar(range = "2017", top = 300) %>%
e_heatmap(values, coord.system = "calendar", calendarIndex = 1, name = "2017") %>%
e_visual_map(max = 30)
Update
Since version 0.2.0 the above can be done by grouping the data by year which is much clearer and easier:
dates <- seq.Date(as.Date("2017-01-01"), as.Date("2018-12-31"), by = "day")
values <- rnorm(length(dates), 20, 6)
year <- data.frame(date = dates, values = values)
year %>%
dplyr::mutate(year = format(date, "%Y")) %>% # get year from date
group_by(year) %>%
e_charts(date) %>%
e_calendar(range = "2017",top="40") %>%
e_calendar(range = "2018",top="260") %>%
e_heatmap(values, coord_system = "calendar") %>%
e_visual_map(max = 30) %>%
e_title("Calendar", "Heatmap")%>%
e_tooltip("item")

Pretty tables with cumulative count / percentage and group totals using R "tables" package

I am trying to produce a formatted html table which has columns for frequency, cumulative frequency, column percentage, and cumulative column percentage. The table should also have the data subsetted by a grouping variable, and including a group total.
I can almost achieve this using a combination of dplyr and tidyr, but the output is a dataframe which doesn't look so pretty. I wonder if there is an easier way using the tables::tabulate command?
# Sample data
dat <- data.frame(
id = 1:100,
group = factor(sample(c("A", "B"), 100, replace = TRUE)),
sessions = factor(sample(1:10, 100, replace = TRUE))
)
# dplyr/tidyr solution
library(dplyr)
library(tidyr)
dat %>%
group_by(group, sessions) %>%
tally() %>%
spread(key = group, value = n) %>%
mutate(All = rowSums(.[-1])) %>%
gather(key = group, value = n, -sessions) %>%
group_by(group) %>%
mutate(
cum_n = cumsum(n),
p = round(n / sum(n)*100,1),
cum_p = round(cum_n / sum(n)*100,1),
) %>%
data.frame() %>%
reshape(timevar = "group", idvar = "sessions", direction = "wide")
# As far as I get using tables::tabulate
library(tables)
tabular(
Factor(sessions, "Sessions") ~
(Heading()*group + 1) *
(
(n = 1) +
# (cum_n = ??) +
Heading("%")*Percent(denom = "col")*Format(digits = 2)
# + Heading("cum_%")*??*Format(digits = 2)
),
data = dat
)
I would recommend using knitr::kable and kableExtra, amazing packages for producing tables. You can also set it up for multiple format outputs, for example using the same code to produce html and latex for pdf.
library(dplyr)
library(tidyr)
library(knitr)
library(kableExtra)
dat %>%
group_by(group, sessions) %>%
tally() %>%
spread(key = group, value = n) %>%
mutate(All = rowSums(.[-1])) %>%
gather(key = group, value = n, -sessions) %>%
group_by(group) %>%
mutate(
cum_n = cumsum(n),
p = round(n / sum(n)*100,1),
cum_p = round(cum_n / sum(n)*100,1),
) %>%
data.frame() %>%
reshape(timevar = "group", idvar = "sessions", direction = "wide") %>%
kable("html") %>%
kable_styling(bootstrap_options = c("striped", "hover"))

Moving mean as a function in dplyr

I'd like to create a function that can calculate the moving mean for a variable number of last observations and different variables. Take this as mock data:
df = expand.grid(site = factor(seq(10)),
year = 2000:2004,
day = 1:50)
df$temp = rpois(dim(df)[1], 5)
Calculating for 1 variable and a fixed number of last observations works. E.g. this calculates the average of the temperature of the last 5 days:
library(dplyr)
library(zoo)
df <- df %>%
group_by(site, year) %>%
arrange(site, year, day) %>%
mutate(almost_avg = rollmean(x = temp, 5, align = "right", fill = NA)) %>%
mutate(avg = lag(almost_avg, 1))
So far so good. Now trying to functionalize fails.
avg_last_x <- function(dataframe, column, last_x) {
dataframe <- dataframe %>%
group_by(site, year) %>%
arrange(site, year, day) %>%
mutate(almost_avg = rollmean(x = column, k = last_x, align = "right", fill = NA)) %>%
mutate(avg = lag(almost_avg, 1))
return(dataframe) }
avg_last_x(dataframe = df, column = "temp", last_x = 10)
I get this error:
Error in mutate_impl(.data, dots) : k <= n is not TRUE
I understand this is probably related to the evaluation mechanism in dplyr, but I don't get it fixed.
Thanks in advance for your help.
This should fix it.
library(lazyeval)
avg_last_x <- function(dataframe, column, last_x) {
dataframe %>%
group_by(site, year) %>%
arrange(site, year, day) %>%
mutate_(almost_avg = interp(~rollmean(x = c, k = last_x, align = "right",
fill = NA), c = as.name(column)),
avg = ~lag(almost_avg, 1))
}

Resources