Remove comma in thousands in echarts4r - r

Here's a plot made with echarts4r:
library(tibble)
library(echarts4r)
data_test <- tibble(
year = seq(1900, 1920, 1),
variable = seq(200, 400, 10)
)
data_test %>%
e_charts(year) %>%
e_x_axis(year) %>%
e_y_axis(variable) %>%
e_line(variable)
How can I format the values on the x-axis so that I have 1900 instead of 1,900 for example?
I checked here but couldn't find a solution.

Probably not the most elegant solution, but this should work :
data_test %>%
e_charts(year) %>%
e_x_axis(type='category') %>%
e_y_axis(variable) %>%
e_line(variable)
Adding options for a cleaner output :
data_test %>%
e_charts(year) %>%
e_x_axis(type='category',axisLabel = list(interval = 4),axisTick = list(inside=TRUE,alignWithLabel=TRUE,interval=4)) %>%
e_y_axis(variable) %>%
e_line(variable)
Output :

I opened a GitHub issue for this question and this is the answer of the developer of echarts4r (I shortened it a little, see the link for the original answer).
There are two solutions.
1) Transform the variable year as a factor. This works only if there are no missing years.
library(tibble)
library(echarts4r)
data_test <- tibble(
year = seq(1900, 1920, 1),
variable = seq(200, 400, 10)
)
data_test %>%
dplyr::mutate(year = as.factor(year)) %>%
e_charts(year) %>%
e_line(variable)
2) Modify the JavaScript function. This is more robust, and it works when there are missing years.
library(echarts4r)
data_test <- tibble(
year = c(1900, 1901, 1905),
variable = 1:3
)
label <- list(
formatter = htmlwidgets::JS(
'function(value, index){
return value;
}'
)
)
data_test %>%
e_charts(year) %>%
e_y_axis(variable) %>%
e_line(variable) %>%
e_x_axis(serie = year, axisLabel = label)

Related

Presentation of calculated data within a table omitting the value of the respective column

Apologies for the awkward title; I hope it becomes clear soon.
I have data like this: People are attributed to specific Locations, and it has been recorded whether an event has been successful or the data is missing.
df <- data.frame(PersonID = c(1:20),
Location = c("B","A","D","C","A","D","C","D","A","D","B","A","D","C","A","D","C","D","A","D"),
Success = c("yes","no","yes",NA,"yes","no","no","yes",NA,"yes","no","yes",NA,"yes","no","no","yes",NA,"yes","no"))
I would like to know how each location "performs" relative to the other locations, i. e. how many valid attempts have been successful and how the location's rate of success compares to the other locations.
So in my example, location "A" has seen 5 valid attempts (1 "NA"), of which 3 were successful (60%). The other locations have a success rate of 50%, 66.7%, and 50%, an average of 55.6%. Location A is thus 4.4 percentage points higher than the other locations' average. I want to display all that information in a table just like this:
I don't have a specific preference regarding packages, but I like and know some gt and flextable. Alas, not enough to implement this …
Thank you all in advance!
This was not trivial in any way:
library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
library(janitor)
library(gt)
df1 <- df %>%
group_by(Location) %>%
mutate(attempts = sum(!is.na(Success)),
yeses = sum(Success == "yes", na.rm = TRUE),
Success_rate = (yeses/attempts)*100)
df2 <- df1 %>%
summarise(avgother = mean(Success_rate)) %>%
mutate(avgother = map_dbl(row_number(), ~mean(avgother[-.x])))
)
df %>%
group_by(Location) %>%
summarise(attempts = sum(!is.na(Success)),
yeses = sum(Success == "yes", na.rm = TRUE),
Success_rate = (yeses/attempts)*100) %>%
bind_cols(avgother= round(df2$avgother, 1)) %>%
mutate(comp.avg = Success_rate - avgother) %>%
mutate(`attempts` = paste0("(N=", attempts, ")"),
Success_rate = paste0(round(Success_rate, 1), "%")) %>%
select(-yeses) %>%
mutate(comp.avg = ifelse(comp.avg >0, paste0("(+",round(comp.avg, 1),")"), paste0("(",round(comp.avg,1),")"))) %>%
t() %>%
as.data.frame() %>%
rownames_to_column("Location") %>%
row_to_names(row_number = 1) %>%
gt()
Nice answer from #TarJae, similar approach below with flextable.
library(dplyr)
library(flextable)
temp <- df |>
group_by(Location) |>
summarise(Attempts = sum(!is.na(Success)),
Successful = sum(ifelse(Success=="yes", 1,0), na.rm = T)) |>
ungroup() |>
mutate(success_rate = round(Successful / Attempts, 3)*100) |>
mutate(excl.mean = (sum(success_rate) - success_rate)/(n()-1)) |>
mutate(comp.avg = round(success_rate - excl.mean,1)) |>
mutate(success_rate = paste0(formatC(signif(success_rate,digits=3),
digits=3,format="fg", flag="#"), "%"),
comp.avg = ifelse(comp.avg >=0,
paste0("(+", comp.avg, ")"),
paste0("(", comp.avg, ")"))
) |>
select(-c(Successful, excl.mean)) |>
t() |>
as.data.frame()
temp <- cbind(Col1 = c("Location", "# of attempts", "Success rate", "comp. avg."),
temp)
ft <- flextable(temp)
ft <- theme_vanilla(ft)
ft <- delete_part(ft, part = "header")
ft <- hline_top(ft)
ft <- border_inner_v(ft)
ft <- border_outer(ft)
ft

Group data by year and filter by month in R

I have a list of data frames with daily streamflow data.
I want to estimate the maximum daily flow from June to November every year for each data frame in the list that corresponds each of them to data in a station.
This is how the list of data frames looks:
and this is the code I am using:
#Peak mean daily flow summer and fall (June to November)
PeakflowSummerFall <- lapply(listDF,function(x){x %>% group_by(x %>% mutate(year = year(Date)))
%>% filter((x %>% mutate(month = month(Date)) >= 6) & (x %>% mutate(month = month(Date)) <= 11))
%>% summarise(max=max(DailyStreamflow, na.rm =TRUE))})
but I am having this error:
<error/dplyr_error>
Problem with `filter()` input `..1`.
x Input `..1` must be of size 1, not size 24601.
i Input `..1` is `&...`.
i The error occurred in group 1: Date = 1953-06-01, DailyStreamflow = 32, year = 1953.
Backtrace:
Run `rlang::last_trace()` to see the full context
Any solution to this problem?
#### This should give provide you with enough
#### sample data for answerers to work with
install.packages('purrr')
library(purrr)
sample_dat <- listDF %>%
head %>%
map( ~ head(.x))
dput(sample_dat)
#### With that being said...
#### You should flatten the data frame...
#### It's easier to work with...
install.packages('lubridate')
library(lubridate)
listDF %>%
plyr::ldply(rbind) %>%
mutate(month = floor_date(Date, unit = 'month')) %>%
filter(month(Date) > 5, month(Date) < 12) %>%
group_by(.id, month) %>%
dplyr::summarise(max_flow = max(DailyStreamflow)) %>%
split(.$.id)
Given the posted image of the data structure, the following might work.
library(lubridate)
library(dplyr)
listDF %>%
purrr::map(function(x){
x %>%
filter(month(Date) >= 6 & month(Date) <= 11) %>%
group_by(year(Date)) %>%
summarise(Max = max(DailyStreamflow, na.rm = TRUE), .groups = "keep")
})
Test data creation code.
fun <- function(year, n){
d1 <- as.Date(paste(year, 1, 1, sep = "-"))
d2 <- as.Date(paste(year + 10, 12, 31, sep = "-"))
d <- seq(d1, d2, by = "day")
d <- sort(rep(sample(d, n, TRUE), length.out = n))
flow <- sample(10*n, n, TRUE)
data.frame(Date = d, DailyStreamflow = flow)
}
set.seed(2020)
listDF <- lapply(1:3, function(i) fun(c(1953, 1965, 1980)[i], c(24601, 13270, 17761)[i]))
str(listDF)
rm(fun)

Skip "zero" level of dichotomous variables in expss tables

I want to create a summary table for some dichotomous variables using the expss package. Since the variables are dichotomous, one of the two levels would the sufficient to "show the picture".
I tried to use the function tab_net_cell, but was not able to get the right results. Here is some example code with BrCa (Breast cancer) with 1 or 0. I only want to show the number of patients with but not without breast cancer.
df <- data.frame(BrCa = c(1,1,1,0,0,0,NA,NA,0,0))
df$group <- c(1,2,1,2,1,2,1,2,1,2)
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa) %>%
expss::tab_stat_cpct(total_row_position = "none",label = "%") %>%
expss::tab_stat_cases(total_row_position = "none",label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows")
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa) %>%
expss::tab_net_cells("BrCa" = eq(1)) %>%
expss::tab_stat_cpct(total_row_position = "none",label = "%") %>%
expss::tab_stat_cases(total_row_position = "none",label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows")
The simplest way is to filter resulted table:
df <- data.frame(BrCa = c(1,1,1,0,0,0,NA,NA,0,0))
df$group <- c(1,2,1,2,1,2,1,2,1,2)
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa) %>%
expss::tab_stat_cpct(total_row_position = "none",label = "%") %>%
expss::tab_stat_cases(total_row_position = "none",label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows") %>%
expss::where(grepl(1, row_labels))
Another way is to use mean and sum instead of cpct and cases:
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa*100) %>%
expss::tab_stat_mean(label = "%") %>%
expss::tab_stat_sum(label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows")

how to make auto-separated years in a calendar with echarts4r

I'm trying to make calendar with echarts4r package.
library(tidyverse)
library(echarts4r)
dates <- seq.Date(as.Date("2017-01-01"), as.Date("2018-12-31"), by = "day")
values <- rnorm(length(dates), 20, 6)
year <- data.frame(date = dates, values = values)
year %>%
e_charts(date) %>%
e_calendar(range = "2017",top="40") %>%
e_calendar(range = "2018",top="260") %>%
e_heatmap(values, coord.system = "calendar") %>%
e_visual_map(max = 30) %>%
e_title("Calendar", "Heatmap")%>%
e_tooltip("item")
But this one didn't plot 2018 year.
How to make auto-separated years in a calendar?
Is any solution like fill from ggplot?
Expected output : this
The API is admittedly clunky and unintuitive but it is doable. You need to add the two calendars as you do already, reference their index in your e_heatmap function (so that the heatmaps is plotted against the correct calendar). Also, I use e_data in order to pass the values (x) for the second calendar. Make sure to adjust to position of the calendars so that they do not overlap (i.e.: top = 300).
dates18 <- seq.Date(as.Date("2018-01-01"), as.Date("2018-12-31"), by = "day")
dates17 <- seq.Date(as.Date("2017-01-01"), as.Date("2017-12-31"), by = "day")
values <- rnorm(length(dates18), 20, 6)
df <- data.frame(date18 = dates18, date17 = dates17, values = values)
df %>%
e_charts(date18) %>%
e_calendar(range = "2018") %>%
e_heatmap(values, coord.system = "calendar", calendarIndex = 0, name = "2018") %>%
e_data(df, date17) %>%
e_calendar(range = "2017", top = 300) %>%
e_heatmap(values, coord.system = "calendar", calendarIndex = 1, name = "2017") %>%
e_visual_map(max = 30)
Update
Since version 0.2.0 the above can be done by grouping the data by year which is much clearer and easier:
dates <- seq.Date(as.Date("2017-01-01"), as.Date("2018-12-31"), by = "day")
values <- rnorm(length(dates), 20, 6)
year <- data.frame(date = dates, values = values)
year %>%
dplyr::mutate(year = format(date, "%Y")) %>% # get year from date
group_by(year) %>%
e_charts(date) %>%
e_calendar(range = "2017",top="40") %>%
e_calendar(range = "2018",top="260") %>%
e_heatmap(values, coord_system = "calendar") %>%
e_visual_map(max = 30) %>%
e_title("Calendar", "Heatmap")%>%
e_tooltip("item")

Pretty tables with cumulative count / percentage and group totals using R "tables" package

I am trying to produce a formatted html table which has columns for frequency, cumulative frequency, column percentage, and cumulative column percentage. The table should also have the data subsetted by a grouping variable, and including a group total.
I can almost achieve this using a combination of dplyr and tidyr, but the output is a dataframe which doesn't look so pretty. I wonder if there is an easier way using the tables::tabulate command?
# Sample data
dat <- data.frame(
id = 1:100,
group = factor(sample(c("A", "B"), 100, replace = TRUE)),
sessions = factor(sample(1:10, 100, replace = TRUE))
)
# dplyr/tidyr solution
library(dplyr)
library(tidyr)
dat %>%
group_by(group, sessions) %>%
tally() %>%
spread(key = group, value = n) %>%
mutate(All = rowSums(.[-1])) %>%
gather(key = group, value = n, -sessions) %>%
group_by(group) %>%
mutate(
cum_n = cumsum(n),
p = round(n / sum(n)*100,1),
cum_p = round(cum_n / sum(n)*100,1),
) %>%
data.frame() %>%
reshape(timevar = "group", idvar = "sessions", direction = "wide")
# As far as I get using tables::tabulate
library(tables)
tabular(
Factor(sessions, "Sessions") ~
(Heading()*group + 1) *
(
(n = 1) +
# (cum_n = ??) +
Heading("%")*Percent(denom = "col")*Format(digits = 2)
# + Heading("cum_%")*??*Format(digits = 2)
),
data = dat
)
I would recommend using knitr::kable and kableExtra, amazing packages for producing tables. You can also set it up for multiple format outputs, for example using the same code to produce html and latex for pdf.
library(dplyr)
library(tidyr)
library(knitr)
library(kableExtra)
dat %>%
group_by(group, sessions) %>%
tally() %>%
spread(key = group, value = n) %>%
mutate(All = rowSums(.[-1])) %>%
gather(key = group, value = n, -sessions) %>%
group_by(group) %>%
mutate(
cum_n = cumsum(n),
p = round(n / sum(n)*100,1),
cum_p = round(cum_n / sum(n)*100,1),
) %>%
data.frame() %>%
reshape(timevar = "group", idvar = "sessions", direction = "wide") %>%
kable("html") %>%
kable_styling(bootstrap_options = c("striped", "hover"))

Resources