How to dynamically pass column name in e_bar (echarts4r) function? - r

I have a dataframe which contains count for each continent year wise. Below is the dataframe.
# A tibble: 4 x 4
continent year_2020 year_2021 year_2022
<chr> <dbl> <dbl> <dbl>
1 Asia 35 177 350
2 Europe 45 47 84
3 Australia 26 46 58
4 Africa 15 20 25
And this is the R script I used to create the graph
stack %>%
e_charts(continent) %>%
e_bar(year_2020) %>%
e_bar(year_2021) %>%
e_bar(year_2022)
Graph
Bar graph
My expectation is how do I pass this column names dynamically. The above dataframe is sample dataset and the year column keeps on increasing. My idea is to show max of 3 bars per continent.
What I tried was, have a start year and end year so the bar graph can be shown based on the input and not hotcode the column name in e_bar function.
start_year <- "2020"
end_year <- "2022"
year_val <- paste0("year_",start_year:end_year)
year_val1 <- year_val[1]
year_val2 <- year_val[2]
year_val3 <- year_val[3]
stack %>%
e_charts(continent) %>%
e_bar(sym(year_val1)) %>%
e_bar(sym(year_val2)) %>%
e_bar(sym(year_val3))
But was getting the below error
Error in chr_as_locations():
! Can't subset columns that don't exist.
x Column sym(year_val1) doesn't exist.
Need help on how to dynamically to pass the year columns.
Thanks

One option would be to switch to the "underscored" version of e_bar, i.e. e_bar_ which allows to pass the name of the series as a character string:
library(echarts4r)
stack |>
e_charts(continent) |>
e_bar_(year_val1) |>
e_bar_(year_val2) |>
e_bar_(year_val3)
DATA
stack <- structure(list(continent = c("Asia", "Europe", "Australia", "Africa"), year_2020 = c(35L, 45L, 26L, 15L), year_2021 = c(
177L, 47L,
46L, 20L
), year_2022 = c(350L, 84L, 58L, 25L)), class = "data.frame", row.names = c(
"1",
"2", "3", "4"
))

Related

Calculate or Filter the wrong date entries of two Date columns in R

I am trying to figure how I am going to filter the wrong entries or calculate the difference between two Date columns of the same data frame in R. The scenario is: I have Patient table and there are two columns of Patient_admit and Patient discharge. How I am going to find if the date entered for Patient_discharge is before the Patient_admit. In the below dataframe example, the entries of patient 2 and 6 are incorrect.
executed
dput(head(patient)
structure(list(id = c(1003L, 1005L, 1006L, 1007L, 1010L, 1010L
), date_admit = structure(c(115L, 18L, 138L,
91L, 34L, 278L), .Label = c("01/01/2020", "01/02/2020", "01/03/2020",............,
date_discharge = structure(c(143L, 130L, 181L, 156L, 198L,
86L), .Label = c("01/01/2020", "01/01/2021", "01/02/2020",
............., class = "factor")), row.names = c(NA, 6L), class = "data.frame")
The list of date is very long so I just put "..........." for ease of understanding. Thanks
Another possible solution, based on lubridate::dmy:
library(dplyr)
library(lubridate)
df %>%
filter(dmy(Patient_admit) <= dmy(Patient_discharge))
#> Patient_ID Patient_admit Patient_discharge
#> 1 1 20/10/2020 21/10/2020
#> 2 3 21/10/2021 22/10/2021
#> 3 4 25/11/2022 25/11/2022
#> 4 5 25/11/2022 26/11/2022
First convert your dates to the right format using strptime. Calculate the difference in days using difftime and filter if the days are negative. You can use the following code:
library(dplyr)
df %>%
mutate(Patient_admit = strptime(Patient_admit, "%d/%m/%Y"),
Patient_discharge = strptime(Patient_discharge, "%d/%m/%Y")) %>%
mutate(diff_days = difftime(Patient_discharge, Patient_admit, units = c("days"))) %>%
filter(diff_days >= 0) %>%
select(-diff_days)
Output:
Patient_ID Patient_admit Patient_discharge
1 1 2020-10-20 2020-10-21
2 3 2021-10-21 2021-10-22
3 4 2022-11-25 2022-11-25
4 5 2022-11-25 2022-11-26
Data
df <- data.frame(Patient_ID = c(1,2,3,4,5,6),
Patient_admit = c("20/10/2020", "22/10/2021", "21/10/2021", "25/11/2022", "25/11/2022", "05/10/2020"),
Patient_discharge = c("21/10/2020", "20/10/2021", "22/10/2021", "25/11/2022", "26/11/2022", "20/09/2020"))

R: Compute monthly averages for daily values

I have the following data which is stored as a data.frame in R:
Daily value of product A, B and C from 2018-08-01 until 2019-12-31
Now I would like to compute the monthly average of the value for each product. Additionally, only data for the weekdays but not the weekends should be used to calculate the monthly average for each product. What would be the approach in R to get to the required data?
Here is a solution, using dplyr and tidyr:
df <- data.frame(Product = c("A", "B", "C"), "Value_2018-08-01" = c(120L, 100L, 90L),
"Value_2018-08-02" = c(80L, 140L, 20L), "Value_2018-08-03" = c(50L, 70L, 200L),
"Value_2018-12-31" = c(50L, 24L, 24L), "Value_2019-01-01" = c(44L, 60L, 29L),
"Value_2019-12-31" = c(99L, 49L, 49L))
df %>%
tidyr::pivot_longer(c(starts_with("Value"))) %>%
mutate(Date = name,
Date = sub(".*_", "", Date),
Date = as.Date(Date, format="%Y.%m.%d"),
weekday = weekdays(Date)) %>%
filter(!weekday %in% c("Samstag", "Sonntag")) %>%
group_by(Product, format(Date, "%m")) %>%
summarize(mean(value)) %>%
as.data.frame()
Product format(Date, "%m") mean(value)
1 A 01 44.00000
2 A 08 83.33333
3 A 12 74.50000
4 B 01 60.00000
5 B 08 103.33333
6 B 12 36.50000
7 C 01 29.00000
8 C 08 103.33333
9 C 12 36.50000
Note that Samstag and Sonntag should be changed to the names of the weekend days in the language of your working system.
Also, I've calculated the monthly averages as you asked for it. However, if you want to have monthly averages per year, you should change group_by(Product, format(Date, "%m"))to group_by(Product, format(Date, "%m"),format(Date, "%Y")).

How to replace values in specific rows of some columns in R tibble with transformed values conditional on row values?

I have a tibble in R, where I want to change values in some columns with a condition based on a value of another column. So in the tibble df below, I want to multiply all values in the columns agr, man and ser where value in variable column is equal to va with 1000 and where value is equal to emp with 100 and replace the values in the respective columns with these calculated values. There must be a simple solution to it but I am at a loss.
df
country variable year agr man ser
chn va 1980 345 124 62
chn emp 1980 34 65 58
chn va 1981 345 243 670
ind emp 1980 54 34 40
ind va 1980 456 345 760
I have tried using ifelse, mutate_at and sweep functions but it does not work out.
Assuming that there would be also other value in 'variable' column, an option is to use case_when with mutate_at
library(dplyr)
df %>%
mutate_at(vars(agr:ser), ~ case_when(variable == 'va'~ . * 1000,
variable == 'emp' ~ .* 100, TRUE ~ as.numeric(.)))
data
df <- structure(list(country = c("chn", "chn", "chn", "ind", "ind"),
variable = c("va", "emp", "va", "emp", "va"), year = c(1980L,
1980L, 1981L, 1980L, 1980L), agr = c(345L, 34L, 345L, 54L,
456L), man = c(124L, 65L, 243L, 34L, 345L), ser = c(62L,
58L, 670L, 40L, 760L)), class = "data.frame", row.names = c(NA,
-5L))

How to sum a variable by group with NA?

I have a large data set like this :
ID Number
153 31
28
31
30
104 31
30
254 31
266 31
and I want to compute sum by ID include the NA. I mean get this :
ID Number
153 120
104 61
254 31
266 31
I tried aggregate but I dont get the expected result. Some help would be appreciated
One option is to convert the blanks to NA, then fill replace the NA elements with non-NA adjacent elements above in 'ID', grouped by 'ID', get the sum of 'Number'
library(tidyverse)
df1 %>%
mutate(ID = na_if(ID, "")) %>%
fill(ID) %>%
group_by(ID) %>%
summarise(Number = sum(Number))
# A tibble: 4 x 2
# ID Number
# <chr> <int>
#1 104 61
#2 153 120
#3 254 31
#4 266 31
Or without using fill, create a grouping variable with a logical expression and cumsum, and then do the sum
df1 %>%
group_by(grp = cumsum(ID != "")) %>%
summarise(ID = first(ID), Number = sum(Number)) %>%
select(-grp)
data
df1 <- structure(list(ID = c("153", "", "", "", "104", "", "254", "266"
), Number = c(31L, 28L, 31L, 30L, 31L, 30L, 31L, 31L)), row.names = c(NA,
-8L), class = "data.frame")
Or do it straightforwardly :) by
cbind(df1[df1$ID != "", "ID", drop = FALSE],
Number = rev(diff(c(0, rev((rev(cumsum(rev(df1$Number)))[df1$ID != ""]))))))

combine data in depending on the value of one column

I have a data frame in R
year group sales
1 2000 1 20
2 2001 1 25
3 2002 1 23
4 2003 1 30
5 2001 2 50
6 2002 2 55
And I want to group the data by groups or create some kind of object. I want to create one array for each group that will store the year and the sales. And the I will try to save it as a json file with this structure:
[{"group": 1, "sales":[[2000,20],[2001, 25], [2002,23], [2003, 30]]},
{"group": 2, "sales":[[2001, 50], [2002,55]]}]
Is it possible to do it automatically?
Thanks a lot
We can use data.table to paste the 'year' and 'sales' column grouped by 'group. We convert the 'data.frame' to 'data.table' (setDT(df1)). Group by 'group', we use sprintf to paste the 'year', 'sales' along with the parentheses ([]), then collapse the output to a single string with toString (it is a wrapper for paste(..., collapse=', ')), paste the [], and use toJSON.
library(jsonlite)
library(data.table)
toJSON(setDT(df1)[, list(sales= paste0('[',toString(sprintf('[%d,%d]',
year, sales)),']')), by = group])
#[{"group":1,"sales":"[[2000,20], [2001,25], [2002,23], [2003,30]]"},
#{"group":2,"sales":"[[2001,50], [2002,55]]"}]
The paste by group can be done using base R. We split the dataset by the 'group' column to create a list. Loop through the list with lapply, paste, the 'year', 'sales' column as mentioned above. Create a data.frame with the first element of 'group' and the string from the paste step, rbind the list elements to create a single data.frame and then use toJSON.
toJSON(
do.call(rbind,
lapply(
split(df1, df1$group),
function(x) data.frame(group=x$group[1L],
sales=paste0('[',
toString(sprintf('[%d,%d]', x$year, x$sales)),
']')))))
data
df1 <- structure(list(year = c(2000L, 2001L, 2002L, 2003L, 2001L, 2002L
), group = c(1L, 1L, 1L, 1L, 2L, 2L), sales = c(20L, 25L, 23L,
30L, 50L, 55L)), .Names = c("year", "group", "sales"),
class = "data.frame", row.names = c(NA, -6L))
Since the other answer uses data.table, I thought it would be a interesting exercise to try to do this in dplyr. This is not the optimal way but illustrates do which I'm not convinced is well enough documented. I have also shown the more appropriate summarise solution.
df <-read.table(textConnection('
year group sales expenses
2000 1 20 19
2001 1 25 19
2002 1 23 20
2003 1 30 15
2001 2 50 27
2002 2 55 30
'),header=TRUE)
library(dplyr)
library(jsonlite)
df %>%
group_by( group ) %>%
do(
sales = group_by(.,year) %>% select(sales) %>% apply(MARGIN=2,identity),
expenses = group_by(.,year) %>% select(expenses) %>% apply(MARGIN=2,identity)
)
df %>%
group_by( group ) %>%
summarise(
sales = list(apply( data.frame(year,sales), MARGIN=2, identity ))
,expenses = list(apply( data.frame(year,sales), MARGIN=2, identity ))
) %>% jsonlite::toJSON()

Resources