Create function in R to apply to multiple datasets - r

I have this code, recommended from a Stackoverflow user that works very well. I have
several datasets that I wish to apply this code to.
Would I have to continuously apply each dataset to the code, or is there something else that I can do? (Like store it in some sort of function?)
I have datsets
df1, df2, df3, df4. I do not wish to rbind these datasets.
Dput for each dataset:
structure(list(Date = structure(1:6, .Label = c("1/2/2020 5:00:00 PM",
"1/2/2020 5:30:01 PM", "1/2/2020 6:00:00 PM", "1/5/2020 7:00:01 AM",
"1/6/2020 8:00:00 AM", "1/6/2020 9:00:00 AM"), class = "factor"),
Duration = c(20L, 30L, 10L, 5L, 2L, 8L)), class = "data.frame", row.names = c(NA,
-6L))
CODE:
df %>%
group_by(Date = as.Date(dmy_hms(Date))) %>%
summarise(Total_Duration = sum(Duration), Count = n())
This is what I have been doing for each:(etc)
df1 %>%
group_by(Date = as.Date(dmy_hms(Date))) %>%
summarise(Total_Duration = sum(Duration), Count = n())
df2 %>%
group_by(Date = as.Date(dmy_hms(Date))) %>%
summarise(Total_Duration = sum(Duration), Count = n())
df3 %>%
group_by(Date = as.Date(dmy_hms(Date))) %>%
summarise(Total_Duration = sum(Duration), Count = n())
Is there a way to:
Store_code<-
df %>%
group_by(Date = as.Date(dmy_hms(Date))) %>%
summarise(Total_Duration = sum(Duration), Count = n())
and then apply each dataset easily to this code?
df1(Store_code)
df2(Store_code)
Any suggestion is appreciated.

We can use mget to return all the objects into a list, use map to loop over the list and apply the function
library(dplyr)
library(lubridate)
library(purrr)
f1 <- function(dat) {
dat %>%
group_by(Date = as.Date(dmy_hms(Date))) %>%
summarise(Total_Duration = sum(Duration), Count = n())
}
lst1 <- map(mget(ls(pattern = "^df\\d+$")), f1)
Here, we assume the column names are the same i.e. 'Date', 'Duration' in all the datasets. If it is a different one, then can pass as another argument to function
f2 <- function(dat, datecol, durationcol) {
dat %>%
group_by(Date = as.Date(dmy_hms({{datecol}}))) %>%
summarise(Total_Duration = sum({{durationcol}}), Count = n())
}
and apply the function as
f2(df1, Date, Duration)
Or in the loop
lst1 <- map(mget(ls(pattern = "^df\\d+$")), f2,
datecol = Date, durationcol = Duration)

Related

How to fill in missing value of a data.frame in R?

I have multiple columns that has missing values. I want to use the mean of the same day across all years while filling the missing data for each column. for example, DF is my fake data where I see missing values for the two columns (A & X)
library(lubridate)
library(tidyverse)
library(naniar)
set.seed(123)
DF <- data.frame(Date = seq(as.Date("1985-01-01"), to = as.Date("1987-12-31"), by = "day"),
A = sample(1:10,1095, replace = T), X = sample(5:15,1095, replace = T)) %>%
replace_with_na(replace = list(A = 2, X = 5))
To fill in Column A, i use the following code
Fill_DF_A <- DF %>%
mutate(Year = year(Date), Month = month(Date), Day = day(Date)) %>%
group_by(Year, Day) %>%
mutate(A = ifelse(is.na(A), mean(A, na.rm=TRUE), A))
I have many columns in my data.frame and I would like to generalize this for all the columns to fill in the missing value?
We can use na.aggregate from zoo
library(dplyr)
library(zoo)
DF %>%
mutate(Year = year(Date), Month = month(Date), Day = day(Date)) %>%
group_by(Year, Day) %>%
mutate(across(A:X, na.aggregate))
Or if we prefer to use conditional statements
DF %>%
mutate(Year = year(Date), Month = month(Date), Day = day(Date)) %>%
group_by(Year, Day) %>%
mutate(across(A:X, ~ case_when(is.na(.)
~ mean(., na.rm = TRUE), TRUE ~ as.numeric(.))))

how to make auto-separated years in a calendar with echarts4r

I'm trying to make calendar with echarts4r package.
library(tidyverse)
library(echarts4r)
dates <- seq.Date(as.Date("2017-01-01"), as.Date("2018-12-31"), by = "day")
values <- rnorm(length(dates), 20, 6)
year <- data.frame(date = dates, values = values)
year %>%
e_charts(date) %>%
e_calendar(range = "2017",top="40") %>%
e_calendar(range = "2018",top="260") %>%
e_heatmap(values, coord.system = "calendar") %>%
e_visual_map(max = 30) %>%
e_title("Calendar", "Heatmap")%>%
e_tooltip("item")
But this one didn't plot 2018 year.
How to make auto-separated years in a calendar?
Is any solution like fill from ggplot?
Expected output : this
The API is admittedly clunky and unintuitive but it is doable. You need to add the two calendars as you do already, reference their index in your e_heatmap function (so that the heatmaps is plotted against the correct calendar). Also, I use e_data in order to pass the values (x) for the second calendar. Make sure to adjust to position of the calendars so that they do not overlap (i.e.: top = 300).
dates18 <- seq.Date(as.Date("2018-01-01"), as.Date("2018-12-31"), by = "day")
dates17 <- seq.Date(as.Date("2017-01-01"), as.Date("2017-12-31"), by = "day")
values <- rnorm(length(dates18), 20, 6)
df <- data.frame(date18 = dates18, date17 = dates17, values = values)
df %>%
e_charts(date18) %>%
e_calendar(range = "2018") %>%
e_heatmap(values, coord.system = "calendar", calendarIndex = 0, name = "2018") %>%
e_data(df, date17) %>%
e_calendar(range = "2017", top = 300) %>%
e_heatmap(values, coord.system = "calendar", calendarIndex = 1, name = "2017") %>%
e_visual_map(max = 30)
Update
Since version 0.2.0 the above can be done by grouping the data by year which is much clearer and easier:
dates <- seq.Date(as.Date("2017-01-01"), as.Date("2018-12-31"), by = "day")
values <- rnorm(length(dates), 20, 6)
year <- data.frame(date = dates, values = values)
year %>%
dplyr::mutate(year = format(date, "%Y")) %>% # get year from date
group_by(year) %>%
e_charts(date) %>%
e_calendar(range = "2017",top="40") %>%
e_calendar(range = "2018",top="260") %>%
e_heatmap(values, coord_system = "calendar") %>%
e_visual_map(max = 30) %>%
e_title("Calendar", "Heatmap")%>%
e_tooltip("item")

filtering intraday data R

I'm trying to filter intraday-data to include only certain period inside the day. Is there a trick in some packages to achieve this. Here is example data:
library(tibbletime)
example <- as.tibble(data.frame(
date = ymd_hms(seq(as.POSIXct("2017-01-01 09:00:00"), as.POSIXct("2017-01-02 20:00:00"), by="min")),
value = rep(1, 2101)))
I would like to include only 10:00:00 - 18:35:00 for each day, but can't achieve this nicely. My solution for now has been creating extra indic columns and then filter by them, but it hasn't worked well either.
You can use the function between() from data.table
example[data.table::between(format(example$date, "%H:%M:%S"),
lower = "10:00:00",
upper = "18:35:00"), ]
library(tibbletime)
library(tidyverse)
library(lubridate)
example <- as.tibble(data.frame(
date = ymd_hms(seq(as.POSIXct("2017-01-01 09:00:00"), as.POSIXct("2017-01-02 20:00:00"), by="min")),
value = rep(1, 2101)))
example %>%
mutate(time = as.numeric(paste0(hour(date),".",minute(date)))) %>%
filter(time >= 10 & time <= 18.35) %>%
select(-time)
This is pretty hacky but if you really want to stay in the tidyverse:
rng <- range((hms("10:00:00") %>% as_datetime()), (hms("18:35:00") %>% as_datetime()))
example %>%
separate(., date, into = c("date", "time"), sep = " ") %>%
mutate(
time = hms(time) %>% as_datetime(),
date = as_date(date)
) %>%
filter(time > rng[1] & time < rng[2]) %>%
separate(., time, into = c("useless", "time"), sep = " ") %>%
select(-useless)

Expanding date to include all dates in range [duplicate]

I have a dataset that looks like this:
ID created_at
MUM-0001 2014-04-16
MUM-0002 2014-01-14
MUM-0003 2014-04-17
MUM-0004 2014-04-12
MUM-0005 2014-04-18
MUM-0006 2014-04-17
I am trying to introduce new column that would be all dates between start date and defined last day (say, 12th-july-2015). I used seq function in dplyr but getting an error.
data1 <- data1 %>%
arrange(ID) %>%
group_by(ID) %>%
mutate(date = seq(as.Date(created_at), as.Date('2015-07-12'), by= 1))
the error which I am getting is:
Error: incompatible size (453), expecting 1 (the group size) or 1
Can you please suggest some better way to perform this task in R ?
You could use data.table to get the sequence of Dates from 'created_at' to '2015-07-12', grouped by the 'ID' column.
library(data.table)
setDT(df1)[, list(date=seq(created_at, as.Date('2015-07-12'), by='1 day')) , ID]
If you need an option with dplyr, use do
library(dplyr)
df1 %>%
group_by(ID) %>%
do( data.frame(., Date= seq(.$created_at,
as.Date('2015-07-12'), by = '1 day')))
If you have duplicate IDs, then we may need to group by row_number()
df1 %>%
group_by(rn=row_number()) %>%
do(data.frame(ID= .$ID, Date= seq(.$created_at,
as.Date('2015-07-12'), by = '1 day'), stringsAsFactors=FALSE))
Update
Based on #Frank's commment, the new idiom for tidyverse is
library(tidyverse)
df1 %>%
group_by(ID) %>%
mutate(d = list(seq(created_at, as.Date('2015-07-12'), by='1 day')), created_at = NULL) %>%
unnest()
In the case of data.table
setDT(df1)[, list(date=seq(created_at,
as.Date('2015-07-12'), by = '1 day')), by = 1:nrow(df1)]
data
df1 <- structure(list(ID = c("MUM-0001", "MUM-0002", "MUM-0003",
"MUM-0004",
"MUM-0005", "MUM-0006"), created_at = structure(c(16176, 16084,
16177, 16172, 16178, 16177), class = "Date")), .Names = c("ID",
"created_at"), row.names = c(NA, -6L), class = "data.frame")

creating sequence of dates for each group in r

I have a dataset that looks like this:
ID created_at
MUM-0001 2014-04-16
MUM-0002 2014-01-14
MUM-0003 2014-04-17
MUM-0004 2014-04-12
MUM-0005 2014-04-18
MUM-0006 2014-04-17
I am trying to introduce new column that would be all dates between start date and defined last day (say, 12th-july-2015). I used seq function in dplyr but getting an error.
data1 <- data1 %>%
arrange(ID) %>%
group_by(ID) %>%
mutate(date = seq(as.Date(created_at), as.Date('2015-07-12'), by= 1))
the error which I am getting is:
Error: incompatible size (453), expecting 1 (the group size) or 1
Can you please suggest some better way to perform this task in R ?
You could use data.table to get the sequence of Dates from 'created_at' to '2015-07-12', grouped by the 'ID' column.
library(data.table)
setDT(df1)[, list(date=seq(created_at, as.Date('2015-07-12'), by='1 day')) , ID]
If you need an option with dplyr, use do
library(dplyr)
df1 %>%
group_by(ID) %>%
do( data.frame(., Date= seq(.$created_at,
as.Date('2015-07-12'), by = '1 day')))
If you have duplicate IDs, then we may need to group by row_number()
df1 %>%
group_by(rn=row_number()) %>%
do(data.frame(ID= .$ID, Date= seq(.$created_at,
as.Date('2015-07-12'), by = '1 day'), stringsAsFactors=FALSE))
Update
Based on #Frank's commment, the new idiom for tidyverse is
library(tidyverse)
df1 %>%
group_by(ID) %>%
mutate(d = list(seq(created_at, as.Date('2015-07-12'), by='1 day')), created_at = NULL) %>%
unnest()
In the case of data.table
setDT(df1)[, list(date=seq(created_at,
as.Date('2015-07-12'), by = '1 day')), by = 1:nrow(df1)]
data
df1 <- structure(list(ID = c("MUM-0001", "MUM-0002", "MUM-0003",
"MUM-0004",
"MUM-0005", "MUM-0006"), created_at = structure(c(16176, 16084,
16177, 16172, 16178, 16177), class = "Date")), .Names = c("ID",
"created_at"), row.names = c(NA, -6L), class = "data.frame")

Resources