Find missing date range in R - r

I have a dataframe of start and end dates, where each row represents a specific trip.
Those date ranges makeup a continuous timeline except around April where there is a discontinuity/lack of data (because no trips were taken).
I would like to find the start and end date of that specific period? (using a tidy approach preferably)
library(tidyverse)
df<- data.frame(start = as.Date(c("2022-01-03", "2022-01-18", "2022-01-31", "2022-03-01" ,"2022-03-08", "2022-03-09", "2022-04-15",
"2022-04-20", "2022-04-20","2022-05-03", "2022-05-17", "2022-05-17", "2022-05-31", "2022-06-05", "2022-06-22" ,"2022-06-28", "2022-07-11")),
end = as.Date(c("2022-01-18","2022-01-31", "2022-03-01" ,"2022-03-08" ,"2022-03-09", "2022-03-25", "2022-04-20" ,"2022-04-20", "2022-05-03",
"2022-05-17" ,"2022-05-17", "2022-05-31", "2022-06-05" ,"2022-06-22" ,"2022-06-28" ,"2022-07-11", "2022-07-17"))) %>%
mutate(trip_number = as.character(row_number()))
df %>%
ggplot()+
geom_segment(aes(x = start, xend = end, y =0, yend= 0, col = trip_number))+
theme(legend.position = "none")
Created on 2022-07-17 by the reprex package (v2.0.1)

A possible solution:
library(tidyverse)
library(lubridate)
df %>%
mutate(date1 = if_else(start == lag(end), NA_Date_, lag(end)),
date2 = if_else(start == lag(end), NA_Date_, start)) %>%
bind_rows(tibble(start = .$date1, end = .$date2)) %>%
filter(!if_all(everything(), is.na)) %>%
arrange(start) %>%
select(!starts_with("date"))
#> start end trip_number
#> 1 2022-01-03 2022-01-18 1
#> 2 2022-01-18 2022-01-31 2
#> 3 2022-01-31 2022-03-01 3
#> 4 2022-03-01 2022-03-08 4
#> 5 2022-03-08 2022-03-09 5
#> 6 2022-03-09 2022-03-25 6
#> 7 2022-03-25 2022-04-15 <NA>
#> 8 2022-04-15 2022-04-20 7
#> 9 2022-04-20 2022-04-20 8
#> 10 2022-04-20 2022-05-03 9
#> 11 2022-05-03 2022-05-17 10
#> 12 2022-05-17 2022-05-17 11
#> 13 2022-05-17 2022-05-31 12
#> 14 2022-05-31 2022-06-05 13
#> 15 2022-06-05 2022-06-22 14
#> 16 2022-06-22 2022-06-28 15
#> 17 2022-06-28 2022-07-11 16
#> 18 2022-07-11 2022-07-17 17

Related

Map returns lists instead of dates

I have a tibble column containing dates, and I want to match them to the nearest higher date in a list of dates. Therefore I wrote a function matchDate, which I want to call with map:
library(tidyverse)
d1 = as.Date("2022-01-01")
d2 = as.Date("2022-12-31")
matchDate = function(date,matchDates,...){
matchDates[matchDates >= date] %>% min %>% return
}
df = tibble(date = seq.Date(from=d1,to=d2,by='months'))
dates = seq.Date(from=d2,to=d1,by='-3 months')
df$match = map(df$date,~matchDate(.x,matchDates=dates))
typeof(matchDate(d1,dates))
#> [1] "double"
df
#> # A tibble: 12 x 2
#> date match
#> <date> <list>
#> 1 2022-01-01 <date [1]>
#> 2 2022-02-01 <date [1]>
#> 3 2022-03-01 <date [1]>
#> 4 2022-04-01 <date [1]>
#> 5 2022-05-01 <date [1]>
#> 6 2022-06-01 <date [1]>
#> ...
Created on 2022-05-30 by the reprex package (v2.0.1)
Not sure why typeof returns double here, but the function works fine and map also returns the right dates. The only thing is, it wrappes them into a list (of length one). I tried to add unlist at different places in my code, but it didn't change anything. Can someone explain what is going on, or how to unlist correctly? Many thanks!
We may flatten it- by default map returns a list if we don't specify the suffix for column types i.e. _int, _dbl _chr etc to return a vector. With Date class it is a bit complicated as the storage mode is integer/double and this could coerce to its integer storage values
library(purrr)
library(dplyr)
df$match <- map(df$date,~matchDate(.x,matchDates=dates)) %>%
invoke(c,.)
-output
> df
# A tibble: 12 × 2
date match
<date> <date>
1 2022-01-01 2022-03-31
2 2022-02-01 2022-03-31
3 2022-03-01 2022-03-31
4 2022-04-01 2022-07-01
5 2022-05-01 2022-07-01
6 2022-06-01 2022-07-01
7 2022-07-01 2022-07-01
8 2022-08-01 2022-10-01
9 2022-09-01 2022-10-01
10 2022-10-01 2022-10-01
11 2022-11-01 2022-12-31
12 2022-12-01 2022-12-31
With base R, we can use do.call with c
do.call("c", map(df$date,~matchDate(.x,matchDates=dates)))
[1] "2022-03-31" "2022-03-31" "2022-03-31" "2022-07-01" "2022-07-01" "2022-07-01" "2022-07-01" "2022-10-01" "2022-10-01" "2022-10-01"
[11] "2022-12-31" "2022-12-31"

Is there a way to group data according to time in R?

I'm working with trip ticket data and it includes a column with dates and times. I'm want to group trips according to Morning(05:00 - 10:59), Lunch(11:00-12:59), Afternoon(13:00-17:59), Evening(18:00-23:59), and Dawn/Graveyard(00:00-04:59) and then count the number of trips (by means of counting the unique values in the trip_id column) for each of those categories.
Only I don't know how to group/summarize according to time values. Is this possible in R?
trip_id start_time end_time day_of_week
1 CFA86D4455AA1030 2021-03-16 08:32:30 2021-03-16 08:36:34 Tuesday
2 30D9DC61227D1AF3 2021-03-28 01:26:28 2021-03-28 01:36:55 Sunday
3 846D87A15682A284 2021-03-11 21:17:29 2021-03-11 21:33:53 Thursday
4 994D05AA75A168F2 2021-03-11 13:26:42 2021-03-11 13:55:41 Thursday
5 DF7464FBE92D8308 2021-03-21 09:09:37 2021-03-21 09:27:33 Sunday
Here's a solution with hour() and case_when().
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
trip <- tibble(start_time = mdy_hm("1/1/2022 1:00") + minutes(seq(0, 700, 15)))
trip <- trip %>%
mutate(
hr = hour(start_time),
time_of_day = case_when(
hr >= 5 & hr < 11 ~ "morning",
hr >= 11 & hr < 13 ~ "afternoon",
TRUE ~ "fill in the rest yourself :)"
)
)
print(trip)
#> # A tibble: 47 x 3
#> start_time hr time_of_day
#> <dttm> <int> <chr>
#> 1 2022-01-01 01:00:00 1 fill in the rest yourself :)
#> 2 2022-01-01 01:15:00 1 fill in the rest yourself :)
#> 3 2022-01-01 01:30:00 1 fill in the rest yourself :)
#> 4 2022-01-01 01:45:00 1 fill in the rest yourself :)
#> 5 2022-01-01 02:00:00 2 fill in the rest yourself :)
#> 6 2022-01-01 02:15:00 2 fill in the rest yourself :)
#> 7 2022-01-01 02:30:00 2 fill in the rest yourself :)
#> 8 2022-01-01 02:45:00 2 fill in the rest yourself :)
#> 9 2022-01-01 03:00:00 3 fill in the rest yourself :)
#> 10 2022-01-01 03:15:00 3 fill in the rest yourself :)
#> # ... with 37 more rows
trips <- trip %>%
count(time_of_day)
print(trips)
#> # A tibble: 3 x 2
#> time_of_day n
#> <chr> <int>
#> 1 afternoon 7
#> 2 fill in the rest yourself :) 16
#> 3 morning 24
Created on 2022-03-21 by the reprex package (v2.0.1)

Expand dataset by count column in Dplyr

I have a dataset as follows:
library(tidyverse)
df <- data.frame(
report_date = c("2020-03-14", "2020-03-14", "2020-03-19", "2020-03-20"),
start_date = c("2020-03-06", "2020-03-10", "2020-03-11", "2020-03-11"),
count = c(1, 2, 1, 3)
)
Looking like:
report_date start_date count
1 2020-03-14 2020-03-06 1
2 2020-03-14 2020-03-10 2
3 2020-03-19 2020-03-11 1
4 2020-03-20 2020-03-11 3
I want to perform a transformation using the value count - aka - repeating each row n times as in count for starting row.
I think it's clear if I show the desired result as follows:
df_final <- data.frame(
report_date = c("2020-03-14", "2020-03-14", "2020-03-14", "2020-03-19",
"2020-03-20", "2020-03-20", "2020-03-20"),
start_date = c("2020-03-06", "2020-03-10", "2020-03-10", "2020-03-11",
"2020-03-11", "2020-03-11", "2020-03-11"),
count = c(1, 1, 1, 1, 1, 1, 1)
)
report_date start_date count
1 2020-03-14 2020-03-06 1
2 2020-03-14 2020-03-10 1
3 2020-03-14 2020-03-10 1
4 2020-03-19 2020-03-11 1
5 2020-03-20 2020-03-11 1
6 2020-03-20 2020-03-11 1
7 2020-03-20 2020-03-11 1
Thanks!
We may use uncount to replicate and then create the 'count'
library(dplyr)
library(tidyr)
df %>%
uncount(count) %>%
mutate(count = 1)
-output
report_date start_date count
1 2020-03-14 2020-03-06 1
2 2020-03-14 2020-03-10 1
3 2020-03-14 2020-03-10 1
4 2020-03-19 2020-03-11 1
5 2020-03-20 2020-03-11 1
6 2020-03-20 2020-03-11 1
7 2020-03-20 2020-03-11 1

Getting months as numerical value in R

I created this for loop to iterate through a list of student records (SU_students) and get the difference between the enrollment begin and enrollment end dates in a new column called "enroll_months".
I'm using the interval() function from lubridate library and when I use it outside of the loop on a single value of two dates it returns numerical value which is what I'm looking for; to have the months as a numerical value in column in the data frame.
for (row in 1:nrow(SU_students)){
SU_students$enroll_months[row] <- interval(Enrollment_Begin[row], Enrollment_End[row]) %/% months(1)
}
Assuming your SU_students is the same length as Enrollment_Begin and Enrollment_end, you can do this all within a data.frame. I have found it easier to use lubridate::time_length() as it feels more intuitive and easier to parameterize if I start changing things.
These functions are vectorized so there's no need for the for loop to iterate over the elements.
set.seed(42)
df <- data.frame(
SU_students = letters[1:10],
Enrollment_Begin = as.Date("2021-10-04") + runif(10, -1, 1) * 100,
Enrollment_End = as.Date("2021-10-04") + runif(10, -1, 1) * 100
)
df$enroll_months <- lubridate::time_length(lubridate::interval(df$Enrollment_Begin, df$Enrollment_End), "months")
df
#> SU_students Enrollment_Begin Enrollment_End enroll_months
#> 1 a 2021-12-25 2021-09-25 -3.0133179
#> 2 b 2021-12-30 2021-11-16 -1.4384720
#> 3 c 2021-08-22 2021-12-29 4.2485981
#> 4 d 2021-12-09 2021-08-16 -3.7743148
#> 5 e 2021-11-01 2021-09-26 -1.1630180
#> 6 f 2021-10-07 2021-12-31 2.7478618
#> 7 g 2021-11-20 2022-01-07 1.5912136
#> 8 h 2021-07-22 2021-07-19 -0.1145282
#> 9 i 2021-11-04 2021-09-28 -1.1799681
#> 10 j 2021-11-14 2021-10-16 -0.9337551
Created on 2021-10-04 by the reprex package (v2.0.1)

Creating sequence of dates in a column in a dataframe in R

I am having a data frame in R as follows:
df <- data.frame("Type" = c("Item A","Item B"), "Frequency" = c("Quarterly","Other"), "Date" = as.Date(c("2021-02-05","2021-05-05")),"endDate" = as.Date("2021-12-12"), stringsAsFactors = F)
I am trying to generate the sequence of dates between Date and endDate as each row. I am using the code below to generate the sequence
df <- df %>%
dplyr::mutate(id = 1:nrow(df),deliveryDate = ifelse(
df$Frequency == "Quarterly", list(seq(as.Date(df$Date), as.Date(df$endDate), by = "3 month")),
ifelse(df$Frequency == "Monthly", list(seq(as.Date(df$Date), as.Date(df$endDate), by = "month")),
ifelse(df$Frequency %in% c("Other"),list(seq(as.Date(df$Date), as.Date(df$Date), by = "month")),df$Date)))) %>%
tidyr::unnest(deliveryDate) %>%
dplyr::group_by(Type) %>%
dplyr::mutate(deliveryNumber = row_number()) %>%
dplyr::select(deliveryNumber,Type, Frequency, deliveryDate) %>%
TO be more descriptive, the sequence of date will be generated based on the frequency of the type. So to handle that case, I used dplyr::mutate().
But I am getting an error as follows:
Error: Problem with `mutate()` input `deliveryDate`.
x 'from' must be of length 1
ℹ Input `deliveryDate` is `ifelse(...)`.
Can anyone help me solve this issue in R? Thanks in advance!!!
You should consider a named vector:
library(tidyverse)
vec<-c(Quarterly = "3 months", Other = "month")
df %>%
rowwise() %>%
mutate(deliveryDate = list(seq(Date,endDate, by = vec[Frequency]))) %>%
unnest(deliveryDate)
# A tibble: 12 x 5
Type Frequency Date endDate deliveryDate
<chr> <chr> <date> <date> <date>
1 Item A Quarterly 2021-02-05 2021-12-12 2021-02-05
2 Item A Quarterly 2021-02-05 2021-12-12 2021-05-05
3 Item A Quarterly 2021-02-05 2021-12-12 2021-08-05
4 Item A Quarterly 2021-02-05 2021-12-12 2021-11-05
5 Item B Other 2021-05-05 2021-12-12 2021-05-05
6 Item B Other 2021-05-05 2021-12-12 2021-06-05
7 Item B Other 2021-05-05 2021-12-12 2021-07-05
8 Item B Other 2021-05-05 2021-12-12 2021-08-05
9 Item B Other 2021-05-05 2021-12-12 2021-09-05
10 Item B Other 2021-05-05 2021-12-12 2021-10-05
11 Item B Other 2021-05-05 2021-12-12 2021-11-05
12 Item B Other 2021-05-05 2021-12-12 2021-12-05
Use complete
df %>% group_by(Type) %>% mutate(DeliveryDate = Date,
Frequency = case_when(Frequency %in% "Quarterly"~ "quarter",
Frequency %in% "Monthly" ~ "month",
Frequency %in% "Weekly" ~ "week",
TRUE ~ "month")) %>%
complete(DeliveryDate = seq.Date(Date, endDate, by = Frequency)) %>%
fill(Frequency, Date, endDate)
# A tibble: 12 x 5
# Groups: Type [2]
Type DeliveryDate Frequency Date endDate
<chr> <date> <chr> <date> <date>
1 Item A 2021-02-05 quarter 2021-02-05 2021-12-12
2 Item A 2021-05-05 quarter 2021-02-05 2021-12-12
3 Item A 2021-08-05 quarter 2021-02-05 2021-12-12
4 Item A 2021-11-05 quarter 2021-02-05 2021-12-12
5 Item B 2021-05-05 month 2021-05-05 2021-12-12
6 Item B 2021-06-05 month 2021-05-05 2021-12-12
7 Item B 2021-07-05 month 2021-05-05 2021-12-12
8 Item B 2021-08-05 month 2021-05-05 2021-12-12
9 Item B 2021-09-05 month 2021-05-05 2021-12-12
10 Item B 2021-10-05 month 2021-05-05 2021-12-12
11 Item B 2021-11-05 month 2021-05-05 2021-12-12
12 Item B 2021-12-05 month 2021-05-05 2021-12-12
Here's one way. It's not clear what you want for "Other" as opposed to "month", so I set it here to "week".
Note that you don't need to reference the data frame when using mutate() since everything called in the function is set to the environment of the data frame. Also, look into using case_when() instead of using nested ifelse() calls.
library(tidyverse)
df %>%
mutate(Frequency2 = case_when(Frequency == "Quarterly" ~ "3 month",
Frequency == "Month" ~ "month",
TRUE ~ "week")) %>%
group_by(Type, Frequency2) %>%
nest() %>%
mutate(middates = map2(data, Frequency2, ~ seq.Date(min(.x$Date), max(.x$endDate), by = .y))) %>%
unnest(c(data, middates)) %>%
ungroup()
# A tibble: 36 x 6
Type Frequency Frequency2 Date endDate middates
<chr> <chr> <chr> <date> <date> <date>
1 Item A Quarterly 3 month 2021-02-05 2021-12-12 2021-02-05
2 Item A Quarterly 3 month 2021-02-05 2021-12-12 2021-05-05
3 Item A Quarterly 3 month 2021-02-05 2021-12-12 2021-08-05
4 Item A Quarterly 3 month 2021-02-05 2021-12-12 2021-11-05
5 Item B Other week 2021-05-05 2021-12-12 2021-05-05
6 Item B Other week 2021-05-05 2021-12-12 2021-05-12
7 Item B Other week 2021-05-05 2021-12-12 2021-05-19
8 Item B Other week 2021-05-05 2021-12-12 2021-05-26
9 Item B Other week 2021-05-05 2021-12-12 2021-06-02
10 Item B Other week 2021-05-05 2021-12-12 2021-06-09
# ... with 26 more rows

Resources