15 minutes to hours in R - r

I have a time series with 15 minutes intervals.
I would like to change it into 1 hour interval using R. So the results of the measurements will be added together as well.
Could you please help me with this?
And is it possible to change it after that from hours to month?
The data frame is as below:
timestamp (UTC) value
2020-06-11 22:15:00 5,841
2020-06-11 22:30:00 5,719
2020-06-11 22:45:00 5,841
2020-06-11 23:00:00 5,841
2020-06-11 23:15:00 5,597
2020-06-11 23:30:00 5,232
2020-06-11 23:45:00 5,476
2020-06-12 0:00:00 4,259
2020-06-12 0:15:00 0,243
2020-06-12 0:30:00 0,243
2020-06-12 0:45:00 0,365
2020-06-12 1:00:00 0,243

Depending on how you count, every 15 mins after an hour belongs to the next, you can use lubridate::ceiling_date (22:15 => 23:00), if it belongs to the same hour, use lubridate::floor_date (22:15 => 22:00).
library(dplyr)
library(lubridate)
# option 1
df1 %>%
mutate(timestamp = ceiling_date(timestamp, unit = "hour")) %>%
group_by(timestamp) %>%
summarise(value = sum(value))
# A tibble: 3 × 2
timestamp value
<dttm> <dbl>
1 2020-06-11 23:00:00 23.2
2 2020-06-12 00:00:00 20.6
3 2020-06-12 01:00:00 1.09
#option 2
df1 %>%
mutate(timestamp = floor_date(timestamp, unit = "hour")) %>%
group_by(timestamp) %>%
summarise(value = sum(value))
# A tibble: 4 × 2
timestamp value
<dttm> <dbl>
1 2020-06-11 22:00:00 17.4
2 2020-06-11 23:00:00 22.1
3 2020-06-12 00:00:00 5.11
4 2020-06-12 01:00:00 0.243
data:
df1 <- structure(list(timestamp = structure(c(1591906500, 1591907400,
1591908300, 1591909200, 1591910100, 1591911000, 1591911900, 1591912800,
1591913700, 1591914600, 1591915500, 1591916400), class = c("POSIXct",
"POSIXt"), tzone = ""), value = c(5.841, 5.719, 5.841, 5.841,
5.597, 5.232, 5.476, 4.259, 0.243, 0.243, 0.365, 0.243)), row.names = c(NA,
-12L), class = "data.frame")

Related

Averaging weekly data into monthly data on 2 variables

I have two columns with start and end dates of every week. I need to aggregate other column on monthly basis by the mean of the weeks of particular month (I have 3 years in dataset) and create another column that will contain weight for the whole month (so it will be the same value for 5-6 weeks, depending how many weeks particular month have for particular ID (I have thousands of id's in dataset). Tricky part is that some of the weeks are overlapping, so that one row sometimes but be taken into calculation of both months eg. when we have start_date = 2020-07-27 and end_date = 2020-08-09 (It has to be taken both to July and August month).
This is my data:
ID
weight
start_date
end_date
60
1,2
2019-12-30
2020-01-05
60
1,4
2020-01-06
2020-01-12
60
1,3
2020-01-13
2020-01-19
60
1,0
2020-01-20
2020-01-26
60
3,8
2020-01-27
2020-02-02
61
1,7
2019-12-30
2020-01-05
61
12,9
2020-01-06
2020-01-12
I want to obtain:
ID
weight
start_date
end_date
Monthy_weight
Month
60
1,2
2020-12-30
2020-01-05
1,74
01.2020
60
1,4
2020-01-06
2020-01-12
1,74
01.2020
60
1,3
2020-01-13
2020-01-19
1,74
01.2020
60
1,0
2020-01-20
2020-01-26
1,74
01.2020
60
3,8
2020-01-27
2020-02-02
1,74
01.2020
61
1,7
2020-12-30
2020-01-05
7,3
01.2020
61
12,9
2020-01-06
2020-01-12
7,3
01.2020
Firstly I wanted to do a loop that will detect every month in both columns and if the month appears, it will take the mean from other column, but then I found similar problem on stack overflow (How to convert weekly data into monthly data?) and decided to do it with zoo.
I tried to implement solution from the above post:
library(zoo)
z.st <- read.zoo(long_weights[c("start_date", "weight")])
z.en <- read.zoo(long_weights[c("end_date", "weight")])
z <- c(z.st, z.en)
g <- zoo(, seq(start(z), end(z), "day"))
m <- na.locf(merge(z, g))
aggregate(m, as.yearmon, mean)
but after this line:
z <- c(z.st, z.en)
Im obtaining an error: Error in bind.zoo(...) : indexes overlap
I also tried, but this not takes into consideration overlapping weeks:
df <- df %>% group_by(HHKEY, month = floor_date((as.Date(end_date)- as.Date(start_date))/2 + as.Date(start_date), "month")) %>% mutate(monthly_weight = mean(weight), .after = end_date, month = format(month, "%Y.%m")) %>% ungroup()
A possible solution may be to get the start_date per month when they differ (at the end of a month) as end date for the grouping variable month. Extended the data to include a year change within an ID.
library(dplyr)
df %>%
group_by(ID) %>%
mutate(start_date = as.Date(start_date), end_date = as.Date(end_date),
month = lead(format(start_date, "%m.%Y")),
month = if_else(is.na(month),
format(start_date, "%m.%Y"), format(end_date, "%m.%Y"))) %>%
group_by(ID, month) %>%
mutate(monthly_weight = mean(weight), .before=month) %>%
ungroup()
# A tibble: 14 × 6
ID weight start_date end_date monthly_weight month
<dbl> <dbl> <date> <date> <dbl> <chr>
1 60 1.2 2019-12-30 2020-01-05 1.74 01.2020
2 60 1.4 2020-01-06 2020-01-12 1.74 01.2020
3 60 1.3 2020-01-13 2020-01-19 1.74 01.2020
4 60 1 2020-01-20 2020-01-26 1.74 01.2020
5 60 3.8 2020-01-27 2020-02-02 1.74 01.2020
6 61 1.7 2019-12-30 2020-01-05 7.3 01.2020
7 61 12.9 2020-01-06 2020-01-12 7.3 01.2020
8 61 1.2 2020-12-29 2021-01-04 1.74 01.2021
9 61 1.4 2021-01-05 2021-01-11 1.74 01.2021
10 61 1.3 2021-01-12 2021-01-18 1.74 01.2021
11 61 1 2021-01-19 2021-01-25 1.74 01.2021
12 61 3.8 2021-01-26 2021-02-01 1.74 01.2021
13 63 1.7 2020-12-29 2021-01-04 7.3 01.2021
14 63 12.9 2021-01-05 2021-01-11 7.3 01.2021
extended data
df <- structure(list(ID = c(60, 60, 60, 60, 60, 61, 61, 61, 61, 61,
61, 61, 63, 63), weight = c(1.2, 1.4, 1.3, 1, 3.8, 1.7, 12.9,
1.2, 1.4, 1.3, 1, 3.8, 1.7, 12.9), start_date = structure(c(18260,
18267, 18274, 18281, 18288, 18260, 18267, 18625, 18632, 18639,
18646, 18653, 18625, 18632), class = "Date"), end_date = structure(c(18266,
18273, 18280, 18287, 18294, 18266, 18273, 18631, 18638, 18645,
18652, 18659, 18631, 18638), class = "Date")), row.names = c(NA,
-14L), class = c("tbl_df", "tbl", "data.frame"))

How to add rows in a dataset based on conditions in R

I have dataset where length of stay of booking going in two or three month i want to create a row for every such bookings where revenue will be divided for every month and remaining information about the booking will remain same. if a booking length is in same month then it will show that as it is.
structure(list(channel = c("109", "109", "Agent"), room_stay_status = c("ENQUIRY",
"ENQUIRY", "CHECKED_OUT"), start_date = structure(c(1637971200,
1640995200, 1640995200), tzone = "UTC", class = c("POSIXct",
"POSIXt")), end_date = structure(c(1643155200, 1642636800, 1641168000
), tzone = "UTC", class = c("POSIXct", "POSIXt")), los = c(60,
19, 2), booker = c("Anuj", "Anuj", "Anuj"), area = c("Goa", "Goa",
"Goa"), property_sku = c("Amna-3b", "Amna-3b", "Amna-3b"), Revenue = c(90223.666,
5979, 7015.9), Booking_ref = c("aed97", "b497h9", "bde65")), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
output should look like this
structure(list(channel = c("109", "109", "109", "109", "Agent"
), room_stay_status = c("ENQUIRY", "ENQUIRY", "ENQUIRY", "ENQUIRY",
"CHECKED_OUT"), start_date = structure(c(1637971200, 1638316800,
1640995200, 1640995200, 1640995200), tzone = "UTC", class = c("POSIXct",
"POSIXt")), end_date = structure(c(1638230400, 1640908800, 1643155200,
1642636800, 1641168000), tzone = "UTC", class = c("POSIXct",
"POSIXt")), los = c(4, 31, 25, 19, 2), booker = c("Anuj", "Anuj",
"Anuj", "Anuj", "Anuj"), area = c("Goa", "Goa", "Goa", "Goa",
"Goa"), property_sku = c("Amna-3b", "Amna-3b", "Amna-3b", "Amna-3b",
"Amna-3b"), Revenue = c(6014.91106666667, 46615.5607666667, 37593.1941666667,
5979, 7015.9), Booking_ref = c("aed97", "aed97", "aed97", "b497h9",
"bde65")), row.names = c(NA, -5L), class = c("tbl_df", "tbl",
"data.frame"))
Many thanks in advance.
An quick attempt here (assuming your data is named df_in and df_out) which seems to do the trick:
library("dplyr")
library("tidyr")
library("lubridate")
# Function for creating a vector from start (st) to end (nd) with intermediate
# months inside
cut_months <- function(st, nd) {
repeat {
# Grow vector, keep adding next month
next_month <- ceiling_date(tail(st, 1) + seconds(1), "month")
if (next_month > nd) {
st <- append(st, nd)
break
} else {
st <- append(st, next_month)
}
}
return(st)
}
# Let's try it
print(cut_months(df_in$start_date[1], df_in$end_date[2]))
# [1] "2021-11-27 01:00:00 CET" "2021-12-01 01:00:00 CET" "2022-01-01 00:00:00 CET" "2022-01-20 01:00:00 CET"
# Function for expanding months:
expand_months <- function(df) {
expand_rows <-
df %>%
# Expand months and unnest list-column
mutate(key_dates = mapply(cut_months, start_date, end_date)) %>%
select(-start_date, -end_date) %>%
unnest(key_dates) %>%
# Compute needed values
group_by(Booking_ref) %>%
arrange(Booking_ref, key_dates) %>%
mutate(
start_date = key_dates,
end_date = lead(key_dates),
los = as.numeric(as.duration(start_date %--% end_date), "days"), # Ceiling this?
Revenue = Revenue * los / sum(los, na.rm = TRUE)
) %>%
arrange(Booking_ref, start_date) %>%
# Clean-up
filter(!is.na(end_date)) %>%
select(-key_dates)
expand_rows
}
# Print results and compare:
expand_months(df_in)
## A tibble: 5 x 10
## Groups: Booking_ref [3]
#channel room_stay_status los booker area property_~1 Revenue Booki~2 start_date end_date
#<chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr> <dttm> <dttm>
#1 109 ENQUIRY 4 Anuj Goa Amna-3b 6015. aed97 2021-11-27 01:00:00 2021-12-01 01:00:00
#2 109 ENQUIRY 31.0 Anuj Goa Amna-3b 46553. aed97 2021-12-01 01:00:00 2022-01-01 00:00:00
#3 109 ENQUIRY 25.0 Anuj Goa Amna-3b 37656. aed97 2022-01-01 00:00:00 2022-01-26 01:00:00
#4 109 ENQUIRY 19 Anuj Goa Amna-3b 5979 b497h9 2022-01-01 01:00:00 2022-01-20 01:00:00
#5 Agent CHECKED_OUT 2 Anuj Goa Amna-3b 7016. bde65 2022-01-01 01:00:00 2022-01-03 01:00:00
## ... with abbreviated variable names 1: property_sku, 2: Booking_ref
df_out
## A tibble: 5 x 10
#channel room_stay_status start_date end_date los booker area property_~1 Revenue Booki~2
#<chr> <chr> <dttm> <dttm> <dbl> <chr> <chr> <chr> <dbl> <chr>
#1 109 ENQUIRY 2021-11-27 00:00:00 2021-11-30 00:00:00 4 Anuj Goa Amna-3b 6015. aed97
#2 109 ENQUIRY 2021-12-01 00:00:00 2021-12-31 00:00:00 31 Anuj Goa Amna-3b 46616. aed97
#3 109 ENQUIRY 2022-01-01 00:00:00 2022-01-26 00:00:00 25 Anuj Goa Amna-3b 37593. aed97
#4 109 ENQUIRY 2022-01-01 00:00:00 2022-01-20 00:00:00 19 Anuj Goa Amna-3b 5979 b497h9
#5 Agent CHECKED_OUT 2022-01-01 00:00:00 2022-01-03 00:00:00 2 Anuj Goa Amna-3b 7016. bde65
## ... with abbreviated variable names 1: property_sku, 2: Booking_ref
I do not understand entirely how you distribute the Revenue. Consider that left as an exercise to fix :).
Hint: you need a ceiling() around the computation of the new los which computes decimal days.
Using solution from this post to split date:
df2 <- df %>%
group_by(id = row_number()) %>% # for each row
mutate(seq = list(seq(start_date, end_date, "day")), # create a sequence of dates with 1 day step
month = map(seq, month)) %>% # get the month for each one of those dates in sequence
unnest() %>% # unnest data
group_by(Booking_ref, id, month) %>% # for each group, row and month
summarise(start_date = min(seq), # get minimum date as start
end_date = max(seq)) %>% # get maximum date as end
ungroup() %>% # ungroup
select(-id, - month)%>%
group_by(Booking_ref)%>%
mutate(last_date=max(end_date)) # get last_date to determine los
df3 <- merge(df2,df%>%select(-start_date,-end_date),by=c('Booking_ref'),all.x=T)%>%
mutate(timespam=end_date-start_date)%>%
mutate(los2=as.numeric(case_when(last_date==end_date~timespam,
T~timespam+1)),
Revenue2=Revenue*los2/los)
out_df <- df3%>%
select(-Revenue,-los,-timespam,-last_date)%>%
rename(Revenue=Revenue2,
los=los2)
> out_df
Booking_ref start_date end_date channel room_stay_status booker area property_sku los Revenue
1 aed97 2022-01-01 2022-01-26 109 ENQUIRY Anuj Goa Amna-3b 25 37593.194
2 aed97 2021-11-27 2021-11-30 109 ENQUIRY Anuj Goa Amna-3b 4 6014.911
3 aed97 2021-12-01 2021-12-31 109 ENQUIRY Anuj Goa Amna-3b 31 46615.561
4 b497h9 2022-01-01 2022-01-20 109 ENQUIRY Anuj Goa Amna-3b 19 5979.000
5 bde65 2022-01-01 2022-01-03 Agent CHECKED_OUT Anuj Goa Amna-3b 2 7015.900

adding a column to specify duration of event based on dates

I have a dataframe where i have to columns that represent the start of an event and the planned end of the event
What is the best way to add a column in which i could see the duration in days of the event in the dataframe ?
Another alternative would be to directly create a new dataset from it by using the group_by function on which i could see for each day the average duration of a campaign, but it seems too complicated
structure(list(launched_at = c("03/26/2021", "03/24/2021", "01/05/2021",
"02/17/2021", "02/15/2021", "02/25/2021"), deadline = c("04/25/2021",
"04/08/2021", "01/17/2021", "03/03/2021", "03/01/2021", "04/26/2021"
)), row.names = c(NA, 6L), class = "data.frame")
We could use mdy function from lubridate package:
library(lubridate)
library(dplyr)
df %>%
mutate(across(, mdy), # this line only if your dates are not in date format
duration_days = as.integer(deadline - launched_at))
launched_at deadline duration_days
1 2021-03-26 2021-04-25 30
2 2021-03-24 2021-04-08 15
3 2021-01-05 2021-01-17 12
4 2021-02-17 2021-03-03 14
5 2021-02-15 2021-03-01 14
6 2021-02-25 2021-04-26 60
One option
as.POSIXct(df$deadline,tz="UTC",format="%m/%d/%y")-
as.POSIXct(df$launched_at,tz="UTC",format="%m/%d/%y")
Time differences in days
[1] 30 15 12 15 15 61
If you're looking for duration between 'launched_at' and 'deadline',
library(dplyr)
df %>%
mutate(launched_at = as.Date(launched_at, "%m/%d/%Y"),
deadline = as.Date(deadline, "%m/%d/%Y"),
duration = deadline - launched_at)
launched_at deadline duration
1 2021-03-26 2021-04-25 30 days
2 2021-03-24 2021-04-08 15 days
3 2021-01-05 2021-01-17 12 days
4 2021-02-17 2021-03-03 14 days
5 2021-02-15 2021-03-01 14 days
6 2021-02-25 2021-04-26 60 days
more concise way(#Darren Tsai)
df %>%
mutate(across(c(launched_at, deadline), as.Date, "%m/%d/%Y"),
duration = deadline - launched_at)
You can use the built-in functions within and as.Date:
df = within(df, {
launched_at = as.Date(launched_at, "%m/%d/%y")
deadline = as.Date(deadline, "%m/%d/%y")
duration = deadline-launched_at})
launched_at deadline duration
1 2020-03-26 2020-04-25 30 days
2 2020-03-24 2020-04-08 15 days
3 2020-01-05 2020-01-17 12 days
4 2020-02-17 2020-03-03 15 days
5 2020-02-15 2020-03-01 15 days
6 2020-02-25 2020-04-26 61 days
Another option using difftime:
df <- structure(list(launched_at = c("03/26/2021", "03/24/2021", "01/05/2021",
"02/17/2021", "02/15/2021", "02/25/2021"), deadline = c("04/25/2021",
"04/08/2021", "01/17/2021", "03/03/2021", "03/01/2021", "04/26/2021"
)), row.names = c(NA, 6L), class = "data.frame")
df$duration <- with(df, difftime(as.Date(deadline, "%m/%d/%Y"), as.Date(launched_at, "%m/%d/%Y"), units = c("days")))
df
#> launched_at deadline duration
#> 1 03/26/2021 04/25/2021 30 days
#> 2 03/24/2021 04/08/2021 15 days
#> 3 01/05/2021 01/17/2021 12 days
#> 4 02/17/2021 03/03/2021 14 days
#> 5 02/15/2021 03/01/2021 14 days
#> 6 02/25/2021 04/26/2021 60 days
Created on 2022-07-22 by the reprex package (v2.0.1)

Adding dates and times to event durations

As an addition to this question, is it possible to add when an event started and when it finished in another column(s)?
Here is a reproducible example pulled from the OP.
df <- structure(list(Time = structure(c(1463911500, 1463911800, 1463912100,
1463912400, 1463912700, 1463913000), class = c("POSIXct", "POSIXt"
), tzone = ""), Temp = c(20.043, 20.234, 6.329, 20.424, 20.615,
20.805)), row.names = c(NA, -6L), class = "data.frame")
> df
Time Temp
1 2016-05-22 12:05:00 20.043
2 2016-05-22 12:10:00 20.234
3 2016-05-22 12:15:00 6.329
4 2016-05-22 12:20:00 20.424
5 2016-05-22 12:25:00 20.615
6 2016-05-22 12:30:00 20.805
library(dplyr)
df %>%
# add id for different periods/events
mutate(tmp_Temp = Temp > 20, id = rleid(tmp_Temp)) %>%
# keep only periods with high temperature
filter(tmp_Temp) %>%
# for each period/event, get its duration
group_by(id) %>%
summarise(event_duration = difftime(last(Time), first(Time)))
id event_duration
<int> <time>
1 1 5 mins
2 3 10 mins
i.e there are two more columns: "start_DateTime" and "end_DateTime"
Thanks!
Sure. Modify the final summarise() like this:
df %>%
# add id for different periods/events
mutate(tmp_Temp = Temp > 20, id = rleid(tmp_Temp)) %>%
# keep only periods with high temperature
filter(tmp_Temp) %>%
# for each period/event, get its duration
group_by(id) %>%
summarise(event_duration = difftime(last(Time), first(Time)),
start_DateTime = min(Time),
end_DateTime = max(Time))
#> # A tibble: 2 × 4
#> id event_duration start_DateTime end_DateTime
#> <int> <drtn> <dttm> <dttm>
#> 1 1 5 mins 2016-05-22 12:05:00 2016-05-22 12:10:00
#> 2 3 10 mins 2016-05-22 12:20:00 2016-05-22 12:30:00

do() superseded! Alternative is to use across(), nest_by(), and summarise, how?

I'm doing something quite simple. Given a dataframe of start dates and end dates for specific periods I want to expand/create a full sequence for each period binned by week (with the factor for each row), then output this in a single large dataframe.
For instance:
library(tidyverse)
library(lubridate)
# Dataset
start_dates = ymd_hms(c("2019-05-08 00:00:00",
"2020-01-17 00:00:00",
"2020-03-03 00:00:00",
"2020-05-28 00:00:00",
"2020-12-10 00:00:00",
"2021-05-07 00:00:00",
"2022-01-04 00:00:00"), tz = "UTC")
end_dates = ymd_hms(c( "2019-10-24 00:00:00",
"2020-03-03 00:00:00",
"2020-05-28 00:00:00",
"2020-12-10 00:00:00",
"2021-05-07 00:00:00",
"2022-01-04 00:00:00",
"2022-01-19 00:00:00"), tz = "UTC")
df1 = data.frame(studying = paste0("period",seq(1:7),sep = ""),start_dates,end_dates)
It was suggested to me to use do(), which currently works fine but I hate it when things are superseded. I also have a way of doing it using map2. But reading the file (https://dplyr.tidyverse.org/reference/do.html) suggests you can use nest_by(), across() and summarise() to do the same job as do(), how would I go about getting same result? I've tried a lot of things but I just can't seem to get it.
# do() way to do it
df1 %>%
group_by(studying) %>%
do(data.frame(week=seq(.$start_dates,.$end_dates,by="1 week")))
# transmute() way to do it
df1 %>%
transmute(weeks = map2(start_dates,end_dates, seq, by = "1 week"), studying)
%>% unnest(cols = c(weeks))
As the documentation of ?do suggests, we can now use summarise and replace the . with across():
library(tidyverse)
library(lubridate)
df1 %>%
group_by(studying) %>%
summarise(week = seq(across()$start_dates,
across()$end_dates,
by = "1 week"))
#> `summarise()` has grouped output by 'studying'. You can override using the
#> `.groups` argument.
#> # A tibble: 134 x 2
#> # Groups: studying [7]
#> studying week
#> <chr> <dttm>
#> 1 period1 2019-05-08 00:00:00
#> 2 period1 2019-05-15 00:00:00
#> 3 period1 2019-05-22 00:00:00
#> 4 period1 2019-05-29 00:00:00
#> 5 period1 2019-06-05 00:00:00
#> 6 period1 2019-06-12 00:00:00
#> 7 period1 2019-06-19 00:00:00
#> 8 period1 2019-06-26 00:00:00
#> 9 period1 2019-07-03 00:00:00
#> 10 period1 2019-07-10 00:00:00
#> # … with 124 more rows
Created on 2022-01-19 by the reprex package (v0.3.0)
You can also use tidyr::complete:
df1 %>%
group_by(studying) %>%
complete(start_dates = seq(from = start_dates, to = end_dates, by = "1 week")) %>%
select(-end_dates, weeks = start_dates)
# A tibble: 134 x 2
# Groups: studying [7]
studying weeks
<chr> <dttm>
1 period1 2019-05-08 00:00:00
2 period1 2019-05-15 00:00:00
3 period1 2019-05-22 00:00:00
4 period1 2019-05-29 00:00:00
5 period1 2019-06-05 00:00:00
6 period1 2019-06-12 00:00:00
7 period1 2019-06-19 00:00:00
8 period1 2019-06-26 00:00:00
9 period1 2019-07-03 00:00:00
10 period1 2019-07-10 00:00:00
# ... with 124 more rows
Although marked Experimental the help file for group_modify does say that
‘group_modify()’ is an evolution of ‘do()’
and, in fact, the code for the example in the question using group_modify is nearly the same as with do.
# with group_modify
df2 <- df1 %>%
group_by(studying) %>%
group_modify(~ data.frame(week = seq(.$start_dates, .$end_dates, by = "1 week")))
# with do
df0 <- df1 %>%
group_by(studying) %>%
do(data.frame(week = seq(.$start_dates, .$end_dates, by = "1 week")))
identical(df2, df0)
## [1] TRUE
Not sure if this exactly what you are looking for, but here is my attempt with rowwise and unnest
df1 %>%
rowwise() %>%
mutate(week = list(seq(start_dates, end_dates, by = "1 week"))) %>%
select(studying, week) %>%
unnest(cols = c(week))
Another approach:
library(tidyverse)
df1 %>%
group_by(studying) %>%
summarise(df = tibble(weeks = seq(start_dates, end_dates, by = 'week'))) %>%
unnest(df)
#> `summarise()` has grouped output by 'studying'. You can override using the `.groups` argument.
#> # A tibble: 134 × 2
#> # Groups: studying [7]
#> studying weeks
#> <chr> <dttm>
#> 1 period1 2019-05-08 00:00:00
#> 2 period1 2019-05-15 00:00:00
#> 3 period1 2019-05-22 00:00:00
#> 4 period1 2019-05-29 00:00:00
#> 5 period1 2019-06-05 00:00:00
#> 6 period1 2019-06-12 00:00:00
#> 7 period1 2019-06-19 00:00:00
#> 8 period1 2019-06-26 00:00:00
#> 9 period1 2019-07-03 00:00:00
#> 10 period1 2019-07-10 00:00:00
#> # … with 124 more rows
Created on 2022-01-20 by the reprex package (v2.0.1)

Resources