I am having a data frame in R as follows:
df <- data.frame("Type" = c("Item A","Item B"), "Frequency" = c("Quarterly","Other"), "Date" = as.Date(c("2021-02-05","2021-05-05")),"endDate" = as.Date("2021-12-12"), stringsAsFactors = F)
I am trying to generate the sequence of dates between Date and endDate as each row. I am using the code below to generate the sequence
df <- df %>%
dplyr::mutate(id = 1:nrow(df),deliveryDate = ifelse(
df$Frequency == "Quarterly", list(seq(as.Date(df$Date), as.Date(df$endDate), by = "3 month")),
ifelse(df$Frequency == "Monthly", list(seq(as.Date(df$Date), as.Date(df$endDate), by = "month")),
ifelse(df$Frequency %in% c("Other"),list(seq(as.Date(df$Date), as.Date(df$Date), by = "month")),df$Date)))) %>%
tidyr::unnest(deliveryDate) %>%
dplyr::group_by(Type) %>%
dplyr::mutate(deliveryNumber = row_number()) %>%
dplyr::select(deliveryNumber,Type, Frequency, deliveryDate) %>%
TO be more descriptive, the sequence of date will be generated based on the frequency of the type. So to handle that case, I used dplyr::mutate().
But I am getting an error as follows:
Error: Problem with `mutate()` input `deliveryDate`.
x 'from' must be of length 1
ℹ Input `deliveryDate` is `ifelse(...)`.
Can anyone help me solve this issue in R? Thanks in advance!!!
You should consider a named vector:
library(tidyverse)
vec<-c(Quarterly = "3 months", Other = "month")
df %>%
rowwise() %>%
mutate(deliveryDate = list(seq(Date,endDate, by = vec[Frequency]))) %>%
unnest(deliveryDate)
# A tibble: 12 x 5
Type Frequency Date endDate deliveryDate
<chr> <chr> <date> <date> <date>
1 Item A Quarterly 2021-02-05 2021-12-12 2021-02-05
2 Item A Quarterly 2021-02-05 2021-12-12 2021-05-05
3 Item A Quarterly 2021-02-05 2021-12-12 2021-08-05
4 Item A Quarterly 2021-02-05 2021-12-12 2021-11-05
5 Item B Other 2021-05-05 2021-12-12 2021-05-05
6 Item B Other 2021-05-05 2021-12-12 2021-06-05
7 Item B Other 2021-05-05 2021-12-12 2021-07-05
8 Item B Other 2021-05-05 2021-12-12 2021-08-05
9 Item B Other 2021-05-05 2021-12-12 2021-09-05
10 Item B Other 2021-05-05 2021-12-12 2021-10-05
11 Item B Other 2021-05-05 2021-12-12 2021-11-05
12 Item B Other 2021-05-05 2021-12-12 2021-12-05
Use complete
df %>% group_by(Type) %>% mutate(DeliveryDate = Date,
Frequency = case_when(Frequency %in% "Quarterly"~ "quarter",
Frequency %in% "Monthly" ~ "month",
Frequency %in% "Weekly" ~ "week",
TRUE ~ "month")) %>%
complete(DeliveryDate = seq.Date(Date, endDate, by = Frequency)) %>%
fill(Frequency, Date, endDate)
# A tibble: 12 x 5
# Groups: Type [2]
Type DeliveryDate Frequency Date endDate
<chr> <date> <chr> <date> <date>
1 Item A 2021-02-05 quarter 2021-02-05 2021-12-12
2 Item A 2021-05-05 quarter 2021-02-05 2021-12-12
3 Item A 2021-08-05 quarter 2021-02-05 2021-12-12
4 Item A 2021-11-05 quarter 2021-02-05 2021-12-12
5 Item B 2021-05-05 month 2021-05-05 2021-12-12
6 Item B 2021-06-05 month 2021-05-05 2021-12-12
7 Item B 2021-07-05 month 2021-05-05 2021-12-12
8 Item B 2021-08-05 month 2021-05-05 2021-12-12
9 Item B 2021-09-05 month 2021-05-05 2021-12-12
10 Item B 2021-10-05 month 2021-05-05 2021-12-12
11 Item B 2021-11-05 month 2021-05-05 2021-12-12
12 Item B 2021-12-05 month 2021-05-05 2021-12-12
Here's one way. It's not clear what you want for "Other" as opposed to "month", so I set it here to "week".
Note that you don't need to reference the data frame when using mutate() since everything called in the function is set to the environment of the data frame. Also, look into using case_when() instead of using nested ifelse() calls.
library(tidyverse)
df %>%
mutate(Frequency2 = case_when(Frequency == "Quarterly" ~ "3 month",
Frequency == "Month" ~ "month",
TRUE ~ "week")) %>%
group_by(Type, Frequency2) %>%
nest() %>%
mutate(middates = map2(data, Frequency2, ~ seq.Date(min(.x$Date), max(.x$endDate), by = .y))) %>%
unnest(c(data, middates)) %>%
ungroup()
# A tibble: 36 x 6
Type Frequency Frequency2 Date endDate middates
<chr> <chr> <chr> <date> <date> <date>
1 Item A Quarterly 3 month 2021-02-05 2021-12-12 2021-02-05
2 Item A Quarterly 3 month 2021-02-05 2021-12-12 2021-05-05
3 Item A Quarterly 3 month 2021-02-05 2021-12-12 2021-08-05
4 Item A Quarterly 3 month 2021-02-05 2021-12-12 2021-11-05
5 Item B Other week 2021-05-05 2021-12-12 2021-05-05
6 Item B Other week 2021-05-05 2021-12-12 2021-05-12
7 Item B Other week 2021-05-05 2021-12-12 2021-05-19
8 Item B Other week 2021-05-05 2021-12-12 2021-05-26
9 Item B Other week 2021-05-05 2021-12-12 2021-06-02
10 Item B Other week 2021-05-05 2021-12-12 2021-06-09
# ... with 26 more rows
Related
I have a dataframe of start and end dates, where each row represents a specific trip.
Those date ranges makeup a continuous timeline except around April where there is a discontinuity/lack of data (because no trips were taken).
I would like to find the start and end date of that specific period? (using a tidy approach preferably)
library(tidyverse)
df<- data.frame(start = as.Date(c("2022-01-03", "2022-01-18", "2022-01-31", "2022-03-01" ,"2022-03-08", "2022-03-09", "2022-04-15",
"2022-04-20", "2022-04-20","2022-05-03", "2022-05-17", "2022-05-17", "2022-05-31", "2022-06-05", "2022-06-22" ,"2022-06-28", "2022-07-11")),
end = as.Date(c("2022-01-18","2022-01-31", "2022-03-01" ,"2022-03-08" ,"2022-03-09", "2022-03-25", "2022-04-20" ,"2022-04-20", "2022-05-03",
"2022-05-17" ,"2022-05-17", "2022-05-31", "2022-06-05" ,"2022-06-22" ,"2022-06-28" ,"2022-07-11", "2022-07-17"))) %>%
mutate(trip_number = as.character(row_number()))
df %>%
ggplot()+
geom_segment(aes(x = start, xend = end, y =0, yend= 0, col = trip_number))+
theme(legend.position = "none")
Created on 2022-07-17 by the reprex package (v2.0.1)
A possible solution:
library(tidyverse)
library(lubridate)
df %>%
mutate(date1 = if_else(start == lag(end), NA_Date_, lag(end)),
date2 = if_else(start == lag(end), NA_Date_, start)) %>%
bind_rows(tibble(start = .$date1, end = .$date2)) %>%
filter(!if_all(everything(), is.na)) %>%
arrange(start) %>%
select(!starts_with("date"))
#> start end trip_number
#> 1 2022-01-03 2022-01-18 1
#> 2 2022-01-18 2022-01-31 2
#> 3 2022-01-31 2022-03-01 3
#> 4 2022-03-01 2022-03-08 4
#> 5 2022-03-08 2022-03-09 5
#> 6 2022-03-09 2022-03-25 6
#> 7 2022-03-25 2022-04-15 <NA>
#> 8 2022-04-15 2022-04-20 7
#> 9 2022-04-20 2022-04-20 8
#> 10 2022-04-20 2022-05-03 9
#> 11 2022-05-03 2022-05-17 10
#> 12 2022-05-17 2022-05-17 11
#> 13 2022-05-17 2022-05-31 12
#> 14 2022-05-31 2022-06-05 13
#> 15 2022-06-05 2022-06-22 14
#> 16 2022-06-22 2022-06-28 15
#> 17 2022-06-28 2022-07-11 16
#> 18 2022-07-11 2022-07-17 17
I'm working with trip ticket data and it includes a column with dates and times. I'm want to group trips according to Morning(05:00 - 10:59), Lunch(11:00-12:59), Afternoon(13:00-17:59), Evening(18:00-23:59), and Dawn/Graveyard(00:00-04:59) and then count the number of trips (by means of counting the unique values in the trip_id column) for each of those categories.
Only I don't know how to group/summarize according to time values. Is this possible in R?
trip_id start_time end_time day_of_week
1 CFA86D4455AA1030 2021-03-16 08:32:30 2021-03-16 08:36:34 Tuesday
2 30D9DC61227D1AF3 2021-03-28 01:26:28 2021-03-28 01:36:55 Sunday
3 846D87A15682A284 2021-03-11 21:17:29 2021-03-11 21:33:53 Thursday
4 994D05AA75A168F2 2021-03-11 13:26:42 2021-03-11 13:55:41 Thursday
5 DF7464FBE92D8308 2021-03-21 09:09:37 2021-03-21 09:27:33 Sunday
Here's a solution with hour() and case_when().
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
trip <- tibble(start_time = mdy_hm("1/1/2022 1:00") + minutes(seq(0, 700, 15)))
trip <- trip %>%
mutate(
hr = hour(start_time),
time_of_day = case_when(
hr >= 5 & hr < 11 ~ "morning",
hr >= 11 & hr < 13 ~ "afternoon",
TRUE ~ "fill in the rest yourself :)"
)
)
print(trip)
#> # A tibble: 47 x 3
#> start_time hr time_of_day
#> <dttm> <int> <chr>
#> 1 2022-01-01 01:00:00 1 fill in the rest yourself :)
#> 2 2022-01-01 01:15:00 1 fill in the rest yourself :)
#> 3 2022-01-01 01:30:00 1 fill in the rest yourself :)
#> 4 2022-01-01 01:45:00 1 fill in the rest yourself :)
#> 5 2022-01-01 02:00:00 2 fill in the rest yourself :)
#> 6 2022-01-01 02:15:00 2 fill in the rest yourself :)
#> 7 2022-01-01 02:30:00 2 fill in the rest yourself :)
#> 8 2022-01-01 02:45:00 2 fill in the rest yourself :)
#> 9 2022-01-01 03:00:00 3 fill in the rest yourself :)
#> 10 2022-01-01 03:15:00 3 fill in the rest yourself :)
#> # ... with 37 more rows
trips <- trip %>%
count(time_of_day)
print(trips)
#> # A tibble: 3 x 2
#> time_of_day n
#> <chr> <int>
#> 1 afternoon 7
#> 2 fill in the rest yourself :) 16
#> 3 morning 24
Created on 2022-03-21 by the reprex package (v2.0.1)
I have the following vector of dates in R. I would like to convert it from character class into Date class, however I only want to display the dates as year-month (%Y-%m) instead of year-month-day (%Y-%m-%d)
library(tidyverse)
dates <- structure(list(Date = c("2022-03-24", "2022-04-21", "2022-05-24",
"2022-07-22", "2022-09-01")), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
dates %>%
mutate(Date = format(as.Date(Date), '%Y-%m'))
when using format to convert the dates to %Y-%m, is there a way to maintain the class of the vector as <date> instead of as a character <chr>?
# A tibble: 5 x 1
Date
<chr>
1 2022-03-24
2 2022-04-21
3 2022-05-24
4 2022-07-22
5 2022-09-01
We may use floor_date to normalize the day to a single constant value from the Dates
library(lubridate)
library(dplyr)
dates %>%
mutate(Date2 = floor_date(ymd(Date), "month"))
-output
# A tibble: 5 × 2
Date Date2
<chr> <date>
1 2022-03-24 2022-03-01
2 2022-04-21 2022-04-01
3 2022-05-24 2022-05-01
4 2022-07-22 2022-07-01
5 2022-09-01 2022-09-01
Or another option is to assign the day
dates$Date2 <- ymd(dates$Date)
day(dates$Date2) <- 1
> dates
# A tibble: 5 × 2
Date Date2
<chr> <date>
1 2022-03-24 2022-03-01
2 2022-04-21 2022-04-01
3 2022-05-24 2022-05-01
4 2022-07-22 2022-07-01
5 2022-09-01 2022-09-01
A base R option -
You can change the last two digits of the date with 01.
as.Date(sub('\\d{2}$', '01', dates$Date))
#[1] "2022-03-01" "2022-04-01" "2022-05-01" "2022-07-01" "2022-09-01"
I am trying to calculate the unemployment rate based of the data below and add it as new rows to the data table. I want to divide unemployed by labourforce based off the date and add each datapoint as a row.
Essentially, I am trying to go from this
date
series_1
value
2021-01-01
labourforce
13793
2021-02-01
labourforce
13812
2021-03-01
labourforce
13856
2021-01-01
unemployed
875
2021-02-01
unemployed
805
2021-03-01
unemployed
778
to this
date
series_1
value
2021-01-01
labourforce
13793
2021-02-01
labourforce
13812
2021-03-01
labourforce
13856
2021-01-01
unemployed
875
2021-02-01
unemployed
805
2021-03-01
unemployed
778
2021-01-01
unemploymentrate
6.3
2021-02-01
unemploymentrate
5.8
2021-03-01
unemploymentrate
5.6
Here is my code so far. I know the last line is wrong? Any suggestions or ideas are welcome!
longdata %>%
group_by(date) %>%
summarise(series_1 = 'unemploymentrate',
value = series_1$unemployed/series_1$labourforce))
Fro each day, you can get the ratio of 'unemployed' by 'labourforce' and add it as new rows to your original dataset.
library(dplyr)
df %>%
group_by(date) %>%
summarise(value = value[series_1 == 'unemployed']/value[series_1 == 'labourforce'] * 100,
series_1 = 'unemploymentrate') %>%
bind_rows(df) %>%
arrange(series_1)
# date value series_1
# <chr> <dbl> <chr>
#1 2021-01-01 13793 labourforce
#2 2021-02-01 13812 labourforce
#3 2021-03-01 13856 labourforce
#4 2021-01-01 875 unemployed
#5 2021-02-01 805 unemployed
#6 2021-03-01 778 unemployed
#7 2021-01-01 6.34 unemploymentrate
#8 2021-02-01 5.83 unemploymentrate
#9 2021-03-01 5.61 unemploymentrate
Try:
library(dplyr)
library(tidyr)
df %>%
pivot_wider(names_from = series_1, values_from = value) %>%
mutate(unempolymentrate = round(unemployed*100/labourforce, 2)) %>%
pivot_longer(-1, names_to = "series_1", values_to = "value") %>%
mutate(series_1 = factor(series_1, levels = c("labourforce", "unemployed", "unempolymentrate"))) %>%
arrange(series_1, date)
#> # A tibble: 9 x 3
#> date series_1 value
#> <chr> <fct> <dbl>
#> 1 2021-01-01 labourforce 13793
#> 2 2021-02-01 labourforce 13812
#> 3 2021-03-01 labourforce 13856
#> 4 2021-01-01 unemployed 875
#> 5 2021-02-01 unemployed 805
#> 6 2021-03-01 unemployed 778
#> 7 2021-01-01 unempolymentrate 6.34
#> 8 2021-02-01 unempolymentrate 5.83
#> 9 2021-03-01 unempolymentrate 5.61
Created on 2021-04-23 by the reprex package (v2.0.0)
data
df <- structure(list(date = c("2021-01-01", "2021-02-01", "2021-03-01",
"2021-01-01", "2021-02-01", "2021-03-01"), series_1 = c("labourforce",
"labourforce", "labourforce", "unemployed", "unemployed", "unemployed"
), value = c(13793L, 13812L, 13856L, 875L, 805L, 778L)), class = "data.frame", row.names = c(NA,
-6L))
I am trying to figure out how to add a row when a date range spans a calendar year. Below is a minimal reprex:
I have a date frame like this:
have <- data.frame(
from = c(as.Date('2018-12-15'), as.Date('2019-12-20'), as.Date('2019-05-13')),
to = c(as.Date('2019-06-20'), as.Date('2020-01-25'), as.Date('2019-09-10'))
)
have
#> from to
#> 1 2018-12-15 2019-06-20
#> 2 2019-12-20 2020-01-25
#> 3 2019-05-13 2019-09-10
I want a data.frame that splits into two rows when to and from span a calendar year.
want <- data.frame(
from = c(as.Date('2018-12-15'), as.Date('2019-01-01'), as.Date('2019-12-20'), as.Date('2020-01-01'), as.Date('2019-05-13')),
to = c(as.Date('2018-12-31'), as.Date('2019-06-20'), as.Date('2019-12-31'), as.Date('2020-01-25'), as.Date('2019-09-10'))
)
want
#> from to
#> 1 2018-12-15 2018-12-31
#> 2 2019-01-01 2019-06-20
#> 3 2019-12-20 2019-12-31
#> 4 2020-01-01 2020-01-25
#> 5 2019-05-13 2019-09-10
I am wanting to do this because for a particular row, I want to know how many days are in each year.
want$time_diff_by_year <- difftime(want$to, want$from)
Created on 2020-05-15 by the reprex package (v0.3.0)
Any base R, tidyverse solutions would be much appreciated.
You can determine the additional years needed for your date intervals with map2, then unnest to create additional rows for each year.
Then, you can identify date intervals of intersections between partial years and a full calendar year. This will keep the partial years starting Jan 1 or ending Dec 31 for a given year.
library(tidyverse)
library(lubridate)
have %>%
mutate(date_int = interval(from, to),
year = map2(year(from), year(to), seq)) %>%
unnest(year) %>%
mutate(year_int = interval(as.Date(paste0(year, '-01-01')), as.Date(paste0(year, '-12-31'))),
year_sect = intersect(date_int, year_int),
from_new = as.Date(int_start(year_sect)),
to_new = as.Date(int_end(year_sect))) %>%
select(from_new, to_new)
Output
# A tibble: 5 x 2
from_new to_new
<date> <date>
1 2018-12-15 2018-12-31
2 2019-01-01 2019-06-20
3 2019-12-20 2019-12-31
4 2020-01-01 2020-01-25
5 2019-05-13 2019-09-10