Group dates by certain range in R - r

let's say I have a list of dates from March 1st to July 15th:
daterange = as.data.frame(seq(as.Date("2020-3-1"), as.Date("2020-7-15"), "days"))
I want to group the dates by 1-15 and 16-30/31 for each month. So the dates in March will be separated into two groups: Mar 1-15 and Mar 16-31. Then keep doing this for every month.
I know the lubridate package can sort by week, but I don't know how to set a custom range.
Thanks

We can create a logical vector on day as well as a group on yearmon
library(dplyr)
library(zoo)
library(lubridate)
library(stringr)
daterange2 <- daterange %>%
set_names('Date') %>%
group_by(yearmon = as.yearmon(Date),
Daygroup = (day(Date) > 15) + 1) %>%
mutate(Label = str_c(format(Date, '%b'),
str_c(min(day(Date)), max(day(Date)), sep='-'), sep= ' '))

Using base R, you can create two groups in each month by pasting the month value from each date and assign value 1/2 based on the date.
newdaterange <- transform(daterange, group = paste0(format(date, "%b"), '-group-',
ifelse(as.integer(format(date, "%d")) > 15, 1, 2)))

Related

Extract year from date with weird date format

I have a date format as follows: yyyymmdd. So, 10 March 2022 is fromatted as 20220310. So there is no separator between the day, month and year. But no I want to replace to column with all those dates with a column that only contains the year. Normally I would use the following code:
df <- df %>%
mutate(across(contains("Date"), ~(parse_date_time(., orders = c('ymd')))))
And then separate the column into three different columns with year, month and days and than delete the monht and day column. But somehow the code above doesn't work. Hope that anyone can help me out.
Not as fancy, but you could simply get the year from a substring of the whole date:
df$Year <- as.numeric(substr(as.character(df$Date),1,4))
you can try this:
df$column_with_date <- as.integer(x = substr(x = df$column_with_date, start = 1, stop = 4))
The as.integer function is optional, but you could use it to save more space in memory.
You code works if it is in the format below. You can use mutate_at with a list of year, month, and day to create the three columns like this:
df <- data.frame(Date = c("20220310"))
library(lubridate)
library(dplyr)
df %>%
mutate(across(contains("Date"), ~(parse_date_time(., orders = c('ymd'))))) %>%
mutate_at(vars(Date), list(year = year, month = month, day = day))
#> Date year month day
#> 1 2022-03-10 2022 3 10
Created on 2022-07-25 by the reprex package (v2.0.1)

Sequence a group of dates in R

I wish to generate some Tidy data.
26 companies are observed everyday for 10 days.
Each day a value is recorded.
The first day is: 2020/1/1
How do I create a list of dates so that the first 26 rows of the date column of the date frame is "2020/1/1" (Year, Month, Day) and the next 26 rows are "2020/1/2" etc.
Here is the data frame without the date column:
library(tidyverse)
set.seed(33)
date_chunk <- rep(as.Date("2020/1/1"), 26)
# Tidy data. 10 sequential days starting 2020/1/1/
df <- tibble(
company = rep(letters, 10),
value = sample(0:5, 260, replace = TRUE),
color = "grey"
)
You can try this
rep(seq(as.Date("2020-01-01"),as.Date("2020-01-10"),1),each=26)
This will return a list of dates from 2020-01-01 to 2020-01-10 where each date will be repeated 26 times
For each company we can add row_number() to first date_chunk to get an incremental sequence of dates.
library(dplyr)
df %>%
group_by(company) %>%
mutate(date = first(date_chunk) + row_number() - 1)

Conversion of daily to standard meteorological week in R

I have seen many questions in SO on converting daily data to weekly using xts, zoo or lubridate packages. None of the answers was found appropriate for my problem. I have tried the following code
library(zoo)
library(lubridate)
library(xts)
library(tidyverse)
#Calculation for multistation
set.seed(123)
df <- data.frame("date"= seq(from = as.Date("1970-1-1"), to = as.Date("2000-12-31"), by = "day"),
"Station1" = runif(length(seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days")), 10, 30),
"Station2" = runif(length(seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days")), 11, 29),
"Station3" = runif(length(seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days")), 9, 28))
head(df)
# Aggregate over week
df %>%
mutate(Week = week(ymd(date)),
Year = year(ymd(date))) %>%
pivot_longer(-c(Week, date, Year), values_to = "value", names_to = "Station") %>%
group_by(Year, Week, Station) %>%
summarise(Weekly = mean(value)) %>%
arrange(Station) %>%
print(n = 55)
From the output you can see that 1970 cotains 53 weeks which I don't want. I want to start the week from the first date of every year and the 52nd week should have 8 days in a nonleap year and in case of leap years 9th and 52nd week should have 8 days so that every year contains 52 weeks only. How to do that in R?
Why not just write a function that gives the meteorological week from the definition you gave? Package lubridate will give you the day of the year with yday, which can act as the index for a vector of the correct week labels. These are straightforward to construct with simple modular math and concatenation.
You then only need to figure out if you are in a leap year, which again is possible using lubridate::leap_year. Combine these in an ifelse and you have an easy-to-use function:
met_week <- function(dates)
{
normal_year <- c((0:363 %/% 7 + 1), 52)
leap_year <- c(normal_year[1:59], 9, normal_year[60:365])
year_day <- lubridate::yday(dates)
return(ifelse(lubridate::leap_year(dates), leap_year[year_day], normal_year[year_day]))
}
and you can do
df %>% mutate(week = met_week(date))
You could just do it manually on the day of the year, not sure there is a function already built for that.
df %>%
mutate(Week = pmin(52, ceiling(yday(date) / 7)),
Year = year(ymd(date)))

How to randomize a date in R

I'm trying to back into a fake birthdate based on the age of a consumer. I'm using lubridate package. Here is my code:
ymd(today) - years(df$age) - months(sample(1:12, 1)) - days(sample(1:31, 1)).
I want to use this to generate a different dob that equals the age. When I run this inline it gives every row the same month and day and different year. I want the month and day to vary as well.
You can make a date with the year of birth at 1st of January and then add random duration of days to it.
library(lubridate)
library(dplyr)
set.seed(5)
df <- data.frame(age = c(18, 33, 58, 63))
df %>%
mutate(dob = make_date(year(Sys.Date()) - age, 1, 1) +
duration(sample(0:364, n()), unit = "days"))
In base R, we can extract the year from the age column subtract it from current year, select a random month and date, paste the values together and create a Date object.
set.seed(123)
df <- data.frame(age = sample(100, 5))
as.Date(paste(as.integer(format(Sys.Date(), "%Y")) - df$age,
sprintf("%02d", sample(12, nrow(df))),
sprintf("%02d", sample(30, nrow(df))), sep = "-"))
#[1] "1990-01-29" "1940-06-14" "1978-09-19" "1933-05-16" "1928-04-03"
However, in this case you might need to make an extra check for month of February, or to be safe you might want to sample dates only from 28 instead of 30 here.

Get Lent from timeDate in R

I would like to get all Lent Fridays from 2010 to 2020. I am currently using timeDate to get holidays such as Easter, Good Friday, and Ash Wednesday. As follows
aw <- as.Date(AshWednesday(year = 2010:2020))
gf <- as.Date(GoodFriday(year = 2010:2020))
I can also get fixed holidays that don't come with the package. For example
mg <- as.Date(AshWednesday(year = 2010:2020)-1) #Mardi Gras
cm <- as.Date(seq(ymd('2010-05-05'),ymd('2020-05-05'), by = '1 year')) #cinco de mayo
But I am struggling to get all Lent Fridays per year.
Note: Lent begins on the Sunday that follows Ash Wednesday and lasts for 40 days.
As posted below, the following code worked:
library(dplyr)
library(purrr)
library(tidyr)
library(lubridate)
tibble(aw) %>%
mutate(aw_sunday = aw + lubridate::days(0)) %>% #Find the first Sunday after each Ash Wednesday
mutate(extra_days = map(aw_sunday, function(x) x + lubridate::days(1:40))) %>% #Find all series of 40 days after each sunday
unnest %>%
mutate(week_day = lubridate::wday(extra_days, label = TRUE)) %>% #Find all the day names
filter(week_day == 'Fri') %>% # Filter out the fridays
pull(extra_days)
I changed days(4) to days(0) because I think lent Fridays actually start immediately after Ash Wednesday, not skipping a week. I pulled the Note above from Wikipedia under "the Ambrosian Rite". I guess there is a difference.
Here's one way to do it:
library(dplyr)
library(purrr)
library(tidyr)
library(lubridate)
tibble(aw) %>%
mutate(aw_sunday = aw + lubridate::days(4)) %>% #Find the first Sunday after each Ash Wednesday
mutate(extra_days = map(aw_sunday, function(x) x + lubridate::days(1:40))) %>% #Find all series of 40 days after each sunday
unnest %>%
mutate(week_day = lubridate::wday(extra_days, label = TRUE)) %>% #Find all the day names
filter(week_day == 'Fri') %>% # Filter out the fridays
pull(extra_days)
If you don't want to use lubridate, you can use the base seq.Date function:
gf <- as.Date(GoodFriday(year = 2010:2020))
lentFridays <- lapply(gf, function(x)seq.Date(x, length.out = 6, by = "-7 days"))
and to pretty it up:
lentFridays <- data.frame(lentFridays)
names(lentFridays) <- paste0("Year", 2010:2020)
If Lent begins on the Sunday after Ash Wednesday, then all subsequent Fridays are those that are 9, 16, ..., 44 days after Ash Wednesday. You can use the days or ddays functions in lubridate to add time durations to individual dates. Since you have a vector of dates, we can store the Lent Fridays in a list.
library(lubridate)
lent_fridays <- lapply(aw, function(x) x + days(seq(9, 44, by =7)))
e.g., for 2018, the Lent Fridays are the following dates.
[[9]]
[1] "2018-02-23" "2018-03-02" "2018-03-09" "2018-03-16" "2018-03-23"
[6] "2018-03-30"

Resources