Get Lent from timeDate in R - r

I would like to get all Lent Fridays from 2010 to 2020. I am currently using timeDate to get holidays such as Easter, Good Friday, and Ash Wednesday. As follows
aw <- as.Date(AshWednesday(year = 2010:2020))
gf <- as.Date(GoodFriday(year = 2010:2020))
I can also get fixed holidays that don't come with the package. For example
mg <- as.Date(AshWednesday(year = 2010:2020)-1) #Mardi Gras
cm <- as.Date(seq(ymd('2010-05-05'),ymd('2020-05-05'), by = '1 year')) #cinco de mayo
But I am struggling to get all Lent Fridays per year.
Note: Lent begins on the Sunday that follows Ash Wednesday and lasts for 40 days.
As posted below, the following code worked:
library(dplyr)
library(purrr)
library(tidyr)
library(lubridate)
tibble(aw) %>%
mutate(aw_sunday = aw + lubridate::days(0)) %>% #Find the first Sunday after each Ash Wednesday
mutate(extra_days = map(aw_sunday, function(x) x + lubridate::days(1:40))) %>% #Find all series of 40 days after each sunday
unnest %>%
mutate(week_day = lubridate::wday(extra_days, label = TRUE)) %>% #Find all the day names
filter(week_day == 'Fri') %>% # Filter out the fridays
pull(extra_days)
I changed days(4) to days(0) because I think lent Fridays actually start immediately after Ash Wednesday, not skipping a week. I pulled the Note above from Wikipedia under "the Ambrosian Rite". I guess there is a difference.

Here's one way to do it:
library(dplyr)
library(purrr)
library(tidyr)
library(lubridate)
tibble(aw) %>%
mutate(aw_sunday = aw + lubridate::days(4)) %>% #Find the first Sunday after each Ash Wednesday
mutate(extra_days = map(aw_sunday, function(x) x + lubridate::days(1:40))) %>% #Find all series of 40 days after each sunday
unnest %>%
mutate(week_day = lubridate::wday(extra_days, label = TRUE)) %>% #Find all the day names
filter(week_day == 'Fri') %>% # Filter out the fridays
pull(extra_days)

If you don't want to use lubridate, you can use the base seq.Date function:
gf <- as.Date(GoodFriday(year = 2010:2020))
lentFridays <- lapply(gf, function(x)seq.Date(x, length.out = 6, by = "-7 days"))
and to pretty it up:
lentFridays <- data.frame(lentFridays)
names(lentFridays) <- paste0("Year", 2010:2020)

If Lent begins on the Sunday after Ash Wednesday, then all subsequent Fridays are those that are 9, 16, ..., 44 days after Ash Wednesday. You can use the days or ddays functions in lubridate to add time durations to individual dates. Since you have a vector of dates, we can store the Lent Fridays in a list.
library(lubridate)
lent_fridays <- lapply(aw, function(x) x + days(seq(9, 44, by =7)))
e.g., for 2018, the Lent Fridays are the following dates.
[[9]]
[1] "2018-02-23" "2018-03-02" "2018-03-09" "2018-03-16" "2018-03-23"
[6] "2018-03-30"

Related

Extract year from date with weird date format

I have a date format as follows: yyyymmdd. So, 10 March 2022 is fromatted as 20220310. So there is no separator between the day, month and year. But no I want to replace to column with all those dates with a column that only contains the year. Normally I would use the following code:
df <- df %>%
mutate(across(contains("Date"), ~(parse_date_time(., orders = c('ymd')))))
And then separate the column into three different columns with year, month and days and than delete the monht and day column. But somehow the code above doesn't work. Hope that anyone can help me out.
Not as fancy, but you could simply get the year from a substring of the whole date:
df$Year <- as.numeric(substr(as.character(df$Date),1,4))
you can try this:
df$column_with_date <- as.integer(x = substr(x = df$column_with_date, start = 1, stop = 4))
The as.integer function is optional, but you could use it to save more space in memory.
You code works if it is in the format below. You can use mutate_at with a list of year, month, and day to create the three columns like this:
df <- data.frame(Date = c("20220310"))
library(lubridate)
library(dplyr)
df %>%
mutate(across(contains("Date"), ~(parse_date_time(., orders = c('ymd'))))) %>%
mutate_at(vars(Date), list(year = year, month = month, day = day))
#> Date year month day
#> 1 2022-03-10 2022 3 10
Created on 2022-07-25 by the reprex package (v2.0.1)

Group dates by certain range in R

let's say I have a list of dates from March 1st to July 15th:
daterange = as.data.frame(seq(as.Date("2020-3-1"), as.Date("2020-7-15"), "days"))
I want to group the dates by 1-15 and 16-30/31 for each month. So the dates in March will be separated into two groups: Mar 1-15 and Mar 16-31. Then keep doing this for every month.
I know the lubridate package can sort by week, but I don't know how to set a custom range.
Thanks
We can create a logical vector on day as well as a group on yearmon
library(dplyr)
library(zoo)
library(lubridate)
library(stringr)
daterange2 <- daterange %>%
set_names('Date') %>%
group_by(yearmon = as.yearmon(Date),
Daygroup = (day(Date) > 15) + 1) %>%
mutate(Label = str_c(format(Date, '%b'),
str_c(min(day(Date)), max(day(Date)), sep='-'), sep= ' '))
Using base R, you can create two groups in each month by pasting the month value from each date and assign value 1/2 based on the date.
newdaterange <- transform(daterange, group = paste0(format(date, "%b"), '-group-',
ifelse(as.integer(format(date, "%d")) > 15, 1, 2)))

Conversion of daily to standard meteorological week in R

I have seen many questions in SO on converting daily data to weekly using xts, zoo or lubridate packages. None of the answers was found appropriate for my problem. I have tried the following code
library(zoo)
library(lubridate)
library(xts)
library(tidyverse)
#Calculation for multistation
set.seed(123)
df <- data.frame("date"= seq(from = as.Date("1970-1-1"), to = as.Date("2000-12-31"), by = "day"),
"Station1" = runif(length(seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days")), 10, 30),
"Station2" = runif(length(seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days")), 11, 29),
"Station3" = runif(length(seq.Date(as.Date("1970-1-1"), as.Date("2000-12-31"), "days")), 9, 28))
head(df)
# Aggregate over week
df %>%
mutate(Week = week(ymd(date)),
Year = year(ymd(date))) %>%
pivot_longer(-c(Week, date, Year), values_to = "value", names_to = "Station") %>%
group_by(Year, Week, Station) %>%
summarise(Weekly = mean(value)) %>%
arrange(Station) %>%
print(n = 55)
From the output you can see that 1970 cotains 53 weeks which I don't want. I want to start the week from the first date of every year and the 52nd week should have 8 days in a nonleap year and in case of leap years 9th and 52nd week should have 8 days so that every year contains 52 weeks only. How to do that in R?
Why not just write a function that gives the meteorological week from the definition you gave? Package lubridate will give you the day of the year with yday, which can act as the index for a vector of the correct week labels. These are straightforward to construct with simple modular math and concatenation.
You then only need to figure out if you are in a leap year, which again is possible using lubridate::leap_year. Combine these in an ifelse and you have an easy-to-use function:
met_week <- function(dates)
{
normal_year <- c((0:363 %/% 7 + 1), 52)
leap_year <- c(normal_year[1:59], 9, normal_year[60:365])
year_day <- lubridate::yday(dates)
return(ifelse(lubridate::leap_year(dates), leap_year[year_day], normal_year[year_day]))
}
and you can do
df %>% mutate(week = met_week(date))
You could just do it manually on the day of the year, not sure there is a function already built for that.
df %>%
mutate(Week = pmin(52, ceiling(yday(date) / 7)),
Year = year(ymd(date)))

How to randomize a date in R

I'm trying to back into a fake birthdate based on the age of a consumer. I'm using lubridate package. Here is my code:
ymd(today) - years(df$age) - months(sample(1:12, 1)) - days(sample(1:31, 1)).
I want to use this to generate a different dob that equals the age. When I run this inline it gives every row the same month and day and different year. I want the month and day to vary as well.
You can make a date with the year of birth at 1st of January and then add random duration of days to it.
library(lubridate)
library(dplyr)
set.seed(5)
df <- data.frame(age = c(18, 33, 58, 63))
df %>%
mutate(dob = make_date(year(Sys.Date()) - age, 1, 1) +
duration(sample(0:364, n()), unit = "days"))
In base R, we can extract the year from the age column subtract it from current year, select a random month and date, paste the values together and create a Date object.
set.seed(123)
df <- data.frame(age = sample(100, 5))
as.Date(paste(as.integer(format(Sys.Date(), "%Y")) - df$age,
sprintf("%02d", sample(12, nrow(df))),
sprintf("%02d", sample(30, nrow(df))), sep = "-"))
#[1] "1990-01-29" "1940-06-14" "1978-09-19" "1933-05-16" "1928-04-03"
However, in this case you might need to make an extra check for month of February, or to be safe you might want to sample dates only from 28 instead of 30 here.

How to filter a dataset by the time stamp

I'm working with some bird GPS tracking data, and I would like to exclude points based on the time stamp.
Some background information- the GPS loggers track each bird for just over 24 hours, starting in the evening, and continuing through the night and the following day. What I would like to do is exclude points taken after 9:30pm on the day AFTER deployment (so removing points from the very end of the track).
As an R novice, I'm struggling because the deployment dates differ for each bird, so I can't simply use subset() for a specific date and time.
An example of my dataframe (df):
BirdID x y Datetime
15K12 492719.9 5634805 2015-06-23 18:25:00
15K12 492491.5 5635018 2015-06-23 18:27:00
15K70 455979.1 5653581 2015-06-24 19:54:00
15K70 456040.9 5653668 2015-06-24 19:59:00
So, pretending these points represent the start of the GPS track for each animal, I would like to remove points after 9:30 pm on June 24 for bird 15K12, and after 9:30 on June 25 for bird 15K70.
Any ideas?
First, check if df$Datetime is a date variable:
class(df$Datetime)
If it's not, you can convert it with this:
df$Datetime <- ymd_hms(df&Datetime)
You use mutate to create a new variable called newdate that takes the earliest date of the bird's data and sets the date for cutoff which is the next day at 21:30:00 of the earliest date of a bird's observations.
Then you filter the Datetime column by the newdate column and you get the observations that are found earlier that the specified date.
library(dplyr); library(lubridate)
df %>%
group_by(BirdID) %>%
mutate(newdate = as.POSIXct(date(min(Datetime)) + days(1) + hours(21) + minutes(30))) %>%
filter(Datetime < newdate)
Did a reproducible example:
library(dplyr); library(lubridate)
set.seed(1)
# Create a data frame (1000 observations)
BirdID <- paste(rep(floor(runif(250, 1, 20)),4),
rep("k", 1000), rep(floor(runif(250, 1, 40)),4), sep = "")
x <- rnorm(1000, mean = 47000, sd = 2000)
y <- rnorm(1000, mean = 5650000, sd = 300000)
Datetime <- as.POSIXct(rnorm(1000, mean = as.numeric(as.POSIXct("2015-06-23 18:25:00")), sd = 99999), tz = "GMT", origin = "1970-01-01")
df <- data.frame(BirdID, x, y, Datetime, stringsAsFactors = FALSE)
# Filter the data frame by the specified date
df_filtered <- df %>%
group_by(BirdID) %>%
mutate(newdate = as.POSIXct(date(min(Datetime)) + days(1) + hours(21) + minutes(30))) %>%
filter(Datetime < newdate)
This should fix any problem.

Resources