Dates %within% Intervals - r

Running into a real head-scratcher and not sure of how to resolve. Really hoping some of you may be able to help. Also, first time I've ever contributed to StackOverflow....yay!
library(tidyverse)
library(lubridate)
start_date <- ymd("2014-06-28")
end_date <- ymd("2019-06-30")
PayPeriod_EndDate <- seq(start_date, end_date, by = '2 week')
PayPeriod_Interval <- int_diff(PayPeriod_EndDate)
This creates a vector of intervals, with each interval representing a pay period of two weeks in length. This is part one, and part one is relatively easy (though still took awhile to figure out, ha).
Part two contains a vector of dates.
Dates <- c("2014-07-08", "2018-10-20", "2018-12-13", "2018-12-13", "2018-12-06", "2018-11-30", "2019-01-16", "2019-01-23", "2019-03-15", "2018-10-02")
I want to identify Dates %within% Intervals, with the output being the interval that each date is within. So Date "2014-07-08" will be assigned 2014-06-28 UTC--2014-07-12 UTC, since this dates is within this interval.
A very similar problem seems to have been explored here...https://github.com/tidyverse/lubridate/issues/658
I have attempted the following
ymd(Dates) %within% PayPeriod_Interval
However, the result only calculates for the first element in the Dates vector. I have since tried various combinations of for loops, mutating into factors, etc... with little progress. This is work related so am really on a time-deficit and will be monitoring this post throughout the day and into the weekend.
Best and thank you!
James

The tidyverse is very useful but sometimes, base R is all you need. In this case the cut function is all you need.
library(lubridate)
start_date <- ymd("2014-06-28")
end_date <- ymd("2019-06-30")
PayPeriod_EndDate <- seq(start_date, end_date, by = '2 week')
Dates <- c("2014-07-08", "2018-10-20", "2018-12-13", "2018-12-13", "2018-12-06", "2018-11-30", "2019-01-16", "2019-01-23", "2019-03-15", "2018-10-02")
startperiod<-cut(as.Date(Dates), breaks=PayPeriod_EndDate)
endperiod<-as.Date(startperiod)+13
The output from the cut function is the start date of each pay period which the "Dates" variable is located.

This is how a map - solution could look like:
map(ymd(Dates), ~ PayPeriod_Interval[.x %within% PayPeriod_Interval])
# [[1]]
# [1] 2014-06-28 UTC--2014-07-12 UTC
#
# [[2]]
# [1] 2018-10-13 UTC--2018-10-27 UTC
#
# ...
To have the result as a interval vector (and not list) you can use:
PayPeriod_Interval[map_int(ymd(Dates), ~ which(.x %within% PayPeriod_Interval))]
# [1] 2014-06-28 UTC--2014-07-12 UTC 2018-10-13 UTC--2018-10-27 UTC 2018-12-08 UTC--2018-12-22 UTC 2018-12-08 UTC--2018-12-22 UTC 2018-11-24 UTC--2018-12-08 UTC
# [6] 2018-11-24 UTC--2018-12-08 UTC 2019-01-05 UTC--2019-01-19 UTC 2019-01-19 UTC--2019-02-02 UTC 2019-03-02 UTC--2019-03-16 UTC 2018-09-29 UTC--2018-10-13 UTC
If you are just interested in the end date of the interval an option is
PayPeriod_EndDate[map_int(ymd(Dates), ~ which.min(.x > PayPeriod_EndDate))]
# [1] "2014-07-12" "2018-10-27" "2018-12-22" "2018-12-22" "2018-12-08" "2018-12-08" "2019-01-19" "2019-02-02" "2019-03-16" "2018-10-13"
which.min returns number of the entry of the first Date of PayPeriod_EndDate that is not smaller than the specific date in the Dates-vector, thus the Date which is at the end of the specific payment period.

Related

How can I alter the date of a datetime vector depending on the time with lubridate?

I am trying to manipulate a date inside a datetime vector depending on time of day.
Each item in the vector newmagic looks something like this "2020-03-05 02:03:54 UTC"
For all the items that have a time between 19:00 and 23:59 I want to go back one day.
I tried writing an if statement:
if(hour(newmagic)>=19&hour(newmagic)<=23){
date(newmagic)<-date(newmagic)-1
}
giving me no output but
Warning message: In if (hour(newmagic) >= 19 & hour(newmagic) <= 23) {
: the condition has length > 1 and only the first element will be
used
when I limit the data to the condition and simply execute date()-1
newmagic[hour(newmagic)>=19&hour(newmagic)<=23&!is.na(newmagic)] <- date(newmagic[hour(newmagic)>=19&hour(newmagic)<=23&!is.na(newmagic)])-1
The output does remove 1 day but also sets the time to 0
Original:
"2020-03-07 20:58:00 UTC"
After date()-1
"2020-03-06 00:00:00 UTC"
I don't really know how to go on.
How can I adapt the if statement so that it will actually do what I intend to?
How can I rewrite the limitation in the second approach so that the time itself will stay intact?
Thank you for the help
You can try out this in your original data set. I have used lubridate and tidyverse
package. Initially I have split the data frame into date and time. Then I have converted the variables into date and time format and used the ifelse condition.
The code and the output is as follows:-
library(tidyverse)
library(lubridate)
ab <- data.frame(ymd_hms(c("2000-11-01 2:23:15", "2028-03-25 20:47:51",
"1990-05-14 22:45:30")))
colnames(ab) <- paste(c("Date_time"))
ab <- ab %>% separate(Date_time, into = c("Date", "Time"),
sep = " ", remove = FALSE)
ab$Date <- as.Date(ab$Date)
ab$Time <- hms(ab$Time)
ab$date_condition <- ifelse(hour(ab$Time) %in% c(19,20,21,22,23),
ab$date_condition <- ab$Date -1,
ab$date_condition <- ab$Date)
ab$date_condition <- as.Date(ab$date_condition, format = "%Y-%m-%d",
origin = "1970-01-01")
ab
# Date_time Date Time date_condition
1 2000-11-01 02:23:15 2000-11-01 2H 23M 15S 2000-11-01
2 2028-03-25 20:47:51 2028-03-25 20H 47M 51S 2028-03-24
3 1990-05-14 22:45:30 1990-05-14 22H 45M 30S 1990-05-13

Why subtracting months from a lubridate date gives inconsistent results? [duplicate]

I'm trying to subtract n months from a date as follows:
maturity <- as.Date("2012/12/31")
m <- as.POSIXlt(maturity)
m$mon <- m$mon - 6
but the resulting date is 01-Jul-2012, and not 30-Jun-2012, as I should expect.
Is there any short way to get such result?
1) seq.Date. Note that June has only 30 days so it cannot give June 31st thus instead it gives July 1st.
seq(as.Date("2012/12/31"), length = 2, by = "-6 months")[2]
## [1] "2012-07-01"
If we knew it was at month end we could do this:
seq(as.Date(cut(as.Date("2012/12/31"), "month")), length=2, by="-5 month")[2]-1
## "2012-06-30"
2) yearmon. Also if we knew it was month end then we could use the "yearmon" class of the zoo package like this:
library(zoo)
as.Date(as.yearmon(as.Date("2012/12/31")) -.5, frac = 1)
## [1] "2012-06-30"
This converts the date to "yearmon" subtracts 6 months (.5 of a year) and then converts it back to "Date" using frac=1 which means the end of the month (frac=0 would mean the beginning of the month). This also has the advantage over the previous solution that it is vectorized automatically, i.e. as.Date(...) could have been a vector of dates.
Note that if "Date" class is only being used as a way of representing months then we can get rid of it altogether and directly use "yearmon" since that models what we want in the first place:
as.yearmon("2012-12") - .5
## [1] "Jun 2012"
3) mondate. A third solution is the mondate package which has the advantage here that it returns the end of the month 6 months ago without having to know that we are month end:
library(mondate)
mondate("2011/12/31") - 6
## mondate: timeunits="months"
## [1] 2011/06/30
This is also vectorized.
4) lubridate. This lubridate answer has been changed in line with changes in the package:
library(lubridate)
as.Date("2012/12/31") %m-% months(6)
## [1] "2012-06-30"
lubridate is also vectorized.
5) sqldf/SQLite
library(sqldf)
sqldf("select date('2012-12-31', '-6 months') as date")
## date
## 1 2012-07-01
or if we knew we were at month end:
sqldf("select date('2012-12-31', '+1 day', '-6 months', '-1 day') as date")
## date
## 1 2012-06-30
you can use lubridate package for this
library(lubridate)
maturity <- maturity %m-% months(6)
there is no reason for changing the day field.
you can set your day field back to the last day in that month by
day(maturity) <- days_in_month(maturity)
lubridate works correctly with such calculations:
library(lubridate)
as.Date("2000-01-01") - days(1) # 1999-12-31
as.Date("2000-03-31") - months(1) # 2000-02-29
but sometimes fails:
as.Date("2000-02-29") - years(1) # NA, should be 1999-02-28
tidyverse has added the clock package in addition to the lubridate package that has nice functionality for this:
library(clock)
# sequence of dates
date_build(2018, 1:5, 31, invalid = "previous")
[1] "2018-01-31" "2018-02-28" "2018-03-31" "2018-04-30" "2018-05-31"
When the date is sequenced, 2018-02-31 is not a valid date. The invalid argument makes explicit what to do in this case: go to the last day of the "previous" valid date.
There is also a series add functions, but in your case you would use add_months. Again it has the invalid argument that you can specify:
x <- as.Date("2022-03-31")
# The previous valid moment in time
add_months(x, -1, invalid = "previous")
[1] "2022-02-28"
# The next valid moment in time, 2022-02-31 is not a valid date
add_months(x, -1, invalid = "next")
[1] "2022-03-01"
# Overflow the days. There were 28 days in February, 2020, but we
# specified 31. So this overflows 3 days past day 28.
add_months(x, -1, invalid = "overflow")
[1] "2022-03-03"
You can also specify invalid to be NA or if you leave off this argument you could get an error.
Technically you cannot add/subtract 1 month to all dates (although you can add/subtract 30 days to all dates, but I suppose, that's not something you want). I think this is what you are looking for
> lubridate::ceiling_date(as.Date("2020-01-31"), unit = "month")
[1] "2020-02-01"
> lubridate::floor_date(as.Date("2020-01-31"), unit = "month")
[1] "2020-01-01"
UPDATE, I just realised that Tung-nguyen also wrote the same method and has a two line version here https://stackoverflow.com/a/44690219/19563460
Keeping this answer here so newbies can see different ways of doing it
With the R updates, you can now do this easily in base R using seq.date(). Here are some examples of implementing this that should work without additional packages
ANSWER 1: typing directly
maturity <- as.Date("2012/12/31")
seq(maturity, length.out=2, by="-3 months")[2]
# see here for more help
?seq.date
ANSWER 2: Adding in some flexibility, e.g. 'n' months
maturity <- as.Date("2012/12/31")
n <- 3
bytime <- paste("-",n," months",sep="")
seq(maturity,length.out=2,by=bytime)[2]
ANSWER 3: Make a function
# Here's a little function that will let you add X days/months/weeks
# to any base R date. Commented for new users
#---------------------------------------------------------
# MyFunction
# DateIn, either a date or a string that as.Date can convert into one
# TimeBack, number of units back/forward
# TimeUnit, unit of time e.g. "weeks"/"month"/"days"
# Direction can be "back" or "forward", not case sensitive
#---------------------------------------------------------
MyFunction <- function(DateIn,TimeBack,TimeUnit,Direction="back"){
#--- Set up the by string
if(tolower(Direction)=="back"){
bystring <- paste("-",TimeBack," ",tolower(TimeUnit),sep="")
}else{
bystring <- paste(TimeBack," ",tolower(TimeUnit),sep="")
}
#--- Return the new date using seq in the base package
output <- seq(as.Date(DateIn),length.out=2,by=bystring)[2]
return(output)
}
# EXAMPLES
MyFunction("2000-02-29",3,"months","forward")
Answer <- MyFunction(DateIn="2002-01-01",TimeBack=14,
TimeUnit="weeks",Direction="back")
print(Answer)
maturity <- as.Date("2012/12/31")
n <- 3
MyFunction(DateIn=maturity,TimeBack=n,TimeUnit="months",Direction="back")
ANSWER 4: I quite like my little function, so I just uploaded it to my mini personal R package.
This is freely available, so now technically the answer is use the JumpDate function from the Greatrex.Functions package
Can't guarantee it'll work forever and no support available, but you're welcome to use it.
# Install/load my package
install.packages("remotes")
remotes::install_github('hgreatrex/Greatrex.Functions',force=TRUE)
library(Greatrex.Functions)
# run it
maturity <- as.Date("2012/12/31")
n <- 3
Answer <- JumpDate(DateIn=maturity,TimeBack=n,TimeUnit="months",
Direction="back",verbose=TRUE)
print(Answer)
JumpDate("2000-02-29",3,"months","forward")
# Help file here
?Greatrex.Functions::JumpDate
You can see how I made the function/package here:
https://github.com/hgreatrex/Greatrex.Functions/blob/master/R/JumpDate.r
With nice instructions here on making your own mini compilation of functions.
http://web.mit.edu/insong/www/pdf/rpackage_instructions.pdf
and here
How do I insert a new function into my R package?
Hope that helps! I hope it's also useful to see the different levels of designing an answer to a coding problem, depending on how often you need it and the level of flexibility you need.

How to format a Date as "YYYY-Mon" with Lubridate?

I would like to create a vector of dates between two specified moments in time with step 1 month, as described in this thread (Create a Vector of All Days Between Two Dates), to be then converted into factors for data visualization.
However, I'd like to have the dates in the YYYY-Mon, ie. 2010-Feb, format. But so far I managed only to have the dates in the standard format 2010-02-01, using a code like this:
require(lubridate)
first <- ymd_hms("2010-02-07 15:00:00 UTC")
start <- ymd(floor_date(first, unit="month"))
last <- ymd_hms("2017-10-29 20:00:00 UTC")
end <- ymd(ceiling_date(last, unit="month"))
> start
[1] "2010-02-01"
> end
[1] "2017-11-01"
How can I change the format to YYYY-Mon?
You can use format():
start %>% format('%Y-%b')
To create the vector, use seq():
seq(start, end, by = 'month') %>% format('%Y-%b')
Obs: Use capital 'B' for full month name: '%Y-%B'.

Create end of the month date from a date variable

I have a large data frame with date variables, which reflect first day of the month. Is there an easy way to create a new data frame date variable that represents the last day of the month?
Below is some sample data:
date.start.month=seq(as.Date("2012-01-01"),length=4,by="months")
df=data.frame(date.start.month)
df$date.start.month
"2012-01-01" "2012-02-01" "2012-03-01" "2012-04-01"
I would like to return a new variable with:
"2012-01-31" "2012-02-29" "2012-03-31" "2012-04-30"
I've tried the following but it was unsuccessful:
df$date.end.month=seq(df$date.start.month,length=1,by="+1 months")
To get the end of months you could just create a Date vector containing the 1st of all the subsequent months and subtract 1 day.
date.end.month <- seq(as.Date("2012-02-01"),length=4,by="months")-1
date.end.month
[1] "2012-01-31" "2012-02-29" "2012-03-31" "2012-04-30"
Here is another solution using the lubridate package:
date.start.month=seq(as.Date("2012-01-01"),length=4,by="months")
df=data.frame(date.start.month)
library(lubridate)
df$date.end.month <- ceiling_date(df$date.start.month, "month") - days(1)
df$date.end.month
[1] "2012-01-31" "2012-02-29" "2012-03-31" "2012-04-30"
This uses the same concept given by James above, in that it gets the first day of the next month and subtracts one day.
By the way, this will work even when the input date is not necessarily the first day of the month. So for example, today is the 27th of the month and it still returns the correct last day of the month:
ceiling_date(Sys.Date(), "month") - days(1)
[1] "2017-07-31"
Use timeLastDayInMonth from the timeDate package:
df$eom <- timeLastDayInMonth(df$somedate)
library(lubridate)
as.Date("2019-09-01") - days(1)
[1] "2019-08-31"
or
library(lubridate)
as.Date("2019-09-01") + months(1) - days(1)
[1] "2019-09-30"
A straightforward solution would be using the yearmonfunction with the argument frac=1 from the xts-package. frac is a number between 0 and 1 that indicates the fraction of the way through the period that the result represents.
as.Date(as.yearmon(seq.Date(as.Date('2017-02-01'),by='month',length.out = 6)),frac=1)
[1] "2017-02-28" "2017-03-31" "2017-04-30" "2017-05-31" "2017-06-30" "2017-07-31"
Or if you prefer “piping” using magrittr:
seq.Date(as.Date('2017-02-01'),by='month',length.out = 6) %>%
as.yearmon() %>% as.Date(,frac=1)
[1] "2017-02-28" "2017-03-31" "2017-04-30" "2017-05-31" "2017-06-30" "2017-07-31"
A function as below would do the work (assume dt is scalar) -
month_end <- function(dt) {
d <- seq(dt, dt+31, by="days")
max(d[format(d,"%m")==format(dt,"%m")])
}
If you have a vector of Dates, then do the following -
sapply(dates, month_end)
you can use timeperiodsR
date.start.month=seq(as.Date("2012-01-01"),length=4,by="months")
df=data.frame(date.start.month)
df$date.start.month
# install.packages("timeperiodsR")
pm <- previous_month(df$date.start.month[1]) # get previous month
start(pm) # first day of previous month
end(pm) # last day of previous month
seq(pm) # vector with all days of previous month
We can also use bsts::LastDayInMonth:
transform(df, date.end.month = bsts::LastDayInMonth(df$date.start.month))
# date.start.month date.end.month
# 1 2012-01-01 2012-01-31
# 2 2012-02-01 2012-02-29
# 3 2012-03-01 2012-03-31
# 4 2012-04-01 2012-04-30
tidyverse has added the clock package in addition to the lubridate package that has nice functionality for this:
library(clock)
date_build(2012, 1:12, 31, invalid = "previous")
# [1] "2012-01-31" "2012-02-29" "2012-03-31" "2012-04-30" "2012-05-31" "2012-06-30"
# [7] "2012-07-31" "2012-08-31" "2012-09-30" "2012-10-31" "2012-11-30" "2012-12-31"
The invalid argument specifies what to do with an invalid date (e.g. 2012-02-31). From the documentation:
"previous": The previous valid instant in time.
"previous-day": The previous valid day in time, keeping the time of
day.
"next": The next valid instant in time.
"next-day": The next valid day in time, keeping the time of day.
"overflow": Overflow by the number of days that the input is invalid
by. Time of day is dropped.
"overflow-day": Overflow by the number of days that the input is
invalid by. Time of day is kept.
"NA": Replace invalid dates with NA.
"error": Error on invalid dates.

Length of lubridate interval

What's the best way to get the length of time represented by an interval in lubridate, in specified units? All I can figure out is something like the following messy thing:
> ival
[1] 2011-01-01 03:00:46 -- 2011-10-21 18:33:44
> difftime(attr(ival, "start") + as.numeric(ival), attr(ival, "start"), 'days')
Time difference of 293.6479 days
(I also added this as a feature request at https://github.com/hadley/lubridate/issues/105, under the assumption that there's no better way available - but maybe someone here knows of one.)
Update - apparently the difftime function doesn't handle this either. Here's an example.
> (d1 <- as.POSIXct("2011-03-12 12:00:00", 'America/Chicago'))
[1] "2011-03-12 12:00:00 CST"
> (d2 <- d1 + days(1)) # Gives desired result
[1] "2011-03-13 12:00:00 CDT"
> (i2 <- d2 - d1)
[1] 2011-03-12 12:00:00 -- 2011-03-13 12:00:00
> difftime(attr(i2, "start") + as.numeric(i2), attr(i2, "start"), 'days')
Time difference of 23 hours
As I mention below, I think one nice way to handle this would be to implement a /.interval function that doesn't first cast its input to a period.
The as.duration function is what lubridate provides. The interval class is represented internally as the number of seconds from the start, so if you wanted the number of hours you could simply divide as.numeric(ival) by 3600, or by (3600*24) for days.
If you want worked examples of functions applied to your object, you should provide the output of dput(ival). I did my testing on the objects created on the help(duration) page which is where ?interval sent me.
date <- as.POSIXct("2009-03-08 01:59:59") # DST boundary
date2 <- as.POSIXct("2000-02-29 12:00:00")
span <- date2 - date #creates interval
span
#[1] 2000-02-29 12:00:00 -- 2009-03-08 01:59:59
str(span)
#Classes 'interval', 'numeric' atomic [1:1] 2.85e+08
# ..- attr(*, "start")= POSIXct[1:1], format: "2000-02-29 12:00:00"
as.duration(span)
#[1] 284651999s (9.02y)
as.numeric(span)/(3600*24)
#[1] 3294.583
# A check against the messy method:
difftime(attr(span, "start") + as.numeric(span), attr(span, "start"), 'days')
# Time difference of 3294.583 days
This question is really old, but I'm adding an update because this question has been viewed many times and when I needed to do something like this today, I found this page. In lubridate you can now do the following:
d1 <- ymd_hms("2011-03-12 12:00:00", tz = 'America/Chicago')
d2 <- ymd_hms("2011-03-13 12:00:00", tz = 'America/Chicago')
(d1 %--% d2)/dminutes(1)
(d1 %--% d2)/dhours(1)
(d1 %--% d2)/ddays(1)
(d1 %--% d2)/dweeks(1)
Ken, Dividing by days(1) will give you what you want. Lubridate doesn't coerce periods to durations when you divide intervals by periods. (Although the algorithm for finding the exact number of whole periods in the interval does begin with an estimate that uses the interval divided by the analagous number of durations, which might be what you are noticing).
The end result is the number of whole periods that fit in the interval. The warning message alerts the user that it is an estimate because there will be some fraction of a period that is dropped from the answer. Its not sensible to do math with a fraction of a period since we can't modify a clock time with it unless we convert it to multiples of a shorter period - but there won't be a consistent way to make the conversion. For example, the day you mention would be equal to 23 hours, but other days would be equal to 24 hours. You are thinking the right way - periods are an attempt to respect the variations caused by DST, leap years, etc. but they only do this as whole units.
I can't reproduce the error in subtraction that you mention above. It seems to work for me.
three <- force_tz(ymd_hms("2011-03-12 12:00:00"), "")
# note: here in TX, "" *is* CST
(four <- three + days(1))
> [1] "2011-03-13 12:00:00 CDT"
four - days(1)
> [1] "2011-03-12 12:00:00 CST"
Be careful when divinding time in seconds to obtain days as then you are no longer working with abstract representations of time but in bare numbers, which can lead to the following:
> date_f <- now()
> date_i <- now() - days(23)
> as.duration(date_f - date_i)/ddays(1)
[1] 22.95833
> interval(date_i,date_f)/ddays(1)
[1] 22.95833
> int_length(interval(date_i,date_f))/as.numeric(ddays(1))
[1] 22.95833
Which leads to consider that days or months are events in a calendar, not time amounts that can be measured in seconds, miliseconds, etc.
The best way to calculate differences in days is avoiding the transformation into seconds and work with days as a unit:
> e <- now()
> s <- now() - days(23)
> as.numeric(as.Date(s))
[1] 18709
> as.numeric(as.Date(e) - as.Date(s))
[1] 23
However, if you are considering a day as a pure 86400 seconds time span, as ddays() does, the previous approach can lead to the following:
> e <- ymd_hms("2021-03-13 00:00:10", tz = 'UTC')
> s <- ymd_hms("2021-03-12 23:59:50", tz = 'UTC')
> as.duration(e - s)
[1] "20s"
> as.duration(e - s)/ddays(1)
[1] 0.0002314815
> as.numeric(as.Date(e) - as.Date(s))
[1] 1
Hence, it depends on what you are looking for: time difference or calendar difference.

Resources