Interval of two dates not output correctly in R - r

Wanted to get the interval between 2 days displayed in days. Using lubridate package.
Method 1 using interval function.
library(lubridate)
date1 <- as.Date("2022-08-08")
date2 <- as.Date("2022-09-08")
x <- interval(date1, date2)
print(x)
days(x)
Output as follows
[1] 2022-08-08 UTC--2022-09-08 UTC
[1] "2678400d 0H 0M 0S"
Question: Why the answer is not correct? 2678400 days!
Method 2 using difftime function.
y <- difftime(date1, date2, units="days")
print(y)
Output as follows
Time difference of 31 days
The thing is, I want it to display only 31 instead of the whole sentence "Time difference of 31 days"
Need some guidance here.

lubridate::days() works with numerics. You've given it a period. as.numeric(x) gives 2678400 (the number of seconds between 2022-08-08 and 2022-09-08?). You're a victim of implicit coercion.
#jay.sf has given you the solution for difftime. To get the correct answer using lubridate:
time_length(x, "days")
[1] 31
#JustJames gave the full explanation of what went wrong in their now-deleted answer:
"According to the docs
as.interval changes difftime, Duration, Period and numeric class objects to intervals that begin at the specified date-time. Numeric objects are first coerced to timespans equal to the numeric value in seconds."

Try this,
date1 <- as.Date("2022-08-08")
date2 <- as.Date("2022-09-08")
dateDiff <- as.numeric(difftime(date2, date1))
print(dateDiff)
Output
> dateDiff = as.numeric(difftime(date2, date1))
> print(dateDiff)
[1] 31
Hope this helps!

Related

How to use cut function on dates

I have the following two dates:
dates <- c("2019-02-01", "2019-06-30")
I want to create the following bins from above two dates:
2019-05-30, 2019-04-30, 2019-03-31, 2019-02-28
I used cut function along with seq,
dt <- as.Date(dates)
cut(seq(dt[1], dt[2], by = "month"), "month")
but this does not produce correct results.
Could you please shed some light on the use of cut function on dates?
We assume that what is wanted is all end of months between but not including the 2 dates in dates. In the question dates[1] is the beginning of the month and dates[2] is the end of the month but we do not assume that although if we did it might be simplified. We have produced descending series below but usually in R one uses ascending.
The first approach below uses a monthly sequence and cut and the second approach below uses a daily sequence.
No packages are used.
1) We define a first of the month function, fom, which given a Date or character date gives the Date of the first of the month using cut. Then we calculate monthly dates between the first of the months of the two dates, convert those to end of the month and then remove any dates that are not strictly between the dates in dates.
fom <- function(x) as.Date(cut(as.Date(x), "month"))
s <- seq(fom(dates[2]), fom(dates[1]), "-1 month")
ss <- fom(fom(s) + 32) - 1
ss[ss > dates[1] & ss < dates[2]]
## [1] "2019-05-31" "2019-04-30" "2019-03-31" "2019-02-28"
2) Another approach is to compute a daily sequence between the two elements of dates after converting to Date class and then only keep those for which the next day has a different month and is between the dates in dates. This does not use cut.
dt <- as.Date(dates)
s <- seq(dt[2], dt[1], "-1 day")
s[as.POSIXlt(s)$mon != as.POSIXlt(s+1)$mon & s > dt[1] & s < dt[2]]
## [1] "2019-05-31" "2019-04-30" "2019-03-31" "2019-02-28"
There is no need for cut here:
library(lubridate)
dates <- c("2019-02-01", "2019-06-30")
seq(min(ymd(dates)), max(ymd(dates)), by = "months") - 1
#> [1] "2019-01-31" "2019-02-28" "2019-03-31" "2019-04-30" "2019-05-31"
Created on 2021-11-25 by the reprex package (v2.0.1)

sequence of monthly dates making sure it's the same day, or the last day of month in case of invalid

Given an initial date, I want to generate a sequence of dates with monthly intervals, ensuring every element has the same day as the initial date or the last day of the month in case the same day would yield an invalid date.
Sounds pretty standard, right?
Using difftime is not possible. Here's what the help file of difftime says:
Units such as "months" are not possible as they are not of constant
length. To create intervals of months, quarters or years use seq.Date
or seq.POSIXt.
But then looking at the help file of seq.POSIXt I find that:
Using "month" first advances the month without changing the day: if
this results in an invalid day of the month, it is counted forward
into the next month: see the examples.
This is the example in the help file.
seq(ISOdate(2000,1,31), by = "month", length.out = 4)
> seq(ISOdate(2000,1,31), by = "month", length.out = 4)
[1] "2000-01-31 12:00:00 GMT" "2000-03-02 12:00:00 GMT"
"2000-03-31 12:00:00 GMT" "2000-05-01 12:00:00 GMT"
So, given that the initial date is on day 31, this would yield invalid dates on February, April, etc. So, the sequence end up actually skipping those months because it "counts forward" and end up with March-02, instead of February-29.
If I start on 2000-01-31, I would like the sequence as follows:
2000-01-31
2000-02-29
2000-03-31
2000-04-30
...
And it should properly handle leap-years, so if the initial date is 2015-01-31 the sequence should be:
2015-01-31
2015-02-28
2015-03-31
2015-04-30
...
These are just examples to illustrate the problem and I do not know the initial date in advance, nor can I assume anything about it. The initial date may well be in the middle of the month (2015-01-15) in which case seq works fine. But it can also be, as in the examples, towards the end of the month on dates that using seq alone would be problematic (days 29, 30 and 31). I cannot assume either that the initial date is the last day of the month.
I have looked around trying to find a solution. In some questions here in SO (e.g. here) there is a "trick" to get the last day of a month, by getting the first day of the next month and simply subtract 1. And finding the first day is "easy" because it is just day 1.
So my solution so far is:
# Given an initial date for my sequence
initial_date <- as.Date("2015-01-31")
# Find the first day of the month
library(magrittr) # to use pipes and make the code more readable
firs_day_of_month <- initial_date %>%
format("%Y-%m") %>%
paste0("-01") %>%
as.Date()
# Generate a sequence from initial date, using seq
# This is the sequence that will have incorrect values in months that would
# have invalid dates
given_dat_seq <- seq(initial_date, by = "month", length.out = 4)
# And then generate an auxiliary sequence for the last day of the month
# I do this generating a sequence that starts the first day of the
# same month as initial date and it goes one month further
# (lenght 5 instead of 4) and substract 1 to all the elements
last_day_seq <- seq(firs_day_of_month, by = "month", length.out = 5)-1
# And finally, for each pair of elements, I take the min date of both
pmin(given_dat_seq, last_day_seq[2:5])
It works, but it is, at the same time, kinda dumb, hacky and convoluted. So I do not like it. And most importantly, I cannot believe there is no easier way to do this in R.
Can someone please point me to a simpler solution? (I guess it should have been as simple as seq(initial_date, "month", 4), but apparently it is not). I've googled it and looked here in SO and R mailing lists, but apart from the tricks I mentioned above, I couldn't find a solution.
The simplest solution is %m+% from lubridate, which solves this exact problem. So:
seq_monthly <- function(from,length.out) {
return(from %m+% months(c(0:(length.out-1))))
}
Output:
> seq_monthly(as.Date("2015-01-31"),length.out=4)
[1] "2015-01-31" "2015-02-28" "2015-03-31" "2015-04-30"
Similar to the lubridate answer, here is one using RcppBDT (which wraps the Boost Date.Time library from C++)
R> dt <- new(bdtDt, 2010, 1, 31); for (i in 1:5) { dt$addMonths(i); print(dt) }
[1] "2010-02-28"
[1] "2010-04-30"
[1] "2010-07-31"
[1] "2010-11-30"
[1] "2011-04-30"
R> dt <- new(bdtDt, 2000, 1, 31); for (i in 1:5) { dt$addMonths(i); print(dt) }
[1] "2000-02-29"
[1] "2000-04-30"
[1] "2000-07-31"
[1] "2000-11-30"
[1] "2001-04-30"
R>

Calendar date arithmetic in R

Is there a way in R to do calendar arithmetic, e.g.
> as.Date('2014-03-30') - months(1)
[1] 2014-02-28
except in reality there's no such months function. This can be done with awareness of leap years and daylight savings time in SQL and Java, but I can't find a way to do it in R. I thought I'd get clever and use seq but no:
> seq(as.POSIXct('2014-03-30', tz='UTC'), by = '-1 months', length=2)[2]
[1] "2014-03-02 UTC"
Here is one way using RcppBDT which wraps
(parts of) Boost Date_Time for use by R:
R> library(RcppBDT)
R> dt <- new(bdtDt, 2014, 3, 30)
R> dt
[1] "2014-03-30"
R> dt$addMonths(-1)
R> dt
[1] "2014-02-28"
R>
This is too long for an organized comment, but months is a function, and using as.POSIXlt (as opposed to ct) can allow for easy extraction of date attributes.
test <- as.POSIXlt('2014-03-30', tz='UTC')
attributes(test)$names
test$mon
months(test)
Given that test$mon returns a numeric value, it would be easy to perform arithmetic on the months. However, subtracting 1 month from January just gives you -1 (Jan is 0), and redefining test$mon <- test$mon - 1 doesn't seem to be of much help.
Nonetheless, depending on your application, the above information may still be useful.

Strip the date and keep the time

Lots of people ask how to strip the time and keep the date, but what about the other way around? Given:
myDateTime <- "11/02/2014 14:22:45"
I would like to see:
myTime
[1] "14:22:45"
Time zone not necessary.
I've already tried (from other answers)
as.POSIXct(substr(myDateTime, 12,19),format="%H:%M:%S")
[1] "2013-04-13 14:22:45 NZST"
The purpose is to analyse events recorded over several days by time of day only.
Thanks
Edit:
It turns out there's no pure "time" object, so every time must also have a date.
In the end I used
as.POSIXct(as.numeric(as.POSIXct(myDateTime)) %% 86400, origin = "2000-01-01")
rather than the character solution, because I need to do arithmetic on the results. This solution is similar to my original one, except that the date can be controlled consistently - "2000-01-01" in this case, whereas my attempt just used the current date at runtime.
I think you're looking for the format function.
(x <- strptime(myDateTime, format="%d/%m/%Y %H:%M:%S"))
#[1] "2014-02-11 14:22:45"
format(x, "%H:%M:%S")
#[1] "14:22:45"
That's character, not "time", but would work with something like aggregate if that's what you mean by "analyse events recorded over several days by time of day only."
If the time within a GMT day is useful for your problem, you can get this with %%, the remainder operator, taking the remainder modulo 86400 (the number of seconds in a day).
stamps <- c("2013-04-12 19:00:00", "2010-04-01 19:00:01", "2018-06-18 19:00:02")
as.numeric(as.POSIXct(stamps)) %% 86400
## [1] 0 1 2

How do you convert POSIX date to day of year?

The title has it: how do you convert a POSIX date to day-of-year?
An alternative is to format the "POSIXt" object using strftime():
R> today <- Sys.time()
R> today
[1] "2012-10-19 19:12:04 BST"
R> doy <- strftime(today, format = "%j")
R> doy
[1] "293"
R> as.numeric(doy)
[1] 293
which is preferable to remembering that the day of the years is zero-based in the POSIX standard.
As ?POSIXlt reveals, a $yday suffix to a POSIXlt date (or even a vector of such) will convert to day of year. Beware that POSIX counts Jan 1 as day 0, so you might want to add 1 to the result.
It took me embarrassingly long to find this, so I thought I'd ask and answer my own question.
Alternatively, the excellent lubridate package provides the yday function, which is just a wrapper for the above method. It conveniently defines similar functions for other units (month, year, hour, ...).
today <- Sys.time()
yday(today)
I realize it isn't quite what the poster was looking for, but I needed to convert POSIX date-times into a fractional day of the year for time series analysis and ended up doing this:
today <- Sys.time()
doy2015f<-difftime(today,as.POSIXct(as.Date("2015-01-01 00:00", tzone="GMT")),units='days')
The data.table package also provides a yday() function.
require(data.table)
today <- Sys.time()
yday(today)
This is the way how I do it:
as.POSIXlt(c("15.4", "10.5", "15.5", "10.6"), format = "%d.%m")$yday
# [1] 104 129 134 160

Resources