R - weird result from difftime function - r

I was using the difftime function from the base package in R and based on my data I found a couple of weird return values of this function:
> difftime("2014-10-29", "2014-10-21", units = "days")
Time difference of 8.041667 days
> difftime("2020-4-04", "2020-3-28", units = "days")
Time difference of 6.958333 days
Any idea why those values are not integers? Thanks!
All I see in the doc, relevant to it is:
"Note that units = "days" means a period of 24 hours, hence takes no account of Daylight Savings Time. Differences in objects of class "Date" are computed as if in the UTC time zone."

I think you should use as.Date to wrap your date strings, e.g.,
> difftime(as.Date("2014-10-29"), as.Date("2014-10-21"), units = "days")
Time difference of 8 days
> difftime(as.Date("2020-4-04"), as.Date("2020-3-28"), units = "days")
Time difference of 7 days
You can observe the difference with or without as.Date
> (a1 <- as.POSIXct("2014-10-29"))
[1] "2014-10-29 CET"
> (a2 <- as.POSIXct("2014-10-21"))
[1] "2014-10-21 CEST"
> (b1 <- as.POSIXct(as.Date("2014-10-29")))
[1] "2014-10-29 01:00:00 CET"
> (b2 <- as.POSIXct(as.Date("2014-10-21")))
[1] "2014-10-21 02:00:00 CEST"
> c(a1, b1)
[1] "2014-10-29 00:00:00 CET" "2014-10-29 01:00:00 CET"
> c(a2, b2)
[1] "2014-10-21 00:00:00 CEST" "2014-10-21 02:00:00 CEST"

The difftime-function uses as.POSIXct() not as.Date() to convert strings to dates, and this includes the system-specific time-zone (if not otherwise provided). Those pairs of dates contain the change to and from summertime in many time-zones, which may be why the time interval is not an integer.

Related

R: How can I convert non-intuitive strings that describe hours, minutes, and seconds to a workable POSIXct format to perform standard arithmatic on?

I have a data set in R that has values in hours, minutes, and seconds format. However, some values only have hours and minutes, some only have minutes and seconds, some only have minutes, and some only have seconds. It's also not formatted very favorably. Sample data can be found below:
example <- as.data.frame(c("22h28m", "17m7s", "15m", "14s"))
I'd like to convert it to a POSIXct format like the below, with the goal of adding/subtracting time:
Column Title
22:28:00
00:17:07
00:15:00
00:00:14
I've tried as POSIXct() and strptime() formulas, but to no avail. Any help would be greatly appreciated - thanks!
Maybe parse_date_time from lubridate?
library("lubridate")
x <- c("22h28m", "17m7s", "15m", "14s")
y <- parse_date_time(x, orders = c("%Hh%Mm%Ss", "%Hh%Mm", "%Hh%Ss", "%Mm%Ss", "%Hh", "%Mm", "%Ss"), exact = TRUE)
y
## [1] "0000-01-01 22:28:00 UTC"
## [2] "0000-01-01 00:17:07 UTC"
## [3] "0000-01-01 00:15:00 UTC"
## [4] "0000-01-01 00:00:14 UTC"
To get numbers of seconds since midnight, you could do:
y0 <- floor_date(y, unit = "day")
dy <- y - y0
dy
## Time differences in secs
## [1] 80880 1027 900 14
Then you could add dy to any reference time specified as a length-1 POSIXct object. For example, the most recent midnight in the current time zone:
y0 <- floor_date(now(), unit = "day")
y0 + dy
## [1] "2022-02-03 22:28:00 EST"
## [2] "2022-02-03 00:17:07 EST"
## [3] "2022-02-03 00:15:00 EST"
## [4] "2022-02-03 00:00:14 EST"
Update
After reading the documentation a bit more carefully, I am realizing that lubridate implements a way to obtain dy directly.
dy <- duration(toupper(x))
dy
## [1] "80880s (~22.47 hours)" "1027s (~17.12 minutes)" "900s (~15 minutes)" "14s"
Then you can do y0 + dy as above to obtain a POSIXct object, and, if you like,
strftime(y0 + dy, "%T")
## [1] "22:28:00" "00:17:07" "00:15:00" "00:00:14"
to obtain a character vector listing the times without dates or time zones.

lubridate [R]: difference between a date and now() to be added to a date

I am struggling a bit with lubridate
I have a date series in the past in df$mydate variable as POSIXct. I want to take max(df$mydate) subtract it from now(), then subtract 2 more days from that time interval - i.e. make the interval 2 days shorter than the difference between the latest date of the series and today. The obtained time interval then should be added to all dates in df$mydate so that the dates block is brought forward to end 2 days in the past from today.
How can I do this with lubridate?
when I try to convert now() - max(df$mydate) to interval I get an empty interval. So I do not even get to step 2 - shortening the interval by 2 days and to step 3 - trying to then add this time length to dates I have.
The Idee with lubridate is to take care of all the transformation between intervals and dates for you so you don't need to think about it. This simple code does exactly that what you want.
library(lubridate)
my_date <-as.POSIXlt(paste0("2009-08-",1:10))
time_diff <- now() - max(my_date)
time_diff_short = time_diff - 2
my_date + time_diff_short
What I found was that you need my_date to be of the format POSIXlt
You can use difftime from base to get the time difference to now Sys.time() and subtract 2 days.
x <- x + (difftime(Sys.time(), max(x), units = "days") - 2)
x
#[1] "2020-09-11 10:32:20 CEST" "2020-09-12 10:32:20 CEST"
#[3] "2020-09-13 10:32:20 CEST" "2020-09-14 10:32:20 CEST"
Sys.time()
#[1] "2020-09-16 10:32:20 CEST"
Data:
(x <- seq(as.POSIXct("2000-01-01 12:00:00"), length.out = 4, by = "days"))
#[1] "2000-01-01 12:00:00 CET" "2000-01-02 12:00:00 CET"
#[3] "2000-01-03 12:00:00 CET" "2000-01-04 12:00:00 CET"

How to get the beginning of the day in POSIXct

My day starts at 2016-03-02 00:00:00. Not 2016-03-02 00:00:01.
How do I get the beginning of the day in POSIXct in local time?
My confusing probably comes from the fact that R sees this as the end-date of 2016-03-01? Given that R uses an ISO 8601?
For example if I try to find the beginning of the day using Sys.Date():
as.POSIXct(Sys.Date(), tz = "CET")
"2016-03-01 01:00:00 CET"
Which is not correct - but are there other ways?
I know I can hack my way out using a simple
as.POSIXct(paste(Sys.Date(), "00:00:00", sep = " "), tz = "CET")
But there has to be a more correct way to do this? Base R preferred.
It's a single command---but you want as.POSIXlt():
R> as.POSIXlt(Sys.Date())
[1] "2016-03-02 UTC"
R> format(as.POSIXlt(Sys.Date()), "%Y-%m-%d %H:%M:%S")
[1] "2016-03-02 00:00:00"
R>
It is only when converting to POSIXct happens that the timezone offset to UTC (six hours for me) enters:
R> as.POSIXct(Sys.Date())
[1] "2016-03-01 18:00:00 CST"
R>
Needless to say by wrapping both you get the desired type and value:
R> as.POSIXct(as.POSIXlt(Sys.Date()))
[1] "2016-03-02 UTC"
R>
Filed under once again no need for lubridate or other non-Base R packages.
Notwithstanding that you understandably prefer base R, a "smart way," for certain meaning of "smart," would be:
library(lubridate)
x <- floor_date(Sys.Date(),"day")
> format(x,"%Y-%m-%d-%H-%M-%S")
[1] "2016-03-02-00-00-00"
From ?floor_date:
floor_date takes a date-time object and rounds it down to the nearest
integer value of the specified time unit.
Pretty handy.
Your example is a bit unclear.
You are talking about a 1 minute difference for the day start, but your example shows a 1 hour difference due to the timezone.
You can try
?POSIXct
to get the functionality explained.
Using Sys.Date() withing POSIXct somehow overwrites your timezone setting.
as.POSIXct(Sys.Date(), tz="EET")
"2016-03-01 01:00:00 CET"
While entering a string gives you
as.POSIXct("2016-03-01 00:00:00", tz="EET")
"2016-03-01 EET"
It looks like 00:00:00 is actually the beginning of the day. You can conclude it from the results of the following 2 inequalities
as.POSIXct("2016-03-02 00:00:02 CET")>as.POSIXct("2016-03-02 00:00:01 CET")
TRUE
as.POSIXct("2016-03-02 00:00:01 CET")>as.POSIXct("2016-03-02 00:00:00 CET")
TRUE
So somehow this is a timezone issue. Notice that 00:00:00 is automatically removed from the as.POSIXct result.
as.POSIXct("2016-03-02 00:00:00 CET")
"2016-03-02 CET"

Inconsistent results from difftime

> time1 = strptime("2010-03-01 00:15:00", format = "%Y-%m-%d %H:%M:%S")
> time2a = strptime("2010-03-01", format = "%Y-%m-%d")
> time2b = as.Date(time2a)
> difftime(time1, time2a)
Time difference of 15 mins
> difftime(time1, time2b)
Time difference of 5.25 hours
From the help page of difftime, date object (time2b) is accepted. Why is the result wrong (5.25 hours)?
Thank you.
The first thing difftime does is check for the tz argument. If missing it uses:
if(missing(tx)) {
as.POSIXct(time1)
as.POSIXct(time2)
}
testing that:
> as.POSIXct(time2b)
[1] "2010-02-28 16:00:00 PST"
> as.POSIXct(time2a)
[1] "2010-03-01 PST"
So it applies my timezone offset to the date object.
> difftime(time1,time2a)
Time difference of 15 mins
> difftime(time1,time2b,tz='GMT')
Time difference of 15 mins

Create a regular sequence of date-times (POSIXct) using seq()

My goal is to create a vector of POSIXct time stamps given a start, an end and a delta (15min, 1hour, 1day). I hoped I could use seq for this, but I have a problem converting between the numeric and POSIXct representation:
now <- Sys.time()
now
# [1] "2012-01-19 10:30:39 CET"
as.POSIXct(as.double(now), origin="1970-01-01", tz="CET")
# [1] "2012-01-19 09:30:39 CET"
as.POSIXct(as.double(now), origin=as.POSIXct("1970-01-01", tz="CET"), tz="CET")
# [1] "2012-01-19 09:30:39 CET"
One hour gets lost during this conversion. What am I doing wrong?
There is a seq() method for objects of class "POSIXt" which is the super class of the "POSIXlt" and "POSIXct" classes. As such you don't need to do any conversion.
> now <- Sys.time()
> tseq <- seq(from = now, length.out = 100, by = "mins")
> length(tseq)
[1] 100
> head(tseq)
[1] "2012-01-19 10:52:38 GMT" "2012-01-19 10:53:38 GMT"
[3] "2012-01-19 10:54:38 GMT" "2012-01-19 10:55:38 GMT"
[5] "2012-01-19 10:56:38 GMT" "2012-01-19 10:57:38 GMT"
You have to be aware that when converting from POSIXct to numeric, R takes the timezone into account but always starts counting from a GMT origin :
> xgmt <- as.POSIXct('2011-01-01 14:00:00',tz='GMT')
> xest <- as.POSIXct('2011-01-01 14:00:00',tz='EST')
> (as.numeric(xgmt) - as.numeric(xest)) / 3600
[1] -5
As you see, the time in EST is conceived to be five hours earlier than the time in GMT, which is the time difference between both timezones. It's that value that is saved internally.
The as.POSIXCT() function just adds an attribute containing the timezone. It doesn't alter the value, so you get the time presented in GMT time, but with an attribute telling it is EST. This also means that once you go from POSIXct to numeric, you should treat your data as if it's GMT time. (It's a whole lot more complex than that, but it's the general idea). So you have to calculate the offset as follows:
> nest <- as.numeric(xest)
> origin <- as.POSIXct('1970-01-01 00:00:00',tz='EST')
> offset <- as.numeric(origin)
> as.POSIXct(nest-offset,origin=origin)
[1] "2011-01-01 14:00:00 EST"
This works whatever the timezone is in your locale (in my case, that's actually CET). Also note that behaviour of timezone data can differ between systems.
These time zone issues are always fiddly, but I think the problem is that your origin is being calculated in the wrong time zone (since the string only specifies the date).
Try using origin <- now - as.numeric(now).
Alternatively, use lubridate::origin, which is the string "1970-01-01 UTC".
A full solution, again using lubridate.
start <- now()
seq(start, start + days(3), by = "15 min")
I do not have an answer to your problem, but I do have an alternative way of creating vectors of POSIXct objects. If, for example, you want to create a vector of 1000 timestamps from now with a delta_t of 15 minutes:
now = Sys.time()
dt = 15 * 60 # in seconds
timestamps = now + seq(0, 1000) * dt
> head(timestamps)
[1] "2012-01-19 11:17:46 CET" "2012-01-19 11:32:46 CET"
[3] "2012-01-19 11:47:46 CET" "2012-01-19 12:02:46 CET"
[5] "2012-01-19 12:17:46 CET" "2012-01-19 12:32:46 CET"
The trick is you can add a vector of seconds to a POSIXct object.
An alternative to using seq.POSIXt is xts::timeBasedSeq, which allows you to specify the sequence as a string:
library(xts)
now <- Sys.time()
timeBasedSeq(paste("2012-01-01/",format(now),"/H",sep="")) # Hourly steps
timeBasedSeq(paste("2012-01-01/",format(now),"/d",sep="")) # Daily steps
You need to use seq(from=start,to=end, by=step). Note that in step you can either use "days" or an integer defining how many seconds elapse from item to item.

Resources