Inconsistent results from difftime - r

> time1 = strptime("2010-03-01 00:15:00", format = "%Y-%m-%d %H:%M:%S")
> time2a = strptime("2010-03-01", format = "%Y-%m-%d")
> time2b = as.Date(time2a)
> difftime(time1, time2a)
Time difference of 15 mins
> difftime(time1, time2b)
Time difference of 5.25 hours
From the help page of difftime, date object (time2b) is accepted. Why is the result wrong (5.25 hours)?
Thank you.

The first thing difftime does is check for the tz argument. If missing it uses:
if(missing(tx)) {
as.POSIXct(time1)
as.POSIXct(time2)
}
testing that:
> as.POSIXct(time2b)
[1] "2010-02-28 16:00:00 PST"
> as.POSIXct(time2a)
[1] "2010-03-01 PST"
So it applies my timezone offset to the date object.
> difftime(time1,time2a)
Time difference of 15 mins
> difftime(time1,time2b,tz='GMT')
Time difference of 15 mins

Related

R - weird result from difftime function

I was using the difftime function from the base package in R and based on my data I found a couple of weird return values of this function:
> difftime("2014-10-29", "2014-10-21", units = "days")
Time difference of 8.041667 days
> difftime("2020-4-04", "2020-3-28", units = "days")
Time difference of 6.958333 days
Any idea why those values are not integers? Thanks!
All I see in the doc, relevant to it is:
"Note that units = "days" means a period of 24 hours, hence takes no account of Daylight Savings Time. Differences in objects of class "Date" are computed as if in the UTC time zone."
I think you should use as.Date to wrap your date strings, e.g.,
> difftime(as.Date("2014-10-29"), as.Date("2014-10-21"), units = "days")
Time difference of 8 days
> difftime(as.Date("2020-4-04"), as.Date("2020-3-28"), units = "days")
Time difference of 7 days
You can observe the difference with or without as.Date
> (a1 <- as.POSIXct("2014-10-29"))
[1] "2014-10-29 CET"
> (a2 <- as.POSIXct("2014-10-21"))
[1] "2014-10-21 CEST"
> (b1 <- as.POSIXct(as.Date("2014-10-29")))
[1] "2014-10-29 01:00:00 CET"
> (b2 <- as.POSIXct(as.Date("2014-10-21")))
[1] "2014-10-21 02:00:00 CEST"
> c(a1, b1)
[1] "2014-10-29 00:00:00 CET" "2014-10-29 01:00:00 CET"
> c(a2, b2)
[1] "2014-10-21 00:00:00 CEST" "2014-10-21 02:00:00 CEST"
The difftime-function uses as.POSIXct() not as.Date() to convert strings to dates, and this includes the system-specific time-zone (if not otherwise provided). Those pairs of dates contain the change to and from summertime in many time-zones, which may be why the time interval is not an integer.

Why not show the hour,time and second when I use as.Date in R

I coded the below in R and I want to see the hour,time and second format.
However, when I ran the code, it just shows the year,month and day even though I specified the format correctly.
> val <- 12016539307200
> valD <- as.Date(as.POSIXct(val, origin="1970-01-01"),format="%Y%m%d %H%M%S")
> valD
[1] "382758-12-22"
Could you give me a way to solve this issue?
Because it is a Date object, representing a calendar date. To have an object representing time, keep it in POSIXct:
> val <- 12016539307200
> valD <- as.POSIXct(val, origin="1970-01-01", tz = "UTC")
> valD
[1] "382758-12-22 01:20:00 UTC"
If it contains milliseconds, you go for the following:
as.POSIXct(val/1000, origin="1970-01-01")
"2350-10-16 09:35:07 CEST"
or
library(anytime)
anytime(12016539307200/1000)
"2350-10-16 09:35:07 CEST"

R POSIXct returns NA with "03/12/2017 02:17:13"

I have a data set containing the following date, along with several others
03/12/2017 02:17:13
I want to put the whole data set into a data table, so I used read_csv and as.data.table to create DT which contained the date/time information in date.
Next I used
DT[, date := as.POSIXct(date, format = "%m/%d/%Y %H:%M:%S")]
Everything looked fine except I had some NA values where the original data had dates. The following expression returns an NA
as.POSIXct("03/12/2017 02:17:13", format = "%m/%d/%Y %H:%M:%S")
The question is why and how to fix.
Just use functions anytime() or utctime() from package anytime
R> library(anytime)
R> anytime("03/12/2017 02:17:13")
[1] "2017-03-12 01:17:13 CST"
R>
or
R> utctime("03/12/2017 02:17:13")
[1] "2017-03-11 20:17:13 CST"
R>
The real crux is that time did not exists in North America due to DST. You could parse it as UTC as UTC does not observer daylight savings:
R> utctime("03/12/2017 02:17:13", tz="UTC")
[1] "2017-03-12 02:17:13 UTC"
R>
You can express that UTC time as Mountain time, but it gets you the previous day:
R> utctime("03/12/2017 02:17:13", tz="America/Denver")
[1] "2017-03-11 19:17:13 MST"
R>
Ultimately, you (as the analyst) have to provide as to what was measured. UTC would make sense, the others may need adjustment.
My solution is below but ways to improve appreciated.
The explanation for the NA is that in the mountain time zone in the US, that date and time is in the window of the switch to daylight savings where the time doesn't exist, hence NA. While the time zone is not explicitly specified, I guess R must be picking it up from the computer's time, which is in "America/Denver"
The solution is to explicitly state the date/time string is in UTC and then convert back as follows:
time.utc <- as.POSIXct("03/12/2017 02:17:13", format = "%m/%d/%Y %H:%M:%S", tz = "UTC")
> time.utc
[1] "2017-03-12 02:17:13 UTC"
>
Next, add 6 hours to the UTC time which is the difference between UTC and MST
time.utc2 <- time.utc + 6 * 60 * 60
> time.utc2
[1] "2017-03-12 08:17:13 UTC"
>
Now convert to America/Denver time using daylight savings.
time.mdt <- format(time.utc2, usetz = TRUE, tz = "America/Denver")
> time.mdt
[1] "2017-03-12 01:17:13 MST"
>
Note that this is in standard time, because daylight savings doesn't start until 2 am.
If you change the original string from 2 am to 3 am, you get the following
> time.mdt
[1] "2017-03-12 03:17:13 MDT"
>
The hour between 2 and 3 is lost in the change from standard to daylight savings but the data are now correct.

R round a date with timezone

timestamp = 1491800340000
I'm having trouble with some date manipulation in R. The timestamp above is:
2017-04-10T04:59:00.000 GMT
2017-04-09T23:59:00.000 America/Bogota (Local time)
I want to round it to 2017-04-09T00:00:00.000 GMT because my daily aggregations are set to 00:00 GMT.
How can I do that?
Here's what I tried:
> Sys.timezone()
[1] "America/Bogota"
> timestamp = 1491800340000
> date = strptime(timestamp / 1000, "%s");
[1] "2017-04-09 23:59:00 COT"
> midnightLocal = trunc(date, "day");
[1] "2017-04-09 COT"
> midnightUTC = strptime(format(midnightLocal, "%Y-%m-%d"), "%Y-%m-%d", tz = "UTC");
[1] "2017-04-09 UTC"
> truncatedtimestamp = as.integer(format(midnightUTC, "%s"));
[1] 1491714000
which is 2017-04-09T05:00:00.000 GMT (not midnight as I expected). Looks like I failed to specify the timezone somewhere?
I tried many things like POSIXct but did not succeed.
Any hint is appreciated!
Cheers
ps: I'd prefer not to install any package
A little trickery:
timestamp = 1491800340000
ts <- as.POSIXct(timestamp / 1000, origin = "1970-01-01 00:00:00 GMT")
ts2 <- as.Date(trunc(ts, "day"))
attr(ts2, "tzone") <- "GMT"
format(ts2, "%Y-%m-%d %H:%M:%S %Z") # to prove it's midnight
# [1] "2017-04-09 00:00:00 UTC"
class(ts2)
# [1] "Date"
From here you have a couple of options: a little brute-force (numeric conversion) or perhaps the more time-friendly/safe way.
Brute-force numeric:
ts3a <- as.numeric(ts2) * 60*60*24
ts3a
# [1] 1491696000
as.POSIXct(ts3a, origin = "1970-01-01 00:00:00 GMT", tz = "GMT")
# [1] "2017-04-09 GMT"
Time-friendly/safe:
ts3b <- as.POSIXct(ts2)
attr(ts3b, "tzone") <- "GMT"
ts3b
# [1] "2017-04-09 GMT"
(Since they are POSIXct, it's showing the date only because it is midnight; you can easily prove it's correct.)

Create a regular sequence of date-times (POSIXct) using seq()

My goal is to create a vector of POSIXct time stamps given a start, an end and a delta (15min, 1hour, 1day). I hoped I could use seq for this, but I have a problem converting between the numeric and POSIXct representation:
now <- Sys.time()
now
# [1] "2012-01-19 10:30:39 CET"
as.POSIXct(as.double(now), origin="1970-01-01", tz="CET")
# [1] "2012-01-19 09:30:39 CET"
as.POSIXct(as.double(now), origin=as.POSIXct("1970-01-01", tz="CET"), tz="CET")
# [1] "2012-01-19 09:30:39 CET"
One hour gets lost during this conversion. What am I doing wrong?
There is a seq() method for objects of class "POSIXt" which is the super class of the "POSIXlt" and "POSIXct" classes. As such you don't need to do any conversion.
> now <- Sys.time()
> tseq <- seq(from = now, length.out = 100, by = "mins")
> length(tseq)
[1] 100
> head(tseq)
[1] "2012-01-19 10:52:38 GMT" "2012-01-19 10:53:38 GMT"
[3] "2012-01-19 10:54:38 GMT" "2012-01-19 10:55:38 GMT"
[5] "2012-01-19 10:56:38 GMT" "2012-01-19 10:57:38 GMT"
You have to be aware that when converting from POSIXct to numeric, R takes the timezone into account but always starts counting from a GMT origin :
> xgmt <- as.POSIXct('2011-01-01 14:00:00',tz='GMT')
> xest <- as.POSIXct('2011-01-01 14:00:00',tz='EST')
> (as.numeric(xgmt) - as.numeric(xest)) / 3600
[1] -5
As you see, the time in EST is conceived to be five hours earlier than the time in GMT, which is the time difference between both timezones. It's that value that is saved internally.
The as.POSIXCT() function just adds an attribute containing the timezone. It doesn't alter the value, so you get the time presented in GMT time, but with an attribute telling it is EST. This also means that once you go from POSIXct to numeric, you should treat your data as if it's GMT time. (It's a whole lot more complex than that, but it's the general idea). So you have to calculate the offset as follows:
> nest <- as.numeric(xest)
> origin <- as.POSIXct('1970-01-01 00:00:00',tz='EST')
> offset <- as.numeric(origin)
> as.POSIXct(nest-offset,origin=origin)
[1] "2011-01-01 14:00:00 EST"
This works whatever the timezone is in your locale (in my case, that's actually CET). Also note that behaviour of timezone data can differ between systems.
These time zone issues are always fiddly, but I think the problem is that your origin is being calculated in the wrong time zone (since the string only specifies the date).
Try using origin <- now - as.numeric(now).
Alternatively, use lubridate::origin, which is the string "1970-01-01 UTC".
A full solution, again using lubridate.
start <- now()
seq(start, start + days(3), by = "15 min")
I do not have an answer to your problem, but I do have an alternative way of creating vectors of POSIXct objects. If, for example, you want to create a vector of 1000 timestamps from now with a delta_t of 15 minutes:
now = Sys.time()
dt = 15 * 60 # in seconds
timestamps = now + seq(0, 1000) * dt
> head(timestamps)
[1] "2012-01-19 11:17:46 CET" "2012-01-19 11:32:46 CET"
[3] "2012-01-19 11:47:46 CET" "2012-01-19 12:02:46 CET"
[5] "2012-01-19 12:17:46 CET" "2012-01-19 12:32:46 CET"
The trick is you can add a vector of seconds to a POSIXct object.
An alternative to using seq.POSIXt is xts::timeBasedSeq, which allows you to specify the sequence as a string:
library(xts)
now <- Sys.time()
timeBasedSeq(paste("2012-01-01/",format(now),"/H",sep="")) # Hourly steps
timeBasedSeq(paste("2012-01-01/",format(now),"/d",sep="")) # Daily steps
You need to use seq(from=start,to=end, by=step). Note that in step you can either use "days" or an integer defining how many seconds elapse from item to item.

Resources