Set units of difference between datetime objects - r

The diff command returns the differences between dates in a vector of dates in the R date format. I'd like to control the units that are returned, but it seems like they are automatically determined, with no way to control it w/ an argument. Here's an example:
> t = Sys.time()
> diff(c(t, t + 1))
Time difference of 1 secs
And yet:
> diff(c(t, t+10000))
Time difference of 1.157407 days
The "time delta" object has a units attribute, but it seems silly to write a bunch of conditionals to coerce everything into days, seconds etc.

I'm not sure what you mean by "a bunch of conditionals," just change the units manually.
> t = Sys.time()
> a <- diff(c(t,t+1))
> b <- diff(c(t, t+10000))
> units(a) <- "mins"
> units(b) <- "mins"
> a
Time difference of 0.01666667 mins
> b
Time difference of 166.6667 mins
See ?difftime. If you only need to use diff to get the difference between two times (rather than a longer vector), then, as Dirk suggests, use the difftime function with the units parameter.

A POSIXct type (which you created by calling Sys.time()) always use fractional seconds since the epoch.
The difftime() functions merely formats this differently for your reading pleasure. If you actually specify the format, you get what you specified:
R> difftime(t+ 10000,t,unit="secs")
Time difference of 10000 secs
R> difftime(t+ 10000,t,unit="days")
Time difference of 0.115741 days
R>

I think you need difftime in which you can specify the desired units. See:
> difftime(Sys.time(), Sys.time()+10000)
Time difference of -2.777778 hours
> difftime(Sys.time(), Sys.time()+10000, units="secs")
Time difference of -10000 secs

Not sure how precise you care to be, but you can get very specific about date-times with the lubridate package. A wonky thing about time units is that their length depends on when they occur because of leap seconds, leap days, and other conventions.
After you load lubridate, subtracting date times automatically creates a time interval object.
library(lubridate)
int <- Sys.time() - (Sys.time() + 10000)
You can then change it to a duration, which measures the exact length of time. Durations display in seconds because seconds are the only unit that has a consistent length. If you want your answer in a specific unit, just divide by a duration object that has the length of one of those units.
as.duration(int)
int / dseconds(1)
int / ddays(1)
int / dminutes(5) #to use "5 minutes" as a unit
Or you could just change the int to a period. Unlike durations, periods don't have an exact and consistent length. But they faithfully map clock times. You can do math by adding and subtracting both periods and durations to date-times.
as.period(int)
Sys.time() + dseconds(5) + dhours(2) - ddays(1)
Sys.time() + hours(2) + months(5) - weeks(1) #these are periods

If yon need to use the diff() function (e.g. as I do in within a ddply function), you can also turn the input data into numeric format to always receive differences in seconds like this:
> t = Sys.time()
> diff(as.numeric(c(t, t+1)))
[1] 1
> diff(as.numeric(c(t, t+10000)))
[1] 10000
From that point on you can use the diff seconds to calculate differences in other units.

Related

Why when subtracting datetime in R with grouped data, the results are in minutes rather than seconds?

I had thought when subtracting two POSIXct variables that the result would be seconds. However, I have found the situation when my data is grouped that it gives me the result in minutes rather than seconds. Why would that be?
The reason that it chose minutes instead of seconds is that the default is to use a "suitable" set; from ?difftime (which is used for time1 - time2):
If 'units = "auto"', a suitable set of units is chosen, the
largest possible (excluding '"weeks"') in which all the absolute
differences are greater than one.
If you already have a vector vec of time differences, then you can force the units in a couple of ways.
vec <- Sys.time() - (Sys.time() - 8888)
vec
# Time difference of 2.468889 hours
If you want to preserve the difftime attribute (and therefore how it prints/formats), you can do
units(vec)
# [1] "hours"
units(vec) <- "secs"
vec
# Time difference of 8888 secs
or inline (functional, does not change vec):
`units<-`(vec, "secs")
# Time difference of 8888 secs
vec
# Time difference of 2.468889 hours
If you don't need to preserve the difftime attribute, then you can just do
as.numeric(vec, units = "secs")
# [1] 8888
This should work regardless of being in a mutate or a grouped mutate or whatever.

R: Turn timestamps into (as short as possible) integers

Edit 1: I think a possible solution would be to count the number of 15-minute intervals elapsed since a starting date. If anyone has thoughts on this, please come forward. Thanks
As the title says, I am looking for a way to turn timestamps into as small as possible integers.
Explanation of the situation:
I am working with "panelAR". I have T>N panel-data containing different timestamps that look like this (300,000 rows in total):
df$timestamp[1]
[1] "2013-08-01 00:15:00 UTC"
class(df$timestamp)
[1] "POSIXct" "POSIXt"
I am using panelAR and thus need the timestamp as an integer. I can't simply use "as.integer" because I would hit the max length for integers resulting in only NA's. This was my first try to work around this problem:
df$timestamp <- as.numeric(gsub("[: -]", "" , df$timestamp, perl=TRUE))
Subtract the numbers starting at te 3rd position (Because "20" is irrelevant) and stop before the 2nd last position (Because they all end at 00 seconds)
(I need shorter integers in order to not hit the max level of integers in R)
df$timestamp <- substr(df$timestamp, 3, nchar(df$timestamp)-2)
#Save as integer
df$timestamp <- as.integer(df$timestamp)
#Result
df$timestamp[1]
1308010015
This allows panelAR to work with it, but the numbers seem to be way too large. When I try to run a regression with it, i get the following error message:
"cannot allocate vector of size 1052.2 GB"
I am looking for a way to turn these timestamps into (as small as possible) integers in order to work with panelAR.
Any help is greatly appreciated.
this big number that you get corresponds to the number of seconds elapsed since 1970-01-01 00:00:00. Do your time stamps have regular intervals? If it is, let's say, every 15 minutes you could divide all integers by 900, and it might help.
Another option is to pick your earliest date and subtract it from the others
#generate some dates:
a <- as.POSIXct("2013-01-01 00:00:00 UTC")
b <- as.POSIXct("2013-08-01 00:15:00 UTC")
series <- seq(a,b, by = 'min')
#calculate the difference (result are integers/seconds)
integer <- as.numeric(series - min(series))
If you still get memory problems, I might combine both.
I managed to solve the main question. Since this still results in a memory error, I think it stems from the number of observations and the way panelAR computes things. I will open a separate question for that matter.
I used
df$timestampnew <- as.integer(difftime(df$timestamp, "2013-01-01 00:00:00", units = "min")/15)
to get integers that count the number of 15-min intervals elapsed since a certain date.

Get current time in milliseconds

I am trying to do an API call which requires a time in milliseconds. I am pretty new in R and been Googling for hours to achieve something like what In Java would be:
System.currentTimeMillis();
Only thing i see is stuff like
Sys.Date() and Sys.time
which returns a formatted date instead of time in millis.
I hope someone can give me a oneliner which solves my problem.
Sys.time does not return a "formatted time". It returns a POSIXct classed object, which is the number of seconds since the Unix epoch. Of course, when you print that object, it returns a formatted time. But how something prints is not what it is.
To get the current time in milliseconds, you just need to convert the output of Sys.time to numeric, and multiply by 1000.
R> print(as.numeric(Sys.time())*1000, digits=15)
[1] 1476538955719.77
Depending on the API call you want to make, you might need to remove the fractional milliseconds.
No need for setting the global variable digits.secs.
See strptime for details.
# Print milliseconds of current time
# See ?strptime for details, specifically
# the formatting option %OSn, where 0 <= n <= 6
as.numeric(format(Sys.time(), "%OS3")) * 1000
To get current epoch time (in second):
as.numeric(Sys.time())
If you want to get the time difference (for computing duration for example), just subtract Sys.time() directly and you will get nicely formatted string:
currentTs <- Sys.time()
# about five seconds later
elapsed <- Sys.time() - currentTs
print(elapsed) # Time difference of 4.926194 secs

R difftime subtracts 2 days

I have some timedelta strings which were exported from Python. I'm trying to import them for use in R, but I'm getting some weird results.
When the timedeltas are small, I get results that are off by 2 days, e.g.:
> as.difftime('26 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of 24.20389 days
When they are larger, it doesn't work at all:
> as.difftime('36 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of NA secs
I also read into 'R' some time delta objects I had processed with 'Python' and had a similar issue with the 26 days 04:53:36.000000000 format. As Gregor said, %d in strptime is the day of the month as a zero padded decimal number so won't work with numbers >31 and there doesn't seem to be an option for cumulative days (probably because strptime is for date time objects and not time delta objects).
My solution was to convert the objects to strings and extract the numerical data as Gregor suggested and I did this using the gsub function.
# convert to strings
data$tdelta <- as.character(data$tdelta)
# extract numerical data
days <- as.numeric(gsub('^.*([0-9]+) days.*$','\\1',data$tdelta))
hours <- as.numeric(gsub('^.*ys ([0-9]+):.*$','\\1',data$tdelta))
minutes <- as.numeric(gsub('^.*:([0-9]+):.*$','\\1',data$tdelta))
seconds <- as.numeric(gsub('^.*:([0-9]+)..*$','\\1',data$tdelta))
# add up numerical components to whatever units you want
time_diff_seconds <- seconds + minutes*60 + hours*60*60 + days*24*60*60
# add column to data frame
data$tdelta <- time_diff_seconds
That should allow you to do computations with the time differences. Hope that helps.

Convert a date string "yyyy-mm-dd" to milliseconds since epoch

I have some numbers that represent dates in milliseconds since epoch, 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970
1365368400000,
1365973200000,
1366578000000
I'm converting them to date format:
as.Date(as.POSIXct(my_dates/1000, origin="1970-01-01", tz="GMT"))
answer:
[1] "2013-04-07" "2013-04-14" "2013-04-21"
How to convert these strings back to milliseconds since epoch?
Here are your javascript dates
x <- c(1365368400000, 1365973200000, 1366578000000)
You can convert them to R dates more easily by dividing by the number of milliseconds in one day.
y <- as.Date(x / 86400000, origin = "1970-01-01")
To convert back, just convert to numeric and multiply by this number.
z <- as.numeric(y) * 86400000
Finally, check that the answer is what you started with.
stopifnot(identical(x, z))
As per the comment, you may sometimes get numerical rounding errors leading to x and z not being identical. For numerical comparisons like this, use:
library(testthat)
expect_equal(x, z)
I will provide a simple framework to handle various kinds of dates encoding and how to go back an forth. Using the R package ‘lubridate’ this is made very easy using the period and interval classes.
When dealing with days, it can be easy as one can use the as.numeric(Date) to get the number of dates since the epoch. To get any unit of time smaller than a day one can convert using the various factors (24 for hours, 24 * 60 for minutes, etc.) However, for months, the math can get a bit more tricky and thus I prefer in many instances to use this method.
library(lubridate)
as.period(interval(start = epoch, end = Date), unit = 'month')#month
This can be used for year, month, day, hour, minute, and smaller units through apply the factors.
Going the other way such as being given months since epoch:
library(lubridate)
epoch %m+% as.period(Date, unit = 'months')
I presented this approach with months as it might be the more complicated one. An advantage to using period and intervals is that it can be adjusted to any epoch and unit very easily.

Resources