NA difference in as.difftime R - r

It might seem as if it is duplicate of Find time difference in days with R but I guess it is not.
The problem is simple. I have two time stamps (format='%H:%M:%S'):
times <- c('02:51:43', '02:45:52')
and I want to calculate the time difference, however my attempt results with an unwanted behaviour:
as.difftime(times[1], times[2])
# Time difference of NA secs
I tried to specify format along with the units='secs', but I get the error that the argument time2 is not used.
Can someone give me a hint where I make a mistake?
(Sorry in advance, but I ain't even sure if it is reproducible.)

We can convert the times into POSIXct format and then subtract
x1 <- as.POSIXct(times, format = "%H:%M:%S", tz = "UTC")
x1[1] - x1[2]
#Time difference of 5.85 mins
which is also equivalent to
difftime(x1[1], x1[2])

I also encountered this problem, and I assigned the date-time class to the object again then it worked.
Suppose I have 2 date-time objects:
day1<-as.Date('2018-12-31')
day2<-as.Date('2019-12-31')
But this 'Time difference of NA secs' occurred, so I simply do this:
day1<-as.Date(day1)
day2<-as.Date(day2)
Then it worked fine:
difftime(day2,day1,units="days")
#Time difference of 365 days
Hope this helps.

Related

R: Turn timestamps into (as short as possible) integers

Edit 1: I think a possible solution would be to count the number of 15-minute intervals elapsed since a starting date. If anyone has thoughts on this, please come forward. Thanks
As the title says, I am looking for a way to turn timestamps into as small as possible integers.
Explanation of the situation:
I am working with "panelAR". I have T>N panel-data containing different timestamps that look like this (300,000 rows in total):
df$timestamp[1]
[1] "2013-08-01 00:15:00 UTC"
class(df$timestamp)
[1] "POSIXct" "POSIXt"
I am using panelAR and thus need the timestamp as an integer. I can't simply use "as.integer" because I would hit the max length for integers resulting in only NA's. This was my first try to work around this problem:
df$timestamp <- as.numeric(gsub("[: -]", "" , df$timestamp, perl=TRUE))
Subtract the numbers starting at te 3rd position (Because "20" is irrelevant) and stop before the 2nd last position (Because they all end at 00 seconds)
(I need shorter integers in order to not hit the max level of integers in R)
df$timestamp <- substr(df$timestamp, 3, nchar(df$timestamp)-2)
#Save as integer
df$timestamp <- as.integer(df$timestamp)
#Result
df$timestamp[1]
1308010015
This allows panelAR to work with it, but the numbers seem to be way too large. When I try to run a regression with it, i get the following error message:
"cannot allocate vector of size 1052.2 GB"
I am looking for a way to turn these timestamps into (as small as possible) integers in order to work with panelAR.
Any help is greatly appreciated.
this big number that you get corresponds to the number of seconds elapsed since 1970-01-01 00:00:00. Do your time stamps have regular intervals? If it is, let's say, every 15 minutes you could divide all integers by 900, and it might help.
Another option is to pick your earliest date and subtract it from the others
#generate some dates:
a <- as.POSIXct("2013-01-01 00:00:00 UTC")
b <- as.POSIXct("2013-08-01 00:15:00 UTC")
series <- seq(a,b, by = 'min')
#calculate the difference (result are integers/seconds)
integer <- as.numeric(series - min(series))
If you still get memory problems, I might combine both.
I managed to solve the main question. Since this still results in a memory error, I think it stems from the number of observations and the way panelAR computes things. I will open a separate question for that matter.
I used
df$timestampnew <- as.integer(difftime(df$timestamp, "2013-01-01 00:00:00", units = "min")/15)
to get integers that count the number of 15-min intervals elapsed since a certain date.

Calculating Time Differences using xts in R

I was wondering if there was a way to calculate time differences using the xts package without having to convert time values etc. if possible. I have an xts object with a time format given as 2010-02-15 13:35:59.123 (where the .123 is the milliseconds).
Now, I would like to find the number of milliseconds until the end of the day (i.e. 17:00:00). The problem however is that I basically have to do a few conversions of the data before I can do this (such as using as.POSIXct) and this becomes more complicated since I have to do it for several different days and possibly even different times. For this reason, I would prefer to not have to convert the "end of day time" and leave it as 17:00:00 such that in order to find the number of milliseconds between the present time and the end of day time I can just have a fairly simple operation such as 17:00:00.000 - 13:35:59.123 = ...
Is there a simple way to do this with minimal conversions? I'm certain xts has a function which I don't know of but I couldn't find anything in the documentation :/
EDIT: I forgot to mention, I tried the more 'straightforward' route by trying to compute the time differences by first trying to use the function as.POSIXct(16:00:00, format = "%H:%M:%S") but this gives an error, and I'm honestly not sure why...
You should be able to do this using a combination of ave(), .indexDate(), and a custom function. You didn't provide a reproducible example, so here's one using the daily data that comes with xts.
library(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)
secsRemaining <- function(x) { end(x)-index(x) })
tdiff <- ave(x[,1], as.yearmon(index(x)), FUN = secsRemaining)
tdiff[86:92,]
# Open
# 2007-03-28 259200
# 2007-03-29 172800
# 2007-03-30 86400
# 2007-03-31 0
# 2007-04-01 2505600
# 2007-04-02 2419200
# 2007-04-03 2332800
In your case, the call would use .indexDate(x) instead of as.yearmon(index(x)).
tdiff <- ave(x[,1], .indexDate(x), FUN = secsRemaining)
Also note that this call to ave() only works on a 1-column xts object. Seems like a bug that it doesn't. Also note that you have to use FUN = with ave(), since the FUN argument occurs after ....

R difftime subtracts 2 days

I have some timedelta strings which were exported from Python. I'm trying to import them for use in R, but I'm getting some weird results.
When the timedeltas are small, I get results that are off by 2 days, e.g.:
> as.difftime('26 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of 24.20389 days
When they are larger, it doesn't work at all:
> as.difftime('36 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of NA secs
I also read into 'R' some time delta objects I had processed with 'Python' and had a similar issue with the 26 days 04:53:36.000000000 format. As Gregor said, %d in strptime is the day of the month as a zero padded decimal number so won't work with numbers >31 and there doesn't seem to be an option for cumulative days (probably because strptime is for date time objects and not time delta objects).
My solution was to convert the objects to strings and extract the numerical data as Gregor suggested and I did this using the gsub function.
# convert to strings
data$tdelta <- as.character(data$tdelta)
# extract numerical data
days <- as.numeric(gsub('^.*([0-9]+) days.*$','\\1',data$tdelta))
hours <- as.numeric(gsub('^.*ys ([0-9]+):.*$','\\1',data$tdelta))
minutes <- as.numeric(gsub('^.*:([0-9]+):.*$','\\1',data$tdelta))
seconds <- as.numeric(gsub('^.*:([0-9]+)..*$','\\1',data$tdelta))
# add up numerical components to whatever units you want
time_diff_seconds <- seconds + minutes*60 + hours*60*60 + days*24*60*60
# add column to data frame
data$tdelta <- time_diff_seconds
That should allow you to do computations with the time differences. Hope that helps.

Converting time format to numeric with R

In most cases, we convert numeric time to POSIXct format using R. However, if we want to compare two time points, then we would prefer the numeric time format. For example, I have a date format like "2001-03-13 10:31:00",
begin <- "2001-03-13 10:31:00"
Using R, I want to covert this into a numeric (e.g., the Julian time), perhaps something like the passing seconds between 1970-01-01 00:00:00 and 2001-03-13 10:31:00.
Do you have any suggestions?
The Julian calendar began in 45 BC (709 AUC) as a reform of the Roman calendar by Julius Caesar. It was chosen after consultation with the astronomer Sosigenes of Alexandria and was probably designed to approximate the tropical year (known at least since Hipparchus). see http://en.wikipedia.org/wiki/Julian_calendar
If you just want to remove ":" , " ", and "-" from a character vector then this will suffice:
end <- gsub("[: -]", "" , begin, perl=TRUE)
#> end
#[1] "20010313103100"
You should read the section about 1/4 of the way down in ?regex about character classes. Since the "-" is special in that context as a range operator, it needs to be placed first or last.
After your edit then the answer is clearly what #joran wrote, except that you would need first to convert to a DateTime class:
as.numeric(as.POSIXct(begin))
#[1] 984497460
The other point to make is that comparison operators do work for Date and DateTime classed variables, so the conversion may not be necessary at all. This compares 'begin' to a time one second later and correctly reports that begin is earlier:
as.POSIXct(begin) < as.POSIXct(begin) +1
#[1] TRUE
Based on the revised question this should do what you want:
begin <- "2001-03-13 10:31:00"
as.numeric(as.POSIXct(begin))
The result is a unix timestamp, the number of seconds since epoch, assuming the timestamp is in the local time zone.
Maybe this could also work:
library(lubridate)
...
df <- '24:00:00'
as.numeric(hms(df))
hms() will convert your data from one time format into another, this will let you convert it into seconds. See full documentation.
I tried this because i had trouble with data which was in that format but over 24 hours.
The example from ?as.POSIX help gives
as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S"))
so for you it would be
as.numeric(as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S")))

R: Aggregating by dates with POSIXct?

I have some zoo series that use POSIXct index.
In order to aggregate by days I've tried these two ways:
aggregate(myzoo,format((index((myzoo)),"%Y-%m-%d")),sum)
aggregate(myzoo,as.Date(index(myzoo)),sum)
I don't know why they don't give the same output.
myzoo series had the weekends removed. The "as.Date way" seems to be OK but the "format way" aggregation gives me data on the weekends.
Why?
Which one is the right?
I've even tried it as.POSIXct(format(...))
As I mentioned in my comment, you need to be careful when changing the format of a timestamp that includes time with a time zone, because it can get shifted between days. Without any data, it's hard to say exactly what your problem is, but you might also try apply.daily from xts:
apply.daily(myzoo, sum)
Here's a working example:
> x <- zoo(2:20, as.POSIXct("2003-02-01") + (2:20) * 7200)
> apply.daily(x, sum)
> 2003-02-01 22:00:00 2003-02-02 16:00:00
65 144

Resources