R: Turn timestamps into (as short as possible) integers - r

Edit 1: I think a possible solution would be to count the number of 15-minute intervals elapsed since a starting date. If anyone has thoughts on this, please come forward. Thanks
As the title says, I am looking for a way to turn timestamps into as small as possible integers.
Explanation of the situation:
I am working with "panelAR". I have T>N panel-data containing different timestamps that look like this (300,000 rows in total):
df$timestamp[1]
[1] "2013-08-01 00:15:00 UTC"
class(df$timestamp)
[1] "POSIXct" "POSIXt"
I am using panelAR and thus need the timestamp as an integer. I can't simply use "as.integer" because I would hit the max length for integers resulting in only NA's. This was my first try to work around this problem:
df$timestamp <- as.numeric(gsub("[: -]", "" , df$timestamp, perl=TRUE))
Subtract the numbers starting at te 3rd position (Because "20" is irrelevant) and stop before the 2nd last position (Because they all end at 00 seconds)
(I need shorter integers in order to not hit the max level of integers in R)
df$timestamp <- substr(df$timestamp, 3, nchar(df$timestamp)-2)
#Save as integer
df$timestamp <- as.integer(df$timestamp)
#Result
df$timestamp[1]
1308010015
This allows panelAR to work with it, but the numbers seem to be way too large. When I try to run a regression with it, i get the following error message:
"cannot allocate vector of size 1052.2 GB"
I am looking for a way to turn these timestamps into (as small as possible) integers in order to work with panelAR.
Any help is greatly appreciated.

this big number that you get corresponds to the number of seconds elapsed since 1970-01-01 00:00:00. Do your time stamps have regular intervals? If it is, let's say, every 15 minutes you could divide all integers by 900, and it might help.
Another option is to pick your earliest date and subtract it from the others
#generate some dates:
a <- as.POSIXct("2013-01-01 00:00:00 UTC")
b <- as.POSIXct("2013-08-01 00:15:00 UTC")
series <- seq(a,b, by = 'min')
#calculate the difference (result are integers/seconds)
integer <- as.numeric(series - min(series))
If you still get memory problems, I might combine both.

I managed to solve the main question. Since this still results in a memory error, I think it stems from the number of observations and the way panelAR computes things. I will open a separate question for that matter.
I used
df$timestampnew <- as.integer(difftime(df$timestamp, "2013-01-01 00:00:00", units = "min")/15)
to get integers that count the number of 15-min intervals elapsed since a certain date.

Related

NA difference in as.difftime R

It might seem as if it is duplicate of Find time difference in days with R but I guess it is not.
The problem is simple. I have two time stamps (format='%H:%M:%S'):
times <- c('02:51:43', '02:45:52')
and I want to calculate the time difference, however my attempt results with an unwanted behaviour:
as.difftime(times[1], times[2])
# Time difference of NA secs
I tried to specify format along with the units='secs', but I get the error that the argument time2 is not used.
Can someone give me a hint where I make a mistake?
(Sorry in advance, but I ain't even sure if it is reproducible.)
We can convert the times into POSIXct format and then subtract
x1 <- as.POSIXct(times, format = "%H:%M:%S", tz = "UTC")
x1[1] - x1[2]
#Time difference of 5.85 mins
which is also equivalent to
difftime(x1[1], x1[2])
I also encountered this problem, and I assigned the date-time class to the object again then it worked.
Suppose I have 2 date-time objects:
day1<-as.Date('2018-12-31')
day2<-as.Date('2019-12-31')
But this 'Time difference of NA secs' occurred, so I simply do this:
day1<-as.Date(day1)
day2<-as.Date(day2)
Then it worked fine:
difftime(day2,day1,units="days")
#Time difference of 365 days
Hope this helps.

How to convert times over 24:00:00 in R

In R I have this data.frame
24:43:30 23:16:02 14:05:44 11:44:30 ...
Note that some of the times are over 24:00:00 ! In fact all my times are within 02:00:00 to 25:59:59.
I want to subtract all entries in my dataset data with 2 hours. This way I get a regular data-set. How can I do this?
I tried this
strptime(data, format="%H:%M:%S") - 2*60*60
and this work for all entries below 23:59:59. For all entries above I simply get NA since the strptime command produce NA to all entries above 23:59:59.
Using lubridate package can make the job easier!
> library(lubridate)
> t <- '24:43:30'
> hms(t) - hms('2:0:0')
[1] "22H 43M 30S"
Update:
Converting the date back to text!
> substr(strptime(hms(t) - hms('2:0:0'),format='%HH %MM %SS'),12,20)
[1] "22:43:30"
Adding #RHertel's update:
format(strptime(hms(t) - hms('2:0:0'),format='%HH %MM %SS'),format='%H:%M:%S')
Better way of formating the lubridate object:
s <- hms('02:23:58) - hms('2:0:0')
paste(hour(s),minute(s),second(s),sep=":")
"0:23:58"
Although the answer by #amrrs solves the main problem, the formatting could remain an issue because hms() does not provide a uniform output. This is best shown with an example:
library(lubridate)
hms("01:23:45")
#[1] "1H 23M 45S"
hms("00:23:45")
#[1] "23M 45S"
hms("00:00:45")
#[1] "45S"
Depending on the time passed to hms() the output may or may not contain an entry for the hours and for the minutes. Moreover leading zeros are omitted in single-digit values of hours, minutes and seconds. This can result pretty much in a formatting nightmare if one tries to put that data into a common form.
To resolve this difficulty one could first convert the time into a duration with lubridate's as.duration() function. Then, the duration in seconds can be transformed into a POSIXct object from which the hours, minutes, and seconds can be extracted easily with format():
times <- c("24:43:30", "23:16:02", "14:05:44", "11:44:30", "02:00:12")
shifted_times <- hms(times) - hms("02:00:00")
format(.POSIXct(as.duration(shifted_times),tz="GMT"), "%H:%M:%S")
#[1] "22:43:30" "21:16:02" "12:05:44" "09:44:30" "00:00:12"
The last entry "02:00:12" would have caused difficulties if shifted_times had been passed to strptime().

R difftime subtracts 2 days

I have some timedelta strings which were exported from Python. I'm trying to import them for use in R, but I'm getting some weird results.
When the timedeltas are small, I get results that are off by 2 days, e.g.:
> as.difftime('26 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of 24.20389 days
When they are larger, it doesn't work at all:
> as.difftime('36 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of NA secs
I also read into 'R' some time delta objects I had processed with 'Python' and had a similar issue with the 26 days 04:53:36.000000000 format. As Gregor said, %d in strptime is the day of the month as a zero padded decimal number so won't work with numbers >31 and there doesn't seem to be an option for cumulative days (probably because strptime is for date time objects and not time delta objects).
My solution was to convert the objects to strings and extract the numerical data as Gregor suggested and I did this using the gsub function.
# convert to strings
data$tdelta <- as.character(data$tdelta)
# extract numerical data
days <- as.numeric(gsub('^.*([0-9]+) days.*$','\\1',data$tdelta))
hours <- as.numeric(gsub('^.*ys ([0-9]+):.*$','\\1',data$tdelta))
minutes <- as.numeric(gsub('^.*:([0-9]+):.*$','\\1',data$tdelta))
seconds <- as.numeric(gsub('^.*:([0-9]+)..*$','\\1',data$tdelta))
# add up numerical components to whatever units you want
time_diff_seconds <- seconds + minutes*60 + hours*60*60 + days*24*60*60
# add column to data frame
data$tdelta <- time_diff_seconds
That should allow you to do computations with the time differences. Hope that helps.

Creating a specific sequence of date/times in R

I want to create a single column with a sequence of date/time increasing every hour for one year or one month (for example). I was using a code like this to generate this sequence:
start.date<-"2012-01-15"
start.time<-"00:00:00"
interval<-60 # 60 minutes
increment.mins<-interval*60
x<-paste(start.date,start.time)
for(i in 1:365){
print(strptime(x, "%Y-%m-%d %H:%M:%S")+i*increment.mins)
}
However, I am not sure how to specify the range of the sequence of dates and hours. Also, I have been having problems dealing with the first hour "00:00:00"? Not sure what is the best way to specify the length of the date/time sequence for a month, year, etc? Any suggestion will be appreciated.
I would strongly recommend you to use the POSIXct datatype. This way you can use seq without any problems and use those data however you want.
start <- as.POSIXct("2012-01-15")
interval <- 60
end <- start + as.difftime(1, units="days")
seq(from=start, by=interval*60, to=end)
Now you can do whatever you want with your vector of timestamps.
Try this. mondate is very clever about advancing by a month. For example, it will advance the last day of Jan to last day of Feb whereas other date/time classes tend to overshoot into Mar. chron does not use time zones so you can't get the time zone bugs that code as you can using POSIXct. Here x is from the question.
library(chron)
library(mondate)
start.time.num <- as.numeric(as.chron(x))
# +1 means one month. Use +12 if you want one year.
end.time.num <- as.numeric(as.chron(paste(mondate(x)+1, start.time)))
# 1/24 means one hour. Change as needed.
hours <- as.chron(seq(start.time.num, end.time.num, 1/24))

Converting time format to numeric with R

In most cases, we convert numeric time to POSIXct format using R. However, if we want to compare two time points, then we would prefer the numeric time format. For example, I have a date format like "2001-03-13 10:31:00",
begin <- "2001-03-13 10:31:00"
Using R, I want to covert this into a numeric (e.g., the Julian time), perhaps something like the passing seconds between 1970-01-01 00:00:00 and 2001-03-13 10:31:00.
Do you have any suggestions?
The Julian calendar began in 45 BC (709 AUC) as a reform of the Roman calendar by Julius Caesar. It was chosen after consultation with the astronomer Sosigenes of Alexandria and was probably designed to approximate the tropical year (known at least since Hipparchus). see http://en.wikipedia.org/wiki/Julian_calendar
If you just want to remove ":" , " ", and "-" from a character vector then this will suffice:
end <- gsub("[: -]", "" , begin, perl=TRUE)
#> end
#[1] "20010313103100"
You should read the section about 1/4 of the way down in ?regex about character classes. Since the "-" is special in that context as a range operator, it needs to be placed first or last.
After your edit then the answer is clearly what #joran wrote, except that you would need first to convert to a DateTime class:
as.numeric(as.POSIXct(begin))
#[1] 984497460
The other point to make is that comparison operators do work for Date and DateTime classed variables, so the conversion may not be necessary at all. This compares 'begin' to a time one second later and correctly reports that begin is earlier:
as.POSIXct(begin) < as.POSIXct(begin) +1
#[1] TRUE
Based on the revised question this should do what you want:
begin <- "2001-03-13 10:31:00"
as.numeric(as.POSIXct(begin))
The result is a unix timestamp, the number of seconds since epoch, assuming the timestamp is in the local time zone.
Maybe this could also work:
library(lubridate)
...
df <- '24:00:00'
as.numeric(hms(df))
hms() will convert your data from one time format into another, this will let you convert it into seconds. See full documentation.
I tried this because i had trouble with data which was in that format but over 24 hours.
The example from ?as.POSIX help gives
as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S"))
so for you it would be
as.numeric(as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S")))

Resources