Binning time series in R? - r

I'm new to R. My data has 600k objects defined by three attributes: Id, Date and TimeOfCall.
TimeofCall has a 00:00:00 format and range from 00:00:00 to 23:59:59.
I want to bin the TimeOfCall attribute, into 24 bins, each one representing hourly slot (first bin 00:00:00 to 00:59:59 and so on).
Can someone talk me through how to do this? I tried using cut() but apparently my format is not numeric. Thanks in advance!

While you could convert to a formal time representation, in this case it might be easier to just use substr:
test <- c("00:00:01","02:07:01","22:30:15")
as.numeric(substr(test,1,2))
#[1] 0 2 22
Using a POSIXct time to deal with it would also work, and might be handy if you plan on further calculations (differences in time etc):
testtime <- as.POSIXct(test,format="%H:%M:%S")
#[1]"2013-12-09 00:00:01 EST" "2013-12-09 02:07:01 EST" "2013-12-09 22:30:15 EST"
as.numeric(format(testtime,"%H"))
#[1] 0 2 22

You can use cut.POsixlt function. But you should coerce your data to a valid time object. here I am using handy hms from lubridate. And strftime to get the time format.
library(lubridate)
x <- c("09:10:01", "08:10:02", "08:20:02","06:10:03 ", "Collided at 9:20:04 pm")
x.h <- strftime(cut(as.POSIXct(hms(x),origin=Sys.Date()),'hours'),
format='%H:%M:%S')
data.frame(x,x.h)
x x.h
1 09:10:01 10:00:00
2 08:10:02 09:00:00
3 08:20:02 09:00:00
4 06:10:03 07:00:00
5 Collided at 9:20:04 pm 22:00:00

Related

Convert HH:MM:SS column to ZT time

I am hoping somebody can help me with a logic question in R-studio. I have a rather large data set, with "Time" as one of my columns. This column has values from 00:00:00 to 23:59:00, and is in HH:MM:SS format.
Because of some trouble I have had with analysis of time in this format, I am trying to create a new column, called "ZT" where I convert this time column to ZT time. Lights turn on at 7am, so need the time 07:00:00 to correspond to ZT=0, with 07:01:00 to correspond to ZT=0.016... and so on and so forth.
Can anybody help me with this? It would be much appreciated!
Not sure if this is what you are going for or not but this seems to work at converting a character vector of times in the format HH:MM:SS to your ZT time in the format HH:MM:SS starting at 7am as 00:00:00.
I am unclear exactly what you mean when you state that 07:01:00 should correspond to ZT=0.016, but maybe this can be a start.
Fair warning this is a little slow (took about 1 minute on my machine) but maybe someone else can help vectorize it and speed it up:
#Make Some Fake Data
df<-data.frame(Time=format(seq(ISOdate(2020,1,1), ISOdate(2020,2,1), by = "min"), '%H:%M:00'), Variable1=runif(n=44641))
#We need the help of a an external package to handle time in HH:MM:SS format
library(lubridate)
time_store<- hms(df$Time) #Convert your times to HMS format
ZT_vec<-vector() #Create an empty vector that we will fill in
for (i in 1:length(time_store)){ #iterate over each observation
if (hour(time_store[i])<7){ #Make sure the conversions are going the right direction
ZT<-time_store[i]+hours(17)
ZT_vec<-c(ZT_vec,sprintf("%02d:%02d:%02d", hour(ZT), minute(ZT), second(ZT))) #format the times in HH:MM:SS
} else {
ZT<-time_store[i]-hours(7)
ZT_vec<-c(ZT_vec,sprintf("%02d:%02d:%02d", hour(ZT), minute(ZT), second(ZT)))
}
}
df<-cbind(df,ZT_vec) #Bind on our new column
head(df)
Time Variable1 ZT_vec
12:00:00 0.6560604 05:00:00
12:01:00 0.3485023 05:01:00
12:02:00 0.8396784 05:02:00
12:03:00 0.4773929 05:03:00
12:04:00 0.6969242 05:04:00
12:05:00 0.5371502 05:05:00
head(df[4020:4025,])
Time Variable1 ZT_vec
06:59:00 0.6758364 23:59:00
07:00:00 0.1255861 00:00:00
07:01:00 0.2789485 00:01:00
07:02:00 0.2175933 00:02:00
07:03:00 0.1855100 00:03:00
07:04:00 0.1632865 00:04:00

Recognize time in R

I am working on a dataset in R with a time variable like this:
Time = data.frame("X1"=c(930,1130,914,1615))
The first one/two digits of X1 refers to hour and the last two refers to minute. I want to make R recognize it as a time variable.
I try to use lubridate hm function but it didnt work probably because a ":" is missing between the hour and minute in my data.
I also thought about using str_sub function to separate the hour and minute first and then put them together with a ":" in between and finally use the lubridate function but I dont know how to extract the hour since sometimes it is presented as one digit but sometimes it is presented as two digits.
How do I make R recognize this as a time variable?
Thanks very much!
You could 0-pad to 4 digits and then format using standard R date tools:
as.POSIXct(sprintf("%04d",Time$X1), format="%H%M")
#[1] "2018-04-22 09:30:00 AEST" "2018-04-22 11:30:00 AEST"
#[3] "2018-04-22 09:14:00 AEST" "2018-04-22 16:15:00 AEST
This converts them to chron "times" class. Internally such variables are stored as a fraction of a day and are rendered on output as shown below. The sub inserts a : before the last 2 characters and :00 after them so that they are in HH:MM:SS format which times understands.
library(chron)
times(sub("(..)$", ":\\1:00", Time$X1))
## [1] 09:30:00 11:30:00 09:14:00 16:15:00
It could also be done like this where we transform each to a fraction of a day:
with(Time, times( (X1 %/% 100) / 24 + (X1 %% 100) / (24 * 60) ))
## [1] 09:30:00 11:30:00 09:14:00 16:15:00

R date origin for time in seconds

I daily have a time series in seconds since the job start (00:00h UTC). As I want to plot time series of other data, I want convert time series in seconds to dates. All data comes from a hdf5 file read with package rhdf5
>tiempo=h5read("rams2.h5","t_coords")
> class(tiempo)
[1] "array"
> head(tiempo)
[1] 0 1800 3600 5400 7200 9000
Then I have tried and as.POSIXct to build a dataframe (to later use with ggplot2).
> temps<-as.data.frame(as.POSIXct(tiempo, origin = "1960-01-01 00:00"))
> class(temps)
[1] "data.frame"
> head(temps)
as.POSIXct(tiempo, origin = "1960-01-01 00:00")
1 1960-01-01 01:00:00
2 1960-01-01 01:30:00
3 1960-01-01 02:00:00
4 1960-01-01 02:30:00
5 1960-01-01 03:00:00
6 1960-01-01 03:30:00
The problem comes fromt the first value of "temps"
1 1960-01-01 01:00:00
but it should be 1960-01-01 00:00:00 as this is the supplied origin and first value of "tiempo" is 0. It seems that an hour has been added to every time.
Maybe I'm missing something when setting origin? Or while reading h5 file? Any idea?
Thanks in advance
PD: I can't supply example data as the h5 file is a very huge one.
CORRECT CODE
Thanks to ottlngr answer and Richard Telford comment, the problem was a time zone issue solved by adding tz= "UTC"
temps<-as.data.frame(as.POSIXct(tiempo, origin = "1960-01-01 00:00:00",tz = "UTC"))
Possibly you should use the tz argument in as.POSIXct(), but hard to tell without sample data. Take a look at this question, it might help you: Change timezone in a POSIXct object

Add hour if missing in timestamp using lubridate

I have a list of timestamps in the factor format the I want to convert using lurbridate.
However some of the timestamps lack time 00:00:00:
2013-12-24 23:00:00
2013-12-24
2013-12-24 01:00:00
How do I expand the df$timestamp <- ymd_hms(df$Timestamp_factor) to insert 00:00:00 if time i missing?
You can use the truncated term for lubridate to get those missing parameters. In this case, you are missing three parameters hour, minute and second
ymd_hms(c("2013-12-24 23:00:00", "2013-12-24", "2013-12-24 01:00:00"), truncated = 3)
That, however, will always return 00:00:00 as the time

Convert data frame with epoch timestamps to time-series with milliseconds in R

I have the following data.frame:
df <- data.frame(timestamp=c(1428319770511, 1428319797218, 1428319798182, 1428319803327, 1428319808478),
session=c("A","A","B","A","A"))
I'd like to convert this data frame to a time series and work on time windows shorter than one second. I already tried zoo and xts, but I found it difficult to represent the epoch times as dates.
Here's what I already tried:
df$date<-strptime(as.POSIXct(df$timestamp, origin="1970-01-01"),format="%Y-%m-%d %H:%M:%OS")
Which return NAs. Calling this:
df$date<-strptime(as.POSIXct(df$timestamp/1000, origin="1970-01-01"),format="%Y-%m-%d %H:%M:%OS")
Works but doesn't contain milliseconds data.
I also tried to play with options(digits.secs=3) but with no luck.
I guess I'm hitting a small wall here with R's handling of milliseconds but any ideas would be greatly appreciated.
---EDIT---
Ok, Thanks to Joshua's answer and a comment here Convert UNIX epoch to Date object in R by #Dirk Eddelbuettel, dividing by 1000 doesn't truncate the data. So this works:
options(digits.secs = 3)
df$date<-as.POSIXct(df$timestamp/1000, origin="1970-01-01", tz="UTC")
Which returns:
timestamp session date
1428319770511 A 2015-04-06 14:29:30.510
1428319797218 A 2015-04-06 14:29:57.217
1428319798182 B 2015-04-06 14:29:58.181
1428319803327 A 2015-04-06 14:30:03.326
1428319808478 A 2015-04-06 14:30:08.477
Your timestamps are in milliseconds. You need to convert them to seconds to be able to use them with as.POSIXct. And there's no point in calling strptime on a POSIXct vector.
Also, it's good practice to explicitly set the timezone, rather than leave it set to "".
df$datetime <- as.POSIXct(df$timestamp/1000, origin="1970-01-01", tz="UTC")
options(digits.secs=6)
df
# timestamp session datetime
# 1 1.42832e+12 A 2015-04-06 11:29:30.510
# 2 1.42832e+12 A 2015-04-06 11:29:57.217
# 3 1.42832e+12 B 2015-04-06 11:29:58.181
# 4 1.42832e+12 A 2015-04-06 11:30:03.326
# 5 1.42832e+12 A 2015-04-06 11:30:08.477
I'm not sure why you aren't seeing millisecond resolution...

Resources