Creating time index that creates same intervals every day - r

I want to make a time index that starts at 9:15:00 and progress in 50-minute intervals before ending at 03:30:00. E.g. 9:15:00,10:05:00, 10:55:00 and so on. This is my code that perfectly creates these time indices for 1st day. However, it becomes messed up next day and begins at 09:25:00 instead of 09:15:00 and gets all intervals wrong. The start time keeps changing every day.
Intervals <- seq(as.POSIXct("2016-04-01 09:15:00", format="%Y-%m-%d %H:%M:%S"), as.POSIXct("2016-04-29 15:30:00", format="%Y-%m-%d %H:%M:%S"),
by="50 min")
As I am trying to calculate various intervals by change only argument by="50 min" for example to by="55 min" and need flexibility in fixing the end time, so I put it as before 15:30:00 . Please me to fix It?

You might be better off generating one sequence and reusing this for each day. As so:
start <- "2016-04-01"
stop <- "2016-04-29"
daylength <- difftime(as.POSIXct(stop), as.POSIXct(start), units="days")
Intervals <- seq(
as.POSIXct(paste(start, "09:15")),
as.POSIXct(paste(start, "15:30")),
by="50 min"
)
out <- Intervals + as.difftime(rep(0:daylength, each=length(Intervals)), units="days")
range(out)
#[1] "2016-04-01 09:15:00 AEST" "2016-04-29 15:05:00 AEST"

It may be worth exploring making use of the cut function. For example for the set of days:
myDays <-
seq(
from = as.Date("2016-04-01"),
to = as.Date("2016-04-29"),
by = "day"
)
the one could arrive at 50 minutes intervals for each day via:
myIntervals <-
data.frame(table(cut(x = as.POSIXct(myDays), breaks = "50 min")))
Preview
>> head(myIntervals, 10)
Var1 Freq
1 2016-04-01 01:00:00 1
2 2016-04-01 01:50:00 0
3 2016-04-01 02:40:00 0
4 2016-04-01 03:30:00 0
5 2016-04-01 04:20:00 0
6 2016-04-01 05:10:00 0
7 2016-04-01 06:00:00 0
8 2016-04-01 06:50:00 0
9 2016-04-01 07:40:00 0
10 2016-04-01 08:30:00 0

Related

Sum a part of a time series between set end points

I have a time series (xts) of rain gage data and I would like to be able to sum all the rain amounts between a beginning and end time point from a list. And then make a new data frame that is StormNumber and TotalRain over that time
> head(RainGage)
Rain_mm
2019-07-01 00:00:00 0
2019-07-01 00:15:00 0
2019-07-01 00:30:00 0
2019-07-01 00:45:00 0
2019-07-01 01:00:00 0
2019-07-01 01:15:00 0
head(StormTimes)
StormNumber RainStartTime RainEndTime
1 1 2019-07-21 20:00:00 2019-07-22 04:45:00
2 2 2019-07-22 11:30:00 2019-07-22 23:45:00
3 3 2019-07-11 09:15:00 2019-07-11 19:00:00
4 4 2019-05-29 17:00:00 2019-05-29 20:45:00
5 5 2019-06-27 14:30:00 2019-06-27 17:15:00
6 6 2019-07-11 06:15:00 2019-07-11 09:00:00
I have this code that I got from the SO community when I was trying to do something similar in the past (but extract data rather than sum it). However, I have no idea how it works so I am struggling to adapt it to this situation.
do.call(rbind, Map(function(x, y) RainGage[paste(x, y, sep="/")],
StormTimes$RainStartTime, StormTimes$RainEndTime)
In this case I would suggest just to write your own function and then use apply to achieve what you want, for example:
dates <- c('2019-07-01 00:00:00', '2019-07-01 00:15:00',
'2019-07-01 00:30:00', '2019-07-01 00:45:00',
'2019-07-01 01:00:00', '2019-07-01 01:15:00')
dates <- as.POSIXct(strptime(dates, '%Y-%m-%d %H:%M:%S'))
mm <- c(0, 10, 10, 20, 0, 0)
rain <- data.frame(dates, mm)
number <- c(1,2)
start <- c('2019-07-01 00:00:00','2019-07-01 00:18:00')
start <- as.POSIXct(strptime(start, '%Y-%m-%d %H:%M:%S'))
end <- c('2019-07-01 00:17:00','2019-07-01 01:20:00')
end <- as.POSIXct(strptime(end, '%Y-%m-%d %H:%M:%S'))
storms <- data.frame(number, start, end)
# Sum of rain
f = function(x, output) {
# Get storm number
number = x[1]
# Get starting moment
start = x[2]
# Get ending moment
end = x[3]
# Calculate sum
output <- sum(rain[rain$dates >= start & rain$dates < end, 'mm'])
}
# Apply function to each row of the dataframe
storms$rain <- apply(storms, 1, f)
print(storms)
This yields:
number start end rain
1 1 2019-07-01 00:00:00 2019-07-01 00:17:00 10
2 2 2019-07-01 00:18:00 2019-07-01 01:20:00 30
So a column rain in storms now holds the sum of rain$mm, which is what you're after.
Hope that helps you out!

calculate duration of time interval while removing certain time spans

Assume we have an interval spanning several days (interval "A" in Figure below).
library(lubridate)
int <- interval("2018-01-01 22:00:00", "2018-01-04 10:00:00")
In hours, I get
as.period(int, unit = "hours")
"60H 0M 0S"
Now, I want to subtract all non-working-hours, here 16:00-08:00 (greyed out) in that interval, i.e. only keep the blue parts (08:00-16:00) and, again, calculate the remaining hours (see "B" in Figure below), which would be 8 + 8 + 2 = 18 hours.
One approach would be to create a list of intervals I want to keep which span the entire interval and then calculate intersections. (The code below could, of course, be setup programmatically using floor/ceiling/seq functions etc.)
int_keep <- list(
interval("2018-01-01 08:00:00", "2018-01-01 16:00:00"),
interval("2018-01-02 08:00:00", "2018-01-02 16:00:00"),
interval("2018-01-03 08:00:00", "2018-01-03 16:00:00"),
interval("2018-01-04 08:00:00", "2018-01-04 16:00:00"),
interval("2018-01-05 08:00:00", "2018-01-05 16:00:00")
)
l <- lapply(int_keep, function(x) intersect(x, int))
mns <- sapply(l, as.numeric) # returns seconds
sum(mns, na.rm = T) / 60 / 60 # sum of intersections in hours
[1] 18
While this works, it appears utterly clumsy to me. What would be a less tedious way to do this?
df <- data.frame(DateTime=seq.POSIXt(as.POSIXct("2018-01-01 22:00:00"), as.POSIXct("2018-01-04 10:00:00"), by = "1 hour"))
head(df)
#DateTime
#1 2018-01-01 22:00:00
#2 2018-01-01 23:00:00
#3 2018-01-02 00:00:00
#4 2018-01-02 01:00:00
#5 2018-01-02 02:00:00
#6 2018-01-02 03:00:00
#you want the hours worked between A and B
A <-format(strptime("8:00:00", "%H:%M:%S"),"%H:%M:%S")
B <-format(strptime("16:00:00", "%H:%M:%S"),"%H:%M:%S")
#a simple ifelse statement to assign a value of 1 to column "value" if the time is between 8 and 16 or a 0 if it's not:
df$value<-ifelse((format(df[1],"%H:%M:%S")>A & format(df[1],"%H:%M:%S")<=B),1,0)
tail(df)
#DateTime DateTime
#56 2018-01-04 05:00:00 0
#57 2018-01-04 06:00:00 0
#58 2018-01-04 07:00:00 0
#59 2018-01-04 08:00:00 0
#60 2018-01-04 09:00:00 1
#61 2018-01-04 10:00:00 1
#now taking the column sum of the value column will give you the total hours worked:
TotalHoursWorked<-colSums(df$value)
TotalHoursWorked
#DateTime
# 18

R convert hourly to daily data up to 0:00 instead of 23:00

How do you set 0:00 as end of day instead of 23:00 in an hourly data? I have this struggle while using period.apply or to.period as both return days ending at 23:00. Here is an example :
x1 = xts(seq(as.POSIXct("2018-02-01 00:00:00"), as.POSIXct("2018-02-05 23:00:00"), by="hour"), x = rnorm(120))
The following functions show periods ends at 23:00
to.period(x1, OHLC = FALSE, drop.date = FALSE, period = "days")
x1[endpoints(x1, 'days')]
So when I am aggregating the hourly data to daily, does someone have an idea how to set the end of day at 0:00?
As already pointed out by another answer here, to.period on days computes on the data with timestamps between 00:00:00 and 23:59:59.9999999 on the day in question. so 23:00:00 is seen as the last timestamp in your data, and 00:00:00 corresponds to a value in the next day "bin".
What you can do is shift all the timestamps back 1 hour, use to.period get the daily data points from the hour points, and then using align.time to get the timestamps aligned correctly.
(More generally, to.period is useful for generating OHLCV type data, and so if you're say generating say hourly bars from ticks, it makes sense to look at all the ticks between 23:00:00 and 23:59:59.99999 in the bar creation. then 00:00:00 to 00:59:59.9999.... would form the next hourly bar and so on.)
Here is an example:
> tail(x1["2018-02-01"])
# [,1]
# 2018-02-01 18:00:00 -1.2760349
# 2018-02-01 19:00:00 -0.1496041
# 2018-02-01 20:00:00 -0.5989614
# 2018-02-01 21:00:00 -0.9691905
# 2018-02-01 22:00:00 -0.2519618
# 2018-02-01 23:00:00 -1.6081656
> head(x1["2018-02-02"])
# [,1]
# 2018-02-02 00:00:00 -0.3373271
# 2018-02-02 01:00:00 0.8312698
# 2018-02-02 02:00:00 0.9321747
# 2018-02-02 03:00:00 0.6719425
# 2018-02-02 04:00:00 -0.5597391
# 2018-02-02 05:00:00 -0.9810128
> head(x1["2018-02-03"])
# [,1]
# 2018-02-03 00:00:00 2.3746424
# 2018-02-03 01:00:00 0.8536594
# 2018-02-03 02:00:00 -0.2467268
# 2018-02-03 03:00:00 -0.1316978
# 2018-02-03 04:00:00 0.3079848
# 2018-02-03 05:00:00 0.2445634
x2 <- x1
.index(x2) <- .index(x1) - 3600
> tail(x2["2018-02-01"])
# [,1]
# 2018-02-01 18:00:00 -0.1496041
# 2018-02-01 19:00:00 -0.5989614
# 2018-02-01 20:00:00 -0.9691905
# 2018-02-01 21:00:00 -0.2519618
# 2018-02-01 22:00:00 -1.6081656
# 2018-02-01 23:00:00 -0.3373271
x.d2 <- to.period(x2, OHLC = FALSE, drop.date = FALSE, period = "days")
> x.d2
# [,1]
# 2018-01-31 23:00:00 0.12516594
# 2018-02-01 23:00:00 -0.33732710
# 2018-02-02 23:00:00 2.37464235
# 2018-02-03 23:00:00 0.51797747
# 2018-02-04 23:00:00 0.08955208
# 2018-02-05 22:00:00 0.33067734
x.d2 <- align.time(x.d2, n = 86400)
> x.d2
# [,1]
# 2018-02-01 0.12516594
# 2018-02-02 -0.33732710
# 2018-02-03 2.37464235
# 2018-02-04 0.51797747
# 2018-02-05 0.08955208
# 2018-02-06 0.33067734
Want to convince yourself? Try something like this:
x3 <- rbind(x1, xts(x = matrix(c(1,2), nrow = 2), order.by = as.POSIXct(c("2018-02-01 23:59:59.999", "2018-02-02 00:00:00"))))
x3["2018-02-01 23/2018-02-02 01"]
# [,1]
# 2018-02-01 23:00:00.000 -1.6081656
# 2018-02-01 23:59:59.999 1.0000000
# 2018-02-02 00:00:00.000 -0.3373271
# 2018-02-02 00:00:00.000 2.0000000
# 2018-02-02 01:00:00.000 0.8312698
x3.d <- to.period(x3, OHLC = FALSE, drop.date = FALSE, period = "days")
> x3.d <- align.time(x3.d, 86400)
> x3.d
[,1]
2018-02-02 1.00000000
2018-02-03 -0.09832625
2018-02-04 -0.65075506
2018-02-05 -0.09423664
2018-02-06 0.33067734
See that the value of 2 on 00:00:00 did not form the last observation in the day for 2018-02-02 (00:00:00), which went from 2018-02-01 00:00:00 to 2018-02-01 23:59:59.9999.
Of course, if you want the daily timestamp to be the start of the day, not the end of the day, which would be 2018-02-01 as start of bar for the first row, in x3.d above, you could shift back the day by one. You could do this relatively safely for most timezones, when your data doesn't involve weekend dates:
index(x3.d) = index(x3.d) - 86400
I say relatively safetly, because there are corner cases when there are time shifts in a time zone. e.g. Be careful with day light savings. Simply subtracting -86400 can be a problem when going from Sunday to Saturday in time zones where day light saving occurs:
#e.g. bad: day light savings occurs on this weekend for US EST
z <- xts(x = 9, order.by = as.POSIXct("2018-03-12", tz = "America/New_York"))
> index(z) - 86400
[1] "2018-03-10 23:00:00 EST"
i.e. the timestamp is off by one hour, when you really want the midnight timestamp (00:00:00).
You could get around this problem using something much safer like this:
library(lubridate)
# right
> index(z) - days(1)
[1] "2018-03-11 EST"
I don't think this is possible because 00:00 is the start of the day. From the manual:
These endpoints are aligned in POSIXct time to the zero second of the day at the beginning, and the 59.9999th second of the 59th minute of the 23rd hour of the final day
I think the solution here is to use minutes instead of hours. Using your example:
x1 = xts(seq(as.POSIXct("2018-02-01 00:00:00"), as.POSIXct("2018-02-05 23:59:99"), by="min"), x = rnorm(7200))
to.period(x1, OHLC = FALSE, drop.date = FALSE, period = "day")
x1[endpoints(x1, 'day')]

period.apply over an hour with deciding start time

So I have a xts time serie over the year with time zone "UTC". The time interval between each row is 15 minutes.
x1 x2
2014-12-31 23:15:00 153.0 0.0
2014-12-31 23:30:00 167.1 5.4
2014-12-31 23:45:00 190.3 4.1
2015-01-01 00:00:00 167.1 9.7
As I want data over one hour to allow for comparison with other data sets, I tried to use period.apply:
dat <- period.apply(dat, endpoints(dat,on="hours",k=1), colSums)
The problem is that the first row in my new data set is 2014-12-31 23:45:00 and not 2015-01-01 00:00:00. I tried changing the endpoint vector but somehow it keeps saying that it is out of bounds. I also thought this was my answer: https://stats.stackexchange.com/questions/5305/how-to-re-sample-an-xts-time-series-in-r/19003#19003 but it was not. I don't want to change the names of my columns, I want to sum over a different interval.
Here a reproducible example:
library(xts)
seq<-seq(from=ISOdate(2014,12,31,23,15),length.out = 100, by="15 min", tz="UTC")
xts<-xts(rep(1,100),order.by = seq)
period.apply(xts, endpoints(xts,on="hours",k=1), colSums)
And the result looks like this:
2014-12-31 23:45:00 3
2015-01-01 00:45:00 4
2015-01-01 01:45:00 4
2015-01-01 02:45:00 4
and ends up like this:
2015-01-01 21:45:00 4
2015-01-01 22:45:00 4
2015-01-01 23:45:00 4
2015-01-02 00:00:00 1
Whereas I would like it to always sum over the same interval, meaning I would like only 4s.
(I am using RStudio 0.99.903 with R x64 3.3.2)
The problem is that you're using endpoints, but you want to align by the start of the interval, not the end. I thought you might be able to use this startpoints function, but that produced weird results.
The basic idea of the work-around below is to subtract a small amount from all index values, then use endpoints and period.apply to aggregate. Then call align.time on the result. I'm not sure if this is a general solution, but it seems to work for your example.
library(xts)
seq<-seq(from=ISOdate(2014,12,31,23,15),length.out = 100, by="15 min", tz="UTC")
xts<-xts(rep(1,100),order.by = seq)
# create a temporary object
tmp <- xts
# subtract a small amount of time from each index value
.index(tmp) <- .index(tmp)-0.001
# aggregate to hourly
agg <- period.apply(tmp, endpoints(tmp, "hours"), colSums)
# round index up to next hour
agg_aligned <- align.time(agg, 3600)

Create a time interval of 15 minutes from minutely data in R?

I have some data which is formatted in the following way:
time count
00:00 17
00:01 62
00:02 41
So I have from 00:00 to 23:59hours and with a counter per minute. I'd like to group the data in intervals of 15 minutes such that:
time count
00:00-00:15 148
00:16-00:30 284
I have tried to do it manually but this is exhausting so I am sure there has to be a function or sth to do it easily but I haven't figured out yet how to do it.
I'd really appreciate some help!!
Thank you very much!
For data that's in POSIXct format, you can use the cut function to create 15-minute groupings, and then aggregate by those groups. The code below shows how to do this in base R and with the dplyr and data.table packages.
First, create some fake data:
set.seed(4984)
dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60),
count=sample(1:50, 100, replace=TRUE))
Base R
cut the data into 15 minute groups:
dat$by15 = cut(dat$time, breaks="15 min")
time count by15
1 2016-05-01 00:00:00 22 2016-05-01 00:00:00
2 2016-05-01 00:01:00 11 2016-05-01 00:00:00
3 2016-05-01 00:02:00 31 2016-05-01 00:00:00
...
98 2016-05-01 01:37:00 20 2016-05-01 01:30:00
99 2016-05-01 01:38:00 29 2016-05-01 01:30:00
100 2016-05-01 01:39:00 37 2016-05-01 01:30:00
Now aggregate by the new grouping column, using sum as the aggregation function:
dat.summary = aggregate(count ~ by15, FUN=sum, data=dat)
by15 count
1 2016-05-01 00:00:00 312
2 2016-05-01 00:15:00 395
3 2016-05-01 00:30:00 341
4 2016-05-01 00:45:00 318
5 2016-05-01 01:00:00 349
6 2016-05-01 01:15:00 397
7 2016-05-01 01:30:00 341
dplyr
library(dplyr)
dat.summary = dat %>% group_by(by15=cut(time, "15 min")) %>%
summarise(count=sum(count))
data.table
library(data.table)
dat.summary = setDT(dat)[ , list(count=sum(count)), by=cut(time, "15 min")]
UPDATE: To answer the comment, for this case the end point of each grouping interval is as.POSIXct(as.character(dat$by15)) + 60*15 - 1. In other words, the endpoint of the grouping interval is 15 minutes minus one second from the start of the interval. We add 60*15 - 1 because POSIXct is denominated in seconds. The as.POSIXct(as.character(...)) is because cut returns a factor and this just converts it back to date-time so that we can do math on it.
If you want the end point to the nearest minute before the next interval (instead of the nearest second), you could to as.POSIXct(as.character(dat$by15)) + 60*14.
If you don't know the break interval, for example, because you chose the number of breaks and let R pick the interval, you could find the number of seconds to add by doing max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1.
The cut approach is handy but slow with large data frames. The following approach is approximately 1,000x faster than the cut approach (tested with 400k records.)
# Function: Truncate (floor) POSIXct to time interval (specified in seconds)
# Author: Stephen McDaniel # PowerTrip Analytics
# Date : 2017MAY
# Copyright: (C) 2017 by Freakalytics, LLC
# License: MIT
floor_datetime <- function(date_var, floor_seconds = 60,
origin = "1970-01-01") { # defaults to minute rounding
if(!is(date_var, "POSIXct")) stop("Please pass in a POSIXct variable")
if(is.na(date_var)) return(as.POSIXct(NA)) else {
return(as.POSIXct(floor(as.numeric(date_var) /
(floor_seconds))*(floor_seconds), origin = origin))
}
}
Sample output:
test <- data.frame(good = as.POSIXct(Sys.time()),
bad1 = as.Date(Sys.time()),
bad2 = as.POSIXct(NA))
test$good_15 <- floor_datetime(test$good, 15 * 60)
test$bad1_15 <- floor_datetime(test$bad1, 15 * 60)
Error in floor_datetime(test$bad, 15 * 60) :
Please pass in a POSIXct variable
test$bad2_15 <- floor_datetime(test$bad2, 15 * 60)
test
good bad1 bad2 good_15 bad2_15
1 2017-05-06 13:55:34.48 2017-05-06 <NA> 2007-05-06 13:45:00 <NA>
You can do it in one line by using trs function from FQOAT, just like:
df_15mins=trs(df, "15 mins")
Below is a repeatable example:
library(foqat)
head(aqi[,c(1,2)])
# Time NO
#1 2017-05-01 01:00:00 0.0376578
#2 2017-05-01 01:01:00 0.0341483
#3 2017-05-01 01:02:00 0.0310285
#4 2017-05-01 01:03:00 0.0357016
#5 2017-05-01 01:04:00 0.0337507
#6 2017-05-01 01:05:00 0.0238120
#mean
aqi_15mins=trs(aqi[,c(1,2)], "15 mins")
head(aqi_15mins)
# Time NO
#1 2017-05-01 01:00:00 0.02736549
#2 2017-05-01 01:15:00 0.03244958
#3 2017-05-01 01:30:00 0.03743626
#4 2017-05-01 01:45:00 0.02769419
#5 2017-05-01 02:00:00 0.02901817
#6 2017-05-01 02:15:00 0.03439455

Resources