Spliting date/time data into constant time intervals - r

I already posted similar question here:
time split to constant daily intervals and summarise the results
Now I'm trying it in a simple version:
I have a data which contains date/time variable (call it x) of object POSIXct in the following format: yyyy-mm-dd HH:MM:SS.
The date is not really of my interest. What I'm trying to do is to split my time data into constant time intervals.
To make it clear, let's start with some reproducible example. Using dput, my x variable looks like:
structure(c(1495608914, 1495642528, 1495642529, 1495607831, 1495641488, 1495643715), class = c("POSIXct", "POSIXt"), tzone="")
I've been able to split it into time intervals using: split(x, cut((x), "30 mins"))
However, this method starts the splitting from the minimum time value I have in x; but, I'm interested in splitting the data to constant time intervals.
So, using my splitting method mentioned above, I'll get 20 groups starts at 06:37:00 with intervals of 30 minutes (and x will be splitted between 3 of that 20 groups with 2,1 and 3 observations). While I'm looking for some indication regarding the data point time interval:
x v1 v2 . . . x.ind
06:37:11 14
06:55:14 14
15:58:08 32
.
.
.
where 1 is for 00:00:00-00:30:00, 2 is for 00:30:00-01:00:00,..., 14 is for 06:30:00-07:00:00,..., 48 is for 23:30:00-00:00:00

A solution using dplyr for the join and grouping and lubridate for date rounding floor_date
library(dplyr)
library(lubridate)
observations <- data.frame(period = floor_date(x, unit = "30 minutes"), n=rep(1, length(x)))
intervals <- data.frame(period = seq.POSIXt(min(observations), max(observations), by = 30*60))
result <- intervals %>%
full_join(observations) %>%
group_by(period) %>%
summarize(n=sum(n,na.rm= TRUE))

Related

Date Formatting in Time Series Codes

I have a .csv file that looks like this:
Date
Time
Demand
01-Jan-05
6:30
6
01-Jan-05
6:45
3
...
23-Jan-05
21:45
0
23-Jan-05
22:00
1
The days are broken into 15 minute increments from 6:30 - 22:00.
Now, I am trying to do a time series on this, but I am a little lost on the notation of this.
I have the following so far:
library(tidyverse)
library(forecast)
library(zoo)
tp <- read.csv(".csv")
tp.ts <- ts(tp$DEMAND, start = c(), end = c(), frequency = 63)
The frequency I am after is an entire day, which I believe makes the number 63.***
However, I am unsure as to how to notate the dates in c().
***Edit
If the frequency is meant to be observations per a unit of time, and I am trying to observe just (Demand) by the 15 minute time slots (Time) in each day (Date), maybe my Frequency is 1?
***Edit 2
So I think I am struggling with doing the time series because I have a Date column (which is characters) and a Time column.
Since I need the data for Demand at the given hours on the dates, maybe I need to convert the dates to be used in ts() and combine the Date and Time date into a new column?
If I do this, I am assuming this should give me the times I need (6:30 to 22:00) but with the addition of having the date?
However, the data is to be used to predict the Demand for the rest of the month. So maybe the Date is an important variable if the day of the week impacts Demand?
We assume you are starting with tp shown reproducibly in the Note at the end. A complete cycle of 24 * 4 = 96 points should be represented by one unit of time internally. The chron class does that so read it in as a zoo series z with chron time index and then convert that to ts giving ts_ser or possibly leave it as a zoo series depending on what you are going to do next.
library(zoo)
library(chron)
to_chron <- function(date, time) as.chron(paste(date, time), "%d-%b-%y %H:%M")
z <- read.zoo(tp, index = 1:2, FUN = to_chron, frequency = 4 * 24)
ts_ser <- as.ts(z)
Note
tp <- structure(list(Date = c("01-Jan-05", "01-Jan-05"), Time = c("6:30",
"6:45"), Demand = c(6L, 3L)), row.names = 1:2, class = "data.frame")

How do I subtract Date column given as a character in R?

I want to add a column which is a subtraction of Store_Entry_Time from Store_Exit_Time.
For example the result for row 1 should be (2014-12-02 18:49:05.402863 - 2014-12-02 16:56:32.394052) = 1 hour 53 minutes approximately.( I want this result in just hours).
I entered class(Store_Entry_Time) and it says "character".
How do I obtain the subtracting and put it into new column as "Time Spent"?
You can use ymd_hms from lubridate to convert the column into POSIXct format and then use difftime to caluclate the difference in time.
library(dplyr)
df <- df %>%
mutate(across(c(Store_Entry_Time, Store_Exit_Time), lubridate::ymd_hms),
Time_Spent = as.numeric(difftime(Store_Exit_Time,
Store_Entry_Time, units = 'hours')))
For a base R option here, we can try using as.POSIXct:
df$Time_Spent <- as.numeric(as.POSIXct(df$Store_Exit_Time) -
as.POSIXct(df$Store_Entry_Time)
The above column would give the difference in time, measured in hours.
Example:
Store_Exit_Time <- "2014-12-02 18:49:05.402863"
Store_Entry_Time <- "2014-12-02 16:56:32.394052"
Time_Spent <- as.numeric(as.POSIXct(Store_Exit_Time) - as.POSIXct(Store_Entry_Time))
Time_Spent
[1] 1.875836

Converting df into ts object and decompose in 15 minute intervals in R

I know there has been a lot on this topic already but I can't seem to get what I want working.
I've read:
how to convert data frame into time series in R
Convert data frame with date column to timeseries
As well as several others but can't get it to work.
I have the following df
df <- data.frame(CloseTime = c("2017-09-13 19:15:00","2017-09-13 19:30:00","2017-09-13 19:45:00","2017-09-13 20:00:00","2017-09-13 20:15:00"),
OpenPice = c(271.23,269.50,269.82,269.10,269.50),
HightPrice = c(271.23,269.50,269.82,269.10,269.50),
LowPrice = c(271.23,269.50,269.82,269.10,269.50),
ClosePrice = c(271.23,269.50,269.82,269.10,269.50))
I'd like to convert it into a tsobject, with 15-minute intervals and decompose the time series.
I also read that the zoo package allows you to decompose specific multiple intervals i.e. 15 mins, 1h, 1 day?
Can someone please help. How can I convert this into a ts object and decompose my ts object?
Just for the reproducibility purpose, another toy-example with longer period of time.
df <-
data.frame(
CloseTime = seq(as.POSIXct("2017-09-13 19:15:00"),as.POSIXct("2018-10-20 21:45:00"),by="15 mins"),
ClosePrice1 = cumsum(rnorm(38603)),
ClosePrice2 = cumsum(rnorm(38603)),
ClosePrice3 = cumsum(rnorm(38603))
)
I found it much better to aggregate time series into different intervals using dplyr and lubridate::floor_date. Instead of mean, one can summarise using min, max, first, last. I would recommend stay around the tidyr to keep code readable. Below example converting into 30minutes interval.
library(lubridate); library(dplyr); library(magrittr)
df30m <-
df %>%
group_by( CloseTime = floor_date( CloseTime, "30 mins")) %>%
summarize_all(mean)
Data.frame can be converted to timeseries object such as zoo and than to ts for decomposing purposes.
library(zoo)
df30m_zoo <- zoo( df30m[-1], order.by = df30m$CloseTime )
df30m_ts <- ts(df30m_zoo, start=1, frequency = 2 * pi)
df30m_decomposed <- decompose(df30m_ts)
The points are already 15 minutes apart so assuming that you want a period of 1 day this will convert it. There are 24 * 60 * 60 seconds in a day (which s the period) but you can change the denominator to the number of seconds in a period get a different period. You will need at least two periods of data to decompose it.
library(zoo)
z <- read.zoo(df)
time(z) <- (as.numeric(time(z)) - as.numeric(start(z))) / (24 * 60 * 60)
as.ts(z)
giving:
Time Series:
Start = c(0, 1)
End = c(0, 5)
Frequency = 96
OpenPice HightPrice LowPrice ClosePrice
0.00000000 271.23 271.23 271.23 271.23
0.01041667 269.50 269.50 269.50 269.50
0.02083333 269.82 269.82 269.82 269.82
0.03125000 269.10 269.10 269.10 269.10
0.04166667 269.50 269.50 269.50 269.50
Alhtough not asked for in the question, in another answer the data was converted to 30 minutes. That could readily be done like this:
library(xts) # also loads zoo
z <- read.zoo(df)
to.minutes30(z)

Time series analysis applicability?

I have a sample data frame like this (date column format is mm-dd-YYYY):
date count grp
01-09-2009 54 1
01-09-2009 100 2
01-09-2009 546 3
01-10-2009 67 4
01-11-2009 80 5
01-11-2009 45 6
I want to convert this data frame into time series using ts(), but the problem is: the current data frame has multiple values for the same date. Can we apply time series in this case?
Can I convert data frame into time series, and build a model (ARIMA) which can forecast count value on a daily basis?
OR should I forecast count value based on grp, but in that case, I have to select only grp and count column of a data frame. So in that case, I have to skip date column, and daily forecast for count value is not possible?
Suppose if I want to aggregate count value on per day basis. I tried with aggregate function, but there we have to specify date value, but I have a very large data set? Any other option available in r?
Can somebody, please, suggest if there is a better approach to follow? My assumption is that the time series forcast works only for bivariate data? Is this assumption right?
It seems like there are two aspects of your problem:
i want to convert this data frame into time series using ts(), but the
problem is- current data frame having multiple values for the same
date. can we apply time series in this case?
If you are happy making use of the xts package you could attempt:
dta2$date <- as.Date(dta2$date, "%d-%m-%Y")
dtaXTS <- xts::as.xts(dta2[,2:3], dta2$date)
which would result in:
>> head(dtaXTS)
count grp
2009-09-01 54 1
2009-09-01 100 2
2009-09-01 546 3
2009-10-01 67 4
2009-11-01 80 5
2009-11-01 45 6
of the following classes:
>> class(dtaXTS)
[1] "xts" "zoo"
You could then use your time series object as univariate time series and refer to the selected variable or as a multivariate time series, example using PerformanceAnalytics packages:
PerformanceAnalytics::chart.TimeSeries(dtaXTS)
Side points
Concerning your second question:
can somebody plz suggest me what is the better approach to follow, my
assumption is time series forcast is works only for bivariate data? is
this assumption also right?
IMHO, this is rather broad. I would suggest that you use created xts object and elaborate on the model you want to utilise and why, if it's a conceptual question about nature of time series analysis you may prefer to post your follow-up question on CrossValidated.
Data sourced via: dta2 <- read.delim(pipe("pbpaste"), sep = "") using the provided example.
Since daily forecasts are wanted we need to aggregate to daily. Using DF from the Note at the end, read the first two columns of data into a zoo series z using read.zoo and argument aggregate=sum. We could optionally convert that to a "ts" series (tser <- as.ts(z)) although this is unnecessary for many forecasting functions. In particular, checking out the source code of auto.arima we see that it runs x <- as.ts(x) on its input before further processing. Finally run auto.arima, forecast or other forecasting function.
library(forecast)
library(zoo)
z <- read.zoo(DF[1:2], format = "%m-%d-%Y", aggregate = sum)
auto.arima(z)
forecast(z)
Note: DF is given reproducibly here:
Lines <- "date count grp
01-09-2009 54 1
01-09-2009 100 2
01-09-2009 546 3
01-10-2009 67 4
01-11-2009 80 5
01-11-2009 45 6"
DF <- read.table(text = Lines, header = TRUE)
Updated: Revised after re-reading question.

Finding a more elegant was to aggregate hourly data to mean hourly data using zoo

I have a chunk of data logging temperatures from a few dozen devices every hour for over a year. The data are stored as a zoo object. I'd very much like to summarize those data by looking at the average values for every one of the 24 hours in a day (1am, 2am, 3am, etc.). So that for each device I can see what its average value is for all the 1am times, 2am times, and so on. I can do this with a loop but sense that there must be a way to do this in zoo with an artful use of aggregate.zoo. Any help?
require(zoo)
# random hourly data over 30 days for five series
x <- matrix(rnorm(24 * 30 * 5),ncol=5)
# Assign hourly data with a real time and date
x.DateTime <- as.POSIXct("2014-01-01 0100",format = "%Y-%m-%d %H") +
seq(0,24 * 30 * 60 * 60, by=3600)
# make a zoo object
x.zoo <- zoo(x, x.DateTime)
#plot(x.zoo)
# what I want:
# the average value for each series at 1am, 2am, 3am, etc. so that
# the dimensions of the output are 24 (hours) by 5 (series)
# If I were just working on x I might do something like:
res <- matrix(NA,ncol=5,nrow=24)
for(i in 1:nrow(res)){
res[i,] <- apply(x[seq(i,nrow(x),by=24),],2,mean)
}
res
# how can I avoid the loop and write an aggregate statement in zoo that
# will get me what I want?
Calculate the hour for each time point and then aggregate by that:
hr <- as.numeric(format(time(x.zoo), "%H"))
ag <- aggregate(x.zoo, hr, mean)
dim(ag)
## [1] 24 5
ADDED
Alternately use hours from chron or hour from data.table:
library(chron)
ag <- aggregate(x.zoo, hours, mean)
This is quite similar to the other answer but takes advantage of the fact the the by=... argument to aggregate.zoo(...) can be a function which will be applied to time(x.zoo):
as.hour <- function(t) as.numeric(format(t,"%H"))
result <- aggregate(x.zoo,as.hour,mean)
identical(result,ag) # ag from G. Grothendieck answer
# [1] TRUE
Note that this produces a result identical to the other answer, not not the same as yours. This is because your dataset starts at 1:00am, not midnight, so your loop produces a matrix wherein the 1st row corresponds to 1:00am and the last row corresponds to midnight. These solutions produce zoo objects wherein the first row corresponds to midnight.

Resources