Converting df into ts object and decompose in 15 minute intervals in R - r

I know there has been a lot on this topic already but I can't seem to get what I want working.
I've read:
how to convert data frame into time series in R
Convert data frame with date column to timeseries
As well as several others but can't get it to work.
I have the following df
df <- data.frame(CloseTime = c("2017-09-13 19:15:00","2017-09-13 19:30:00","2017-09-13 19:45:00","2017-09-13 20:00:00","2017-09-13 20:15:00"),
OpenPice = c(271.23,269.50,269.82,269.10,269.50),
HightPrice = c(271.23,269.50,269.82,269.10,269.50),
LowPrice = c(271.23,269.50,269.82,269.10,269.50),
ClosePrice = c(271.23,269.50,269.82,269.10,269.50))
I'd like to convert it into a tsobject, with 15-minute intervals and decompose the time series.
I also read that the zoo package allows you to decompose specific multiple intervals i.e. 15 mins, 1h, 1 day?
Can someone please help. How can I convert this into a ts object and decompose my ts object?

Just for the reproducibility purpose, another toy-example with longer period of time.
df <-
data.frame(
CloseTime = seq(as.POSIXct("2017-09-13 19:15:00"),as.POSIXct("2018-10-20 21:45:00"),by="15 mins"),
ClosePrice1 = cumsum(rnorm(38603)),
ClosePrice2 = cumsum(rnorm(38603)),
ClosePrice3 = cumsum(rnorm(38603))
)
I found it much better to aggregate time series into different intervals using dplyr and lubridate::floor_date. Instead of mean, one can summarise using min, max, first, last. I would recommend stay around the tidyr to keep code readable. Below example converting into 30minutes interval.
library(lubridate); library(dplyr); library(magrittr)
df30m <-
df %>%
group_by( CloseTime = floor_date( CloseTime, "30 mins")) %>%
summarize_all(mean)
Data.frame can be converted to timeseries object such as zoo and than to ts for decomposing purposes.
library(zoo)
df30m_zoo <- zoo( df30m[-1], order.by = df30m$CloseTime )
df30m_ts <- ts(df30m_zoo, start=1, frequency = 2 * pi)
df30m_decomposed <- decompose(df30m_ts)

The points are already 15 minutes apart so assuming that you want a period of 1 day this will convert it. There are 24 * 60 * 60 seconds in a day (which s the period) but you can change the denominator to the number of seconds in a period get a different period. You will need at least two periods of data to decompose it.
library(zoo)
z <- read.zoo(df)
time(z) <- (as.numeric(time(z)) - as.numeric(start(z))) / (24 * 60 * 60)
as.ts(z)
giving:
Time Series:
Start = c(0, 1)
End = c(0, 5)
Frequency = 96
OpenPice HightPrice LowPrice ClosePrice
0.00000000 271.23 271.23 271.23 271.23
0.01041667 269.50 269.50 269.50 269.50
0.02083333 269.82 269.82 269.82 269.82
0.03125000 269.10 269.10 269.10 269.10
0.04166667 269.50 269.50 269.50 269.50
Alhtough not asked for in the question, in another answer the data was converted to 30 minutes. That could readily be done like this:
library(xts) # also loads zoo
z <- read.zoo(df)
to.minutes30(z)

Related

Date Formatting in Time Series Codes

I have a .csv file that looks like this:
Date
Time
Demand
01-Jan-05
6:30
6
01-Jan-05
6:45
3
...
23-Jan-05
21:45
0
23-Jan-05
22:00
1
The days are broken into 15 minute increments from 6:30 - 22:00.
Now, I am trying to do a time series on this, but I am a little lost on the notation of this.
I have the following so far:
library(tidyverse)
library(forecast)
library(zoo)
tp <- read.csv(".csv")
tp.ts <- ts(tp$DEMAND, start = c(), end = c(), frequency = 63)
The frequency I am after is an entire day, which I believe makes the number 63.***
However, I am unsure as to how to notate the dates in c().
***Edit
If the frequency is meant to be observations per a unit of time, and I am trying to observe just (Demand) by the 15 minute time slots (Time) in each day (Date), maybe my Frequency is 1?
***Edit 2
So I think I am struggling with doing the time series because I have a Date column (which is characters) and a Time column.
Since I need the data for Demand at the given hours on the dates, maybe I need to convert the dates to be used in ts() and combine the Date and Time date into a new column?
If I do this, I am assuming this should give me the times I need (6:30 to 22:00) but with the addition of having the date?
However, the data is to be used to predict the Demand for the rest of the month. So maybe the Date is an important variable if the day of the week impacts Demand?
We assume you are starting with tp shown reproducibly in the Note at the end. A complete cycle of 24 * 4 = 96 points should be represented by one unit of time internally. The chron class does that so read it in as a zoo series z with chron time index and then convert that to ts giving ts_ser or possibly leave it as a zoo series depending on what you are going to do next.
library(zoo)
library(chron)
to_chron <- function(date, time) as.chron(paste(date, time), "%d-%b-%y %H:%M")
z <- read.zoo(tp, index = 1:2, FUN = to_chron, frequency = 4 * 24)
ts_ser <- as.ts(z)
Note
tp <- structure(list(Date = c("01-Jan-05", "01-Jan-05"), Time = c("6:30",
"6:45"), Demand = c(6L, 3L)), row.names = 1:2, class = "data.frame")

Create date index and add to data frame in R

Currently transitioning from Python to R. In Python, you can create a date range with pandas and add it to a data frame like so;
data = pd.read_csv('Data')
dates = pd.date_range('2006-01-01 00:00', periods=2920, freq='3H')
df = pd.DataFrame({'data' : data}, index = dates)
How can I do this in R?
Further, if I want to compare 2 datasets with different lengths but same time span, you can resample the dataset with lower frequency so it can be the same length as the higher frequency by placing 'NaNs' in the holes like so:
df2 = pd.read_csv('data2') #3 hour resolution = 2920 points of data
data2 = df2.resample('30Min').asfreq() #30 Min resolution = 17520 points
I guess I'm basically looking for a Pandas package equivalent for R. How can I code these in R?
The following is a way of getting your time-series data from a given time interval (3 hours)to another (30 minutes):
Get the data:
starter_df <- data.frame(dates=seq(from=(as.POSIXct(strftime("2006-01-01 00:00"))),
length.out = 2920,
by="3 hours"),
data = rnorm(2920))
Get the full sequence in 30 minute intervals and replace the NA's with the values from the starter_df data.frame:
full_data <- data.frame(dates=seq(from=min(starter_df$dates),
to=max(starter_df$dates), by="30 min"),
data=rep(NA,NROW(seq(from=min(starter_df$dates),
to=max(starter_df$dates), by="30 min"))))
full_data[full_data$dates %in% starter_df$dates,] <- starter_df[starter_df$dates %in% full_data$dates,]
I hope it helps.

Convert a time series from minutes to Day period

I've got an R time series object that is measured in 1 hour intervals.
library(lubridate)
library(timeSeries)
set.seed(100)
c <- Sys.time()
d <- c + hours(1:200)
e <- rnorm(200)
f <- data.frame(d,e)
g <- as.timeSeries(f)
I would like to convert this to a daily time, , I am fine with using the average or value of the data column for this conversion.
The outcome would be a time series object with one entry per day whose value is the average of all the hourly values of that particular day.
How can this be done?
First, take advantage of lubridate package to calculate date:
library(lubridate)
f$date <- floor_date(ymd_hms(f$d), "day")
Then, calculate average for given day with
library(dplyr)
dplyr::group_by(f, date) %>%
dplyr::summarise(avg = mean(e))
And use this for time series.

Finding a more elegant was to aggregate hourly data to mean hourly data using zoo

I have a chunk of data logging temperatures from a few dozen devices every hour for over a year. The data are stored as a zoo object. I'd very much like to summarize those data by looking at the average values for every one of the 24 hours in a day (1am, 2am, 3am, etc.). So that for each device I can see what its average value is for all the 1am times, 2am times, and so on. I can do this with a loop but sense that there must be a way to do this in zoo with an artful use of aggregate.zoo. Any help?
require(zoo)
# random hourly data over 30 days for five series
x <- matrix(rnorm(24 * 30 * 5),ncol=5)
# Assign hourly data with a real time and date
x.DateTime <- as.POSIXct("2014-01-01 0100",format = "%Y-%m-%d %H") +
seq(0,24 * 30 * 60 * 60, by=3600)
# make a zoo object
x.zoo <- zoo(x, x.DateTime)
#plot(x.zoo)
# what I want:
# the average value for each series at 1am, 2am, 3am, etc. so that
# the dimensions of the output are 24 (hours) by 5 (series)
# If I were just working on x I might do something like:
res <- matrix(NA,ncol=5,nrow=24)
for(i in 1:nrow(res)){
res[i,] <- apply(x[seq(i,nrow(x),by=24),],2,mean)
}
res
# how can I avoid the loop and write an aggregate statement in zoo that
# will get me what I want?
Calculate the hour for each time point and then aggregate by that:
hr <- as.numeric(format(time(x.zoo), "%H"))
ag <- aggregate(x.zoo, hr, mean)
dim(ag)
## [1] 24 5
ADDED
Alternately use hours from chron or hour from data.table:
library(chron)
ag <- aggregate(x.zoo, hours, mean)
This is quite similar to the other answer but takes advantage of the fact the the by=... argument to aggregate.zoo(...) can be a function which will be applied to time(x.zoo):
as.hour <- function(t) as.numeric(format(t,"%H"))
result <- aggregate(x.zoo,as.hour,mean)
identical(result,ag) # ag from G. Grothendieck answer
# [1] TRUE
Note that this produces a result identical to the other answer, not not the same as yours. This is because your dataset starts at 1:00am, not midnight, so your loop produces a matrix wherein the 1st row corresponds to 1:00am and the last row corresponds to midnight. These solutions produce zoo objects wherein the first row corresponds to midnight.

Calculating a daily mean in R

Say I have the following matrix:
x1 = 1:288
x2 = matrix(x1,nrow=96,ncol=3)
Is there an easy way to get the mean of rows 1:24,25:48,49:72,73:96 for column 2?
Basically I have a one year time series and I have to average some data every 24 hours.
There is.
Suppose we have the days :
Days <- rep(1:4,each=24)
you could do easily
tapply(x2[,2],Days,mean)
If you have a dataframe with a Date variable, you can use that one. You can do that for all variables at once, using aggregate :
x2 <- as.data.frame(cbind(x2,Days))
aggregate(x2[,1:3],by=list(Days),mean)
Take a look at the help files of these functions to start with. Also do a search here, there are quite some other interesting answers on this problem :
Aggregating daily content
Compute means of a group by factor
PS : If you're going to do a lot of timeseries, you should take a look at the zoo package (on CRAN : http://cran.r-project.org/web/packages/zoo/index.html )
1) ts. Since this is a regularly spaced time series, convert it to a ts series and then aggregate it from frequency 24 to frequency 1:
aggregate(ts(x2[, 2], freq = 24), 1, mean)
giving:
Time Series:
Start = 1
End = 4
Frequency = 1
[1] 108.5 132.5 156.5 180.5
2) zoo. Here it is using zoo. The zoo package can also handle irregularly spaced series (if we needed to extend this). Below day.hour is the day number (1, 2, 3, 4) plus the hour as a fraction of the day so that floor(day.hour) is just the day number:
library(zoo)
day.hour <- seq(1, length = length(x2[, 2]), by = 1/24)
z <- zoo(x2[, 2], day.hour)
aggregate(z, floor, mean)
## 1 2 3 4
## 108.5 132.5 156.5 180.5
If zz is the output from aggregate then coredata(zz) and time(zz) are the values and times, respectively, as ordinary vectors.
Quite compact and computationally fast way of doing this is to reshape the vector into a suitable matrix and calculating the column means.
colMeans(matrix(x2[,2],nrow=24))

Resources