Weekly time series in R - r

I need to process five years of weekly data. I used the following command to create a time series from that:
my.ts <- ts(x[,3], start = c(2009,12), freq=52)
When plotting the series it looks good. However, the time points of the observations are stored as:
time(my.ts)
# Time Series:
# Start = c(2009, 12)
# End = c(2014, 26)
# Frequency = 52
# [1] 2009.212 2009.231 2009.250 2009.269 2009.288 2009.308 2009.327 ...
I expected to see proper dates instead (which should be aligned with a Calendar). What shall I do?

That is how the "ts" class works.
The zoo package can represent time series with dates (and other indexes):
library(zoo)
z <- zooreg(1:3, start = as.Date("2009-12-01"), deltat = 7)
giving:
> z
2009-12-01 2009-12-08 2009-12-15
1 2 3
> time(z)
[1] "2009-12-01" "2009-12-08" "2009-12-15"
The xts package and a number of other packages can also represent time series with dates although they do it by converting to POSIXct internally whereas zoo maintains the original class.

Related

Date Formatting in Time Series Codes

I have a .csv file that looks like this:
Date
Time
Demand
01-Jan-05
6:30
6
01-Jan-05
6:45
3
...
23-Jan-05
21:45
0
23-Jan-05
22:00
1
The days are broken into 15 minute increments from 6:30 - 22:00.
Now, I am trying to do a time series on this, but I am a little lost on the notation of this.
I have the following so far:
library(tidyverse)
library(forecast)
library(zoo)
tp <- read.csv(".csv")
tp.ts <- ts(tp$DEMAND, start = c(), end = c(), frequency = 63)
The frequency I am after is an entire day, which I believe makes the number 63.***
However, I am unsure as to how to notate the dates in c().
***Edit
If the frequency is meant to be observations per a unit of time, and I am trying to observe just (Demand) by the 15 minute time slots (Time) in each day (Date), maybe my Frequency is 1?
***Edit 2
So I think I am struggling with doing the time series because I have a Date column (which is characters) and a Time column.
Since I need the data for Demand at the given hours on the dates, maybe I need to convert the dates to be used in ts() and combine the Date and Time date into a new column?
If I do this, I am assuming this should give me the times I need (6:30 to 22:00) but with the addition of having the date?
However, the data is to be used to predict the Demand for the rest of the month. So maybe the Date is an important variable if the day of the week impacts Demand?
We assume you are starting with tp shown reproducibly in the Note at the end. A complete cycle of 24 * 4 = 96 points should be represented by one unit of time internally. The chron class does that so read it in as a zoo series z with chron time index and then convert that to ts giving ts_ser or possibly leave it as a zoo series depending on what you are going to do next.
library(zoo)
library(chron)
to_chron <- function(date, time) as.chron(paste(date, time), "%d-%b-%y %H:%M")
z <- read.zoo(tp, index = 1:2, FUN = to_chron, frequency = 4 * 24)
ts_ser <- as.ts(z)
Note
tp <- structure(list(Date = c("01-Jan-05", "01-Jan-05"), Time = c("6:30",
"6:45"), Demand = c(6L, 3L)), row.names = 1:2, class = "data.frame")

R- Conversion to TS

I am new to R and I am currently struggling to convert a set of data into TS format.
Call_VolumeTS10 <- ts(Forecast_Data_Test$`Call Volume`, frequency = 578, start = c(2019, 1,1), end=c(2020, 7, 31))
However, the code does not properly convert data into a daily time series.
Is it a problem with my code?
As pointed out by #AlexB, the value for frequency is certainly oddly defined. However, the error you're having it's also related to how you defined start and end.
You cannot define start and end with a numeric vector of length 3. It must be a vector of length 2: the first number should be the year (or, generically speaking, the number of seasons that have passed) and the second number should be a number between 1 and the value of the frequency.
To properly write your ts, you should use this code:
Call_VolumeTS10 <- ts(Forecast_Data_Test$`Call Volume`, frequency = 365.25, start = c(2019, 1))
Start, end and frequency will be defined as follow:
your_data <- rnorm(578)
your_ts <- ts(your_data, start = c(2019, 1), frequency = 365.25)
tsp(your_ts)
#> [1] 2019.00 2020.58 365.25
## respectively: start end frequency
However, I suppose you want to define a ts to forecast it.
The problem is that a frequency of 365.25 is rarely handled correctly by forecasting methods (for example forecast::auto.arima or forecast::ets).
Probably you may need to use frequency = 7. Of course, in that case, the value for end in the time series definition will make no sense.
your_data <- rnorm(578)
your_ts <- ts(your_data, start = c(2019, 1), frequency = 7)
tsp(your_ts)
#> [1] 2019.000 2101.429 7.000
## respectively: start end frequency
Obviously, it has no meaning. So it would just make more sense to define it this way:
your_data <- rnorm(578)
your_ts <- ts(your_data, frequency = 7)
tsp(your_ts)
#> [1] 1.00000 83.42857 7.00000
## respectively: start end frequency
In this case, you can interpret the difference between 1.000 and 83.428 as if 82 weeks has passed since the beginning of the time series (plus a couple of days).
Alternatively, you can use the msts function from the forecast package that allows you to define multiple frequencies.
library(forecast)
msts(your_ts, start = c(2019, 1), seasonal.periods = c(7, 365.25))
your_ts
#> Multi-Seasonal Time Series:
#> Start: 2019 1
#> Seasonal Periods: 7 365.25
#> Data:
#> ...
That msts object is well-integrated with forecast::fourier and forecast::tbats.
I suggest you to have a look at this for some ideas about it.
About how to write start in a proper way...
"2019-01-01" is a pretty convenient day, because the right value for start will be c(2019, 1). However, if you find yourself with the need to write a different start date, I suggest you to use this code to define the start time:
start = c(lubridate::year(date), as.numeric(format(date, "%j")))
Where date is any date in the format yyyy-mm-dd.

R: What does the frequency argument to xts do? [duplicate]

I'm creating an xts object with a weekly (7 day) frequency to use in forecasting. However, even when using the frequency=7 argument in the xts call, the resulting xts object has a frequency of 1.
Here's an example with random data:
> values <- rnorm(364, 10)
> days <- seq.Date(from=as.Date("2014-01-01"), to=as.Date("2014-12-30"), by='days')
> x <- xts(values, order.by=days, frequency=7)
> frequency(x)
[1] 1
I have also tried, after using the above code, frequency(x) <- 7. However, this changes the class of x to only zooreg and zoo, losing the xts class and messing with the time stamp formats.
Does xts automatically choose a frequency based on analyzing the data in some way? If so, how can you override this to set a specific frequency for forecasting purposes (in this case, passing a seasonal time series to ets from the forecast package)?
I understand that xts may not allow frequencies that don't make sense, but a frequency of 7 with daily time stamps seems pretty logical.
Consecutive Date class dates always have a frequency of 1 since consecutive dates are 1 apart. Use ts or zooreg to get a frequency of 7:
tt <- ts(values, frequency = 7)
library(zoo)
zr <- as.zooreg(tt)
# or
zr <- zooreg(values, frequency = 7)
These will create a series whose times are 1, 1+1/7, 1+2/7, ...
If we have some index values of zr
zrdates <- index(zr)[5:12]
we can recover the dates from zrdates like this:
days[match(zrdates, index(zr))]
As pointed out in the comments xts does not support this type of series.

How do I add periods to time series in R after aggregation

I have a two variable dataframe (df) in R of daily sales for a ten year period from 2004-07-09 through 2014-12-31. Not every single date is represented in the ten year period, but pretty much most days Monday through Friday.
My objective is to aggregate sales by quarter, convert to a time series object, and run a seasonal decomposition and other time series forecasting.
I am having trouble with the conversion, as ulitmately I receive a error:
time series has no or less than 2 periods
Here's the structure of my code.
# create a time series object
library(xts)
x <- xts(df$amount, df$date)
# create a time series object aggregated by quarter
q.x <- apply.quarterly(x, sum)
When I try to run
fit <- stl(q.x, s.window = "periodic")
I get the error message
series is not periodic or has less than two periods
When I try to run
q.x.components <- decompose(q.x)
# or
decompose(x)
I get the error message
time series has no or less than 2 periods
So, how do I take my original dataframe, with a date variable and an amount variable (sales), aggregate that quarterly as a time series object, and then run a time series analysis?
I think I was able to answer my own question. I did this. Can anyone confirm if this structure makes sense?
library(lubridate)
# add a new variable indicating the calendar year.quarter (i.e. 2004.3) of each observation
df$year.quarter <- quarter(df$date, with_year = TRUE)
library(plyr)
# summarize gift amount by year.quarter
new.data <- ddply(df, .(year.quarter), summarize,
sum = round(sum(amount), 2))
# convert the new data to a quarterly time series object beginning
# in July 2004 (2004, Q3) and ending in December 2014 (2014, Q4)
nd.ts <- ts(new.data$sum, start = c(2004,3), end = c(2014,4), frequency = 4)

Finding a more elegant was to aggregate hourly data to mean hourly data using zoo

I have a chunk of data logging temperatures from a few dozen devices every hour for over a year. The data are stored as a zoo object. I'd very much like to summarize those data by looking at the average values for every one of the 24 hours in a day (1am, 2am, 3am, etc.). So that for each device I can see what its average value is for all the 1am times, 2am times, and so on. I can do this with a loop but sense that there must be a way to do this in zoo with an artful use of aggregate.zoo. Any help?
require(zoo)
# random hourly data over 30 days for five series
x <- matrix(rnorm(24 * 30 * 5),ncol=5)
# Assign hourly data with a real time and date
x.DateTime <- as.POSIXct("2014-01-01 0100",format = "%Y-%m-%d %H") +
seq(0,24 * 30 * 60 * 60, by=3600)
# make a zoo object
x.zoo <- zoo(x, x.DateTime)
#plot(x.zoo)
# what I want:
# the average value for each series at 1am, 2am, 3am, etc. so that
# the dimensions of the output are 24 (hours) by 5 (series)
# If I were just working on x I might do something like:
res <- matrix(NA,ncol=5,nrow=24)
for(i in 1:nrow(res)){
res[i,] <- apply(x[seq(i,nrow(x),by=24),],2,mean)
}
res
# how can I avoid the loop and write an aggregate statement in zoo that
# will get me what I want?
Calculate the hour for each time point and then aggregate by that:
hr <- as.numeric(format(time(x.zoo), "%H"))
ag <- aggregate(x.zoo, hr, mean)
dim(ag)
## [1] 24 5
ADDED
Alternately use hours from chron or hour from data.table:
library(chron)
ag <- aggregate(x.zoo, hours, mean)
This is quite similar to the other answer but takes advantage of the fact the the by=... argument to aggregate.zoo(...) can be a function which will be applied to time(x.zoo):
as.hour <- function(t) as.numeric(format(t,"%H"))
result <- aggregate(x.zoo,as.hour,mean)
identical(result,ag) # ag from G. Grothendieck answer
# [1] TRUE
Note that this produces a result identical to the other answer, not not the same as yours. This is because your dataset starts at 1:00am, not midnight, so your loop produces a matrix wherein the 1st row corresponds to 1:00am and the last row corresponds to midnight. These solutions produce zoo objects wherein the first row corresponds to midnight.

Resources