R- Conversion to TS - r

I am new to R and I am currently struggling to convert a set of data into TS format.
Call_VolumeTS10 <- ts(Forecast_Data_Test$`Call Volume`, frequency = 578, start = c(2019, 1,1), end=c(2020, 7, 31))
However, the code does not properly convert data into a daily time series.
Is it a problem with my code?

As pointed out by #AlexB, the value for frequency is certainly oddly defined. However, the error you're having it's also related to how you defined start and end.
You cannot define start and end with a numeric vector of length 3. It must be a vector of length 2: the first number should be the year (or, generically speaking, the number of seasons that have passed) and the second number should be a number between 1 and the value of the frequency.
To properly write your ts, you should use this code:
Call_VolumeTS10 <- ts(Forecast_Data_Test$`Call Volume`, frequency = 365.25, start = c(2019, 1))
Start, end and frequency will be defined as follow:
your_data <- rnorm(578)
your_ts <- ts(your_data, start = c(2019, 1), frequency = 365.25)
tsp(your_ts)
#> [1] 2019.00 2020.58 365.25
## respectively: start end frequency
However, I suppose you want to define a ts to forecast it.
The problem is that a frequency of 365.25 is rarely handled correctly by forecasting methods (for example forecast::auto.arima or forecast::ets).
Probably you may need to use frequency = 7. Of course, in that case, the value for end in the time series definition will make no sense.
your_data <- rnorm(578)
your_ts <- ts(your_data, start = c(2019, 1), frequency = 7)
tsp(your_ts)
#> [1] 2019.000 2101.429 7.000
## respectively: start end frequency
Obviously, it has no meaning. So it would just make more sense to define it this way:
your_data <- rnorm(578)
your_ts <- ts(your_data, frequency = 7)
tsp(your_ts)
#> [1] 1.00000 83.42857 7.00000
## respectively: start end frequency
In this case, you can interpret the difference between 1.000 and 83.428 as if 82 weeks has passed since the beginning of the time series (plus a couple of days).
Alternatively, you can use the msts function from the forecast package that allows you to define multiple frequencies.
library(forecast)
msts(your_ts, start = c(2019, 1), seasonal.periods = c(7, 365.25))
your_ts
#> Multi-Seasonal Time Series:
#> Start: 2019 1
#> Seasonal Periods: 7 365.25
#> Data:
#> ...
That msts object is well-integrated with forecast::fourier and forecast::tbats.
I suggest you to have a look at this for some ideas about it.
About how to write start in a proper way...
"2019-01-01" is a pretty convenient day, because the right value for start will be c(2019, 1). However, if you find yourself with the need to write a different start date, I suggest you to use this code to define the start time:
start = c(lubridate::year(date), as.numeric(format(date, "%j")))
Where date is any date in the format yyyy-mm-dd.

Related

Date Formatting in Time Series Codes

I have a .csv file that looks like this:
Date
Time
Demand
01-Jan-05
6:30
6
01-Jan-05
6:45
3
...
23-Jan-05
21:45
0
23-Jan-05
22:00
1
The days are broken into 15 minute increments from 6:30 - 22:00.
Now, I am trying to do a time series on this, but I am a little lost on the notation of this.
I have the following so far:
library(tidyverse)
library(forecast)
library(zoo)
tp <- read.csv(".csv")
tp.ts <- ts(tp$DEMAND, start = c(), end = c(), frequency = 63)
The frequency I am after is an entire day, which I believe makes the number 63.***
However, I am unsure as to how to notate the dates in c().
***Edit
If the frequency is meant to be observations per a unit of time, and I am trying to observe just (Demand) by the 15 minute time slots (Time) in each day (Date), maybe my Frequency is 1?
***Edit 2
So I think I am struggling with doing the time series because I have a Date column (which is characters) and a Time column.
Since I need the data for Demand at the given hours on the dates, maybe I need to convert the dates to be used in ts() and combine the Date and Time date into a new column?
If I do this, I am assuming this should give me the times I need (6:30 to 22:00) but with the addition of having the date?
However, the data is to be used to predict the Demand for the rest of the month. So maybe the Date is an important variable if the day of the week impacts Demand?
We assume you are starting with tp shown reproducibly in the Note at the end. A complete cycle of 24 * 4 = 96 points should be represented by one unit of time internally. The chron class does that so read it in as a zoo series z with chron time index and then convert that to ts giving ts_ser or possibly leave it as a zoo series depending on what you are going to do next.
library(zoo)
library(chron)
to_chron <- function(date, time) as.chron(paste(date, time), "%d-%b-%y %H:%M")
z <- read.zoo(tp, index = 1:2, FUN = to_chron, frequency = 4 * 24)
ts_ser <- as.ts(z)
Note
tp <- structure(list(Date = c("01-Jan-05", "01-Jan-05"), Time = c("6:30",
"6:45"), Demand = c(6L, 3L)), row.names = 1:2, class = "data.frame")

How to find decimal representation of years in R?

Since I need reasonably accurate representations of years in decimal format (~ 4-5 digits of accuracy would work) I turned to the lubridate package. This is what I have tried:
refDate <- as.Date("2016-01-10")
endDate <- as.Date("2020-12-31")
daysInLeapYear <- 366
daysInRegYear <- 365
leapYearFractStart <- 0
leapYearRegStart <- 0
daysInterval <- as.interval(difftime(endDate, refDate, unit = "d"), start = refDate)
periodObject <- as.period(daysInterval)
if(leap_year(refDate)) {
leapYearFractStart <- (as.numeric(days_in_month(refDate))-as.numeric(format(refDate, "%d")))/daysInLeapYear
}
if(!leap_year(refDate)) {
leapYearRegStart <- (as.numeric(days_in_month(refDate))-as.numeric(format(refDate, "%d")))/daysInRegYear
}
returnData <- periodObject#year+(periodObject#month/12)+leapYearFractStart+leapYearRegStart
It is safe to assume that the end date is always at the end of a month, hence no leap year check at the end. Relying on lubridate for proper year/month counting I am adjusting for leap-years only for the start date.
I recon this gets me to within 3 digits of accuracy only! In addition, it looks a bit crude.
Is there a more complete and accurate procedure to determine decimal representation of years in an interval?
It's very unclear what you're trying to do exactly here, which makes accuracy difficult to talk about.
lubridate has a function decimal_date which turns dates into decimals. But since 3 decimal places gives you 1000 possible positions within a year, when we only have 365/366 days, there are between 2 and 3 viable values that fall within a day. Accuracy depends on when in the day you want the result to fall.
> decimal_date(as.POSIXlt("2016-01-10 00:00:01"))
[1] 2016.025
> decimal_date(as.POSIXlt("2016-01-10 12:00:00"))
[1] 2016.026
> decimal_date(as.POSIXlt("2016-01-10 23:59:59"))
[1] 2016.027
In other words, going beyond 3 decimal places is only really important if you're interested in the time of day.
This solution uses only base R. We get the beginning of the year using cut(..., "year") and the number of days in the year by differencing it with the beginning of the next year obtained using cut(..., "year") on an arbitrary date in the following year. Finally use those quantities to get the fraction and add it to the year.
d <- as.Date(c("2015-01-31", "2016-01-01", "2016-01-10", "2016-12-31")) # sample input
year_begin <- as.Date(cut(d, "year"))
days_in_year <- as.numeric( as.Date(cut(year_begin + 366, "year")) - year_begin )
as.numeric(format(d, "%Y")) + as.numeric(d - year_begin) / days_in_year
## [1] 2015.082 2016.000 2016.025 2016.997
Alternately, using as.POSIXlt this variation crams it into one line:
with(unclass(as.POSIXlt(d)),1900+year+yday/as.numeric(as.Date(cut(d-yday+366,"y"))-d+yday))
## [1] 2015.082 2016.000 2016.025 2016.997

What does the ts function do in R

I have downloaded the historical prices between Jan-1-2010 and Dec-31-2014 for Twitter, Inc. (TWTR) -NYSE from YAHOO! FINANCE in a twitter.csv file.
I then loaded it into RStudio using:
x = read.csv("Z:/path/to/file/twitter.csv", header=T,stringsAsFactors=F)
Here is how table x looks like:
View(x)
Then I used ts function to get the time series of Adj.Close:
x.ts = ts(x$Adj.Close, frequency = 12, start=c(2010,1), end=c(2014,12)
x.ts
How the previous results have been obtained? They are really different from table x data. Do they need any adjustements?
Your problem is the scale in which the data are read. With frequency = 12, start=c(2010,1), end=c(2014,12) you are telling the function that you have one number per month. If you have one number per day, as it's your case, you should try with:
x.ts = ts(x$Adj.Close, frequency = 365, start=c(2010,1), end=c(2014,365)
Firstly, frequency should be set to 365 if you deal with daily data, 12 if monthly etc.
Secondly
Secondly, I think you need to arrange the data ascending chronologically before using the ts() function.
The function blindly follows exactly what you are telling it, e.g. the data from the chart starts with the first value 35.87 in 2014-12-31 but the start date in the code is 2010, January, meaning it will attribute that value to being associated with Jan-2010.
x <- x %>%
dplyr::arrange(date)
ts.x <- ts(x$Adj.Close, frequency = 365, start=min(x$date), end=max(x$date))

Weekly time series in R

I need to process five years of weekly data. I used the following command to create a time series from that:
my.ts <- ts(x[,3], start = c(2009,12), freq=52)
When plotting the series it looks good. However, the time points of the observations are stored as:
time(my.ts)
# Time Series:
# Start = c(2009, 12)
# End = c(2014, 26)
# Frequency = 52
# [1] 2009.212 2009.231 2009.250 2009.269 2009.288 2009.308 2009.327 ...
I expected to see proper dates instead (which should be aligned with a Calendar). What shall I do?
That is how the "ts" class works.
The zoo package can represent time series with dates (and other indexes):
library(zoo)
z <- zooreg(1:3, start = as.Date("2009-12-01"), deltat = 7)
giving:
> z
2009-12-01 2009-12-08 2009-12-15
1 2 3
> time(z)
[1] "2009-12-01" "2009-12-08" "2009-12-15"
The xts package and a number of other packages can also represent time series with dates although they do it by converting to POSIXct internally whereas zoo maintains the original class.

How do I add periods to time series in R after aggregation

I have a two variable dataframe (df) in R of daily sales for a ten year period from 2004-07-09 through 2014-12-31. Not every single date is represented in the ten year period, but pretty much most days Monday through Friday.
My objective is to aggregate sales by quarter, convert to a time series object, and run a seasonal decomposition and other time series forecasting.
I am having trouble with the conversion, as ulitmately I receive a error:
time series has no or less than 2 periods
Here's the structure of my code.
# create a time series object
library(xts)
x <- xts(df$amount, df$date)
# create a time series object aggregated by quarter
q.x <- apply.quarterly(x, sum)
When I try to run
fit <- stl(q.x, s.window = "periodic")
I get the error message
series is not periodic or has less than two periods
When I try to run
q.x.components <- decompose(q.x)
# or
decompose(x)
I get the error message
time series has no or less than 2 periods
So, how do I take my original dataframe, with a date variable and an amount variable (sales), aggregate that quarterly as a time series object, and then run a time series analysis?
I think I was able to answer my own question. I did this. Can anyone confirm if this structure makes sense?
library(lubridate)
# add a new variable indicating the calendar year.quarter (i.e. 2004.3) of each observation
df$year.quarter <- quarter(df$date, with_year = TRUE)
library(plyr)
# summarize gift amount by year.quarter
new.data <- ddply(df, .(year.quarter), summarize,
sum = round(sum(amount), 2))
# convert the new data to a quarterly time series object beginning
# in July 2004 (2004, Q3) and ending in December 2014 (2014, Q4)
nd.ts <- ts(new.data$sum, start = c(2004,3), end = c(2014,4), frequency = 4)

Resources