Daily Time Series Analysis - r

I have a daily time series about the sales of a product, my series start from 01/01/2016 until 31/08/2017, my problem is that I do not know what value of frequency I should use, considering that it is a six-day week (my week starts on Monday and ends Saturday) and there is no data for Sundays.
Should be it like this ?
myts <- ts(sales, start=c(2016, 1), frequency=6)
Thanks for your help !!

ts expects you to have values for each element of the time-series, i.e., it would expect you to have the seventh day values in the data.
One option is to expand the date index to include your missing observations. You could fill those missing observations with na.approx or na, but you can't give ts a six day week and expect it to comprehend it as a seven day cycle.
A good way to do this is to look at zoo, which has specific functions for dealing with these sorts of situations.

It really depends on what you want to do with the data.
1) plot for example, if your objective is simply to plot it then "ts" class is not a good fit since it is not good at representing dates. Try this instead where we have defined test vector for sales and tt in the Note at the end.
library(zoo)
z <- zoo(sales, tt)
plot(z)
2) acf If you want to compute the autocorrelation function then using the plain vector sales or ts(sales) would be fine:
acf(sales)
3) StructTS If you want to fit a structural time series using StructTS then you will need to decide on the length of a cycle, i.e. does it repeat every week? quarter? year?. Typically an annual cycle is appropriate for sales but, in general, you will need two complete cycles to do anything so you don't really have enough data for that.
4) monthly/quarterly If you are willing to reduce it to monthly or quarterly data then you could use ts but you only have 20 points for monthly or 7 for quarterly. Here we have used the last point in each month:
library(zoo)
z <- zoo(sales, tt)
zm <- aggregate(z, as.yearmon, tail, 1)
tsm <- as.ts(zm)
tsm
giving:
Jan Feb Mar Apr May Jun Jul Aug
2016 3.258097 3.931826 4.356709 4.644391 4.867534 5.049856 5.204007 5.342334
2017 5.828946 5.897154 5.968708 6.030685 6.093570 6.150603 6.204558 6.257668
Sep Oct Nov Dec
2016 5.459586 5.564520 5.659482 5.749393
2017
5) weekly Another thing you could consider would be to use weekly series by just using Saturday, for example:
library(zoo)
z <- zoo(sales, tt)
zw <- z[weekdays(time(z)) == "Saturday"]
Note: We used this dummy data:
set.seed(123)
tt <- seq(as.Date("2016-01-01"), as.Date("2017-08-31"), "day")
tt <- tt[! weekdays(tt) == "Sunday"]
n <- length(tt)
sales <- log(1:n)

Related

R How to use a complex function at seasonal period under hydroTSM and xts packages?

I want to calculate the seasonal mean of my parameter values (when x > 0.002). To do this, I use xts::period.apply() to separate the values seasonally. I use the "quarter" period in endpoints(), but the "quarter" period divides the year under four seasons as following:
"January+February+March",
"April+May+June",
"July+August+Septembre",
"October+November+December"
For example:
library(xts)
library(PerformanceAnalytics)
data(edhec)
head(edhec)
edhec_4yr <- edhec["1997/2001"]
ep <- endpoints(edhec_4yr, "quarter")
# mean
period.apply(edhec_4yr, INDEX = ep,
function(x) apply(x,2, function(y) mean(y[y>0.002])))
But for my study, I want my seasonal period divided as following:
"December+January+February",
"March+April+May",
"June+July+August",
"Septembre+October+November"
Can you help me how to change the order months of "quarter" period?
I can use the simple function (mean, max, min) under the hydroTSM package with the following function:
dm2seasonal(edhec_4yr, FUN=mean, season="DJF")
Where:
DJF : December, January, February
MAM : March, April, May
JJA : June, July, August
SON : September, October, November
But I cannot applied the complex function (mean with condition) as the following function:
dm2seasonal(edhec_4yr, season="DJF",
function(x) apply(x,2, function(y) mean(y[y>0.002])))
Can you help me how to improve this function in order to calculate mean value (when x > 0.02) for DJF for example?
The xts::endpoints() function always returns the last observation in a "standard" period, starting from the origin (midnight, 1970-01-01). So it can't easily do what you want.
You can calculate your own period end points by finding the observation on the last day of the last month in each 3-month window. Here's one way to do that with monthly data:
# .indexmon() returns a zero-based month
ep <- which((.indexmon(edhec_4yr) + 1) %in% c(2, 5, 8, 11))
aggfn <- function(x, bound = 0.002, ...) {
apply(x,2, function(y) mean(y[y > bound], ...))
}
period.apply(edhec_4yr, ep, aggfn)
If you have daily data, you need to find the last day of each month your periods end in. You can do that by using .indexmon() to find all months that end each season, then construct an xts object with the locations of all those observations in the original daily data object. Then you can use apply.monthly() and last() to extract the location of the last day of each season-ending month. The resulting object contains the end points you need to pass to period.apply().
data(prices)
prices <- as.xts(prices) # 'prices' is zoo; convert to xts
season_months <- (.indexmon(prices)+1) %in% c(2, 5, 8, 11)
ep_months <- xts(which(season_months), index(prices)[season_months])
ep_seasons <- as.numeric(apply.monthly(ep_months, last))
period.apply(prices, ep_seasons, aggfn)
And I should note that I'm thinking about how to specify end points in a more flexible manner, and I'll make sure to include a way to specify seasons.

What does the ts function do in R

I have downloaded the historical prices between Jan-1-2010 and Dec-31-2014 for Twitter, Inc. (TWTR) -NYSE from YAHOO! FINANCE in a twitter.csv file.
I then loaded it into RStudio using:
x = read.csv("Z:/path/to/file/twitter.csv", header=T,stringsAsFactors=F)
Here is how table x looks like:
View(x)
Then I used ts function to get the time series of Adj.Close:
x.ts = ts(x$Adj.Close, frequency = 12, start=c(2010,1), end=c(2014,12)
x.ts
How the previous results have been obtained? They are really different from table x data. Do they need any adjustements?
Your problem is the scale in which the data are read. With frequency = 12, start=c(2010,1), end=c(2014,12) you are telling the function that you have one number per month. If you have one number per day, as it's your case, you should try with:
x.ts = ts(x$Adj.Close, frequency = 365, start=c(2010,1), end=c(2014,365)
Firstly, frequency should be set to 365 if you deal with daily data, 12 if monthly etc.
Secondly
Secondly, I think you need to arrange the data ascending chronologically before using the ts() function.
The function blindly follows exactly what you are telling it, e.g. the data from the chart starts with the first value 35.87 in 2014-12-31 but the start date in the code is 2010, January, meaning it will attribute that value to being associated with Jan-2010.
x <- x %>%
dplyr::arrange(date)
ts.x <- ts(x$Adj.Close, frequency = 365, start=min(x$date), end=max(x$date))

R - Daily data and Time Series by year and week

Hi I am new to R and this time series forecasting.
I have a sample data of sales by day for past 3 years and I would like to use this data set to produce plot to find seasonality and pattern.
My daily data format is like eg..
Date, Sales
2010-01-01, 5
2010-01-03, 3
2010-01-04, 2
..
2011-12-01, 4
..
2014-11-01, 1
What I want to see is similar to below plot but by week and year using ts function. Also, due to leap year some year has 53 weeks and some 52 weeks, any idea how this taken into account when plotting ?
Playing with this ts function is not easy to me so it will be great if someone could help with this ..
You should start by creating a ts object. Check ?ts for the syntax, but assuming your data above were stored in `data', it's basically
tsData <- ts(data, start=c(2010,1), frequency=365)
where start refers to the (year, month) and frequency is the number of samples per year. Then you can use plot.ts() to plot the entire time series
plot.ts(tsData)
To extract seasonal patterns or trends, you can use the decompose() function.
decompose(tsData)
Here is an sample
x <- 1:10
y <- 11:20
plot(x, y)
lines(x,y)
On your data sales and date.
You can replace y with date and x with sales. If you still have issue, plz post it to me.

How do I add periods to time series in R after aggregation

I have a two variable dataframe (df) in R of daily sales for a ten year period from 2004-07-09 through 2014-12-31. Not every single date is represented in the ten year period, but pretty much most days Monday through Friday.
My objective is to aggregate sales by quarter, convert to a time series object, and run a seasonal decomposition and other time series forecasting.
I am having trouble with the conversion, as ulitmately I receive a error:
time series has no or less than 2 periods
Here's the structure of my code.
# create a time series object
library(xts)
x <- xts(df$amount, df$date)
# create a time series object aggregated by quarter
q.x <- apply.quarterly(x, sum)
When I try to run
fit <- stl(q.x, s.window = "periodic")
I get the error message
series is not periodic or has less than two periods
When I try to run
q.x.components <- decompose(q.x)
# or
decompose(x)
I get the error message
time series has no or less than 2 periods
So, how do I take my original dataframe, with a date variable and an amount variable (sales), aggregate that quarterly as a time series object, and then run a time series analysis?
I think I was able to answer my own question. I did this. Can anyone confirm if this structure makes sense?
library(lubridate)
# add a new variable indicating the calendar year.quarter (i.e. 2004.3) of each observation
df$year.quarter <- quarter(df$date, with_year = TRUE)
library(plyr)
# summarize gift amount by year.quarter
new.data <- ddply(df, .(year.quarter), summarize,
sum = round(sum(amount), 2))
# convert the new data to a quarterly time series object beginning
# in July 2004 (2004, Q3) and ending in December 2014 (2014, Q4)
nd.ts <- ts(new.data$sum, start = c(2004,3), end = c(2014,4), frequency = 4)

Split dataframe and calculate averages for data subsets in R

I have this data frame in R:
steps day month
4758 Tuesday December
9822 Wednesday December
10773 Thursday December
I want to iterate over the data frame and apply a function to the steps column based on the value in the month column. I'm trying to work out the average number of steps per weekday for each month.
I want to output to a new data frame like so where the week days repeat but I only have the average values per day:
average.steps day month
4500 Tuesday December
9000 Wednesday December
1000 Thursday December
I can work out how to work out the averages for the data frame as a whole, but want to use a for loop to apply it just for step values from the same month.
avgsteps <- ddply(DATA, "day", summarise, msteps = mean(steps))
My basic idea for the for function was:
f <- function(m in month) {ddply(DATA, "day", summarise, msteps = mean(steps))}
But it won't process it and throws the error:
Error: unexpected 'in' in "f <- function(m in"
Any help would be greatly appreciated!
EDIT:
SO I've tried #agstudy's suggested fix (below) and it gets the right data structure (single value for each weekday for each month), but the value assigned to each day is identical. I'm a bit confused what could be going wrong.
steps.month.day.avg <- ddply(steps.month.day, .(fitbit.day,fitbit.month), summarise, msteps = mean(steps))
No need to loop here , you should just change the variables to split data frame by,
ddply(DATA, .(day,month), summarise, msteps = mean(steps))

Resources