Understanding the time() and cycle() functions in R - r

I have time series of the monthly international airplane passengers for many years, what does the time function applied to my data set tell me? What does the cycle function do to my data set? What are these functions useful for?

The syntax of the time statement would be as follows:-
library(tseries)
library(forecast)
data(AirPassengers)
AP <- AirPassengers
time(AP, offset(0))
The offset of 0 indicates that the sampling of this dataset took place at the start of the time series and the time function would create a vector of times, starting the first unit or the first month in this case.
The cycle function on the other hand, will just show the position of the datapoint or the observation in the entire cycle. The syntax would be
cycle(AP)
If you run this in R, you would see that the position assigned to Jan is 1, Feb is 2, March is 3 and so on for all the 12 months.
Use of these functions would be to get an overview of the data before starting to model it.
Here is an additional link for you to browse.

An old question, but perhaps someone is googling this, so an illuminating example could still help some. When trying to make sense Cowpertwait's R book...
Try this to see a visual example of what is going on:
AP
cycle(AP)
layout(matrix(c(1,2,3, 4), 2, 2, byrow = TRUE),widths=c(1,1), heights=c(1,1))
plot(AP)
plot(aggregate(AP))
boxplot(AP)
boxplot(AP ~ cycle(AP))

Related

Analyzing disparate time series in R

Are there tools in R that simplify analysis of lagged and disparate time series. For example:
Daily values that only occur on weekdays (no entry on weekends or holidays)
vs
Bi-annual values
What I'm seeking is ways to:
Complete the missing daily values (with interpolated, or last value rolled forward, etc.)
Look for correlation between daily values and the bi-annual value (only the values that came before the bi-annual event)
As an example:
10-year treasury note interest rate (daily on non-holiday weekdays) as "X" and i-bond fixed rate as "Y" (set May 1/Nov 1)
Any suggestions appreciated.
I've built a test dataset manually for "x" and used functions in zoo to populate the missing values (interpolated), but I'm hoping for a less "brute-force" method for looking at analyzing the disparate time series. I've used lag functions in the past, but those were on matching interval time series.
What Jon commented is what I had in mind:
expand a weekday time series to full week using missing value function(s) in zoo
Sample the daily value - say April 15 for the May 1, Oct 15 for Nov 1
Ideally be able to automate - say loop through April 1-30, Oct 1-30 to look for highest RSqr for the model of choice (linear, polynomial, etc.)
Not have to build discrete datasets for each of the above - but if that is what is required I can do it programmatically - I've done that with stock data in the past. I was looking for a more efficient means of selecting the datasets ad hoc during the analysis.
I don't have code to post, because I'm clueless as to the feature/function that would make the date selection I'm after possible (at least in R).
Thanks for the input so far. It has already been useful in helping me look at alternative methods to achieve what I'm after.

dealing with NA in seasonal cycle analysis R

I have a timeseries of monthly data with lots of missing datapoints, set to NA. I want to simply subtract the annual cycle from the data, ignoring the missing entries. It seems that the decompose function can't handle missing data points, but I have seen elsewhere that the seasonal package is suggested instead. However I am also running into problems there too with the NA.
Here is a minimum reproducible example of the problem using a built in dataset...
library(seasonal)
# set range to missing NA in Co2 dataset
c2<-co2
c2[c2>330 & c2<350]=NA
seas(c2,na.action=na.omit)
Error in na.omit.ts(x) : time series contains internal NAs
Yes, I know! that's why I asked you to omit them! Let's try this:
seas(c2,na.action=na.x13)
Error: X-13 run failed
Errors:
- Adding MV1981.Apr exceeds the number of regression effects
allowed in the model (80).
Hmmm, interesting, no idea what that means, okay, please just exclude the NA:
seas(c2,na.action=na.exclude)
Error in na.omit.ts(x) : time series contains internal NAs
that didn't help much! and for good measure
decompose(c2)
Error in na.omit.ts(x) : time series contains internal NAs
I'm on the following:
R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
Why is leaving out NA such a problem? I'm obviously being completely stupid, but I can't see what I'm doing wrong with the seas function. Happy to consider an alternative solution using xts.
My first solution, simply manually calculating the seasonal cycle, converting to a dataframe to subtract the vector and then transforming back.
# seasonal cycle
scycle=tapply(c2,cycle(c2),mean,na.rm=T)
# converting to df
df=tapply(c2, list(year=floor(time(c2)), month = cycle(c2)), c)
# subtract seasonal cycle
for (i in 1:nrow(df)){df[i,]=df[i,]-scycle}
# convert back to timeseries
anomco2=ts(c(t(df)),start=start(c2),freq=12)
Not very pretty, and not very efficient either.
The comment of missuse lead me to another Seasonal decompose of monthly data including NA in r I missed with a near duplicate question and this suggested the package zoo, which seems to work really well for additive series
library(zoo)
c2=co2
c2[c2>330&c2<350]=NA
d=decompose(na.StructTS(c2))
plot(co2)
lines(d$x,col="red")
shows that the series is very well reconstructed through the missing period.
The output of deconstruct has the trend and seasonal cycle available. I wish I could transfer my bounty to user https://stackoverflow.com/users/516548/g-grothendieck for this helpful response. Thanks to user missuse too.
However, if the missing portion is at the end of the series, the software has to extrapolate the trend and has more difficulties. The original series (in black) maintains the trend, while the trend is smaller in the reconstructed series (red):
c2=co2
c2[c2>350]=NA
d=decompose(na.StructTS(c2))
plot(co2)
lines(d$x,col="red")
Lastly, if instead the missing portion is at the start of the series, the software is unable to extrapolate backwards in time and throws an error... I feel another SO question coming on...
c2=co2
c2[c2<330]=NA
d=decompose(na.StructTS(c2))
Error in StructTS(y) :
the first value of the time series must not be missing
You could just use some algorithm that fills the missing data before.
(e.g. from package imputeTS or zoo)
imputeTS for example has extra imputation algorithms for seasonal time series e.g.:
x <- na_seadec(co2)
Another good option for seasonal data:
x <- na_kalman(co2)
And now just go on without the missing data.
An important hint from Adrian Tompkins (see also comment below):
This will work best, when the missing data is somewhere in the middle. For a lot of leading NAs the method is no good choice. In this case it fills the NAs, but it is not able to extrapolate the trend backwards:
c2<-co2
c2[c2<330]<-NA
c3<-na_kalman(c2)
c4<-na_seadec(c2)
plot(co2)
lines(c3,col="blue")
lines(c4,col="red")

how I change the frame data into time series?

I have a daily rainfall data for 36 years. I want to analyze the time series, but my data is still in the form of frame data, how I change the frame data into time series. My data is a variable, how to unify the year number with the date and month, so the data is only in one column
You could use a time series package for that, such as fpp i.e. install.packages('fpp'). Since you don't give an example code, I can't really help you properly with it but it's quite easy.
ts(your_data, start =, frequency = ) At start = you put the year or month where you'd start and at frequency = you'd put e.g. 36 since you talk about 36 years.
You might want to check out https://robjhyndman.com/. He has an online (free) book available that walks you through the use of his package as well as providing useful information with respect to time series analysis.
Hope this helps.

Cutting Time/Date data at various intervals

I'm using R to analyze 365 days of data collected on over 40,000 events. The events occur at various times of the day. I wish to aggregate the events and calculate means at various intervals such as 2, 8, 12 hour or daily. I've seen CUT and AGGREGATE used but it does not appear to provide the intervals as required.
Any suggestions would be greatly appreciated.
To use the CUT function one must first define the break points. In order to do that, use the seq function.
mydateseq<-seq(as.POSIXct("2016-01-01"), by="2 hour", length.out = 20)
There are options to set the start/stop points or the number of elements. In this example the breaks are set every 2 hours but this is adjustable. See ?seq.POSIXt for more help. Be sure to set the start/stop to completely capture the date range of interest.
Once the date sequence is defined this can be passed to cut function to aggregate or use the group_by function in the dplyr package.

Getting Date to Add Correctly

I have a 3000 x 1000 matrix time series database going back 14 years that is updated every three months. I am forecasting out 9 months using this data still keeping a 3200 x 1100 matrix (mind you these are rough numbers).
During the forecasting process I need the variables Year and Month to be calculated appropriately . I am trying to automate the process so I don't have to mess with the code any more; I can just run the code every three months and upload the projections into our database.
Below is the code I am using right now. As I said above I do not want to have to look at the data or the code just run the code every three months. Right now everything else is working as planed, but I still have to ensure the dates are appropriately annotated. The foo variables are changed for privacy purposes due to the nature of their names.
projection <- rbind(projection, data.frame(foo=forbar, bar=barfoo,
+ Year=2012, Month=1:9,
+ Foo=as.vector(fc$mean)))
I'm not sure exactly where the year/months are coming from, but if you want to refer to the current date for those numbers, here is an option (using the wonderful package, lubridate):
library(lubridate)
today = Sys.Date()
projection <- rbind(projection, data.frame(foo=foobar, bar=barfoo,
year = year(today),
month = sapply(1:9,function(x) month(today+months(x))),
Foo = as.vector(fc$mean)))
I hope this is what you're looking for.

Resources