I have a time series data set which consists of 374 days of data points (1 data point for each day). I am struggling to understand the frequency parameter in ts function, so I left it empty:
ts_0615391206 <- ts(demand_rev_0615391206$estimated_demand,
start=as.Date(min(demand_rev_0615391206$date),format = "d%/m%/Y%"),
end=as.Date(max(demand_rev_0615391206$date),format = "d%/m%/Y%"),
#frequency = 1
)
plot.ts(ts_0615391206)
however, when I try to decompose using:
ts_0615391206_components <- decompose(ts_0615391206)
I am getting the error:
Error in decompose(ts_0615391206) :
time series has no or less than 2 periods
How do I decide how many period there are in my data and consequently what us the parameter "frequency" value should be?
> dput(head(ts_0615391206))
c(2.71, 2.47, 3.86, 3.61, 5.78, 5.59)
>
> str(ts_0615391206)
Time-Series [1:194] from 16125 to 16318: 2.71 2.47 3.86 3.61 5.78 5.59 3.28 3.4 3.34 3.68 ...
ts_06153912061 <- ts(ts_0615391206, start = c(1999,1), frequency = 365) for Daily
ts_06153912061 <- ts(ts_0615391206, start = c(1999,1), frequency = 52) for Weekly
ts_06153912061 <- ts(ts_0615391206, start = c(1999,1), frequency = 12) for Monthly
ts_06153912061 <- ts(ts_0615391206, start = c(1999,1), frequency = 4) for Quarterly
ts_06153912061 <- ts(ts_0615391206, start = c(1999,1), frequency = 1) for Yearly or Annually
Per the documentation ?ts:
...one could use a value of 7 for frequency when the data are sampled
daily, and the natural time period is a week, or 12 when the data are
sampled monthly and the natural time period is a year. Values of 4 and
12 are assumed in (e.g.) print methods to imply a quarterly and
monthly series respectively.
Try setting frequency = 7.
The decompose() function from the stats R package decomposes given time series into trend, seasonal component and reminder part. The seasonal component is a strictly periodical time series, and the period's length equals to the frequency of the time series. For example, if you set frequency = m, the decompose() function build decomposition which seasonal component will have period m.
The decompose() function works if m is integer, m > 1, and length of the time series is greater or equal to 2m.
The help page ?decompose notes that the time series should "covers an integer number of complete periods" in order the function works well. So it may be better if the series length is multiple of m.
There are no clear periodicity in your data. May be this discussion will be useful for you, because it contains Rob Hyndman's R script to reveal periodicity in a series.
If you have data on daily basis then frequency is 365 for 365 days for a year as it contains 365 entries for a year as definition of frequency said it.
when you try to decompose using:
ts_0615391206_components <- decompose(ts_0615391206)
You are getting the error:
Error in decompose(ts_0615391206) :
time series has no or less than 2 periods
Because it takes at least two time series i.e. two years data to train the model
otherwise it would throw an error
Related
I am looking into ambient air pollution within regions of NSW and conducting a daily time series decomposition analysis using Rbeast to investigate if there is a change point signature around the time of Covid-19 lockdowns.
I have created a looping code to analyse the data for each pollutant within each region - however the Beast X axis ("Date" - i.e. 01-01-2021 - ideally would plot years (2012-2022) is plotting strangely ( I.e. Time = 16000, 17000, 18000 etc.?).
Anyone know how to fix this?
beast_output = list()
target_pollutants = c("PM10", "OZONE", "NO", "NO2")
target_sites = c("WOLLONGONG", "MUSWELLBROOK", "SINGLETON", "CAMBERWELL", "WAGGAWAGGANORTH", "RICHMOND", "CAMDEN", "CHULLORA", "EARLWOOD", "WALLSEND", "BERESFIELD", "BARGO", "BRINGELLY", "PROSPECT", "STMARYS", "OAKDALE", "RANDWICK", "ROZELLE", "NEWCASTLE", "KEMBLAGRANGE", "ALBIONPARKSOUTH")
for (poll in target_pollutants) {
beast_output[[poll]] = list()
df = time_by_poll[[poll]] # grab the target df
sites = colnames(df)
sites$Date = NULL # clear date from the list
for (site in sites) {
ts = ts(df[[site]], start=min(df$Date), end=max(df$Date))
beast_results = beast(ts)
# plot(beastie_resulty)
beast_output[[poll]][[site]] = beast_results
}
}
plot (beast_results[["OZONE"]][["RANDWICK"]])
Thanks for asking and sorry about the issue. Indeed, the API interface in Rbeast is kinda confusing because it was originally coded to handle satellite time series.
Regardless, the BEAST model in the package was formulated only for regular time series.(By regular, I mean equally-spaced time series with the same number of data points per period.) Because leap years have 366 days but others have 356 days, daily time series are treated in BEAST as irregular time series if the periodicity is one year. However, if the periodic variation is weekly/7 days, daily time series are considered as regular. In order to handle irregular time series, I implemented the beast.irreg function which accepts irregular inputs and aggregate them into regular time series before doing the decomposition and changepoint detection.
To illustrate, I got a sample PM10 dataset for several regions (e.g., WOLLONGONG", and "MUSWELLBROOK") from this site https://www.dpie.nsw.gov.au/air-quality/air-quality-data-services/data-download-facility, and I posted the CSV file (as well as another dataset on ozone) under https://github.com/zhaokg/Rbeast/tree/master/R/SampleData. You can directly read the files from R as shown below:
df = read.csv('https://github.com/zhaokg/Rbeast/raw/master/R/SampleData/pm10_1658112168.csv',header=FALSE,skip=3)
dates = as.Date(df[,1], "%d/%m/%Y") # the 1st col is dates
Y = df[,2:ncol(df)] # the rest are PM10 data for the several sample sites
# e.g., Y[,1] for the first region (WOLLONGONG)
library(Rbeast)
o = beast.irreg(log(Y[,1]),time=dates,deltat=1/12, freq=12, tseg.min=3, sseg.min=6)
# log(Y[,1]) : Log-transformation may help if data is skewed bcz the BEAST model
assumes Gaussian errors;
# time=dates : Use the 'time' arg to supply the times of individual data points.
Alternatively, the `beast123' function also handles date strings of different formats
# deltat=1/12: Aggregate the daily time series into a regular one at the interval
of 1 month=1/12 year
# freq=12 : The period is 1 year, which is 12 data points (1.0/deltat=12)
# tseg.min: The minimum trend segment length allowed in the changepoint detection is 3 data points (3 months)
-- the results MAY be sensitive to this parameter
# sseg.min: The minimum seasonal segment length allowed in the changepoint detection is 6 data points (6 months)
-- the results MAY be sensitive to this parameter
plot(o)
# For sure, the daily time series can be re-sampled/aggregated to a different time interval
# Below, it is aggregated into a half monthly time series (dT=1/24 year), and the number
# of data points per period is freq=1.0 year/dT=24
o = beast.irreg( log(Y[,1]),time=dates, deltat = 1/24, freq=24, tseg.min=12, sseg.min=12)
plot(o)
# Aggregated to a weekly time series (e.g., dT=7 / 365 year: the unit again is year),
# and freq=1 year/ dT = 365/7.
# tcp.minmax=c(0,10) : the min and max numbers of changepoints allowed in the trend component
o = beast.irreg( log(Y[,1]),time=dates,deltat=7/365,freq=365/7, tcp.minmax=c(0,15),tseg.min=5, sseg.min=20,ocp=10)
plot(o)
# Finally if you want to run on the daily interval. Specify the dT=deltat=1/365 year, and
# freq = period/dT= 1.0 year/(1/365)year =365. Bcz the raw data is daily,
# the majority of the raw data is kept intact during the aggregation except when
# there is a leap year (the last two days of the leap year are merged into a single day)
o = beast.irreg( log(Y[,1]),time=dates, deltat = 1/365, freq=365/1, tseg.min=30, sseg.min=180)
plot(o)
By default, a time series is decomposed as Y= season + trend + error, but for your dataset in the original scale (e.g., not log-tranformed), there could be some spikes. One way to model this is to add an extra spike/outlier component: Y=season+trend+outlier/spike-like+error
# Use 'ocp' to specify the maximum number of spikes (either upward or downward) to be allowed in the outlier/spike component
o = beast.irreg(Y[,1],time=dates, deltat = 1/365, freq=365/1, tseg.min=30, sseg.min=180, ocp=10)
plot(o)
Below is an example for one time series analyzed at the weekly interval (Again, the exact results vary, depending on the choices of tseg.min or sseg.min).
More important, another issue I noticed from your figure is that your data seem to have lots of missing values, which should be assigned NA but instead assigned zeros in your figure. If that is the case, the analysis result for certain would be wrong. BEAST can handle missing data and these missing values should be given NA or NAN (e.g., Y[Y==0]=NA).
I am looking to forecast my time series. I have the following period daily data 2021-Jan-1 to 2022-Jul-1.
So I have a column of observations for each day.
what I tried so far:
d1=zoo(data, seq(from = as.Date("2021-01-01"), to = as.Date("2022-07-01"), by = 1))
tsdata <- ts(d1, frequency = 365)
ddata <- decompose(tsdata, "multiplicative")
I get following error here:
Error in decompose(tsdata, "multiplicative") :
time series has no or less than 2 periods
From what i have read it seems like because I do not have two full years? is that correct? I have tried doing it weekly as well:
series <- ts(data, frequency = 52, start = c(2021, 1))
getting the same issue.
How do I go about it without having to extend my dataset to two years since I do not have that, and still being able to decompose it?
Plus when I am actually trying to forecast it, it isn't giving me good enough forecast:
Plot with forecast
My data somewhat resembles a bell curve during that period. so is there a better fitting timeseries model I can apply instead?
A weekly frequency for daily data should have frequency = 7, not 52. It's possible that this fix to your code will produce a model with a seasonal term.
I don't think you'll be able to produce a time series model with annual seasonality with less than 2 years of data.
You can either produce a model with only weekly seasonality (I expect this is what most folks would recommend), or if you truly believe in the annual seasonal pattern exhibited in your data, your "forecast" can be a seasonal naive forecast that is simply last year's value for that particular day. I wouldn't recommend this, because it just seems risky, and I don't really see the same trajectory in your screenshot over 2022 that's apparent in 2021.
decompose requires two full cycles and that a full cycle represent 1 time unit. ts class can't use Date class anyways. To use frequency 7 we must use times 1/7th apart such as 1, 1+1/7, 1+2/7, etc. so that 1 cycle (7 days) covers 1 unit. Then just label the plot appropriately rather than using those times on the X axis. In the code below use %Y in place of %y if the years start in 19?? and end in 20?? so that tapply maintains the order.
# test data
set.seed(123)
s <- seq(from = as.Date("2021-01-01"), to = as.Date("2022-07-01"), by = 1)
data <- rnorm(length(s))
tsdata <- ts(data, freq = 7)
ddata <- decompose(tsdata, "multiplicative")
plot(ddata, xaxt = "n")
m <- tapply(time(tsdata), format(s, "%y/%m"), head, 1)
axis(1, m, names(m))
I am studying Time Series with R.
Using the stl() command I get the error
Error in stl(avatar_ts, s.window = "periodic") : series is not
periodic or has less than two periods
Below is my code.
avatar_ts = ts(avatar_data$Gross, start = c(2009,12), frequency=365)
st_decompose_method = stl(avatar_ts, s.window="periodic")
If I change the frequency to 12 or 50, the stl() functions runs.
However my dataset has daily observations from 2009-12-18 till 2010-11-18; a total 318 observation.
(Basically it is daily data but detected a few missing weeks since it is daily movie profit dataset)
How can I use the stl() function with frequency of 365 for the time series data?
The unhelpful answer here is that you can use stl() on daily data if you have at least two years of observations.
You define the s.window as "periodic" but since you have less than one year of data which is insufficient.
See the example with simulated data below.
First, with your number of observations:
x <-rnorm(318)
d <- ts(x, start = c(2009,12), frequency = 365)
st_decompose_method = stl(d, s.window="periodic")
This will give the error:
Error in stl(d, s.window = "periodic") :
series is not periodic or has less than two periods
Now with at least two years of data, so minimum of two periods:
x <-rnorm(731)
d <- ts(x, start = c(2009,12), frequency = 365)
st_decompose_method = stl(d, s.window="periodic")
Which will work.
Note by the way that your ts specification is incorrect as you don't provide the start day.
To do this you could do the following:
days <- seq(as.Date("2009-12-18"), as.Date("2010-11-18"), by = "day")
avatar_ts = ts(avatar_data$Gross,
start = c(2009, as.numeric(format(days[1], "%j"))),
frequency=365)
What does this mean? My timeSeries has a frequency of 365, doesn't it? What I'm trying to do is make 3 years of daily forecasts, one day at a time. To put it differently, I'd like to get a forecast for the next day, 365*3 times.
library(forecast)
df = read.csv("./files/all_var_df.csv")
ts = as.timeSeries(df[, c(1, 2)])
train = as.timeSeries(ts[0:3285, ])
validation = ts[3285:4380]
fit_hw <- hw(train)
fit2_hw <- hw(validation, model=fit_hw)
onestep_hw <- fitted(fit2_hw)
Error in hw(train): The time series should
have frequency greater than 1.
Here is some info that might help you answer:
class(train)
> [1] "timeSeries"
head(train, 3)
> 2005-01-01 101634.4
> 2005-01-02 106812.5
> 2005-01-03 119502.8
length(train)
> [1] 3285
Without actually seeing your data I can only speculate. I can, however, recreate this problem using available datasets in R. In the library(fpp2) in R, the dataset ausair contains "Total annual air passengers (in millions) including domestic and international aircraft passengers of air carriers registered in Australia. 1970-2016."
Reading in this dataset as a ts (air <- window(ausair, start = 1990)), we get the following:
Time Series:
Start = 1990
End = 2016
Frequency = 1
[1] 17.55340 21.86010 23.88660 26.92930 26.88850 28.83140 30.07510 30.95350
[9] 30.18570 31.57970 32.57757 33.47740 39.02158 41.38643 41.59655 44.65732
[17] 46.95177 48.72884 51.48843 50.02697 60.64091 63.36031 66.35527 68.19795
[25] 68.12324 69.77935 72.59770
I will now use the hw() function to train with:
fc <- hw(air, seasonal = "additive")
This gives the following error:
Error in hw(air, seasonal = "additive") :
The time series should have frequency greater than 1.
What has happened here is that each datapoint corresponds to a whole year. So the Holt-Winters method is unable to find seasonality. The seasonal portion of the HW method follows the following equation:
Where the term represents the seasonality, and m the frequency. It doesn't make sense to talk about a repetitive seasonal pattern if there is only 1 data point in a time period.
The fix to your problem is in the way you define your time series object with ts(). One argument of the time series function is frequency. Without seeing your data I can't say what that value should be set to. Here is a site explaining the term frequency. It will be dependent on what seasonality your data exhibits. Does it repeat a seasonal pattern every week? Every quarter? If there is no seasonal pattern, you can switch to the function holt() which only uses an exponential decay term and trend term to find a pattern and will not give your error.
I'm trying to decompose my data to see what the trend and seasonality effects are. I have 4 months of data, recorded daily. Data looks like:
date amount
11/1/2000 1700
11/2/2000 11087
11/3/2000 11248
11/4/2000 13336
11/5/2000 18815
11/6/2000 8820
11/7/2000 7687
11/8/2000 5514
11/9/2000 9591
11/10/2000 9676
11/11/2000 14782
11/12/2000 18554
And so forth to the end of Feb 2001. I read in the data like so and generate a timeseries object:
myvector <- read.table("clipboard", sep="\t", header=T)
myts <- ts(myvector$amount, start=c(2000,11), frequency=52)
I'm very confused as to how to read this data in as a time series object. The data is recorded daily, but if I use frequency=365, then try
fit <- stl(myts2, s.window="periodic")
I get:
Error in stl(myts2, s.window = "periodic") :
series is not periodic or has less than two periods
Every example I find does the object casting with multiple years worth of data. Is this not possible in my case?
I know the next steps for plotting the trend and decomposition are:
fit <- stl(myts, s.window="periodic")
plot(fit)
Try seasonal differencing, which is similar to regular differencing except is applied over different periods:
An example:
data(austres)
plot(austres)
seasonal <- diff(austres, lag = 12, differences = 1)
plot(seasonal)
d.seasonal <- diff(seasonal, differences = 2)
plot(d.seasonal)
Now you've made stationary the seasonal component of the time series.