Forecasting time series with R forecast package

Forecasting time series with R forecast package - r

I'm relatively new to R programming, but I've been reading your blogs and posts in order to get up-to-date with the forecast package. However, I have been struggling with the effect of seasonality.
Take for example the simplest signal possible:
train <- ts(sin((2*pi)*seq(from=0, to=10, by=0.01)))
If I just try to forecast this signal with brute force, I get irrelevant results:
plot(forecast(train,h=20))
However, if I manually detect the seasonality as 100, and do the following:
train <- ts(sin((2*pi)*seq(from=0, to=10, by=0.01)),frequency=100)
plot(forecast(train))
I get excellent forecasting results.
I'm honestly very puzzled by these results, which obviously happen for more complex signals.

If I remember correctly, when you create the time series object you have to specify its frequency. That way, the forecast method will be able to detect the seasonal pattern. There are some other ways to detect seasonality, like trying the auto arima function and checking if it selects a seasonal model. Apart from visual exploration, of course.

Related

Multivariate ARIMA (MARIMA) modelling in R

I am currently using the Marima package for R invented by Henrik Spliid in order to forecast multivariate time series with ARIMA.
Overview can be found here:
https://cran.r-project.org/web/packages/marima/marima.pdf
http://orbit.dtu.dk/files/123996117/marima.anv.talk.pdf
When using the Marima function, it is required to define both the order of AR(p) and MA(q) first.
My question is, how can I determine appropriate values for p and q?
I know when it comes to univariate ARIMA analysis, that auto.arima gives a good suggestion for p and q. However, when I use auto.arima for every single univariate time series I want to analyze, there are (slightly) different suggestions for each time series. (For example (2,2,1) for the first, (1,1,1) for the second and so on)
Since I want to analyze all of the time series combined in the multivariate ARIMA model and I only can choose one value for each p and q (if I understood it correctly), I wonder how I can choose those values the most accurate way.
Could I just try to run the model a couple times and see what values for p and q work best (e.g. by testing the residuals of the forecast)?
What are your suggestions?
I would appreciate any help!

Arima.sim issues in R

I am working on making a prediction in R using time-series models.
I used the auto.arima function to find a model for my dataset (which is a ts object).
fit<-auto.arima(data)
I can then plot the results of the prediction for the 20 following dates using the forecast function:
plot(forecast(fit,h=20))
However I would like to add external variables and I cannot do it using forecast because it is kind of a black box to me as I am new to R.
So I tried to mimic it by using the arima.sim function and a problem arose:
HOW TO INITIALIZE THIS FUNCTION ?
I got the model by setting model=as.list(coef(fit)) but the other parameters are still obscure to me.
I went through hundreds of page including in stackoverflow but nobody seems to really know what is going on.
How is it calculated ? Like why does n.start (the burn-in period) must have ma+ar length and not only a max(ar,ma) length ? What is exactly start.innov?
I thought I understood when there is only an AR part but I cannot reproduce the results with an AR+MA filter.
My understanding as for the AR is concerned is that start.innov represent the errors between a filtered zero-signal and the true signal, is it true ?
Like if you want to have an ar of order 2 with initial conditions (a1,a2) you need to set
start.innov[1]=a1-ar1*0-ar2*0=a1
start.innov[2]=a2-ar1*start.innov[1]
and innov to rep(0,20) but what to do when facing an arima function how do you set the innov to get exactly the same curbs as forecast does ?
thanks for your help !!!

You seem to be confused between modelling and simulation. You are also wrong about auto.arima().
auto.arima() does allow exogenous variables via the xreg argument. Read the help file. You can include the exogenous variables for future periods using forecast.Arima(). Again, read the help file.
It is not clear at all why you are referring to arima.sim() here. It is for simulating ARIMA processes, not for modelling or forecasting.

fourier() vs fourierf() function in R

I'm using the fourier() and fourierf() functions in Ron Hyndman's excellent forecast package in R. Looking to verify whether the same terms are selected and used in fourier() and fourierf(), I plotted a few of the output terms.
Below is the original data using ts.plot(data). There's a frequency of 364 in the time series, FYI.
Below is the plot of the terms using fourier(data,3). Basically, it looks like mirror images of the existing data.
Looking at just the sin1 term of the output, again, we get some variation that shows similar 364-day seasonality in line with the data above.
However, when I plot the results of the Fourier forecast using fourierf(data,3, 410) I see the below data. It appears far more smooth than the terms provided by the original fourier function.
So, I wonder how the results of fourier() and fourierf() are related. Is it possible to just see one consolidated Fourier result, so that you can see the sin or cosine result moving through existing data and then through the forecasting period? If not, how can I confirm that the terms created by fourierf() fit the in-sample data?
I want to use it in an auto.arima or glm function with other external regressors like this:
trainFourier<-fourier(data,3)
trainFourier<-as.data.frame(trainFourier)
trainFourier$exogenous<-exogenousData
arima.object<-auto.arima(data, xreg=trainFourier)
futureFourier<-fourierf(data,3, 410)
fourierForecast<-forecast(arima.object, xreg=futureFourier, h=410)
and want to be completely sure that the auto.arima has the proper fitting (using the terms from fourier()) to what I'll put in under xreg for forecast (which has terms from a different function, i.e. ffourier()).

Figured out the problem. I was using both the fda and forecast packages. fda, which is for functional data analysis and regression, has its own fourier() function. If I detach fda, my S1 term from fourier(data,3) looks like this:
which lines up nicely with the Fourier forecast if I use ts.plot(c(trainFourier$S1,futureFourier$S1))
Moral of the story -- watch what your packages supress, folks!

"auto.arima" in SAS?

I used to run arima model in R using "auto.arima" to identify the best arima model that fits the data. Even without it, it's easy in R to write a function to perform similar task. However, I have googled for the past few days, and I can't find a similar procedure in SAS. Does anyone know if there is a "auto.arima" in SAS? Or do I have to write one by myself? Thank you!
Edit:
After days of searching online, the closest one that I can find is Automatic Model Selection in time series forecasting package. However, that function is the one using GUI, and still one has to manually select all the different models to test. Does anyone know a command line procedure or package to do this? Thank you.

SAS has proc arima which is part of the SAS/ETS module (licensed seperately). You can use either the Enterprise Guide proc arima node for a GUI interface to it, or you can use Solutions->Analysis->Time Series Analysis for a base SAS interface. The base sas interface is what I usually use, it has the advantage of comparing many models other than just arima for a fit.
To check to see if you have the correct license run the following code:
proc setinit;
run;
You should see something like this in the results if you have it licensed:
---SAS/ETS (01 JAN 2020)

SAS HpF for high performance forecasting is the best in market for time series forecasting nothing can beat its accuracy when u are trying to generate forecast for multiple products ...
Proc hpfdiagnose followed by proc hpfengine you will hate auto.arima after using this

You might want to give PROC FORECAST a try.
I'm working on a similar problem where I have about 6,000 separate time series to forecast so modeling each one individually is out of the question. You can specify a BY variable in PROC FORECAST that lets you forecast many series at once pretty quickly (it ran my moderately large dataset in less than 3 seconds). And if you choose the STEPAR method, it will fit the best autoregressive model it can find to your data.
Here's a good overview of the FORECAST procedure: http://www.okstate.edu/sas/v8/saspdf/ets/chap12.pdf
Still not as awesome as auto.arima in R, but gets the job done.
Good luck!

SAS has high performance forecasting procedures (PROC HPFDIAGNOSE+PROC HPFENGINE), which not only selects the best ARIMA model, but can also select the best among ARIMA, ESM, UCM, IDM, combination models, and external models, etc. You can either let it automatically picks the best based on default selection criterion, or customize the selection criterion. There is a procedure family to customize everything: PROC HPFDIAGNOSE, PROC HPFENGINE, PROC ARIMASPEC, etc. If you want to do more flexible time series analysis plus coding, you can also use PROC TIMEDATA with all the built-in time series packages, which allows you to program whatever you want and also do all the automatic modeling.
Like being mentioned above, it is the best in market for time series forecasting, and nothing can beat its accuracy when you are trying to generate forecasts for multiple series. However, it usually licensed with SAS Forecast Server or SAS Forecast Studio, which are enterprised forecasting solutions with GUI. It's understandable since other forecasting solutions built on R and Python which can handle automatic
parallelization and automatic forecasting also charge money.
For the cloud computing version, there is also PROC TSMODEL and Visual Forecasting version, which has both forecast accuracy and computation performance advantages. However, it is also for enterprise use and pricey. Afterall, it is targeted to markets that require forecasting for thousands or millions of time series.
For free versions, maybe the closest one would be PROC FORECAST.

On the issue of automatic time series fitting using R

we have to fit about 2000 or odd time series every month,
they have very idiosyncratic behavior in particular, some are arma/arima, some are ewma, some are arch/garch with or without seasonality and/or trend (only thing in common is the time series aspect).
one can in theory build ensemble model with aic or bic criterion to choose the best fit model but is the community aware of any library which attempts to solve this problem?
Google made me aware of the below one by Rob J Hyndman
link
but are they any other alternatives?

There are two automatic methods in the forecast package: auto.arima() which will handle automatic modelling using ARIMA models, and ets() which will automatically select the best model from the exponential smoothing family (including trend and seasonality where appropriate). The AIC is used in both cases for model selection. Neither handles ARCH/GARCH models though. The package is described in some detail in this JSS article: http://www.jstatsoft.org/v27/i03
Further to your question:
When will it be possible to use
forecast package functions, especially
ets function, with high dimensional
data(weekly data, for example)?
Probably early next year. The paper is written (see robjhyndman.com/working-papers/complex-seasonality) and we are working on the code now.

Thanks useRs, I have tried the forecast package, that too as a composite of arima and ets, but not to much acclaim from aic or bic(sbc), so i am now tempted to treat each of the time series to its own svm(support vector machine) because of its better genralization adaptability and also being able to add other variables apart from lags and non linear kernel functions
Any premonitions?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex