"auto.arima" in SAS? - r

I used to run arima model in R using "auto.arima" to identify the best arima model that fits the data. Even without it, it's easy in R to write a function to perform similar task. However, I have googled for the past few days, and I can't find a similar procedure in SAS. Does anyone know if there is a "auto.arima" in SAS? Or do I have to write one by myself? Thank you!
Edit:
After days of searching online, the closest one that I can find is Automatic Model Selection in time series forecasting package. However, that function is the one using GUI, and still one has to manually select all the different models to test. Does anyone know a command line procedure or package to do this? Thank you.

SAS has proc arima which is part of the SAS/ETS module (licensed seperately). You can use either the Enterprise Guide proc arima node for a GUI interface to it, or you can use Solutions->Analysis->Time Series Analysis for a base SAS interface. The base sas interface is what I usually use, it has the advantage of comparing many models other than just arima for a fit.
To check to see if you have the correct license run the following code:
proc setinit;
run;
You should see something like this in the results if you have it licensed:
---SAS/ETS (01 JAN 2020)

SAS HpF for high performance forecasting is the best in market for time series forecasting nothing can beat its accuracy when u are trying to generate forecast for multiple products ...
Proc hpfdiagnose followed by proc hpfengine you will hate auto.arima after using this

You might want to give PROC FORECAST a try.
I'm working on a similar problem where I have about 6,000 separate time series to forecast so modeling each one individually is out of the question. You can specify a BY variable in PROC FORECAST that lets you forecast many series at once pretty quickly (it ran my moderately large dataset in less than 3 seconds). And if you choose the STEPAR method, it will fit the best autoregressive model it can find to your data.
Here's a good overview of the FORECAST procedure: http://www.okstate.edu/sas/v8/saspdf/ets/chap12.pdf
Still not as awesome as auto.arima in R, but gets the job done.
Good luck!

SAS has high performance forecasting procedures (PROC HPFDIAGNOSE+PROC HPFENGINE), which not only selects the best ARIMA model, but can also select the best among ARIMA, ESM, UCM, IDM, combination models, and external models, etc. You can either let it automatically picks the best based on default selection criterion, or customize the selection criterion. There is a procedure family to customize everything: PROC HPFDIAGNOSE, PROC HPFENGINE, PROC ARIMASPEC, etc. If you want to do more flexible time series analysis plus coding, you can also use PROC TIMEDATA with all the built-in time series packages, which allows you to program whatever you want and also do all the automatic modeling.
Like being mentioned above, it is the best in market for time series forecasting, and nothing can beat its accuracy when you are trying to generate forecasts for multiple series. However, it usually licensed with SAS Forecast Server or SAS Forecast Studio, which are enterprised forecasting solutions with GUI. It's understandable since other forecasting solutions built on R and Python which can handle automatic
parallelization and automatic forecasting also charge money.
For the cloud computing version, there is also PROC TSMODEL and Visual Forecasting version, which has both forecast accuracy and computation performance advantages. However, it is also for enterprise use and pricey. Afterall, it is targeted to markets that require forecasting for thousands or millions of time series.
For free versions, maybe the closest one would be PROC FORECAST.

Related

How do I deal with monthly time series data of 3900+ regions at once

I am working on a time series model and I am new to this. I have just started learning time series analysis and forecasting.
I know how to deal with monthly data.
But I have a bigger and huge data that I need to solve.
It has monthly time series data for 3900+ regions.
I want to predict the values for next 12 months using R.
My data looks something like this : https://drive.google.com/file/d/10QvtS55NQ1kIXxeccWxXl0SqqyqYXyoh/view?usp=sharing
I know how to do this for 1 region using ARIMA model but don't know how to handle this big data.
Thanks in advance!
as you are new to the topic, I recommend you to take a look at the approach of using global models like xgboost or glmnet instead.
You will fail to produce scalable results with the "forecast" package or similar local time series approaches using ARIMA, ETS, Prophet and so on.
When your models are complex enough to produce accurate forecasts, they will take a lot of time to compute. For example a model prediction with fully tuned local models took about 5 hours for 100 time series (5 years of train, one year of test) to complete. With global models it is a matter of just 3 minutes.
As I am using it myself, I may recommend the modeltime framework which makes use of the tidymodels stack.

Can MXNET fit a regression LSTM model in R?

I would like to fit an LSTM model using MXNET in R for the purpose of predicting a continuous response (i.e., regression) given several continuous predictors. However, the mx.lstm() function seems to be geared toward NLP as it requires arguments which don't seem applicable to a regression problem (such as those related to embedding).
Is MXNET capable of this sort of modeling and, if not, what is an example of an appropriate tool (preferably in R)? Are there any tutorials relevant to the problem I've described?
LSTM is used for working with temporal data: text, speech, time series. If you want to predict a continuous response, then I assume you want to do something similar to time series analysis.
If my assumption is correct, then, please, take a look here. It gives quite a good example on how to use MxNet with R for time series on CPU. The GPU version is also available here.

fourier() vs fourierf() function in R

I'm using the fourier() and fourierf() functions in Ron Hyndman's excellent forecast package in R. Looking to verify whether the same terms are selected and used in fourier() and fourierf(), I plotted a few of the output terms.
Below is the original data using ts.plot(data). There's a frequency of 364 in the time series, FYI.
Below is the plot of the terms using fourier(data,3). Basically, it looks like mirror images of the existing data.
Looking at just the sin1 term of the output, again, we get some variation that shows similar 364-day seasonality in line with the data above.
However, when I plot the results of the Fourier forecast using fourierf(data,3, 410) I see the below data. It appears far more smooth than the terms provided by the original fourier function.
So, I wonder how the results of fourier() and fourierf() are related. Is it possible to just see one consolidated Fourier result, so that you can see the sin or cosine result moving through existing data and then through the forecasting period? If not, how can I confirm that the terms created by fourierf() fit the in-sample data?
I want to use it in an auto.arima or glm function with other external regressors like this:
trainFourier<-fourier(data,3)
trainFourier<-as.data.frame(trainFourier)
trainFourier$exogenous<-exogenousData
arima.object<-auto.arima(data, xreg=trainFourier)
futureFourier<-fourierf(data,3, 410)
fourierForecast<-forecast(arima.object, xreg=futureFourier, h=410)
and want to be completely sure that the auto.arima has the proper fitting (using the terms from fourier()) to what I'll put in under xreg for forecast (which has terms from a different function, i.e. ffourier()).
Figured out the problem. I was using both the fda and forecast packages. fda, which is for functional data analysis and regression, has its own fourier() function. If I detach fda, my S1 term from fourier(data,3) looks like this:
which lines up nicely with the Fourier forecast if I use ts.plot(c(trainFourier$S1,futureFourier$S1))
Moral of the story -- watch what your packages supress, folks!

Best practices for efficient multiple time series analysis

I have a large number of time series (>100) which differ in the sampling frequency and the time period for which they are available. Each time series has to be tested for unit roots and seasonally adjusted and other preliminary data transformations and checking etc.
As a large number of series have to be routinely checked, what is the solution to do it efficiently? The concern is to save time in the routine aspects and keep track of the series and analysis results. Unit root testing of the series for example is something subjective. How much of this type of analysis can be automated and how?
I have already read the questions regarding the statistical workflow which suggests having a common script to run on each series.
I am asking something more specific and based on experience of handling a multiple time series dataset. The focus is more on minimizing errors while dealing with so many series and also automating repetitive tasks.
I assume the series will be examined independently, as you've not mentioned any inter-relationships in the models. I'm not sure what kind of object you're looking to use or which tests, but the basic goal of "best practices" is independent of the actual package to be used.
The simplest approaches involve loading objects into a list and analyzing each series via simple iterators such as lapply or via multicore methods such as mclapply or foreach, in R. For Matlab, you can operate over cell arrays. The parallel computing toolbox has a function called parfor, for "parallel for", which is similar to the foreach function in R. For my money, I'd recommend using R as it's cheaper (free) and has a much richer functionality for statistical analyses. Matlab has better documentation and help tools, but these tend to matter less over time as you become more familiar with the tools and methods of your research (and as your bookshelf of references grows).
It's good to become accustomed to using multicore tools in general, as this can substantially decrease the time it takes to do analyses on a bunch of independent small objects.

On the issue of automatic time series fitting using R

we have to fit about 2000 or odd time series every month,
they have very idiosyncratic behavior in particular, some are arma/arima, some are ewma, some are arch/garch with or without seasonality and/or trend (only thing in common is the time series aspect).
one can in theory build ensemble model with aic or bic criterion to choose the best fit model but is the community aware of any library which attempts to solve this problem?
Google made me aware of the below one by Rob J Hyndman
link
but are they any other alternatives?
There are two automatic methods in the forecast package: auto.arima() which will handle automatic modelling using ARIMA models, and ets() which will automatically select the best model from the exponential smoothing family (including trend and seasonality where appropriate). The AIC is used in both cases for model selection. Neither handles ARCH/GARCH models though. The package is described in some detail in this JSS article: http://www.jstatsoft.org/v27/i03
Further to your question:
When will it be possible to use
forecast package functions, especially
ets function, with high dimensional
data(weekly data, for example)?
Probably early next year. The paper is written (see robjhyndman.com/working-papers/complex-seasonality) and we are working on the code now.
Thanks useRs, I have tried the forecast package, that too as a composite of arima and ets, but not to much acclaim from aic or bic(sbc), so i am now tempted to treat each of the time series to its own svm(support vector machine) because of its better genralization adaptability and also being able to add other variables apart from lags and non linear kernel functions
Any premonitions?

Resources