Rolling forecast origin cross-validation in R? - r

Probably a dumb post but here goes:
So as someone who has done some econometricks and ML like random forests and XGBoosts I always make sure to use either a k-fold cross validation or/and a train/test set approach (using caret), but I have a question about implementing rolling forecast origin in CV in forecasting models using the ets() function (and arima).
In my textbook it used the tsCV function when showing some basic forecasts like seasonal naive, but when moving to ets models using ets(), he just used the function with model parameters such as "AAdM", then used the forecast() to make a forecast for 8 periods. I did not see any splitting of the data.
Does the ets() function do this automatically, or has the example probably been constructed to just show the elements of the ets model?
If it's the latter, then I should split the data using rsample. But then the question is, how do I for instance implement rolling CV when forecasting say the next 12 periods? A simple train/test split is easy, but I am not familiar with rolling forecasting origins and the book only briefly mentioned anything about cross validation.

Related

When predicting using R ARIMA object, how to declare the time series' history?

Suppose I fit AR(p) model using R arima function from stats package. I fit it using a sample x_1,...,x_n. In theory, when predicting x_{n+1} using this model, it needs an access x_n,...x_{n-p}.
How does the model know which observation I want to predict? What if I wanted to actually predict x_n based on x_{n-1},...,x_{n-p-1} and how my code would differ in this case? Can I make in-sample forecasts, similar to Python's functionality?
If my questions imply that I think about forecasting in a wrong way, please kindly correct my understanding of the subject.

Ensemble machine learning model with NNETAR and BRNN

I used the forecast package to forecast the daily time-series of variable Y using its lag values and a time series of an external parameter X. I found nnetar model (a NARX model) was the best in terms of overall performance. However, I was not able to get the prediction of peaks of the time series well despite my various attempts with parameter tuning.
I then extracted the peak values (above a threshold) of Y (and of course this is not a regular time series anymore) and corresponding X values and tried to fit a regression model (note: not an autoregression model) using various models in carat package. I found out the prediction of peak values using brnn(Bidirectional recurrent neural networks) model just using X values is better than that of nnetar which uses both lag values and X values.
Now my question is how do I go from here to create ensamples of these two models (i.e whenever the prediction using brnn regression model ( or any other regression model) is better I want to replace the prediction using nnetar and move forward - I am mostly concerned about the peaks)? Is this a commonly used approach?
Instead of trying to pick one model that would be the superior at anytime, it's typically better to do an average of the models, in order to include as many individual views as possible.
In the experiments I've been involved in, where we tried to pick one model that would outperform, based on historical performance, it's typically shown that a simple average was as good or better. Which is in line with the typical results on this problem: https://otexts.com/fpp2/combinations.html
So, before you try to go more advanced at it by using trying to pick a specific model based on previous performance, or by using an weighted average, consider doing a simple average of the two models.
If you want to continue with a sort of selection/weighted averaging, try to have a look at the FFORMA package in R: https://github.com/pmontman/fforma
I've not tried the specific package (yet), but have seen promising results in my test using the original m4metalearning package.

which function should I use to estimate a specific ARIMA model in R?

I have a mydata.ts which is around 200 rows. I used stationary tests, took differences and examined ACF and PACF. So I decided to try ARIMA(1,1,1)(0,1,1) for instance.
Which R function should I use to find fitted values and forecasts? Arima, arima or auto.arima?
And can I trust the MAPE, MAD and other error results on summary(model)? Because I read an answer and it was saying the results are not the real but approximated or something.
auto.arima will find the whole model specification that is the 'best' based on AIC, BIC.
IF you know the order (1,1,1) or (0,1,1) then use Arima from forecast package(same as arima, but little more general)
Arima(your_data, order=c(1,1,1)) will give the basic answer.
Seee the documentation for forecast.
then actual out-of-sample forecast can be done with the forecast function.

Simulating a basic sarima model in R

I'm looking for something like
arima.sim()
but for sarima models. I've looked at simulate.Arima() in the forecast package, but it seems to require an input dataset parsed by Arima(), which I don't want to do. I also looked at the gsarima library, but it seems to be only able to simulate seasonal AR models. Is there any way to simulate a sarima model if you only want to provide the following information:
The values of all pure ARMA and seasonal ARMA terms for different lags.
The number of differences for both the seasonal and non-seasonal integrated terms.
The length of a season.
The number of terms I want to simulate.

On the issue of automatic time series fitting using R

we have to fit about 2000 or odd time series every month,
they have very idiosyncratic behavior in particular, some are arma/arima, some are ewma, some are arch/garch with or without seasonality and/or trend (only thing in common is the time series aspect).
one can in theory build ensemble model with aic or bic criterion to choose the best fit model but is the community aware of any library which attempts to solve this problem?
Google made me aware of the below one by Rob J Hyndman
link
but are they any other alternatives?
There are two automatic methods in the forecast package: auto.arima() which will handle automatic modelling using ARIMA models, and ets() which will automatically select the best model from the exponential smoothing family (including trend and seasonality where appropriate). The AIC is used in both cases for model selection. Neither handles ARCH/GARCH models though. The package is described in some detail in this JSS article: http://www.jstatsoft.org/v27/i03
Further to your question:
When will it be possible to use
forecast package functions, especially
ets function, with high dimensional
data(weekly data, for example)?
Probably early next year. The paper is written (see robjhyndman.com/working-papers/complex-seasonality) and we are working on the code now.
Thanks useRs, I have tried the forecast package, that too as a composite of arima and ets, but not to much acclaim from aic or bic(sbc), so i am now tempted to treat each of the time series to its own svm(support vector machine) because of its better genralization adaptability and also being able to add other variables apart from lags and non linear kernel functions
Any premonitions?

Resources