Multivariate ARIMA (MARIMA) modelling in R - r

I am currently using the Marima package for R invented by Henrik Spliid in order to forecast multivariate time series with ARIMA.
Overview can be found here:
https://cran.r-project.org/web/packages/marima/marima.pdf
http://orbit.dtu.dk/files/123996117/marima.anv.talk.pdf
When using the Marima function, it is required to define both the order of AR(p) and MA(q) first.
My question is, how can I determine appropriate values for p and q?
I know when it comes to univariate ARIMA analysis, that auto.arima gives a good suggestion for p and q. However, when I use auto.arima for every single univariate time series I want to analyze, there are (slightly) different suggestions for each time series. (For example (2,2,1) for the first, (1,1,1) for the second and so on)
Since I want to analyze all of the time series combined in the multivariate ARIMA model and I only can choose one value for each p and q (if I understood it correctly), I wonder how I can choose those values the most accurate way.
Could I just try to run the model a couple times and see what values for p and q work best (e.g. by testing the residuals of the forecast)?
What are your suggestions?
I would appreciate any help!

Related

Classifying M multivariate time-series based on known K classes in R

I have a M multivariate time series data, by multivariate I mean that a time series is represented by more than one variable that varies in time (see example image for simulated data). All have the same size. I want to build a classifier trained on K class (eg. all time series data belongs to A, B or C class).
Is there a straightforward implementation of this in R, specifically, as the regular classification approaches (e.g random forest, SVM) will ignore the dependent data and give different predictions within the same time series. I have an intuition how this could be solved, e.g. using some ensemble classification, or concatenating time series into a univariate vector, but I have a feeling there is a better approach for this that doesn't require me to reinvent the wheel. I also know that KNN and DTW approach could in theory work, but not sure how they get around these issues above (e.g. the multivariate problem)
Appreciate any pointers and references

Ensemble machine learning model with NNETAR and BRNN

I used the forecast package to forecast the daily time-series of variable Y using its lag values and a time series of an external parameter X. I found nnetar model (a NARX model) was the best in terms of overall performance. However, I was not able to get the prediction of peaks of the time series well despite my various attempts with parameter tuning.
I then extracted the peak values (above a threshold) of Y (and of course this is not a regular time series anymore) and corresponding X values and tried to fit a regression model (note: not an autoregression model) using various models in carat package. I found out the prediction of peak values using brnn(Bidirectional recurrent neural networks) model just using X values is better than that of nnetar which uses both lag values and X values.
Now my question is how do I go from here to create ensamples of these two models (i.e whenever the prediction using brnn regression model ( or any other regression model) is better I want to replace the prediction using nnetar and move forward - I am mostly concerned about the peaks)? Is this a commonly used approach?
Instead of trying to pick one model that would be the superior at anytime, it's typically better to do an average of the models, in order to include as many individual views as possible.
In the experiments I've been involved in, where we tried to pick one model that would outperform, based on historical performance, it's typically shown that a simple average was as good or better. Which is in line with the typical results on this problem: https://otexts.com/fpp2/combinations.html
So, before you try to go more advanced at it by using trying to pick a specific model based on previous performance, or by using an weighted average, consider doing a simple average of the two models.
If you want to continue with a sort of selection/weighted averaging, try to have a look at the FFORMA package in R: https://github.com/pmontman/fforma
I've not tried the specific package (yet), but have seen promising results in my test using the original m4metalearning package.

Quantile Regression with Time-Series Models (ARIMA-ARCH) in R

I am working on quantile forecasting with time-series data. The model I am using is ARIMA(1,1,2)-ARCH(2) and I am trying to get quantile regression estimates of my data.
So far, I have found "quantreg" package to perform quantile regression, but I have no idea how to put ARIMA-ARCH models as the model formula in function rq.
rq function seems to work for regressions with dependent and independent variables but not for time-series.
Is there some other package that I can put time-series models and do quantile regression in R? Any advice is welcome. Thanks.
I just put an answer on the Data Science forum.
It basically says that most of the ready made packages are using so called exact test based on assumption on the distribution (independent identical normal-Gauss distribution, or wider).
You also have a family of resampling methods in which you simulate a sample with a similar distribution of your observed sample, perform your ARIMA(1,1,2)-ARCH(2) and repeat the process a great number of times. Then you analyze this great number of forecast and measure (as opposed to compute) your confidence intervals.
The resampling methods differs in the way to generate the simulated samples. The most used are:
The Jackknife: in which you "forget" one point, that is you simulate a n samples of size n-1 (if n is the size of the observed sample).
The Bootstrap: in which you simulate a sample by taking n values of the original sample with replacements: some will be taken once, some twice or more, some never,...
It is a (not easy) theorem that the expectation of the confidence intervals, as most of the usual statistical estimators, are the same on the simulated sample than on the original sample. With the difference that you can measure them with a great number of simulations.
Hello and welcome to StackOverflow. Please take some time to read the help page, especially the sections named "What topics can I ask about here?" and "What types of questions should I avoid asking?". And more importantly, please read the Stack Overflow question checklist. You might also want to learn about Minimal, Complete, and Verifiable Examples.
I can try to address your question, although this is hard since you don't provide any code/data. Also, I guess by "put ARIMA-ARCH models" you actually mean that you want to make an integrated series stationary using an ARIMA(1,1,2) plus an ARCH(2) filters.
For an overview of the R time-series capabilities you can refer to the CRAN task list.
You can easily apply these filters in R with an appropriate function.
For instance, you could use the Arima() function from the forecast package, then compute the residuals with residuals() from the stats package. Next, you can use this filtered series as input for the garch() function from the tseries package. Other possibilities are of course possible. Finally, you can apply quantile regression on this filtered series. For instance, you can check out the dynrq() function from the quantreg package, which allows time-series objects in the data argument.

ARIMA Parameter selection from ACF/PACF plots

So I have a time series which I cannot share with you all, but I have a few questions about the proper proceedings to fit the correct ARIMA model for my data.
I have successfully written a loop to determine what degree of differencing needs to be done (parameter d in I(d))
Question:
To determine p and q, I am looking at ACF and PACF plots of my data. However, I am wondering if I should be using a deseasonalized transformation of my time series (trend plus random error, but no seasonality component which could be added back later) or my original time series. I obtained the deseasonal data using the decompose function in R (is stl() significantly better?).
With the original time seriees, my acf plot looks like:
There is some definite seasonality at play here from the ACF plot. Does that mean I need to identify nonzero seasonal parameters in my final model if I need to use this data? How do I choose seasonal P and Q in this case?
With the deseasonalized data, here are what the plots look like:
Not sure how to interpret the deseasonal PACF/ACF plots other than the fact that the spike at lag 6 on the ACF plot indicates p might be 6?
Just learned ARIMA this summer and would appreciate the help from anyone who knows the subject well how to choose the optimal parameters based on what I've shown. Looking forward to a good discourse :)

fourier() vs fourierf() function in R

I'm using the fourier() and fourierf() functions in Ron Hyndman's excellent forecast package in R. Looking to verify whether the same terms are selected and used in fourier() and fourierf(), I plotted a few of the output terms.
Below is the original data using ts.plot(data). There's a frequency of 364 in the time series, FYI.
Below is the plot of the terms using fourier(data,3). Basically, it looks like mirror images of the existing data.
Looking at just the sin1 term of the output, again, we get some variation that shows similar 364-day seasonality in line with the data above.
However, when I plot the results of the Fourier forecast using fourierf(data,3, 410) I see the below data. It appears far more smooth than the terms provided by the original fourier function.
So, I wonder how the results of fourier() and fourierf() are related. Is it possible to just see one consolidated Fourier result, so that you can see the sin or cosine result moving through existing data and then through the forecasting period? If not, how can I confirm that the terms created by fourierf() fit the in-sample data?
I want to use it in an auto.arima or glm function with other external regressors like this:
trainFourier<-fourier(data,3)
trainFourier<-as.data.frame(trainFourier)
trainFourier$exogenous<-exogenousData
arima.object<-auto.arima(data, xreg=trainFourier)
futureFourier<-fourierf(data,3, 410)
fourierForecast<-forecast(arima.object, xreg=futureFourier, h=410)
and want to be completely sure that the auto.arima has the proper fitting (using the terms from fourier()) to what I'll put in under xreg for forecast (which has terms from a different function, i.e. ffourier()).
Figured out the problem. I was using both the fda and forecast packages. fda, which is for functional data analysis and regression, has its own fourier() function. If I detach fda, my S1 term from fourier(data,3) looks like this:
which lines up nicely with the Fourier forecast if I use ts.plot(c(trainFourier$S1,futureFourier$S1))
Moral of the story -- watch what your packages supress, folks!

Resources