How to read a particular PACF plot? - r

I did SARIMA model for the first time. While reading articles and examples I didn't get how to examine ACF/PACF plots
After differencing my ts.data two times to get it stationary I saw that PACF plot and how no idea how to interpret it
Will be glad if someone could explain me that plot or/and give a good source to read about it Differenced twice time series, lag is 12

You have to see both the ACF and the PACF at the same time to see if autocorrelation process of the series. Just seeing the PACF i think it might have a MA(1) because the first lag passes through the confidence interval, and then the other are just a pattern. But to be sure have to see the ACF also

Related

R - Forecast multiple time-series (15K Products)

Hi Stack Overflow community.
I have 5 years of weekly price data for more than 15K Products (5*15K**52 records). Each product is a univariate time series. The objective is to forecast the price of each product.
I am familiar with the univariate time series analysis in which we can visualize each ts series, plot its ACF, PACF, and forecast the series. But, Univariate time series analysis is not possible in this case when I have 15K different time-series, can not visualize each time series, its ACF, PACF, and forecast separately of each product, and make a tweak/decision on it.
I am looking for some recommendations and directions to solve this multi-series forecasting problem using R (preferable). Any help and support will be appreciated.
Thanks in advance.
I would suggest you use auto.arima from the forecast package.
This way you don't have to search for the right ARIMA model.
auto.arima: Returns best ARIMA model according to either AIC, AICc or BIC value. The function conducts a search over possible models within the order constraints provided.
fit <- auto.arima(WWWusage)
plot(forecast(fit,h=20))
Instead of WWWusage you could put one of your time series, to fit an ARIMA model.
With forecast you then perform the forecast - in this case 20 time steps ahead (h=20).
auto.arima basically chooses the ARIMA parameters for you (according to AIC - Akaike information criterion).
You would have to try, if it is too computational expensive for you. But in general it is not that uncommon to forecast that many time series.
Another thing to keep in mind could be, that it might after all not be that unlikely, that there is some cross-correlation in the time series. So from a forecasting precision standpoint it could make sense to not treat this as a univariate forecasting problem.
The setting it sounds quite similar to the m5 forecasting competition that was recently held on Kaggle. Goal was to point forecasts the unit sales of various products sold in the USA by Walmart.
So a lot of time series of sales data to forecast. In this case the winner did not do a univariate forecast. Here a link to a description of the winning solution. Since the setting seems so similar to yours, it probably makes sense to read a little bit in the kaggle forum of this challenge - there might be even useful notebooks (code examples) available.

Multivariate ARIMA (MARIMA) modelling in R

I am currently using the Marima package for R invented by Henrik Spliid in order to forecast multivariate time series with ARIMA.
Overview can be found here:
https://cran.r-project.org/web/packages/marima/marima.pdf
http://orbit.dtu.dk/files/123996117/marima.anv.talk.pdf
When using the Marima function, it is required to define both the order of AR(p) and MA(q) first.
My question is, how can I determine appropriate values for p and q?
I know when it comes to univariate ARIMA analysis, that auto.arima gives a good suggestion for p and q. However, when I use auto.arima for every single univariate time series I want to analyze, there are (slightly) different suggestions for each time series. (For example (2,2,1) for the first, (1,1,1) for the second and so on)
Since I want to analyze all of the time series combined in the multivariate ARIMA model and I only can choose one value for each p and q (if I understood it correctly), I wonder how I can choose those values the most accurate way.
Could I just try to run the model a couple times and see what values for p and q work best (e.g. by testing the residuals of the forecast)?
What are your suggestions?
I would appreciate any help!

How to interpret a VAR model without sigificant coefficients?

I am trying to investigate the relationship between some Google Trends Data and Stock Prices.
I performed the augmented ADF Test and KPSS test to make sure that both time series are integrated of the same order (I(1)).
However, after I took the first differences, the ACF plot was completely insigificant (except for 1 of course), which told me that the differenced series are behaving like white noise.
Nevertheless I tried to estimate a VAR model which you can see attached.
As you can see, only one constant is significant. I have already read that because Stocks.ts.l1 is not significant in the equation for GoogleTrends and GoogleTrends.ts.l1 is not significant in the equation for Stocks, there is no dynamic between the two time series and both can also be models independently from each other with a AR(p) model.
I checked the residuals of the model. They fulfill the assumptions (normally distributed residuals are not totally given but ok, there is homoscedasticity, its stable and there is no autocorrelation).
But what does it mean if no coefficient is significant as in the case of the Stocks.ts equation? Is the model just inappropriate to fit the data, because the data doesn't follow an AR process. Or is the model just so bad, that a constant would describe the data better than the model? Or a combination of the previous questions? Any suggestions how I could proceed my analysis?
Thanks in advance

Interpreting ACF and PACF plots for SARIMA model

I'm new to time series and used the monthly ozone concentration data from Rob Hyndman's website to do some forecasting.
After doing a log transformation and differencing by lags 1 and 12 to get rid of the trend and seasonality respectively, I plotted the ACF and PACF shown [in this image][2]. Am I on the right track and how would I interpret this as a SARIMA?
There seems to be a pattern every 11 lags in the PACF plot, which makes me think I should do more differencing (at 11 lags), but doing so gives me a worse plot.
I'd really appreciate any of your help!
EDIT:
I got rid of the differencing at lag 1 and just used lag 12 instead, and this is what I got for the ACF and PACF.
From there, I deduced that: SARIMA(1,0,1)x(1,1,1) (AIC: 520.098)
or SARIMA(1,0,1)x(2,1,1) (AIC: 521.250)
would be a good fit, but auto.arima gave me (3,1,1)x(2,0,0) (AIC: 560.7) normally and (1,1,1)x(2,0,0) (AIC: 558.09) without stepwise and approximation.
I am confused on which model to use, but based on the lowest AIC, SAR(1,0,1)x(1,1,1) would be the best? Also, the thing that concerns me is that none of the models pass the Ljung-Box test. Is there any way I can fix this?
It is quite difficult to manually select a model order that will perform well at forecasting a dataset. This is why Rob has built the 'auto.arima' function in his R forecast package, to figure out the model that may perform best based on certain metrics.
When you see a pacf plot with significantly negative lags that usually means you have over differenced your data. Try removing the 1st order difference and keeping the 12 order difference. Then carry on making your best guess.
I'd recommend trying his auto.arima function and passing it a time series object with frequency = 12. He has a good writeup of seasonal arima models here:
https://www.otexts.org/fpp/8/9
If you would like more insight into manually selecting a SARIMA model order, this is a good read:
https://onlinecourses.science.psu.edu/stat510/node/67
In response to your Edit:
I think it would be beneficial to this post if you clarify your objective. Which of the following are you trying to achieve?
Find a model where residuals satisfy Ljung Box Test
Produce the most accurate out of sample forecast
Manually select lag orders such that ACF and PACF plots show no significant lags remaining.
In my opinion, #2 is the most sought after objective so I'll assume that is your goal. From my experience, #3 produces poor results out of sample. In regards to #1, I am usually not concerned about correlations remaining in the residuals. We know we do not have the true model for this time-series, so I do not feel there's any reason to expect an approximate model that performs well out of sample to not have left something behind in the residuals that is more complex perhaps, or nonlinear etc.
To provide you another SARIMA result, I ran this data through some code I've developed and found the following equation produced the minimal error on a cross-validation period.
Final model is:
SARIMA [0,1,1] [1,1,1]12 with a constant using the log normal of the time-series.
The errors in the cross validation period are:
MAPE = 16%
MAE = 0.46
RSQR = 74%
Here is the Partial Autocorrelation plot of the residuals for your information.
This is roughly similar in methodology to selecting an equation based on AICc to my understanding, but is ultimately a different approach. Regardless, if your objective is out of sample accuracy, I'd recommend evaluating equations in terms of their out of sample accuracy versus in-sample fit, tests, or plots.

ARIMA Parameter selection from ACF/PACF plots

So I have a time series which I cannot share with you all, but I have a few questions about the proper proceedings to fit the correct ARIMA model for my data.
I have successfully written a loop to determine what degree of differencing needs to be done (parameter d in I(d))
Question:
To determine p and q, I am looking at ACF and PACF plots of my data. However, I am wondering if I should be using a deseasonalized transformation of my time series (trend plus random error, but no seasonality component which could be added back later) or my original time series. I obtained the deseasonal data using the decompose function in R (is stl() significantly better?).
With the original time seriees, my acf plot looks like:
There is some definite seasonality at play here from the ACF plot. Does that mean I need to identify nonzero seasonal parameters in my final model if I need to use this data? How do I choose seasonal P and Q in this case?
With the deseasonalized data, here are what the plots look like:
Not sure how to interpret the deseasonal PACF/ACF plots other than the fact that the spike at lag 6 on the ACF plot indicates p might be 6?
Just learned ARIMA this summer and would appreciate the help from anyone who knows the subject well how to choose the optimal parameters based on what I've shown. Looking forward to a good discourse :)

Resources