Strange behavior of auto.arima in R-package forecast - r
I am trying to use the R-package forecast to fit arima models (with the function Arima) and automatically select an appropriate model (with the function auto.arima). I first estimated two possible models with the function Arima:
tt.1 <- Arima(x, order=c(1,0,1), seasonal=list(order=c(0,1,1)),
include.drift=F)
tt.2 <- Arima(x, order=c(1,0,1), seasonal=list(order=c(0,1,0)),
include.drift=F)
Then, I used the function auto.arima to automatically select an appropriate model for the same data. I fixed d=0 and D=1 just as in the two models above. Furthermore, I set the maximum to 1 for all other parameters, did not use approximation of the selection criterion and did not use stepwise selection (note that the settings I use here are only for demonstration of the strange behavior, not what I really intend to use). I used BIC as criterion for selection the model. Here is the function call:
tt.auto <- auto.arima(x, ic="bic", approximation=F, seasonal=T, stepwise=F,
max.p=1, max.q=1, max.P=1, max.Q=1, d=0, D=1, start.p=1,
start.q=1, start.P=1, start.Q=1, trace=T,
allowdrift=F)
Now, I would have expected that auto.arima selects the model with the lower BIC from the two models above or a model not estimated above by Arima. Furthermore, I would have expected that the output generated by auto.arima when trace=T is exactly the same as the BIC calculated by Arima for the two models above. This is indeed true for the second model but not for the first one. For the first model, the BIC calculated by Arima is 10405.81 but the screen output of auto.arima for the model (1,0,1)(0,1,1) is Inf. Consequently, the second model is selected by auto.arima although the first model has a lower BIC when comparing the two models estimated by Arima. Does anyone have an idea why the BIC calculated by Arima does not correspond to the BIC calculated by auto.arima in case of the first model?
Here is the screen output of auto.arima:
ARIMA(0,0,0)(0,1,0)[96] : 11744.63
ARIMA(0,0,0)(0,1,1)[96] : Inf
ARIMA(0,0,0)(1,1,0)[96] : Inf
ARIMA(0,0,0)(1,1,1)[96] : Inf
ARIMA(0,0,1)(0,1,0)[96] : 11404.67
ARIMA(0,0,1)(0,1,1)[96] : Inf
ARIMA(0,0,1)(1,1,0)[96] : Inf
ARIMA(0,0,1)(1,1,1)[96] : Inf
ARIMA(1,0,0)(0,1,0)[96] : 11120.72
ARIMA(1,0,0)(0,1,1)[96] : Inf
ARIMA(1,0,0)(1,1,0)[96] : Inf
ARIMA(1,0,0)(1,1,1)[96] : Inf
ARIMA(1,0,1)(0,1,0)[96] : 10984.75
ARIMA(1,0,1)(0,1,1)[96] : Inf
ARIMA(1,0,1)(1,1,0)[96] : Inf
ARIMA(1,0,1)(1,1,1)[96] : Inf
And here are summaries of the models calculated by Arima:
> summary(tt.1)
Series: x
ARIMA(1,0,1)(0,1,1)[96]
Coefficients:
ar1 ma1 sma1
0.9273 -0.5620 -1.0000
s.e. 0.0146 0.0309 0.0349
sigma^2 estimated as 867.7: log likelihood=-5188.98
AIC=10385.96 AICc=10386 BIC=10405.81
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.205128 28.16286 11.14871 -7.171098 18.42883 0.3612059 -0.03466711
> summary(tt.2)
Series: x
ARIMA(1,0,1)(0,1,0)[96]
Coefficients:
ar1 ma1
0.9148 -0.4967
s.e. 0.0155 0.0320
sigma^2 estimated as 1892: log likelihood=-5481.93
AIC=10969.86 AICc=10969.89 BIC=10984.75
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.1942746 41.61086 15.38138 -8.836059 24.55919 0.49834 -0.02253845
Note: I am not allowed to make the data available. But I would be happy to provide more output or run modified calls of the functions if necessary.
EDIT: I now looked at the source code of auto.arima and found out that the behavior is caused by a check on the roots which sets the information criterion used for selecting a model to Inf if the model fails the check. The paper cited in the help for auto.arima confirms that (Hyndman, R.J. and Khandakar, Y. (2008) "Automatic time series forecasting: The forecast package for R", Journal of Statistical Software, 26(3), page 11). Sorry for the question, I should have read the paper before asking a question here!
auto.arima tries to find the best model subject to some constraints, avoiding models with parameters that are close to the non-stationarity and non-invertibility boundaries.
Your tt.1 model has a seasonal MA(1) parameter of -1 which lies on the non-invertibility boundary. So you don't want to use that model as it will lead to numerical instabilities. The seasonal difference operator is confounded with the seasonal MA operator.
Internally, auto.arima gives an AIC/AICc/BIC value of Inf to any model that doesn't satisfy the constraints to avoid it being selected.
Related
Some interesting forecasting accuracy results in R
I have a stationary (ADF test, p-value<0.05) time-series dataset which is monthly.There is very low seasonality effect and no trend. I have more than 100 observations as monthly. nnetar model with regressors (default parameters) is used to forecast next 10 months so my test set has last 10 months the rest is training. With accuracy() function, i have some interesting results like this: ME RMSE MAE MPE MAPE Test set -724.8016 5764.572 4505.981 -273.4382 294.9932 Visualization of forecast is logical also the predicted values too but may I ask why sometimes we get these kind of results especially I experienced these results on couple of dataset. Can these kind of result might related with the scale of values (bigger or smaller values) on our data? Or is it just related with the problem of the model? Edit: I solved the inf problem of MAPE and here we have the new results. I also used the function of cvar() so here is the results of cvar() also: 10-fold cross-validation Mean SD ME 4.728654e+01 940.0063749 RMSE 2.673913e+03 825.4895270 MAE 2.057133e+03 586.7923645 MPE -8.074099e-01 10.2851975 MAPE 2.450919e+01 6.9997702 ACF1 -6.968456e-03 0.2815606 Theil's U 2.015842e+00 0.7869573 p-value of Ljung-Box test of residuals is 0.3974 if this value is significant (<0.05), the result of the cross-validation should not be used as the model is underfitting the data. Thank you!
interpreting results from auto.arima R
First time using doing any forecasting and was looking into using auto.arima but I'm not really sure what the results mean, basically from 'Coefficients' onwards. Can someone please be kind enough to give me an explanation Series: total[, "total"] ARIMA(2,0,0) with non-zero mean Coefficients: ar1 ar2 mean 1.1055 -0.4207 138805.107 s.e. 0.2020 0.2002 4664.756 sigma^2 = 53468931: log likelihood = -205.36 AIC=418.72 AICc=421.39 BIC=422.7
It means that the main model fitted by autoarima() converges to: Y_t = 138805.107 - 0.4207·Y_(t-1) + 1.1055·Y_(t-2) For the rest of the output, the standard deviation (s.e.) for each of the coefficients is also shown (necessary to see if the coefficients are significant). Finally, sigma^2 is the variance of the residual values, the log-likelihood is a "quality measure" of the model (the closer to zero the better) and necessary for comparing this fit against others) and the AIC, AICc and BIC are other "quality measures" based of the log-likelihood, sample size and amount of coefficients estimated.
How to copy data displayed in summary of any analysis to a dataframe?
I am trying to carry out time series modelling and would like to copy the data input we get from running a data model onto a dataframe. For eg, when I run the following code: a<-auto.arima(tt1[1:33,1]) summary(a) I get the following output, which I want to copy to a dataframe to analyse in future-to see how many instances across different models have I got similar values. Series: tt1[1:33, 1] ARIMA(0,0,1) with non-zero mean Coefficients: ma1 mean -0.4421 219.4943 s.e. 0.2079 27.2563 sigma^2 estimated as 79580: log likelihood=-232.1 AIC=470.19 AICc=471.02 BIC=474.68 Training set error measures: ME RMSE MAE MPE MAPE MASE ACF1 Training set 3.103363 273.4178 233.6786 NaN Inf 0.6599926 0.07472074
You can use "broom" package to solve this issue. Check the package vignette for more information. Here is a reproducible example: library(broom) library(forecast) library(fpp2) # To get a data set to work on # Fit a linear regression with AR errors fit <- Arima(uschange[,"Consumption"], xreg = uschange[,"Income"], order = c(1,0,0)) # Use broom to convert summary output into a data.frame tidy(fit) # coefficients and standard error # Get main statistics glance(fit)
In the auto.arima() function in R, how do I find the p,d,q values for that arima
I used an R code with an auto.arima function on a time series data set to forecast. From here, Id like to know how to find the p,d,q values for the arima. Is there a quick way to determine that, thank you.
The forecast::auto.arima() function was written to pick the optimal p, d, and q with respect to some optimization criterion (e.g. AIC). If you want to see which model was picked, use the summary() function. For example: fit <- auto.arima(lynx) summary(fit) Series: lynx ARIMA(2,0,2) with non-zero mean Coefficients: ar1 ar2 ma1 ma2 mean 1.3421 -0.6738 -0.2027 -0.2564 1544.4039 s.e. 0.0984 0.0801 0.1261 0.1097 131.9242 sigma^2 estimated as 761965: log likelihood=-932.08 AIC=1876.17 AICc=1876.95 BIC=1892.58 Training set error measures: ME RMSE MAE MPE MAPE MASE ACF1 Training set -1.608903 853.5488 610.1112 -63.90926 140.7693 0.7343143 -0.01267127 Where you can see the particular specification in the second row of the output. In this example, auto.arima picks an ARIMA(2,0,2). Note that I did this naively here for demonstration purposes. I didn't check whether this is an accurate representation of the dependency structure in the lynx data set.
Other than summary(), you could also use arimaorder(fit) to get the vector c(p,d,q) or as.character(fit) to get "ARIMA(p,d,q)".
arima estimate validation through arima.sim
This is out of my curiosity trying to compare time series input to an ARMA model and reconstructed series after an ARMA estimate is obtained. These are the steps I am thinking: construct simulation time series arma.sim <- arima.sim(model=list(ar=c(0.9),ma=c(0.2)),n = 100) estimate the model from arma.sim, assuming we know it is a (1,0,1) model arma.est1 <- arima(arma.sim, order=c(1,0,1)) also say we get arma.est1 in this form, which is close to the original (0.9,0,0.2): Coefficients: ar1 ma1 intercept 0.9115 0.0104 -0.4486 s.e. 0.0456 0.1270 1.1396 sigma^2 estimated as 1.15: log likelihood = -149.79, aic = 307.57 If I try to reconstruct another time series from arma.est1, how do I incorporate intercept or s.e. in arima.sim? Something like this doesn't seem to work well because arma.sim and arma.rec are far off: arma.rec <- arima.sim(n=100, list(ar=c(0.9115),ma=c(0.0104))) Normally we use predict() to check the estimate. But is this a legit way to look at the estimate?