Some interesting forecasting accuracy results in R - r

I have a stationary (ADF test, p-value<0.05) time-series dataset which is monthly.There is very low seasonality effect and no trend. I have more than 100 observations as monthly.
nnetar model with regressors (default parameters) is used to forecast next 10 months so my test set has last 10 months the rest is training.
With accuracy() function, i have some interesting results like this:
ME RMSE MAE MPE MAPE
Test set -724.8016 5764.572 4505.981 -273.4382 294.9932
Visualization of forecast is logical also the predicted values too but may I ask why sometimes we get these kind of results especially I experienced these results on couple of dataset. Can these kind of result might related with the scale of values (bigger or smaller values) on our data? Or is it just related with the problem of the model?
Edit: I solved the inf problem of MAPE and here we have the new results. I also used the function of cvar() so here is the results of cvar() also:
10-fold cross-validation
Mean SD
ME 4.728654e+01 940.0063749
RMSE 2.673913e+03 825.4895270
MAE 2.057133e+03 586.7923645
MPE -8.074099e-01 10.2851975
MAPE 2.450919e+01 6.9997702
ACF1 -6.968456e-03 0.2815606
Theil's U 2.015842e+00 0.7869573
p-value of Ljung-Box test of residuals is 0.3974
if this value is significant (<0.05),
the result of the cross-validation should not be used
as the model is underfitting the data.
Thank you!

Related

Backtransform MAE loss values in R

I built a neural network using the AMES dataset and took the log of the target variable 'SalePrice' in order for it to have an approximately normal distribution. My loss function is MAE, and the MAE values of my 'new' NN are significantly lower than that from my 'original' model. Is there any way that I can retranslate these loss values? Is it really as simple as just taking the exponential?

Why with "arima" function and with "Arima" get different value of MASE with accuracy?

Im try to calculate "manually" all the measures of accuracy and only have problems with de MASE.
Using "fit1" from arima function there is not problems, the problem is with "Arima" function.
I would like to know how calculate MASE manually with fit2 Arima object
fit1<-arima(EuStockMarkets[,1],order=c(1,1,0))
fit2<-Arima(EuStockMarkets[,1],order=c(1,1,0))
accuracy(fit1)
accuracy(fit2)

In the auto.arima() function in R, how do I find the p,d,q values for that arima

I used an R code with an auto.arima function on a time series data set to forecast. From here, Id like to know how to find the p,d,q values for the arima. Is there a quick way to determine that, thank you.
The forecast::auto.arima() function was written to pick the optimal p, d, and q with respect to some optimization criterion (e.g. AIC). If you want to see which model was picked, use the summary() function.
For example:
fit <- auto.arima(lynx)
summary(fit)
Series: lynx
ARIMA(2,0,2) with non-zero mean
Coefficients:
ar1 ar2 ma1 ma2 mean
1.3421 -0.6738 -0.2027 -0.2564 1544.4039
s.e. 0.0984 0.0801 0.1261 0.1097 131.9242
sigma^2 estimated as 761965: log likelihood=-932.08
AIC=1876.17 AICc=1876.95 BIC=1892.58
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set -1.608903 853.5488 610.1112 -63.90926 140.7693 0.7343143 -0.01267127
Where you can see the particular specification in the second row of the output. In this example, auto.arima picks an ARIMA(2,0,2).
Note that I did this naively here for demonstration purposes. I didn't check whether this is an accurate representation of the dependency structure in the lynx data set.
Other than summary(), you could also use arimaorder(fit) to get the vector c(p,d,q) or as.character(fit) to get "ARIMA(p,d,q)".

Strange behavior of auto.arima in R-package forecast

I am trying to use the R-package forecast to fit arima models (with the function Arima) and automatically select an appropriate model (with the function auto.arima). I first estimated two possible models with the function Arima:
tt.1 <- Arima(x, order=c(1,0,1), seasonal=list(order=c(0,1,1)),
include.drift=F)
tt.2 <- Arima(x, order=c(1,0,1), seasonal=list(order=c(0,1,0)),
include.drift=F)
Then, I used the function auto.arima to automatically select an appropriate model for the same data. I fixed d=0 and D=1 just as in the two models above. Furthermore, I set the maximum to 1 for all other parameters, did not use approximation of the selection criterion and did not use stepwise selection (note that the settings I use here are only for demonstration of the strange behavior, not what I really intend to use). I used BIC as criterion for selection the model. Here is the function call:
tt.auto <- auto.arima(x, ic="bic", approximation=F, seasonal=T, stepwise=F,
max.p=1, max.q=1, max.P=1, max.Q=1, d=0, D=1, start.p=1,
start.q=1, start.P=1, start.Q=1, trace=T,
allowdrift=F)
Now, I would have expected that auto.arima selects the model with the lower BIC from the two models above or a model not estimated above by Arima. Furthermore, I would have expected that the output generated by auto.arima when trace=T is exactly the same as the BIC calculated by Arima for the two models above. This is indeed true for the second model but not for the first one. For the first model, the BIC calculated by Arima is 10405.81 but the screen output of auto.arima for the model (1,0,1)(0,1,1) is Inf. Consequently, the second model is selected by auto.arima although the first model has a lower BIC when comparing the two models estimated by Arima. Does anyone have an idea why the BIC calculated by Arima does not correspond to the BIC calculated by auto.arima in case of the first model?
Here is the screen output of auto.arima:
ARIMA(0,0,0)(0,1,0)[96] : 11744.63
ARIMA(0,0,0)(0,1,1)[96] : Inf
ARIMA(0,0,0)(1,1,0)[96] : Inf
ARIMA(0,0,0)(1,1,1)[96] : Inf
ARIMA(0,0,1)(0,1,0)[96] : 11404.67
ARIMA(0,0,1)(0,1,1)[96] : Inf
ARIMA(0,0,1)(1,1,0)[96] : Inf
ARIMA(0,0,1)(1,1,1)[96] : Inf
ARIMA(1,0,0)(0,1,0)[96] : 11120.72
ARIMA(1,0,0)(0,1,1)[96] : Inf
ARIMA(1,0,0)(1,1,0)[96] : Inf
ARIMA(1,0,0)(1,1,1)[96] : Inf
ARIMA(1,0,1)(0,1,0)[96] : 10984.75
ARIMA(1,0,1)(0,1,1)[96] : Inf
ARIMA(1,0,1)(1,1,0)[96] : Inf
ARIMA(1,0,1)(1,1,1)[96] : Inf
And here are summaries of the models calculated by Arima:
> summary(tt.1)
Series: x
ARIMA(1,0,1)(0,1,1)[96]
Coefficients:
ar1 ma1 sma1
0.9273 -0.5620 -1.0000
s.e. 0.0146 0.0309 0.0349
sigma^2 estimated as 867.7: log likelihood=-5188.98
AIC=10385.96 AICc=10386 BIC=10405.81
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.205128 28.16286 11.14871 -7.171098 18.42883 0.3612059 -0.03466711
> summary(tt.2)
Series: x
ARIMA(1,0,1)(0,1,0)[96]
Coefficients:
ar1 ma1
0.9148 -0.4967
s.e. 0.0155 0.0320
sigma^2 estimated as 1892: log likelihood=-5481.93
AIC=10969.86 AICc=10969.89 BIC=10984.75
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.1942746 41.61086 15.38138 -8.836059 24.55919 0.49834 -0.02253845
Note: I am not allowed to make the data available. But I would be happy to provide more output or run modified calls of the functions if necessary.
EDIT: I now looked at the source code of auto.arima and found out that the behavior is caused by a check on the roots which sets the information criterion used for selecting a model to Inf if the model fails the check. The paper cited in the help for auto.arima confirms that (Hyndman, R.J. and Khandakar, Y. (2008) "Automatic time series forecasting: The forecast package for R", Journal of Statistical Software, 26(3), page 11). Sorry for the question, I should have read the paper before asking a question here!
auto.arima tries to find the best model subject to some constraints, avoiding models with parameters that are close to the non-stationarity and non-invertibility boundaries.
Your tt.1 model has a seasonal MA(1) parameter of -1 which lies on the non-invertibility boundary. So you don't want to use that model as it will lead to numerical instabilities. The seasonal difference operator is confounded with the seasonal MA operator.
Internally, auto.arima gives an AIC/AICc/BIC value of Inf to any model that doesn't satisfy the constraints to avoid it being selected.

Forecast accuracy: no MASE with two vectors as arguments

I'm using the accuracy function from the forecast package, to calculate accuracy measures. I'm using it to calculate measures for fitted time series models, such as ARIMA or exponential smoothing.
As I'm testing different model types on different dimensions and aggregation levels, I'm using the MASE, mean absolute scaled error, introduced by Hyndman et al (2006, "Another look at measures of forecast accuracy"), to compare different models on different levels.
Now I'm also comparing models with forecast history. As I only have the forecast values and not the models, I tried to use the accuracy function. In the function description is mentioned that it is also allowed provide two vector arguments, one with forecast values and one with actuals, to calculate the measures (instead of a fitted model):
f: An object of class "forecast", or a numerical vector containing forecasts. It
will also work with Arima, ets and lm objects if x is omitted – in which case
in-sample accuracy measures are returned.
x: An optional numerical vector containing actual values of the same length as
object.
But I was suprised by the fact that all measures are returned, expect the MASE. So I was wondering if somebody knows what the reason is for that? Why is the MASE not returned, while using two vectors as arguments in the accuracy function?
The MASE requires the historical data to compute the scaling factor. It is not computed from the future data as in the answer by #FBE. So if you don't pass the historical data to accuracy(), the MASE cannot be computed. For example,
> library(forecast)
> fcast <- snaive(window(USAccDeaths,end=1977.99))
> accuracy(fcast$mean,USAccDeaths)
ME RMSE MAE MPE MAPE ACF1
225.1666667 341.1639391 259.5000000 2.4692164 2.8505546 0.3086626
Theil's U
0.4474491
But if you pass the whole fcast object (which includes the historical data), you get
> accuracy(fcast,USAccDeaths)
ME RMSE MAE MPE MAPE MASE
225.1666667 341.1639391 259.5000000 2.4692164 2.8505546 0.5387310
ACF1 Theil's U
0.3086626 0.4474491
The paper on MASE clearly explains how to find it (even for non time-series data)
computeMASE <- function(forecast,train,test,period){
# forecast - forecasted values
# train - data used for forecasting .. used to find scaling factor
# test - actual data used for finding MASE.. same length as forecast
# period - in case of seasonal data.. if not, use 1
forecast <- as.vector(forecast)
train <- as.vector(train)
test <- as.vector(test)
n <- length(train)
scalingFactor <- sum(abs(train[(period+1):n] - train[1:(n-period)])) / (n-period)
et <- abs(test-forecast)
qt <- et/scalingFactor
meanMASE <- mean(qt)
return(meanMASE)
}
To help myself a little bit, I created a function to calculate the MASE, as described by Hyndman et al in "Another look at measures of forecast accuracy" (2006).
calculateMASE <- function(f,y) { # f = vector with forecasts, y = vector with actuals
if(length(f)!=length(y)){ stop("Vector length is not equal") }
n <- length(f)
return(mean(abs((y - f) / ((1/(n-1)) * sum(abs(y[2:n]-y[1:n-1]))))))
}
For reference, see:
http://robjhyndman.com/papers/foresight.pdf
http://en.wikipedia.org/wiki/Mean_absolute_scaled_error

Resources