Forecast accuracy: no MASE with two vectors as arguments

Forecast accuracy: no MASE with two vectors as arguments - r

I'm using the accuracy function from the forecast package, to calculate accuracy measures. I'm using it to calculate measures for fitted time series models, such as ARIMA or exponential smoothing.
As I'm testing different model types on different dimensions and aggregation levels, I'm using the MASE, mean absolute scaled error, introduced by Hyndman et al (2006, "Another look at measures of forecast accuracy"), to compare different models on different levels.
Now I'm also comparing models with forecast history. As I only have the forecast values and not the models, I tried to use the accuracy function. In the function description is mentioned that it is also allowed provide two vector arguments, one with forecast values and one with actuals, to calculate the measures (instead of a fitted model):
f: An object of class "forecast", or a numerical vector containing forecasts. It
will also work with Arima, ets and lm objects if x is omitted – in which case
in-sample accuracy measures are returned.
x: An optional numerical vector containing actual values of the same length as
object.
But I was suprised by the fact that all measures are returned, expect the MASE. So I was wondering if somebody knows what the reason is for that? Why is the MASE not returned, while using two vectors as arguments in the accuracy function?

The MASE requires the historical data to compute the scaling factor. It is not computed from the future data as in the answer by #FBE. So if you don't pass the historical data to accuracy(), the MASE cannot be computed. For example,
> library(forecast)
> fcast <- snaive(window(USAccDeaths,end=1977.99))
> accuracy(fcast$mean,USAccDeaths)
ME RMSE MAE MPE MAPE ACF1
225.1666667 341.1639391 259.5000000 2.4692164 2.8505546 0.3086626
Theil's U
0.4474491
But if you pass the whole fcast object (which includes the historical data), you get
> accuracy(fcast,USAccDeaths)
ME RMSE MAE MPE MAPE MASE
225.1666667 341.1639391 259.5000000 2.4692164 2.8505546 0.5387310
ACF1 Theil's U
0.3086626 0.4474491

The paper on MASE clearly explains how to find it (even for non time-series data)
computeMASE <- function(forecast,train,test,period){
# forecast - forecasted values
# train - data used for forecasting .. used to find scaling factor
# test - actual data used for finding MASE.. same length as forecast
# period - in case of seasonal data.. if not, use 1
forecast <- as.vector(forecast)
train <- as.vector(train)
test <- as.vector(test)
n <- length(train)
scalingFactor <- sum(abs(train[(period+1):n] - train[1:(n-period)])) / (n-period)
et <- abs(test-forecast)
qt <- et/scalingFactor
meanMASE <- mean(qt)
return(meanMASE)
}

To help myself a little bit, I created a function to calculate the MASE, as described by Hyndman et al in "Another look at measures of forecast accuracy" (2006).
calculateMASE <- function(f,y) { # f = vector with forecasts, y = vector with actuals
if(length(f)!=length(y)){ stop("Vector length is not equal") }
n <- length(f)
return(mean(abs((y - f) / ((1/(n-1)) * sum(abs(y[2:n]-y[1:n-1]))))))
}
For reference, see:
http://robjhyndman.com/papers/foresight.pdf
http://en.wikipedia.org/wiki/Mean_absolute_scaled_error

Related

Backtransform MAE loss values in R

I built a neural network using the AMES dataset and took the log of the target variable 'SalePrice' in order for it to have an approximately normal distribution. My loss function is MAE, and the MAE values of my 'new' NN are significantly lower than that from my 'original' model. Is there any way that I can retranslate these loss values? Is it really as simple as just taking the exponential?

Some interesting forecasting accuracy results in R

I have a stationary (ADF test, p-value<0.05) time-series dataset which is monthly.There is very low seasonality effect and no trend. I have more than 100 observations as monthly.
nnetar model with regressors (default parameters) is used to forecast next 10 months so my test set has last 10 months the rest is training.
With accuracy() function, i have some interesting results like this:
ME RMSE MAE MPE MAPE
Test set -724.8016 5764.572 4505.981 -273.4382 294.9932
Visualization of forecast is logical also the predicted values too but may I ask why sometimes we get these kind of results especially I experienced these results on couple of dataset. Can these kind of result might related with the scale of values (bigger or smaller values) on our data? Or is it just related with the problem of the model?
Edit: I solved the inf problem of MAPE and here we have the new results. I also used the function of cvar() so here is the results of cvar() also:
10-fold cross-validation
Mean SD
ME 4.728654e+01 940.0063749
RMSE 2.673913e+03 825.4895270
MAE 2.057133e+03 586.7923645
MPE -8.074099e-01 10.2851975
MAPE 2.450919e+01 6.9997702
ACF1 -6.968456e-03 0.2815606
Theil's U 2.015842e+00 0.7869573
p-value of Ljung-Box test of residuals is 0.3974
if this value is significant (<0.05),
the result of the cross-validation should not be used
as the model is underfitting the data.
Thank you!

How to find RMSE value? and What is good RMSE value?

I am doing forecasting of electrical power output, I have different sets of data that varies from 200-4000 observations. I have calculated forecasting but I do not know how to calculate RMSE value and R (correlation coefficient) in R. I tried to calculate it on excel and the result for rmse was 0.0078. so I have basically two questions here.
How to calculate RMSE and R value in R?
What is good RMSE value? is 0.007 a good considerable value?

Here are two functions, one to compute the MSE and the second calls the first one and takes the squre root, RMSE.
These functions accept a fitted model, not a data set. For instance the output of lm, glm, and many others.
mse <- function(x, na.rm = TRUE, ...){
e <- resid(x)
mean(e^2, na.rm = TRUE)
}
rmse <- function(x, ...) sqrt(mse(x, ...))
Like I said in a comment to the question, a value is not good on its own, it's good when compared to others obtained from other fitted models.

Root Mean Square Error (RMSE) is the standard deviation of the prediction errors. prediction errors are a measure of how far from the regression line data points are; RMSE is a measure of how spread out these residuals are. In other words, it tells you how concentrated the data is around the line of best fit. Root mean square error is commonly used in climatology, forecasting, and regression analysis to verify experimental results.
The formula is:
Where:
f = forecasts (expected values or unknown results),
o = observed values (known results).
The bar above the squared differences is the mean (similar to x̄). The same formula can be written with the following, slightly different, notation:
Where:
Σ = summation (“add up”)
(zfi – Zoi)2 = differences, squared
N = sample size.
You can use which ever method you want as both reflects the same and "R" that you are refering to is pearson coefficient that defines the variance amount in the data
Coming to Question2 a good rmse value is always depends on the upper and lower bound of your rmse and a good value should always be smaller that gives less probe of error

ARIMAX exogenous variables reverse causality

I try to fit an ARIMAX model to figure out whether the containment measures (using the Government response stringency index, numbers from 0 to 100) are having a significant effect on the daily new cases rate. I also want to add test rates.
I programmed everything in R (every ts is stationary,...) and did the Granger causality test. Result: Pr(>F)is greater than 0.05. Therefore the null hypothesis of NO Granger causality can be rejected and the new cases rate and the containment measures have reverse causality.
Is there any possibility to transform the variable "stringency index" and continue with an ARIMAX model? If so, how to do this in R?

In R you have "forecast" package to build ARIMA models. Recall, that there is a difference between true ARIMAX models and linear regressions with ARIMA errors. Check this post by Rob Hyndman (forecast package author) for more detailed information:
The ARIMAX model muddle
Here are Rob Hyndman's examples to fit a linear regression with ARIMA errors - check more information here:
library(forecast)
library(fpp2) # To get a data set to work on
# Fit a linear regression with AR errors
fit <- Arima(uschange[,"Consumption"], xreg = uschange[,"Income"], order = c(1,0,0))
# Forecast and plot predictions
fcast <- forecast(fit, xreg=rep(mean(uschange[,2]),8))
autoplot(fcast) + xlab("Year") +
ylab("Percentage change")
# Use auto.arima function to find the optimal parameters
fit <- auto.arima(uschange[,"Consumption"], xreg = uschange[,"Income"])
# Plot predictions
fcast <- forecast(fit, xreg=rep(mean(uschange[,2]),8))
autoplot(fcast) + xlab("Year") +
ylab("Percentage change")
Regarding your question about how to solve the reverse causality matter, it is clear that you have endogeneity bias. The response stringency index affects the daily new cases rate and viceversa. If it is a prediction problem and not an estimation one, I wouldn't care too much on that as long as I get good predictions. For an estimation/causation matter, I will try to get different exogenus variables or try to use instrumental/control variables.

Reproduce ARIMA Forecast (Coefficients from R Arima())

I am quite new to the R and the ARIMA model, and I have a question on the ARIMA model that I obtained in R.
I will use the US unemployment rate as an example, the data range is from Jan, 1948 to Feb, 2015, total of 806 observations. After looking at the AICc, I decided to use ARIMA(2,1,2) model. (BTW I am using Arima() function from "forecast" package in R)
The output is the following:
Series: log.unemp
ARIMA(2,1,2)
ar1 1.6406
ar2 -0.7499
ma1 -1.5943
ma2 0.7893
sigma^2 estimated as 0.001307: log likelihood=1530.14
AIC=‐3050.27 AICc=‐3050.2 BIC=‐3026.82
The code is
fit.best <- Arima(log.unemp, c(2, 1, 2), include.constant=FALSE)
print(fit.best)
Then I want to measure the forecast performance of this model. That is, to calculate things like RMSE, Theil's U, etc. But I do not know how to do that. The reason is that it seems that I do not know how to derive the forecast equation from this output to calculate the fitted values.
So could anyone help me on this? How should I derive the forecast equation from this output? Also, after obtaining the equation, how can I do the forecast in Excel to calculate the fitted values from the first data point (there are some numbers that are not available when you are calculating the fitted value for t=1)?
Thanks!

you can use summmary(fit.best) to view RMSE.
Or if you want to caluate by yourself you can derive residuals and fitted values like this:
fitted=log.unemp-fit.best$residuals
about the equation see this

You can use forecast package
fit.best <- Arima(log.unemp, c(2, 1, 2), include.constant=FALSE)
my_forecast <- forecast(fit.best, h=10)
my_forecast #will show the next 10 periods
# or use some detailed data like
plot(my_forecast$residuals)

Use arima model as below code:
arimafit = arima(log.unemp, order=c(2,1,2))
Forecast using the below code:
arima_future = forecast(arimafit, h=3)
where forecast is the function to forecast for next whatever months you want.
h=3 means it will forecast for next 3 months.
If you want to check RMSE on the test data you can use DMwr package:
metrics = as.data.frame(DMwR::regr.eval(<test_data_vector>, arima_future$point_forecast))
test_data_vector - is the test data vector which you can create while dividing your main dataset into train and test dataset.
arima_future$point_forecast - Is the vector point forecast vector you will get in step2.