I estimate the ARIMA model on a training dataset using the auto.arima function in R. Afterwards I am using the function forecast to make suppose 50 predictions and calculate the accuracy measures such as RMSE and MAE.
If I use the forecast function, it uses only the observations in the training set, and then makes the predictions at each time unit t using the values predicted at time t-1. What I am trying to do, is to make 1 prediction at time, adding at each time t an observed value to the training set, without reestimating the ARIMA model. So instead of considering the predicted values at time t-1, I would consider the real values. So if ARIMA has been estimated on the training dataset of 100 observations, the first forecast will be done considering the training dataset of length 100, the second forecast will consider the training set of length 101, the third forecast will take the training set of length 102 and so on.
The auto.arima output contains the datasets "x" which is the training set I use to estimate the model, and the dataset "fitted" which contains the fitted values. It also has the argument "nobs" which is the length of the dataset "x". I am trying to replace auto.arima$x with a new training dataset where the last observations are given by true values I add one at the time. I also modify "nobs" so it would give me the length of the new "x". But I noticed that the forecast for only one time ahead always considers the old training set. So for instance I added one observed value at a time to the training set and made the one ahead predictions for 50 times but all the predictions are equal to the first one. Like the forecast function ignores the fact that I replaced the "x" series inside the auto.arima output. I tried to replace the "fitted" values with the same result.
Does someone know how exactly the function "forecast" considers the training set based on which to make the predictions? What should I modify inside the auto.arima output at each time t to get the one-ahead predictions based on the real values at the previous times, instead of the estimated ones? Or there is a way to tell the "forecast" function to consider a different training dataset?
I don't want to refit ARIMA model on the new training dataset (using Arima function) and reestimate the residual variance, it takes literally forever...
Any suggestion would be helpful
Thank you in advance
Related
I am trying to calculate Gini for my regression models and since there is no Gini index for regression models, I am getting all the scores and calculate it using Gini functions in R using this code:
preds <-h2o.predict(model, test)
pred_vs_actual <- as.data.frame(h2o.cbind(test$target,preds)
Does this code return the correct pair values for actual and predictions? I know that there is no order in a spark table but I am not sure if this is also the case for H2O object.
Yes what you have (pred_vs_actual) will cbind your model's predictions with the corresponding row (record). As a quick check, when you look at the first few rows of pred_vs_actual you should be able to verify that the cbind does what you expect.
I am doing a TS Analysis. What is the difference between these two accuracies:
fit<-auto.arima(tsdata)
fcast<-forecast(fit,6)
accuracy(fcast) #### First Accuracy
fit<-auto.arima(tsdata)
fcast<-forecast(fit,6)
accuracy(fcast,actual values) #### Second Accuracy
How does the accuracy function work when I don't specify the actual values in the accuracy function as in the first case.
Secondly what is the right approach to calculate accuracy?
In this answer I'm assuming you are using the function from the forecast package.
The answer lies within accuracy's description:
Returns range of summary measures of the forecast accuracy. If x is provided, the function measures out-of-sample (test set) forecast accuracy based on x-f. If x is not provided, the function only produces in-sample (training set) accuracy measures of the forecasts based on f["x"]-fitted(f). All measures are defined and discussed in Hyndman and Koehler (2006).
In your case x being the second argument of the function. So, in short accuracy(fcst) provides an estimation of the prediction error, based on the training set.
For example: lets assume you have 12 months and predicting 6 ahead. Then if you use accuracy(fcst) you get the error of the model over the 12 months (only).
Now, let's assume x = real demand for the 6 months you are forecasting. And that you didn't use this data to build the Arima model. In this case accuracy(fcst, x) gives you the test set error, which is a better measure to what you will get in the future using this model (compared to the train set error).
The best practice is to use a test set error because this measure is less prone to bias (you will most likely get "better" prediction results on the training set then on a "hideout" test set, but these results will be a sort of "overfitting"). If you have a test set, you should use the test set as the second argument.
I am working on building a time series model.
However, I am having trouble understanding what the difference is between the simulate function and the forecast function in the forecast package.
Suppose I built an arima model and want to use it to simulate future values as long as 10 years. The data is hourly and we have a year worth of data.
When using forecast to predict the next 1000-step-ahead estimation, I got the following plot.
Using forecast method
Then I used the simulate function to simulate the next 1000 simulated values and got the following plot.
Using simulate method
Data points after the red line are simulated data points.
In the latter example, I used the following codes to simulate the future values.
simulate(arima1, nsim=1000, future=TRUE, bootstrap=TRUE))
where arima1 is my trained arima model, bootstrap residuals are used because the model residuals are not very normal.
Per definition in the forecast package, future=TRUE means that we are simulating future values based on the historical data.
Can anyone tell me what the difference is between these two method? Why does simulate() give me a much more realistic results but forecasted values from forecast() just converge to a constant after several iterations (no much fluctuation to the results from simulate())?
A simulation is a possible future sample path of the series.
A point forecast is the mean of all possible future sample paths. So the point forecasts are usually much less variable than the data.
The forecast function produces point forecasts (the mean) and interval forecasts containing the estimated variation in the future sample paths.
As a side point, an ARIMA model is not appropriate for this time series because of the skewness. You might need to use a transformation first.
I am using support vector regression in R to forecast future values for a uni-variate time series. Splitting the historical data into test and train sets, I find a model by using svm function in R to the test data and then use the predict() command with train data to predict values for the train set. We can then compute prediction errors. I wonder what happens then? we have a model and by checking the model on the train data, we see the model is efficient. How can I use this model to predict future values out of train data? Generally speaking, we use predict function in R and give it a forecast horizon (h=12) to predict 12 future values. Based on what I saw, the predict() command for SVM does not have such coomand and needs a train dataset. How should I build a train data set for predicting future data which is not in our historical data set?
Thanks
Just a stab in the dark... SVM is not for prediction but for classification, specifically supervised. I am guessing you are trying to predict stock values, no? How about classify your existing data, using some size of your choice say 100 values at a time, for noise (N), up (U), big up (UU), down (D), and big down (DD). In this way as your data comes in you slide your classification frame and get it to tell you if the upcoming trend is N, U, UU, D, DD.
What you can do is to build a data frame with columns representing the actual stock price and its n lagged values. And use it as a train set/test set (the actual value is the output and the previous values the explanatory variables). With this method you can do a 1-day (or whatever the granularity is) into the future forecast and then you can use your prediction to make another one and so on.
I'm generating an Arima model with an external regressor. Let's suppose I have n observations. The predict.Arima function from forecast package just make predictions for n + 1 observation on.
I need to make a prediction for the n value (last value of the series), changing the value of the external regressor, i.e., I need to predict the value of the n observation given an specific value for the external regressor.
library(forecast)
set.seed(123)
aux <- 1:24
covari <- aux + rnorm(24,0,2)
vari <- ts(aux * runif(24,0,3), start=c(2010,1), freq=12)
mod <- auto.arima(vari, xreg=covari)
predict(mod, newxreg=20)
This code generate a model, and shows how to generate a prediction. I can control the number of periods ahead setting the parameter n.ahead.
predict(mod, newxreg=runif(4,15,25), n.ahead=4)
This code will generate predictions for the next 4 values of the series.
What I need is an n.ahead=-1, i.e., a prediction for a value inside the series, but with a different external regressor.
If I'm using just one external regressor the task is not complicated, because since is an additive model, I can just add the difference of the observed xreg value by the value I want multiplied by the coefficient of the xreg. However it gets more complicated if the number of external regressors increase.
Is there any way to predict values that are not ahead the end of the series of an Arima model?
What do you mean by "predict"? With time series, that is an estimate of a future value conditional on the observed past values. So a "prediction" of an observed value is simply the observed value.
But perhaps you mean fitted value. That is, the one-step forecast of an observation conditional on all previous observations. In that case, you can get what you want using fitted(mod).
By the way, predict.Arima() is not part of the forecast package. The forecast package provides the forecast.Arima() function as a replacement. For example:
forecast(mod, xreg=20)
forecast(mod, xreg=runif(4,15,25), h=4)
Update:
As explained in the comment, the OP wants a "prediction" of a past observation assuming a different value of the regressor had been observed. There are several ways of interpreting that.
First, where the coefficients are updated to reflect the new information, and only past data are used. In that case, just re-fit the model and get the fitted values.
Second, where the coefficients are not updated, and only past data are used. There is no function for that, and I'm not sure why anyone would need to do it. But it can be done as follows:
fitted(mod) + mod$coef["covari"] * (newx - oldx)
Third, where the coefficients are not updated, and all data are used. Then we get
observed + mod$coef["covari"] * (newx - oldx)