Forecasting using ARIMAX in R - r

I used to forecast sales of computers at a weekly level in SAS, based on broadly two parameters - Pricing and Marketing spends (vehicle level - hence several variables). This was easy in SAS as I could use PROC ARIMA.
Could you help me to transition to R? I have imported the dataset, performed the auto.arima and analysed p - values for some variables. However I am unaware as to how to proceed with forecasting, for the next 26 weeks. Any help would be greatly appreciated!

R has a built-in ARIMAX procedure called arima. To get the X part, use the xreg= argument. If you don't have exogenous variables and don't use xreg=, note that the the "Intercept" result may not indicate what you think it indicates.
So if you're using a ARIMAX(1, 2, 3)(1, 0, 0) model with dependent variable sales (monthly data), and an exogenous variable nasdaq (and you have a prediction for nasdaq of nasdaq.pred), you'd do:
model <- arima (sales, order=c(1, 2, 3), seasonal=list (order=c(1, 0, 0), freq=12),
xreg=nasdaq)
pred <- predict (model, newxreg=nasdaq.predict)

Suppose Your ARIMA model is Testing then ot forecast for next 26 weeks is:
Forecastedvalue<-forecast.Arima(Testing, h=26)
Hope this helps

Related

R libraries forecast::auto.arima vs fable:ARIMA what's the differences?

The online documentation indicates that the algorithm under the hood is the same to estimate the (s)Arima models. During some tests, with a Kaggle dataset, I had different models: ARIMA function show me a sArima, auto.arima only Arima model.
auto.arima(tsbble_item1_store1$sales)
give
Best model: ARIMA(5,1,2)
and
tsbble_item1_store1 %>%
model(arima = ARIMA(sales))
give
# A mable: 1 x 2
# Key: store [1]
store arima
<dbl> <model>
1 1 <ARIMA(1,1,3)(0,0,2)[7]>
I have very different models. By the way, Arima's fable function shows me a better model, because it controls seasonality respect auto.arima function that doesn't, and the data show evident seasonality.
Does someone know the main differences in default parameters when the two functions try to estimate the model, because I didn't understand from docs?
Sorry if I had some mistakes
thank's in advance
Have nice day
MC
forecast::auto.arima() requires a ts object. That is a vector with some time series attributes including the seasonal frequency. When you just pass a numeric vector, as you have here, it assumes the seasonal frequency is 1 (as for annual data) and will not fit a seasonal ARIMA model.
On the other hand, the tsibble object contains a time index (in this case it looks like it is a date variable) and ARIMA() will use that index to determine what type of seasonality (if any) is present. With a date variable, it will select seasonal frequency of 7 to correspond to a time of week seasonality.
To get the same thing with forecast::auto.arima(), use
auto.arima(ts(tsbble_item1_store1$sales, frequency=7))

How do I decide between different forecasting model families to automate forecasting for 150 time series?

I have weekly time series data for multiple departments (retail domain) and based on some research, I am automating the process of finding model parameters for each time series. So far, I have implemented the following models for each time series in a for loop:
1) ARIMA (auto.arima in R)
2) stlf (cannot use R's ets function since I have weekly data)
3) TBATS
4) Regression on ARIMA errors (using fourier terms)
5) Baseline models: naive & mean
I want to understand how to choose models for each time series. I have multiple approaches to this:
1) Choose model with lowest RMSE on test data (risk: overfitting on test data)
2) Choose model with lowest RMSE best on cross-validation of time series (tsCV)
3) Choose one family of models for all the time series based on which family gives lowest average RMSE score across all the time series.
Are there any ways I can improve my approach? Any disadvantages to any of the above approaches? Any better approach?
Thanks a lot!
Forecast your data with all forecasting methods mentioned above, after that calculate the MAPE and check which model is giving best results then use that model for forecast your data.
Also try to check with different different data transformation like log, inverse, etc.. for your input data.

Arimax forecasting

I need to forecast sales on daily basis using arima model with independent variables as weekdays.
So i build up the model :
d= data,
Total = sales Monday,tuesday...Sunday are my independent Vars
i am using library(forecast)
'fit=arima(d$Total,xreg=cbind(Sunday,Monday,Tuesday,Wednesday,Thursday,Friday),order=c(1,1,1))'
Please help me to proceed further and to predict future values.
How to decide p,d,q and to plot forecasted Vs Actual Values ? please help

Forecast in R - auto.arima with external regressors

Further to this discussion regarding fitting arima model using external regressors.
From Auto.arima to forecast in R
I was able to forecast perfectly for next 5 months given that I had future values for the predictors explaining my response variable (churn_rate).
arima_model_churn_rate <- auto.arima(tsm_churn_rate, stepwise = FALSE,
approximation = FALSE,
xreg = xreg_in_out_p_month_1)
number_of_future_month <- 5
forecast_churn_rate <- forecast (arima_model_churn_rate,
xreg = xreg_fut_in_out_p_month_churn_rate,
h = number_of_future_month)
plot(forecast_churn_rate)
My question is as I need to predict in future I can not wait for the predictors to be measured to make prediction for future months ?
If I have to wait till end of month then I can do simple calculation to see what is churn rate ?
My goal is predict for next 3 months in that case what I should I do get future values for my predictors?
I am kind of confused with this whole scenario as discussed in the blog. For arima model with external regressor we need future values. Its perfectly worked for example case where I just trained my model on 2 years data and I used next 5 months measurements for predictors as future value.
But what If I want to predict for future 3/6/ or even year and If I have to wait for future values then I am already in that time point. Then prediction does not make any sense.
Can someone explain this whole concept to me please. Sorry if I could not explain this whole scenario really well. I tried my level best to get around though.
Thanks in advance !!
If you don't have values for your future predictors, then you need to either forecast them first, or use a different model.
You could try a model without those predictors, or you could include lagged values of the predictors where the lag is at least as long as the forecast horizon.

From Auto.arima to forecast in R

I don't quite understand the syntax of how forecast() applies external regressors in the library(forecast) in R.
My fit looks like this:
fit <- auto.arima(Y,xreg=factors)
where Y is a timeSeries object 100 x 1 and factors is a timeSeries object 100 x 5.
When I go to forecast, I apply...
forecast(fit, h=horizon)
And I get an error:
Error in forecast.Arima(fit, h = horizon) : No regressors provided
Does it want me to add back the xregressors from the fit? I thought these were included in the fit object as fit$xreg. Does that mean it's asking for future values of the xregressors, or that I should repeat the same values I used in the fit set? The documentation doesn't cover the meaning of xreg in the forecast step.
I believe all this means I should use
forecast(fit, h=horizon,xreg=factors)
or
forecast(fit, h=horizon,xreg=fit$xreg)
Which gives the same results. But I'm not sure whether the forecast step is interpreting the factors as future values, or appropriately as previous ones. So,
Is this doing a forecast out of purely past values, as I expect?
Why do I have to specify the xreg values twice? It doesn't run if I exclude them, so it doesn't behave like an option.
Correct me if I am wrong, but I think you may not completely understand how the ARIMA model with regressors works.
When you forecast with a simple ARIMA model (without regressors), it simply uses past values of your time series to predict future values. In such a model, you could simply specify your horizon, and it would give you a forecast until that horizon.
When you use regressors to build an ARIMA model, you need to include future values of the regressors to forecast. For example, if you used temperature as a regressor, and you were predicting disease incidence, then you would need future values of temperature to predict disease incidence.
In fact, the documentation does talk about xreg specifically. look up ?forecast.Arima and look at both the arguments h and xreg. You will see that If xreg is used, then h is ignored. Why? Because if your function uses xreg, then it needs them for forecasting.
So, in your code, h was simply ignored when you included xreg. Since you just used the values that you used to fit the model, it just gave you all the predictions for the same set of regressors as if they were in the future.
related
https://stats.stackexchange.com/questions/110589/arima-with-xreg-rebuilding-the-fitted-values-by-hand
I read that arima in R is borked
See Issue 3 and 4
https://www.stat.pitt.edu/stoffer/tsa4/Rissues.htm
the xreg was suggested to derive the proper intercept.
I'm using real statistics for excel to figure out what is the actual constant. I had a professor tell me you need to have a constant
These derive the same forecasts. So it appears you can use xreg to get some descriptive information, but you would have to use the statsexchange link to manually derive from them.
f = auto.arima(lacondos[,1])
f$coef
g = Arima(lacondos[,1],c(t(matrix(arimaorder(f)))),include.constant=FALSE,xreg=1:length(lacondos[,1]))
g$coef

Resources