Forecast in R - auto.arima with external regressors - r

Further to this discussion regarding fitting arima model using external regressors.
From Auto.arima to forecast in R
I was able to forecast perfectly for next 5 months given that I had future values for the predictors explaining my response variable (churn_rate).
arima_model_churn_rate <- auto.arima(tsm_churn_rate, stepwise = FALSE,
approximation = FALSE,
xreg = xreg_in_out_p_month_1)
number_of_future_month <- 5
forecast_churn_rate <- forecast (arima_model_churn_rate,
xreg = xreg_fut_in_out_p_month_churn_rate,
h = number_of_future_month)
plot(forecast_churn_rate)
My question is as I need to predict in future I can not wait for the predictors to be measured to make prediction for future months ?
If I have to wait till end of month then I can do simple calculation to see what is churn rate ?
My goal is predict for next 3 months in that case what I should I do get future values for my predictors?
I am kind of confused with this whole scenario as discussed in the blog. For arima model with external regressor we need future values. Its perfectly worked for example case where I just trained my model on 2 years data and I used next 5 months measurements for predictors as future value.
But what If I want to predict for future 3/6/ or even year and If I have to wait for future values then I am already in that time point. Then prediction does not make any sense.
Can someone explain this whole concept to me please. Sorry if I could not explain this whole scenario really well. I tried my level best to get around though.
Thanks in advance !!

If you don't have values for your future predictors, then you need to either forecast them first, or use a different model.
You could try a model without those predictors, or you could include lagged values of the predictors where the lag is at least as long as the forecast horizon.

Related

tsCV() function versus train/test split for time-series in R

So, frequency of data is monthly and it is stationary I have an ARIMA model using auto.arima. Couple tests are applied to the data before creating model like ACF,ADF etc.
y is my monthly time-series object using ts() function:
myarima=auto.arima(y, stepwise = F,approximation = F,trace=T)
Then I use forecast function:
forecast = forecast(myarima,h=10)
autoplot(forecast)
Since for this case, I did not create any train and test sets because my data has fluctutation at the end so if I create a train/test split since my test set should equal to the forecast horizon (last 10 months) then the model will not be able to understand fluctutations at the end since it will be the test. Would be great to be enlighten regarding the K-fold cross validation to avoid these kind of scenarios.
Without train and test split, after creating the model and visualizing the forecast, I went for tsCV():
myforecast_arima<-function(x,h){
forecast(auto.arima(x),stepwise=F,approximation=F,h=h)
}
error_myarima<-tsCV(y,myforecast_arima,h=10)
mean(arimaerror^2,na.rm=TRUE) #To get MSE
Then I get a kind of low MSE value which is around 0.30
So, my question is, is it trustworthy method to evaluate ARIMA models and then deploy using this pathway? or Should I use train/test split method? What would you guys prefer in general? Should I use any other method? and how can I determine the window parameter of tsCV() function? If my pathway is correct then how can I improve it? What are the biggest differences between K-Fold CV and tsCV() function?
Thank you!

Forecast using PLS and auto.arima

I have data from multiple countries in different years and around 65 variables.
I've thought of using PLS to find the predictors that best explain a variable (Purchasing.power.parity) and then use auto.arima to forecast the next year's results.
I'm learning about PLS and PCA so I've tried to understand all the math behind it and this is what I came up with so far:
library(pls)
library(forecast)
modelo_pls = plsr(Purchasing.power.parity~. , ncomp=4, data=imputed, validation="CV", scale=T)
summary(modelo_pls)
plot(RMSEP(modelo_pls))
tsdata<-ts(imputed$Purchasing.power.parity)
modelo<-auto.arima(tsdata,xreg =modelo_pls$scores[,1:4])
summary(modelo)
pronostico<- forecast(modelo,xreg =modelo_pls$Yscores[,1:4])
plot(pronostico)
The issue is that auto.arima returns a (0,0,0) model even though the 4 PLS components explain nearly a variance of 98%.

Can one improve coefficients in recursion using some machine learning algorithm?

I've got this prediction problem for daily data across several years. My data has both yearly and weekly seasonality.
I tried using the following recurrence:(which I just came up with, from nowhere if you like) xn = 1/4(xn-738 + xn-364 + xn-7 + 1/6(xn-1+xn-2+xn-3+xn-4+xn-5+xn-6)
Basically, I am taking into consideration some of the previous days in the week before the day I am trying to predict and also the corresponding day a year and two years earlier. I am doing an average over them.
My question is: can one try to improve the prediction by replacing the coefficients 1/4,1/6 etc with coefficients that would make the mean squared residual smaller?
Personally I see your problem as a regression.
If you have enough data I would run for timeseries prediction.
You said that the data has yearly and weekly seasonality. In order to cope with that you can have two models one with weekly window and one dealing with the yearly pattern and then somehow combine them (linear combination or even another model).
However if you don't have you could try passing the above xi as features to a regression model such as linear regression,svm,feed forward neural network and on theory it will find those coefficients that produce small enough loss (error).

Difference between simulate() and forecast() in "forecast" package

I am working on building a time series model.
However, I am having trouble understanding what the difference is between the simulate function and the forecast function in the forecast package.
Suppose I built an arima model and want to use it to simulate future values as long as 10 years. The data is hourly and we have a year worth of data.
When using forecast to predict the next 1000-step-ahead estimation, I got the following plot.
Using forecast method
Then I used the simulate function to simulate the next 1000 simulated values and got the following plot.
Using simulate method
Data points after the red line are simulated data points.
In the latter example, I used the following codes to simulate the future values.
simulate(arima1, nsim=1000, future=TRUE, bootstrap=TRUE))
where arima1 is my trained arima model, bootstrap residuals are used because the model residuals are not very normal.
Per definition in the forecast package, future=TRUE means that we are simulating future values based on the historical data.
Can anyone tell me what the difference is between these two method? Why does simulate() give me a much more realistic results but forecasted values from forecast() just converge to a constant after several iterations (no much fluctuation to the results from simulate())?
A simulation is a possible future sample path of the series.
A point forecast is the mean of all possible future sample paths. So the point forecasts are usually much less variable than the data.
The forecast function produces point forecasts (the mean) and interval forecasts containing the estimated variation in the future sample paths.
As a side point, an ARIMA model is not appropriate for this time series because of the skewness. You might need to use a transformation first.

From Auto.arima to forecast in R

I don't quite understand the syntax of how forecast() applies external regressors in the library(forecast) in R.
My fit looks like this:
fit <- auto.arima(Y,xreg=factors)
where Y is a timeSeries object 100 x 1 and factors is a timeSeries object 100 x 5.
When I go to forecast, I apply...
forecast(fit, h=horizon)
And I get an error:
Error in forecast.Arima(fit, h = horizon) : No regressors provided
Does it want me to add back the xregressors from the fit? I thought these were included in the fit object as fit$xreg. Does that mean it's asking for future values of the xregressors, or that I should repeat the same values I used in the fit set? The documentation doesn't cover the meaning of xreg in the forecast step.
I believe all this means I should use
forecast(fit, h=horizon,xreg=factors)
or
forecast(fit, h=horizon,xreg=fit$xreg)
Which gives the same results. But I'm not sure whether the forecast step is interpreting the factors as future values, or appropriately as previous ones. So,
Is this doing a forecast out of purely past values, as I expect?
Why do I have to specify the xreg values twice? It doesn't run if I exclude them, so it doesn't behave like an option.
Correct me if I am wrong, but I think you may not completely understand how the ARIMA model with regressors works.
When you forecast with a simple ARIMA model (without regressors), it simply uses past values of your time series to predict future values. In such a model, you could simply specify your horizon, and it would give you a forecast until that horizon.
When you use regressors to build an ARIMA model, you need to include future values of the regressors to forecast. For example, if you used temperature as a regressor, and you were predicting disease incidence, then you would need future values of temperature to predict disease incidence.
In fact, the documentation does talk about xreg specifically. look up ?forecast.Arima and look at both the arguments h and xreg. You will see that If xreg is used, then h is ignored. Why? Because if your function uses xreg, then it needs them for forecasting.
So, in your code, h was simply ignored when you included xreg. Since you just used the values that you used to fit the model, it just gave you all the predictions for the same set of regressors as if they were in the future.
related
https://stats.stackexchange.com/questions/110589/arima-with-xreg-rebuilding-the-fitted-values-by-hand
I read that arima in R is borked
See Issue 3 and 4
https://www.stat.pitt.edu/stoffer/tsa4/Rissues.htm
the xreg was suggested to derive the proper intercept.
I'm using real statistics for excel to figure out what is the actual constant. I had a professor tell me you need to have a constant
These derive the same forecasts. So it appears you can use xreg to get some descriptive information, but you would have to use the statsexchange link to manually derive from them.
f = auto.arima(lacondos[,1])
f$coef
g = Arima(lacondos[,1],c(t(matrix(arimaorder(f)))),include.constant=FALSE,xreg=1:length(lacondos[,1]))
g$coef

Resources