How to fit Holt Winters predictions to the original data? - r

The original time series isn't stationary, it has trend and seasonality.
I prepared my dataset removing autocorrelation, trend and seasonality in order to have a stationary series and I made the predictions with Holt-Winters method.
How can I apply trend and seasonality to HW predictions?
Just to clarify my doubts: last demand data is around 3000 (with trend and seasonality), with HW I have a prediction around 0, how can I come back to real data to understand the "real" demand prediction?
I'm using R-Studio, do you have any specific function to suggest?
Thank you in advance

Related

Random Walk to a Time Series in R

I am trying to answer the following question"
The time series given below gives the price of a dozen eggs in cents, adjusted for inflation. Fit a random walk to the time series egg.ts. What is the SS1PE?
library(tsdl)
eggs.ts <- tsdl[[440]]
I am stuck on how I fit a random walk to the time series. I know how to fit a general random walk using rnorm but not to a specific time series. There is a second part to this question below.
Modify your fit above to fit a random walk with a Holt-Winters slope term as well to the eggs time series. Compute the SS1PE. Is this a better fit than the random walk without the slope term? Use the model to predict egg prices for the next 28 years (the time series ends in 1993, so this will give us a prediction for 2021). What does the model predict will be the price of eggs in 2021? What does this tell us about the accuracy of the model?
I know this means to have a HoltWinters with alpha = 1, beta = gamma = False. However, I do not understand what is meant by the slope term.

How do I decide between different forecasting model families to automate forecasting for 150 time series?

I have weekly time series data for multiple departments (retail domain) and based on some research, I am automating the process of finding model parameters for each time series. So far, I have implemented the following models for each time series in a for loop:
1) ARIMA (auto.arima in R)
2) stlf (cannot use R's ets function since I have weekly data)
3) TBATS
4) Regression on ARIMA errors (using fourier terms)
5) Baseline models: naive & mean
I want to understand how to choose models for each time series. I have multiple approaches to this:
1) Choose model with lowest RMSE on test data (risk: overfitting on test data)
2) Choose model with lowest RMSE best on cross-validation of time series (tsCV)
3) Choose one family of models for all the time series based on which family gives lowest average RMSE score across all the time series.
Are there any ways I can improve my approach? Any disadvantages to any of the above approaches? Any better approach?
Thanks a lot!
Forecast your data with all forecasting methods mentioned above, after that calculate the MAPE and check which model is giving best results then use that model for forecast your data.
Also try to check with different different data transformation like log, inverse, etc.. for your input data.

Difference between simulate() and forecast() in "forecast" package

I am working on building a time series model.
However, I am having trouble understanding what the difference is between the simulate function and the forecast function in the forecast package.
Suppose I built an arima model and want to use it to simulate future values as long as 10 years. The data is hourly and we have a year worth of data.
When using forecast to predict the next 1000-step-ahead estimation, I got the following plot.
Using forecast method
Then I used the simulate function to simulate the next 1000 simulated values and got the following plot.
Using simulate method
Data points after the red line are simulated data points.
In the latter example, I used the following codes to simulate the future values.
simulate(arima1, nsim=1000, future=TRUE, bootstrap=TRUE))
where arima1 is my trained arima model, bootstrap residuals are used because the model residuals are not very normal.
Per definition in the forecast package, future=TRUE means that we are simulating future values based on the historical data.
Can anyone tell me what the difference is between these two method? Why does simulate() give me a much more realistic results but forecasted values from forecast() just converge to a constant after several iterations (no much fluctuation to the results from simulate())?
A simulation is a possible future sample path of the series.
A point forecast is the mean of all possible future sample paths. So the point forecasts are usually much less variable than the data.
The forecast function produces point forecasts (the mean) and interval forecasts containing the estimated variation in the future sample paths.
As a side point, an ARIMA model is not appropriate for this time series because of the skewness. You might need to use a transformation first.

Auto-ARIMA function in R giving odd results

I have a day level dataset for 3 years,
I ran auto.arima() in R on it for simple time series forecasting and it gave me a (2,1,2) model.
When I used this model to predict the variable for the next 1 year the plot became constant after a few days, which can't be correct
As I have a daily data for 3 years, and a frequency of 364 days, is ARIMA incapable of handling daily data with large frequencies?
Any help will be appreciated
This sounds like you are trying to forecast too far into the future.
The forecast for tomorrow is going to be accurate, but the forecast for the next day and the day after that are not going to be influenced much by the past data and they will therefore settle around some constant when trying to forecast too far into the future. "Too far into the future" probably means two or more time points.
Lets say you have data up until time point T+100, which you used to estimate your ARIMA(2,1,2) model. You can "forecast" the value for time T+1 by pretending you only have data until point T and use your ARIMA(2,1,2) model to forecast T+1. Then move ahead by one period in your data and pretend you only have data until time T+1 and "forecast" T+2. This way you can assess the forecasting accuracy of your ARIMA(2,1,2) model, for example by calculating the Mean Squared Error (MSE) of the "forecasts".

R: Generate a Seasonal ARIMA time-series model using parameters of existing data

I have a count time series data which I'm able to use to determine the parameters of the underlying stochastic process. For example say I have a SARIMA (p,d,q)(P,D,Q)[S] seasonal ARIMA model.
How do I use this to generate a new count time series data set?
Being even more specific: a SARIMA(1,0,1)(1,0,0)[12] - how can I generate a time series for a 10 year period for each month? (i.e., 120 points to estimate the number of 'counts'.)
Use simulate.Arima() from the forecast package. It handles seasonal ARIMA models whereas arima.sim() does not.
However, ARIMA models are not suitable for count time series as they assume the process is defined on the whole real line.

Resources