I am working on GDP time series forcast. I have log transformed the time series which has significant stochastic trend. I have checked that the time series in first differences is stationary. Now (i believe) I have two options:
Fit an ARMA model on the differenced log transformed GDP time series
Fit an ARIMA model (p,1,q) on the log transformed GDP time series
QUESTION:
I have noticed that ARIMA does not have an intercept, while ARMA does. How is the intercept to be interpreted?
How should I decide which one to use?
I have noticed that ARIMA does not have an intercept, while ARMA does. How is the intercept to be interpreted?
The intercept interpretation depends on your model. It relates to your mean through your other parameter if the series is stationary. E.g., see the AR(1) example on wiki. An intercept in an order one differentiering ARIMA model implies a constant drift which is likely not what you want.
How should I decide which one to use?
A common choice is to use an information criteria like AIC or BIC. E.g., see this post.
Related
I have weekly time series data for multiple departments (retail domain) and based on some research, I am automating the process of finding model parameters for each time series. So far, I have implemented the following models for each time series in a for loop:
1) ARIMA (auto.arima in R)
2) stlf (cannot use R's ets function since I have weekly data)
3) TBATS
4) Regression on ARIMA errors (using fourier terms)
5) Baseline models: naive & mean
I want to understand how to choose models for each time series. I have multiple approaches to this:
1) Choose model with lowest RMSE on test data (risk: overfitting on test data)
2) Choose model with lowest RMSE best on cross-validation of time series (tsCV)
3) Choose one family of models for all the time series based on which family gives lowest average RMSE score across all the time series.
Are there any ways I can improve my approach? Any disadvantages to any of the above approaches? Any better approach?
Thanks a lot!
Forecast your data with all forecasting methods mentioned above, after that calculate the MAPE and check which model is giving best results then use that model for forecast your data.
Also try to check with different different data transformation like log, inverse, etc.. for your input data.
I am trying to estimate the effect of an intervention in a interrupted time series analysis. I used an ARIMA model with the package TSA and created a dummy variable for the pre and post-intervention time. Here is an example with the airmiles data included in the TSA package.
data(airmiles)
bin.int <- ts(c(rep(0,68),rep(1,45)))
air.m2=arimax(log(airmiles),order=c(0,1,1),seasonal=list(order=c(0,1,1),period=12),
xreg=data.frame(bin.int),
method='ML')
air.m2
coeftest(air.m2)
When I used the argument xreg, I understand I fit a coefficient for the effect of X. If X is a dummy variable (0 before and 1 after the intervention), this method estimates the effect as a level change. How may I estimate both a level change and/or a slope change with an ARIMA model ?
Do I have to use the transfer function with autoregressive and moving average parameters ?
Thank you.
François
I have a question about this time series analysis, with mean monthly air temperature (Deg. F) Nottingham Castle 1920-1939:
https://datamarket.com/data/set/22li/mean-monthly-air-temperature-deg-f-nottingham-castle-1920-1939#!ds=22li&display=line
When I ran
auto.arima(x.t,trace=True)
it gave me "ARIMA(5,0,1) with non-zero mean" and "AIC=1198.42" as the lowest AIC. However, when I manually input the arima model, I came across a model with even lower aic.
arima(x = x.t, order = c(3, 1, 3))
aic = 1136.95.
When I run the function auto.arima(x.t,trace = TRUE,d=1), It gave me ARIMA(2,1,2) with AIC of 1221.413. While ARIMA(3,1,3) with drift gives 1209.947 and ARIMA(3,1,3) gives 1207.859.
I am really confused. I thought auto.arima should automatically suggest you the number of differencing. Why is auto.arima AIC different than the arima AIC while they have the same model?
You're fitting two different ARIMA models. Obviously an ARIMA(5,0,1) model is not the same as an ARIMA(3,1,3) model. In the former, you model p=5 time lags with no differencing, whereas in the latter you consider p=3 time lags with d=1 degree of differencing. Additionally, your model's MA components are also different: q=1 vs. q=3.
Different models will obviously give you different quality metrics (i.e. different AICs).
I am using auto.arima from forecast package to create an ARIMAX model.
The dependent variable and the regressors are non-stationary. However, auto.arima() returns a model ARIMA(0,0,0).
Should I worry about this? Should I force auto.arima() to difference my time series, specifying d=1 ?
If I don't put any regressors in my model, it does detect non-stationarity, ending up with ARIMA(0,1,1).
I know the problem is similar to this topic, but my dataset is bigger (about 90 observations), thus the answer given is not satisfying.
auto.arima did nothing wrong. Note you have an additive model:
response = regression + time_series
When you include regressors / covariates, non-stationarity is captured by regressors / covariates, so time series component is simple. For your data, you end up with ARIMA(0,0,0), which is white noise.
When you don't have regressors / covariates, non-stationarity has to be modelled by time series, thus differencing is needed. For your data, you end up with ARIMA(0,1,1).
Of course, those two models are not the same, or even equivalent. If you really want some model selection, use the AIC values by both models. But remember, all models are wrong; some are useful. As long as a model can not be rejected at certain statistical significance, it is useful for prediction purpose.
My aim is to forecast the daily number of registrations in two different channels.
Week seasonality is quite strong, especially the weekends and also observed annual effects. Moreover, I have a few special event days, which significantly differ from the others days.
First, I applied a TBATS model on these two channels.
x.msts <- msts(Channel1_reg,seasonal.periods=c(7,365.25))
# fit model
fit <- tbats(x.msts)
fit
plot(fit)
forecast_channel1 <- forecast(fit,h=30)
First channel:
TBATS(0, {2,3}, -, {<7,3>, <365.25,2>})
Call: tbats(y = x.msts)
Parameters
Lambda: 0
Alpha: 0.0001804516
Gamma-1 Values: -1.517954e-05 1.004701e-05
Gamma-2 Values: -3.059654e-06 -2.796211e-05
AR coefficients: 0.249944 0.544593
MA coefficients: 0.215696 -0.361379 -0.21082
Second channel:
BATS(0, {2,2}, 0.929, -)
Call: tbats(y = y.msts)
Parameters
Lambda: 0
Alpha: 0.1652762
Beta: -0.008057904
Damping Parameter: 0.928972
AR coefficients: -0.586163 -0.676921
MA coefficients: 0.924758 0.743675
If I forecast the second channel, I only get blank values instead of any forecasts.
Could you please help why is that so?
Do you have any suggestion how to build in the specific event days into this model?
Thank you all!
tbats and bats are occasionally unstable, and your second model is showing infinite forecasts. There are already some bug reports about similar issues.
In any case, as you want to use event information, you would be better building a harmonic regression model with ARMA errors.
For example, suppose your event information is recorded as a dummy variable event1. Then the model can be fitted as follows:
harmonics <- fourier(x.msts, K=c(2,2))
fit1 <- auto.arima(x.msts, lambda=0,
xreg=cbind(harmonics,event1), seasonal=FALSE)
f1 <- forecast(fit1,
xreg=cbind(fourierf(x.msts, K=c(2,2), h=200), rep(0,200)))
This assumes that the event will not occur in the next 200 days (hence the 200 0s). I have used harmonics of order 2 for both weeks and years. Adjust these to minimize the AICc of the model.
This model is actually very similar to the TBATS model you are fitting except that the lambda value has been specified rather than estimated, and the seasonality is fixed over time rather than being allowed to evolve. The advantage is that the harmonic regression model tends to be more stable, and allows covariates to be included.