R auto.arima() vs arima() giving different result with the same model - r

I have a question about this time series analysis, with mean monthly air temperature (Deg. F) Nottingham Castle 1920-1939:
https://datamarket.com/data/set/22li/mean-monthly-air-temperature-deg-f-nottingham-castle-1920-1939#!ds=22li&display=line
When I ran
auto.arima(x.t,trace=True)
it gave me "ARIMA(5,0,1) with non-zero mean" and "AIC=1198.42" as the lowest AIC. However, when I manually input the arima model, I came across a model with even lower aic.
arima(x = x.t, order = c(3, 1, 3))
aic = 1136.95.
When I run the function auto.arima(x.t,trace = TRUE,d=1), It gave me ARIMA(2,1,2) with AIC of 1221.413. While ARIMA(3,1,3) with drift gives 1209.947 and ARIMA(3,1,3) gives 1207.859.
I am really confused. I thought auto.arima should automatically suggest you the number of differencing. Why is auto.arima AIC different than the arima AIC while they have the same model?

You're fitting two different ARIMA models. Obviously an ARIMA(5,0,1) model is not the same as an ARIMA(3,1,3) model. In the former, you model p=5 time lags with no differencing, whereas in the latter you consider p=3 time lags with d=1 degree of differencing. Additionally, your model's MA components are also different: q=1 vs. q=3.
Different models will obviously give you different quality metrics (i.e. different AICs).

Related

How can i get predictions with CI from lmerTest models?

We are currently working with plant phenology.
We built a linear mixed model for each species present in the study area.
We set Days From Snowmelt (The sum of days from snowmelt to the visit day along the summer) as the response variable while Mean phenology (mean phenology state for each plot ( there are 3 on each locality) is calculated by the mean phenological state from the 12 subplots into each plot is divided. from 1-6, the higher the number the more advanced the cycle). year and plot nested within the locality are set as random factors.
Once the model is built and revised, we want to predict the days from snowmelt for each species to achieve the phenological phases of interest, which happen to have a mean of 2, 3, 4, and 5. (corresponding to vegetative, flowering, fruit development and dispersion, respectively)
I have tried the function predict() but I get no heterogeneity between phases for each species, the progression seems to be linear (as shown in the image file).
Could this be just because is a linear model so will it only give linear responses? Are there any other ways to get predictions from these kinds of models and show their CI?
How can i get predictions with CI from lmerTest models?
I think you probably mean pediction intervals. You can use the predictInterval function in the merTools package. For example:
library(lmerTest); library(merTools)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), data = sleepstudy)
head(predictInterval(fm1, level = 0.95, seed = 123, n.sims = 100))
Could this be just because is a linear model so will it only give linear responses?
Yes ! If you fit a linear model, then the predictions will be linear. Of course, you can model nonlinearity with a linear model in several ways including transformation(s), nonlinear terms (the model is still linear in the parameters) and splines.

How to report overall results of an nlme mixed effects model

I want to report the results of an one factorial lme from the nlme package. I want to know the overall effect of A on y. To do so I would compare the model with a Null model:
m1 <- lme(y~A,random=~1|B/C,data=data,weights=varIdent(form = ~1|A),method="ML")
m0 <- lme(y~1,random=~1|B/C,data=data,weights=varIdent(form = ~1|A),method="ML")
I am using maximum likelihood because I am comparing models with different main effects.
stats::anova(m0,m1) gives me a significant p value, meaning that there is a significant effect of A on y. However, in contrast to lmer models made with lme4, no Chi2 values are given. First: Is this approach valid? And second: What is the best way to report the result?
Thanks for your answers
An anova with lme should give you the same information as with lmer. Both use what's called a deviance test or likelihood ratio test. The L.ratio part in the table returned by anova is simply the difference in the loglikelihood of the two models multiplied by -2. A deviance test tests this value against a Chi2 distribution with the difference in model parameters (in your case 1) degrees of freedom. So the value reported under L.ratio for lme models is the same as the Chi2 value reported for lmer models (assuming the models are the same of course, and lmer rounds the value to a decimal).
The approach is valid and you could report the value under L.ratio along with the degrees of freedom and p-value, but I would add more information in your report such as the fixed and random coefficients of both models and other parameters that you've added (such as the difference in variance for levels of A specified under weights). If you're only interested in the fixed effect of A than a Wald test should also be appropriate though REML estimates are recommended in cases with a small number of groups (Snijders & Bosker, 2012). The test statistic is the t-value and associated p-value in the model summary output summary(m1). Chapter 6 in Snijders & Bosker (2012) gives a great explanation on tests for fixed and random parameters. Along with reporting examples.

auto.arima() not differencing while it should?

I am using auto.arima from forecast package to create an ARIMAX model.
The dependent variable and the regressors are non-stationary. However, auto.arima() returns a model ARIMA(0,0,0).
Should I worry about this? Should I force auto.arima() to difference my time series, specifying d=1 ?
If I don't put any regressors in my model, it does detect non-stationarity, ending up with ARIMA(0,1,1).
I know the problem is similar to this topic, but my dataset is bigger (about 90 observations), thus the answer given is not satisfying.
auto.arima did nothing wrong. Note you have an additive model:
response = regression + time_series
When you include regressors / covariates, non-stationarity is captured by regressors / covariates, so time series component is simple. For your data, you end up with ARIMA(0,0,0), which is white noise.
When you don't have regressors / covariates, non-stationarity has to be modelled by time series, thus differencing is needed. For your data, you end up with ARIMA(0,1,1).
Of course, those two models are not the same, or even equivalent. If you really want some model selection, use the AIC values by both models. But remember, all models are wrong; some are useful. As long as a model can not be rejected at certain statistical significance, it is useful for prediction purpose.

Daily timeseries forecasting, with weekly and annual cycle

My aim is to forecast the daily number of registrations in two different channels.
Week seasonality is quite strong, especially the weekends and also observed annual effects. Moreover, I have a few special event days, which significantly differ from the others days.
First, I applied a TBATS model on these two channels.
x.msts <- msts(Channel1_reg,seasonal.periods=c(7,365.25))
# fit model
fit <- tbats(x.msts)
fit
plot(fit)
forecast_channel1 <- forecast(fit,h=30)
First channel:
TBATS(0, {2,3}, -, {<7,3>, <365.25,2>})
Call: tbats(y = x.msts)
Parameters
Lambda: 0
Alpha: 0.0001804516
Gamma-1 Values: -1.517954e-05 1.004701e-05
Gamma-2 Values: -3.059654e-06 -2.796211e-05
AR coefficients: 0.249944 0.544593
MA coefficients: 0.215696 -0.361379 -0.21082
Second channel:
BATS(0, {2,2}, 0.929, -)
Call: tbats(y = y.msts)
Parameters
Lambda: 0
Alpha: 0.1652762
Beta: -0.008057904
Damping Parameter: 0.928972
AR coefficients: -0.586163 -0.676921
MA coefficients: 0.924758 0.743675
If I forecast the second channel, I only get blank values instead of any forecasts.
Could you please help why is that so?
Do you have any suggestion how to build in the specific event days into this model?
Thank you all!
tbats and bats are occasionally unstable, and your second model is showing infinite forecasts. There are already some bug reports about similar issues.
In any case, as you want to use event information, you would be better building a harmonic regression model with ARMA errors.
For example, suppose your event information is recorded as a dummy variable event1. Then the model can be fitted as follows:
harmonics <- fourier(x.msts, K=c(2,2))
fit1 <- auto.arima(x.msts, lambda=0,
xreg=cbind(harmonics,event1), seasonal=FALSE)
f1 <- forecast(fit1,
xreg=cbind(fourierf(x.msts, K=c(2,2), h=200), rep(0,200)))
This assumes that the event will not occur in the next 200 days (hence the 200 0s). I have used harmonics of order 2 for both weeks and years. Adjust these to minimize the AICc of the model.
This model is actually very similar to the TBATS model you are fitting except that the lambda value has been specified rather than estimated, and the seasonality is fixed over time rather than being allowed to evolve. The advantage is that the harmonic regression model tends to be more stable, and allows covariates to be included.

Explaining the forecasts from an ARIMA model

I am trying to explain to myself the forecasting result from applying an ARIMA model to a time-series dataset. The data is from the M1-Competition, the series is MNB65. I am trying to fit the data to an ARIMA(1,0,0) model and get the forecasts. I am using R. Here are some output snippets:
> arima(x, order = c(1,0,0))
Series: x
ARIMA(1,0,0) with non-zero mean
Call: arima(x = x, order = c(1, 0, 0))
Coefficients:
ar1 intercept
0.9421 12260.298
s.e. 0.0474 202.717
> predict(arima(x, order = c(1,0,0)), n.ahead=12)
$pred
Time Series:
Start = 53
End = 64
Frequency = 1
[1] 11757.39 11786.50 11813.92 11839.75 11864.09 11887.02 11908.62 11928.97 11948.15 11966.21 11983.23 11999.27
I have a few questions:
(1) How do I explain that although the dataset shows a clear downward trend, the forecast from this model trends upward? This also happens for ARIMA(2,0,0), which is the best ARIMA fit for the data using auto.arima (forecast package) and for an ARIMA(1,0,1) model.
(2) The intercept value for the ARIMA(1,0,0) model is 12260.298. Shouldn't the intercept satisfy the equation: C = mean * (1 - sum(AR coeffs)), in which case, the value should be 715.52. I must be missing something basic here.
(3) This is clearly a series with non-stationary mean. Why is an AR(2) model still selected as the best model by auto.arima? Could there be an intuitive explanation?
Thanks.
No ARIMA(p,0,q) model will allow for a trend because the model is stationary. If you really want to include a trend, use ARIMA(p,1,q) with a drift term, or ARIMA(p,2,q). The fact that auto.arima() is suggesting 0 differences would usually indicate there is no clear trend.
The help file for arima() shows that the intercept is actually the mean. That is, the AR(1) model is (Y_t-c) = ϕ(Y_{t-1} - c) + e_t rather than Y_t = c + ϕY_{t-1} + e_t as you might expect.
auto.arima() uses a unit root test to determine the number of differences required. So check the results from the unit root test to see what's going on. You can always specify the required number of differences in auto.arima() if you think the unit root tests are not leading to a sensible model.
Here are the results from two tests for your data:
R> adf.test(x)
Augmented Dickey-Fuller Test
data: x
Dickey-Fuller = -1.031, Lag order = 3, p-value = 0.9249
alternative hypothesis: stationary
R> kpss.test(x)
KPSS Test for Level Stationarity
data: x
KPSS Level = 0.3491, Truncation lag parameter = 1, p-value = 0.09909
So the ADF says strongly non-stationary (the null hypothesis in that case) while the KPSS doesn't quite reject stationarity (the null hypothesis for that test). auto.arima() uses the latter by default. You could use auto.arima(x,test="adf") if you wanted the first test. In that case, it suggests the model ARIMA(0,2,1) which does have a trend.

Resources