"non-stationary seasonal AR part from CSS" error in R - r

I am trying to fit ARIMA model of a seasonally decomposed series. But when I try to execure following:
fit = arima(diff(series), order=c(1,0,0),
seasonal = list(order = c(1, 0, 0), period = NA))
It gives me following error:
Error in arima(diff(series), order = c(1, 0, 0), seasonal = list(order
= c(1, :
non-stationary seasonal AR part from CSS
what is wrong and what does the error mean?

When using CSS (conditional sum of squares), it is possible for the autoregressive coefficients to be non-stationary (i.e., they fall outside the region for stationary processes). In the case of the ARIMA(1,0,0)(1,0,0)s model that you are fitting, both coefficients should be between -1 and 1 for the process to be stationary.
You can force R to use MLE (maximum likelihood estimation) instead by using the argument method="ML". This is slower but gives better estimates and always returns a stationary model.
If you are differencing the series (as you are here), it is usually better to do this via the model rather than explicitly. So your model would be better estimated using
set.seed(1)
series <- ts(rnorm(100),f=6)
fit <- arima(series, order=c(1,1,0), seasonal=list(order=c(1,0,0),period=NA),
method="ML")

Related

Recreating ARMA Model from EViews in R

I am trying to reformulate a working ARMA(1, 1) model from EViews in R. I have a quarterly time series of around 45 years and try to perform a rolling ARMA forecast using 12 years of data for estimating the model in each quarter after the initial 12 years. The data consists of logged annual changes in some index-value. The data is not always stationary, but I know that the EViews model works and I have specific results to which I try to be as close as possible to them with my R model. Also, the model must be of the AR(1) MA(1) form.
The EViews code simply rolls through the dataset, estimating following model at each point in time and forecasting with the estimates:
ls(m=1000) data c AR(1) MA(1)
Trying to perform the same thing, my code looks like this:
for (i in 48:NROW(data)) {
fit <- arima(data[(i - 48 + 1):i],
order = c(1, 0, 1),
method = "ML",
optim.control = list(maxit = 1000),
optim.method = "BFGS")
result[i] <- predict(fit, n.ahead = 1)$pred
}
Even though I am using the same data as in EViews, I can't estimate the model in evrey quarter but for some periods either get an error message "Error in solve.default(res$hessian * n.used, A) : Lapack routine dgesv: system is exactly singular: U[1,1] = 0" or a warning of the form "possible convergence problem: optim gave code = 1". In case of the error, the code obviously stops working. Whenever I only get the warnings, my predictions differ greatly from those in EViews.
Can anyone help me estimating such a ARMA(1, 1) model the same way as it is done in EViews? Thank you in advance!

rstanarm for adaptive trials

I started exploring the rstanarm package and was curious as how this package could potentially be used in an adaptive trial scenario. The example scenario given within vignette provides a posterior of -0.622 with a credible interval from -0.69 to -0.56.
What would my script look like if I wanted to use this posterior as a prior for my next model when I have additional data from the adaptive trial?
# Code from vignette
t_prior <- student_t(df = 7, location = 0, scale = 2.5)
fit1 <- stan_glm(switch ~ dist100, data = wells,
family = binomial(link = "logit"),
prior = t_prior, prior_intercept = t_prior,
chains = 10, cores = 2, seed = 3245, iter = 100)
Your question is not so easily answered within the rstanarm framework because it only offers limited choices for the priors.
It is entirely valid to use your original prior with the total data from phase I and phase II combined to obtain a posterior distribution (essentially ignoring the intermediate posterior distribution you had after phase I). Alternatively, you could do as you suggest in phase I, then call
draws <- as.matrix(fit1)
mu <- colMeans(draws)
Sigma <- cov(mu)
and use these (estimated) mu and Sigma values as the hyperparameters in a multivariate normal prior over the coefficients in phase II. Unfortunately, such a prior is not supported by rstanarm, so you would need to write your own model with a Bernoulli likelihood, a logit link, and a multivariate normal prior in the Stan language or I think you could accomplish all that using the brm function in the brms package, which generates Stan code from R syntax and draws from the corresponding posterior distribution.
Both approaches conceptually should give you the same posterior distribution after phase II. However, with a finite number of posterior draws they will differ a little bit numerically and the multivariate normal prior might not be a complete description of the posterior distribution you obtained after phase I.

R ARIMA model giving odd results

I'm trying to use an ARIMA model in R to forecast data. A slice of my time series looks like this:
This is just a slice of time for you get a sense of it. I have daily data from 2010 to 2015.
I want to forecast this into the future. I'm using the forecast library, and my code looks like this:
dt = msts(data$val, seasonal.periods=c(7, 30))
fit = auto.arima(dt)
plot(forecast(fit, 300))
This results in:
This model isn't good or interesting. My seasonal.periods were defined by me because I expect to see weekly and monthly seasonality, but the result looks the same with no seasonal periods defined.
Am I missing something? Very quickly the forecast predictions change very, very little from point to point.
Edit:
To further show what I'm talking about, here's a concrete example. Let's say I have the following fake dataset:
x = 1:500
y = 0.5*c(NA, head(x, -1)) - 0.4*c(NA, NA, head(x, -2)) + rnorm(500, 0, 5)
This is an AR(2) model with coefficients 0.5 and 0.4. Plotting this time series yields:
So I create an ARIMA model of this and plot the forecast results:
plot(forecast(auto.arima(y), 300))
And the results are:
Why can't the ARIMA function learn this obvious model? I don't get any better results if I use the arima function and force it to try an AR(2) model.
auto.arima does not handle multiple seasonal periods. Use tbats for that.
dt = msts(data$val, seasonal.periods=c(7, 30))
fit = tbats(dt)
plot(forecast(fit, 300))
auto.arima will just use the largest seasonal period and try to do the best it can with that.

un-log a times series while using the package forecast

Hello I use the package forecast in order to do times-series prevision. I would like to know how to un-log a series on the final forecast plot. With the forecast package I don't know how to un-log my series. Here is an example:
library(forecast)
data <- AirPassengers
data <- log(data) #with this AirPassengers data not nessesary to LOG but with my private data it is...because of some high picks...
ARIMA <- arima(data, order = c(1, 0, 1), list(order = c(12,0, 12), period = 1)) #Just a fake ARIMA in this case...
plot(forecast(ARIMA, h=24)) #but my question is how to get a forecast plot according to the none log AirPassenger data
So the image is logged. I want to have the same ARIMA modell but witht the none loged data.
It is not necessary to use the hack proposed by #ndoogan. forecast.Arima has built-in facilities for undoing transformations. The following code will do what is required:
fc <- forecast(ARIMA, h=24, lambda=0)
Better still, build the transformation into the model itself:
ARIMA <- Arima(data, order=c(1,0,1), list(order=c(1,0,1),period=12)), lambda=0)
fc <- forecast(ARIMA, h=24)
Note that you need to use the Arima function from the forecast package to do this, not the arima function from the stats package.
#Hemmo is correct that this back-transformation will not give the mean of the forecast distribution, and so is not the optimal MSE forecast. However, it will give the median of the forecast distribution, and so will give the optimal MAE forecast.
Finally, the fake model used by #Swiss12000 makes little sense as the seasonal part has frequency 1, and so is confounded with the non-seasonal part. I think you probably meant the model I've used in the code above.
The problem with #ndoogan's answer is that logarithm is not a linear transformation. Which means that E[exp(y)] != exp(E[y]). Jensen's inequality gives actually that E[exp(y)] >= exp(E[y]). Here's a simple demonstration:
set.seed(1)
x<-rnorm(1000)
mean(exp(x))
[1] 1.685356
exp(mean(x))
[1] 0.9884194
Here's a case concerning the prediction:
# Simulate AR(1) process
set.seed(1)
y<-10+arima.sim(model=list(ar=0.9),n=100)
# Fit on logarithmic scale
fit<-arima(log(y),c(1,0,0))
#Simulate one step ahead
set.seed(123)
y_101_log <- fit$coef[2]*(1-fit$coef[1]) +
fit$coef[1]*log(y[100]) + rnorm(n=1000,sd=sqrt(fit$sigma2))
y_101<-exp(y_101_log) #transform to natural scale
exp(mean(y_101_log)) # This is exp(E(log(y_101)))
[1] 5.86717 # Same as exp(predict(fit,n.ahead=1)$pred)
# differs bit because simulation
mean(y_101) # This is E(exp(log(y_101)))=E(y_101)
[1] 5.904633
# 95% Prediction intervals:
#Naive way:
pred<-predict(fit,n.ahead=1)
c(exp(pred$pred-1.96*pred$se),exp(pred$pred+1.96*pred$se))
pred$pred pred$pred
4.762880 7.268523
# Correct ones:
quantile(y_101,probs=c(0.025,0.975))
2.5% 97.5%
4.772363 7.329826
This also provides a solution to your problem in general sense:
Fit your model
Simulate multiple samples from that model (for example one step ahead predictions as above)
For each simulated sample, make the inverse transformation to get the values in original scale
From these simulated samples you can compute the expected value as a ordinary mean, or if you need confidence intervals, compute empirical quantiles.
This is a bit of a hack, but it seems to do what you want. Based on your fitted model ARIMA:
fc<-forecast(ARIMA,h=24)
fc$mean<-exp(fc$mean)
fc$upper<-exp(fc$upper)
fc$lower<-exp(fc$lower)
fc$x<-exp(fc$x)
Now plot it
plot(fc)

Explaining the forecasts from an ARIMA model

I am trying to explain to myself the forecasting result from applying an ARIMA model to a time-series dataset. The data is from the M1-Competition, the series is MNB65. I am trying to fit the data to an ARIMA(1,0,0) model and get the forecasts. I am using R. Here are some output snippets:
> arima(x, order = c(1,0,0))
Series: x
ARIMA(1,0,0) with non-zero mean
Call: arima(x = x, order = c(1, 0, 0))
Coefficients:
ar1 intercept
0.9421 12260.298
s.e. 0.0474 202.717
> predict(arima(x, order = c(1,0,0)), n.ahead=12)
$pred
Time Series:
Start = 53
End = 64
Frequency = 1
[1] 11757.39 11786.50 11813.92 11839.75 11864.09 11887.02 11908.62 11928.97 11948.15 11966.21 11983.23 11999.27
I have a few questions:
(1) How do I explain that although the dataset shows a clear downward trend, the forecast from this model trends upward? This also happens for ARIMA(2,0,0), which is the best ARIMA fit for the data using auto.arima (forecast package) and for an ARIMA(1,0,1) model.
(2) The intercept value for the ARIMA(1,0,0) model is 12260.298. Shouldn't the intercept satisfy the equation: C = mean * (1 - sum(AR coeffs)), in which case, the value should be 715.52. I must be missing something basic here.
(3) This is clearly a series with non-stationary mean. Why is an AR(2) model still selected as the best model by auto.arima? Could there be an intuitive explanation?
Thanks.
No ARIMA(p,0,q) model will allow for a trend because the model is stationary. If you really want to include a trend, use ARIMA(p,1,q) with a drift term, or ARIMA(p,2,q). The fact that auto.arima() is suggesting 0 differences would usually indicate there is no clear trend.
The help file for arima() shows that the intercept is actually the mean. That is, the AR(1) model is (Y_t-c) = ϕ(Y_{t-1} - c) + e_t rather than Y_t = c + ϕY_{t-1} + e_t as you might expect.
auto.arima() uses a unit root test to determine the number of differences required. So check the results from the unit root test to see what's going on. You can always specify the required number of differences in auto.arima() if you think the unit root tests are not leading to a sensible model.
Here are the results from two tests for your data:
R> adf.test(x)
Augmented Dickey-Fuller Test
data: x
Dickey-Fuller = -1.031, Lag order = 3, p-value = 0.9249
alternative hypothesis: stationary
R> kpss.test(x)
KPSS Test for Level Stationarity
data: x
KPSS Level = 0.3491, Truncation lag parameter = 1, p-value = 0.09909
So the ADF says strongly non-stationary (the null hypothesis in that case) while the KPSS doesn't quite reject stationarity (the null hypothesis for that test). auto.arima() uses the latter by default. You could use auto.arima(x,test="adf") if you wanted the first test. In that case, it suggests the model ARIMA(0,2,1) which does have a trend.

Resources