Forecasting ARIMA with xreg - r

I'm trying to forecast time in time out ("TiTo") for someone ordering food at a restaurant using the code below. TiTo is the total time it takes someone from the time they walk through the door to the time they get their food. TimeTT is the time the customer spends talking to the waiter. I believe TimeTT is a predictor of TiTo and I would like to use it as a covariate in the forecast for TiTo. I've read some about ARIMA, and as I understand it you add the predictors to the model in the xreg parameter. I'm thinking of the xreg parameter as something like the independent variable for a regression model, like lm(TiTo ~ TimeTT). Is this the correct way to think of the xreg parameter? Also what does the error message below mean? Do I need to convert TimeTT into a time series to use it in the xreg parameter? I'm new to forecasting so all help is very appreciated.
Forecast Attempt:
OV<-zoo(SampleData$TiTo, order.by=SampleData$DateTime)
eData <- ts(OV, frequency = 24)
Train <-eData[1:15000]
Test <- eData[15001:20809]
Arima.fit <- auto.arima(Train)
Acast<-forecast(Arima.fit, h=5808, xreg = SampleData$TimeTT)
Error:
Error in if (ncol(xreg) != ncol(object$call$xreg)) stop("Number of regressors does not match fitted model") :
argument is of length zero
Data:
dput(Train[1:5])
c(1152L, 1680L, 1680L, 968L, 1680L)
dput(SampleData[1,]$TimeTT)
structure(1156L, .Label = c("0.000000", "0.125000", "0.142857",
"96.750000", "97.800000", "99.000000", "99.600000", "NULL"), class = "factor")

You need to define the xreg when you estimate the model itself, and these need to be forecasted ahead as well. So this will look something like:
Arima.fit <- auto.arima(Train, xreg = SampleData$TimeTT)
forecast(Arima.fit, h = 508, xreg = NewData$TimeTT)
Here is an example using Arima and xreg from Rob Hyndman (here is the link to the example, but to read more about using contemporaneous covariates in ARIMA models go here), this is analogous to auto.arima.
n <- 2000
m <- 200
y <- ts(rnorm(n) + (1:n)%%100/30, f=m)
library(forecast)
fit <- Arima(y, order=c(2,0,1), xreg=fourier(y, K=4))
plot(forecast(fit, h=2*m, xreg=fourierf(y, K=4, h=2*m)))
Hope this helps.

Related

Error in arima model for time series model of R [duplicate]

I am working on project to forecast sales of stores to learn forecasting.Till now I have successfully used simple auto.Arima() function for forecasting.But to make these forecast more accurate I can make use of covariates.I have defined covariates like holidays, promotion which affect on sales of store using xreg operator with the help of this post:
How to setup xreg argument in auto.arima() in R?
But my code fails at line:
ARIMAfit <- auto.arima(saledata, xreg=covariates)
and gives error saying:
Error in model.frame.default(formula = x ~ xreg, drop.unused.levels = TRUE) : variable lengths differ (found for 'xreg') In addition: Warning message: In !is.na(x) & !is.na(rowSums(xreg)) : longer object length is not a multiple of shorter object length
Below is link to my Dataset: https://drive.google.com/file/d/0B-KJYBgmb044blZGSWhHNEoxaHM/view?usp=sharing
This is my code:
data = read.csv("xdata.csv")[1:96,]
View(data)
saledata <- ts(data[1:96,4],start=1)
View(saledata)
saledata[saledata == 0] <- 1
View(saledata)
covariates = cbind(DayOfWeek=model.matrix(~as.factor(data$DayOfWeek)),
Customers=data$Customers,
Open=data$Open,
Promo=data$Promo,
SchoolHoliday=data$SchoolHoliday)
View(head(covariates))
# Remove intercept
covariates <- covariates[,-1]
View(covariates)
require(forecast)
ARIMAfit <- auto.arima(saledata, xreg=covariates)//HERE IS ERROR LINE
summary(ARIMAfit)
Also tell me how I can forecast for next 48 days.I know how to forecast using simple auto.Arima() and n.ahead but dont know how to do it when xreg is used.
A few points. One, you can just convert the entire matrix to a ts object and then isolate the variables later. Second, if you are using covariates in your arima model then you will need to provide them when you forecast out-of-sample. This may mean forecasting each of the covariates before generating forecasts for your variable of interest. In the example below I split the data into two samples for simplicity.
dta = read.csv("xdata.csv")[1:96,]
dta <- ts(dta, start = 1)
# to illustrate out of sample forecasting with covariates lets split the data
train <- window(dta, end = 90)
test <- window(dta, start = 91)
# fit model
covariates <- c("DayOfWeek", "Customers", "Open", "Promo", "SchoolHoliday")
fit <- auto.arima(train[,"Sales"], xreg = train[, covariates])
# forecast
fcast <- forecast(fit, xreg = test[, covariates])

ARIMA modelling, prediction and plotting with CO2 dataset in R

I am working with arima0() and co2. I would like to plot arima0() model over my data. I have tried fitted() and curve() with no success.
Here is my code:
###### Time Series
# format: time series
data(co2)
# format: matrix
dmn <- list(month.abb, unique(floor(time(co2))))
co2.m <- matrix(co2, 12, dimnames = dmn)
co2.dt <- pracma::detrend(co2.m, tt = 'linear')
co2.dt <- ts(as.numeric(co2.dt), start = c(1959,1), frequency=12)
# first diff
co2.dt.dif <- diff(co2.dt,lag = 12)
# Second diff
co2.dt.dif2 <- diff(co2.dt.dif,lag = 1)
With the data prepared, I ran the following arima0:
results <- arima0(co2.dt.dif2, order = c(2,0,0), method = "ML")
resultspredict <- predict(results, n.ahead = 36)
I would like to plot the model and the prediction. I am hoping there is a way to do this in base R. I would also like to be able to plot the predictions as well.
Session 1: To begin with...
To be honest, I am pretty much worried about your way in modelling co2 time series. Something wrong happened already when you de-trended co2. Why use tt = "linear"? You fit a linear trend within each period (i.e., year), and take the residuals for further inspection. This is often not recommended as it tends to introduce artificial effects to the residual series. I would incline to do tt = "constant", i.e., simply dropping off yearly average. This would at least preserve the with-season correlation as in the original data.
Perhaps you want to see some evidence here. Consider using ACF to help you diagnose.
data(co2)
## de-trend by dropping yearly average (no need to use `pracma::detrend`)
yearlymean <- ave(co2, gl(39, 12), FUN = mean)
co2dt <- co2 - yearlymean
## de-trend by dropping within season linear trend
co2.m <- matrix(co2, 12)
co2.dt <- pracma::detrend(co2.m, tt = "linear")
co2.dt <- ts(as.numeric(co2.dt), start = c(1959, 1), frequency = 12)
## compare time series and ACF
par(mfrow = c(2, 2))
ts.plot(co2dt); acf(co2dt)
ts.plot(co2.dt); acf(co2.dt)
Both de-trended series have strong seasonal effect, thus a further seasonal differencing is required.
## seasonal differencing
co2dt.dif <- diff(co2dt, lag = 12)
co2.dt.dif <- diff(co2.dt, lag = 12)
## compare time series and ACF
par(mfrow = c(2, 2))
ts.plot(co2dt.dif); acf(co2dt.dif)
ts.plot(co2.dt.dif); acf(co2.dt.dif)
The ACF for co2.dt.dif has more significant negative correlations. This is the sign of over-de-trending. So we prefer to co2dt. co2dt is already stationary, and no more differencing is needed (otherwise you just over-difference it and introduce more negative autocorrelation).
The big negative spike at lag 1 for ACF of co2dt.dif suggests that we want seasonal MA. Also, the positive spike with the season implies a mild AR process in general. So consider:
## we exclude mean because we found estimation of mean is 0 if we include it
fit <- arima0(co2dt.dif, order = c(1,0,0), seasonal = c(0,0,1), include.mean = FALSE)
Whether this model is doing good, we need to inspect ACF of residuals:
acf(fit$residuals)
Looks like this model is decent (actually pretty great).
For prediction purpose, it is actually a better idea to integrate seasonal differencing of co2dt with model fitting of co2dt.dif. Let's do
fit <- arima0(co2dt, order = c(1,0,0), seasonal = c(0,1,1), include.mean = FALSE)
This will give exactly as same estimate for AR and MA coefficients as above two-stage work, but now prediction is fairly easy to be dealt with a single predict call.
## 3 years' ahead prediction (no prediction error; only mean)
predco2dt <- predict(fit, n.ahead = 36, se.fit = FALSE)
Let's plot co2dt, fitted model and prediction together:
fittedco2dt <- co2dt - fit$residuals
ts.plot(co2dt, fittedco2dt, predco2dt, col = 1:3)
The result looks very promising!
Now the final stage, is to actually map this back to the original co2 series. For fitted values, we just add back the yearly mean we have dropped off:
fittedco2 <- fittedco2dt + yearlymean
But for prediction it is more difficult, because we don't know what yearly mean in the future would be. In this regard, our modelling though looks good, is not practically useful. I will talk about a better idea in another answer. To finish this session, we plot co2 with its fitted values only:
ts.plot(co2, fittedco2, col = 1:2)
Session 2: A better idea for time series modelling
In previous session, we have seen the difficulty in prediction if we separate de-trending and modelling of de-trended series. Now, we try to combine those two stages in one go.
The seasonal pattern of co2 is really strong, so we need a seasonal differencing anyway:
data(co2)
co2dt <- diff(co2, lag = 12)
par(mfrow = c(1,2)); ts.plot(co2dt); acf(co2dt)
After this seasonal differencing, co2dt does not look stationary. So we need further a non-seasonal differencing.
co2dt.dif <- diff(co2dt)
par(mfrow = c(1,2)); ts.plot(co2dt.dif); acf(co2dt.dif)
The negative spikes within season and between season suggest that a MA process is needed for both. I will not work with co2dt.dif; we can work with co2 directly:
fit <- arima0(co2, order = c(0,1,1), seasonal = c(0,1,1))
acf(fit$residuals)
Now the residuals are perfectly uncorrelated! So we have an ARIMA(0,1,1)(0,1,1)[12] model for co2 series.
As usual, fitted values are obtained by subtracting residuals from data:
co2fitted <- co2 - fit$residuals
Predictions are made by a single call to predict:
co2pred <- predict(fit, n.ahead = 36, se.fit = FALSE)
Let's plot them together:
ts.plot(co2, co2fitted, co2pred, col = 1:3)
Oh, this is just gorgeous!
Session 3: Model selection
The story should have finished by now; but I would like to make a comparison with auto.arima from forecast, that can automatically decide on the "best" model.
library(forecast)
autofit <- auto.arima(co2)
#Series: co2
#ARIMA(1,1,1)(1,1,2)[12]
#
#Coefficients:
# ar1 ma1 sar1 sma1 sma2
# 0.2569 -0.5847 -0.5489 -0.2620 -0.5123
#s.e. 0.1406 0.1204 0.5880 0.5701 0.4819
#
#sigma^2 estimated as 0.08576: log likelihood=-84.39
#AIC=180.78 AICc=180.97 BIC=205.5
auto.arima has chosen ARIMA(1,1,1)(1,1,2)[12], which is much more complicated as it involves both seasonal differencing and non-seasonal differencing.
Our model based on step-by-step investigation suggests an ARIMA(0,1,1)(0,1,1)[12]:
fit <- arima0(co2, order = c(0,1,1), seasonal = c(0,1,1))
#Call:
#arima0(x = co2, order = c(0, 1, 1), seasonal = c(0, 1, 1))
#
#Coefficients:
# ma1 sma1
# -0.3495 -0.8515
#s.e. 0.0497 0.0254
#
#sigma^2 estimated as 0.08262: log likelihood = -85.98, aic = 177.96
AIC values suggest our model better. So does BIC:
BIC = -2 * loglik + log(n) * p
We have n <- length(co2) data, and p <- length(fit$coef) + 1 parameters (the additional one for sigma2), thus our model has BIC
-2 * fit$loglik + log(n) * p
# [1] 196.5503
So, auto.arima has over-fitted data.
In fact, as soon as we see ARIMA(1,1,1)(1,1,2)[12], we have strong suspicion for its over-fitting. Because different effects "cancel off" each other. This happens to the additional seasonal MA and non-seasonal AR introduced by auto.arima, as AR introduces positive autocorrelation while MA introduces negative one.

Convergence of Bayesian time series model (BSTS)

I am starting to use BSTS package developed by Steve Scott at Google for time series forecasting. And for most cases I put a fairly the number of MCMC iterations to 500. The forecasts look reasonable but I also wanted to understand the convergence of the model. Once step ahead prediction error variance is often taken as a measure to judge the convergence. bsts model returns one.step.prediction.error as part of the model object. for each MCMC iteration I estimated the one step ahead prediction error variance by the sum of squares and plotted that. Here is a snippet of code with the AirPassangers data which is used for all the bsts example in the package
data(AirPassengers)
y <- log(AirPassengers)
ss <- AddGeneralizedLocalLinearTrend(list(), y)
ss <- AddSeasonal(ss, y, nseasons = 12)
model <- bsts(y, state.specification = ss, niter = 5000)
pred <- predict(model, horizon = 12, burn = 100)
a <- model$one.step.prediction.error
plot(rowMeans(a ^ 2), type = "l")
Clearly the one-step-prediction-error more or less stays the same. Am I missing something here or is there another way of showing the model convergence. Any help would be much appreciated.

Error in arima of R: too few non-missing observations

I am using arima() and auto.arima() of R to get the prediction of sales. The data is at week level for three years.
my code looks like:
x<-c(1571,1501,895,1335,2306,930,2850,1380,975,1080,990,765,615,585,838,555,1449,615,705,465,165,630,330,825,555,720,615,360,765,1080,825,525,885,507,884,1230,342,615,1161,
1585,723,390,690,993,1025,1515,903,990,1510,1638,1461.67,1082,1075,2315,1014,2140,1572,794,1363,1184,1248,1344,1056,816,720,896,608,624,560,512,304,640,640,704,1072,768,
816,640,272,1168,736,1003,864,658.67,768,841,1727,944,848,432,704,850.67,1205,592,1104,976,629,814,1626,933.33,1100.33,1730,2742,1552,1038,826,1888,1440,1372,824,1824,1392,1424,768,464,
960,320,384,512,478,1488,384,338.67,176,624,464,528,592,288,544,418.67,336,752,400,1232,477.67,416,810.67,1256,1040,823,240,1422,704,718,1193,1541,1008,640,752,
1008,864,1507,4123,2176,899,1717,935)
length_data<-length(x)
length_train<-round(length_data*0.80)
forecast_period<-length_data-length_train
train_data<-x[1:length_train]
train_data<-ts(train_data,frequency=52,start=c(1,1))
validation_data<-x[(length_train+1):length_data]
validation_data<-ts(validation_data,frequency=52,start=c(ceiling((length_train)/52),((length_train)%%52+1)))
arima_output<-auto.arima(train_data) # fit the ARIMA Model
arima_validate <- Arima(x=validation_data,model=arima_output)
Error:
Error in stats::arima(x = x, order = order, seasonal = seasonal, include.mean = include.mean, :
too few non-missing observations
What I am doing wrong?
What does it mean by "too few non-missing observations"? I have searched it now net, but did not get any better explanation.
Thanks for any kind of help!
arima_output is a seasonal ARIMA model:
> arima_output
Series: train_data
ARIMA(1,0,1)(0,1,0)[52]
Arima() then attempts to refit this particular model to validation_data. But to fit a seasonal model to a time series, you need at least one full year of observations, since seasonal ARIMA depends on seasonal differencing.
As an illustration, note that Arima() will happily and without errors refit a time series that is double as long as validation_data:
validation_data <- x[(length_train+1):length_data]
validation_data<-ts(rep(validation_data,2),frequency=52,
start=c(ceiling((length_train)/52),((length_train)%%52+1)))
arima_validate <- Arima(x=validation_data,model=arima_output)
One way of dealing with this would be to force auto.arima() to use a nonseasonal model, by specifying D=0:
validation_data <- x[(length_train+1):length_data]
validation_data<-ts(validation_data,frequency=52,
start=c(ceiling((length_train)/52),((length_train)%%52+1)))
arima_output<-auto.arima(train_data, D=0) # fit the ARIMA Model
arima_validate <- Arima(x=validation_data,model=arima_output)
So this did turn out to be more of a CrossValidated question...
Your chosen model is ARIMA(1,0,1)(0,1,0)[52]. That is, it has a seasonal difference of lag 52. Your validation data has 32 observations. So you cannot take the seasonal differences on the validation data without knowing what the training data is.
One way around this is to fit the model to the full time series, and then extract what you want (presumably residuals from the validation portion).
You can also improve the readability of your code:
x <- ts(x, frequency=52, start=c(1,1))
length_data <- length(x)
length_train <- round(length_data*0.80)
train_data <- ts(head(x, length_train),
frequency=frequency(x), start=start(x))
validation_data <- ts(tail(x, length_data-length_train),
frequency=frequency(x), end=end(x))
library(forecast)
arima_train <- auto.arima(train_data)
arima_full <- Arima(x, model=arima_train)
res <- window(residuals(arima_full), start=start(validation_data))

SARIMAX model in R

I would fit a SARIMAX model with temperature as exogenous variable in R. Can I do that with xreg function present in the package TSA?
I thought to fit the model as:
fit1 = arima(x, order=c(p,d,q), seasonal=list(order=c(P,D,Q), period=S), xreg=temp)
is that correct or I have to use other function of R?
if it itsn't correct: which steps should I use?
Thanks.
Check out the forecast package, it's great:
# some random data
x <- ts(rnorm(120,0,3) + 1:120 + 20*sin(2*pi*(1:120)/12), frequency=12)
temp = rnorm(length(x), 20, 30)
require(forecast)
# build the model (check ?auto.arima)
model = auto.arima(x, xreg = data.frame(temp = temp))
# some random predictors
temp.reg = data.frame(temp = rnorm(10, 20, 30))
# forecasting
forec = forecast(model, xreg = temp.reg)
# quick way to visualize things
plot(forec)
# model diagnosis
tsdiag(model)
# model info
summary(forec)
I won't suggest you to use auto.arima(). Depending on the model you want to fit it may return poor results, as for example when working with some complex SARIMA models the difference between the models done manually and with auto.arima() were noticeable, auto.arima() do not even returned white noise innovations (as it is expected), while manual fits, of course, did.

Resources