Error in arima of R: too few non-missing observations - r

I am using arima() and auto.arima() of R to get the prediction of sales. The data is at week level for three years.
my code looks like:
x<-c(1571,1501,895,1335,2306,930,2850,1380,975,1080,990,765,615,585,838,555,1449,615,705,465,165,630,330,825,555,720,615,360,765,1080,825,525,885,507,884,1230,342,615,1161,
1585,723,390,690,993,1025,1515,903,990,1510,1638,1461.67,1082,1075,2315,1014,2140,1572,794,1363,1184,1248,1344,1056,816,720,896,608,624,560,512,304,640,640,704,1072,768,
816,640,272,1168,736,1003,864,658.67,768,841,1727,944,848,432,704,850.67,1205,592,1104,976,629,814,1626,933.33,1100.33,1730,2742,1552,1038,826,1888,1440,1372,824,1824,1392,1424,768,464,
960,320,384,512,478,1488,384,338.67,176,624,464,528,592,288,544,418.67,336,752,400,1232,477.67,416,810.67,1256,1040,823,240,1422,704,718,1193,1541,1008,640,752,
1008,864,1507,4123,2176,899,1717,935)
length_data<-length(x)
length_train<-round(length_data*0.80)
forecast_period<-length_data-length_train
train_data<-x[1:length_train]
train_data<-ts(train_data,frequency=52,start=c(1,1))
validation_data<-x[(length_train+1):length_data]
validation_data<-ts(validation_data,frequency=52,start=c(ceiling((length_train)/52),((length_train)%%52+1)))
arima_output<-auto.arima(train_data) # fit the ARIMA Model
arima_validate <- Arima(x=validation_data,model=arima_output)
Error:
Error in stats::arima(x = x, order = order, seasonal = seasonal, include.mean = include.mean, :
too few non-missing observations
What I am doing wrong?
What does it mean by "too few non-missing observations"? I have searched it now net, but did not get any better explanation.
Thanks for any kind of help!

arima_output is a seasonal ARIMA model:
> arima_output
Series: train_data
ARIMA(1,0,1)(0,1,0)[52]
Arima() then attempts to refit this particular model to validation_data. But to fit a seasonal model to a time series, you need at least one full year of observations, since seasonal ARIMA depends on seasonal differencing.
As an illustration, note that Arima() will happily and without errors refit a time series that is double as long as validation_data:
validation_data <- x[(length_train+1):length_data]
validation_data<-ts(rep(validation_data,2),frequency=52,
start=c(ceiling((length_train)/52),((length_train)%%52+1)))
arima_validate <- Arima(x=validation_data,model=arima_output)
One way of dealing with this would be to force auto.arima() to use a nonseasonal model, by specifying D=0:
validation_data <- x[(length_train+1):length_data]
validation_data<-ts(validation_data,frequency=52,
start=c(ceiling((length_train)/52),((length_train)%%52+1)))
arima_output<-auto.arima(train_data, D=0) # fit the ARIMA Model
arima_validate <- Arima(x=validation_data,model=arima_output)
So this did turn out to be more of a CrossValidated question...

Your chosen model is ARIMA(1,0,1)(0,1,0)[52]. That is, it has a seasonal difference of lag 52. Your validation data has 32 observations. So you cannot take the seasonal differences on the validation data without knowing what the training data is.
One way around this is to fit the model to the full time series, and then extract what you want (presumably residuals from the validation portion).
You can also improve the readability of your code:
x <- ts(x, frequency=52, start=c(1,1))
length_data <- length(x)
length_train <- round(length_data*0.80)
train_data <- ts(head(x, length_train),
frequency=frequency(x), start=start(x))
validation_data <- ts(tail(x, length_data-length_train),
frequency=frequency(x), end=end(x))
library(forecast)
arima_train <- auto.arima(train_data)
arima_full <- Arima(x, model=arima_train)
res <- window(residuals(arima_full), start=start(validation_data))

Related

Error in arima model for time series model of R [duplicate]

I am working on project to forecast sales of stores to learn forecasting.Till now I have successfully used simple auto.Arima() function for forecasting.But to make these forecast more accurate I can make use of covariates.I have defined covariates like holidays, promotion which affect on sales of store using xreg operator with the help of this post:
How to setup xreg argument in auto.arima() in R?
But my code fails at line:
ARIMAfit <- auto.arima(saledata, xreg=covariates)
and gives error saying:
Error in model.frame.default(formula = x ~ xreg, drop.unused.levels = TRUE) : variable lengths differ (found for 'xreg') In addition: Warning message: In !is.na(x) & !is.na(rowSums(xreg)) : longer object length is not a multiple of shorter object length
Below is link to my Dataset: https://drive.google.com/file/d/0B-KJYBgmb044blZGSWhHNEoxaHM/view?usp=sharing
This is my code:
data = read.csv("xdata.csv")[1:96,]
View(data)
saledata <- ts(data[1:96,4],start=1)
View(saledata)
saledata[saledata == 0] <- 1
View(saledata)
covariates = cbind(DayOfWeek=model.matrix(~as.factor(data$DayOfWeek)),
Customers=data$Customers,
Open=data$Open,
Promo=data$Promo,
SchoolHoliday=data$SchoolHoliday)
View(head(covariates))
# Remove intercept
covariates <- covariates[,-1]
View(covariates)
require(forecast)
ARIMAfit <- auto.arima(saledata, xreg=covariates)//HERE IS ERROR LINE
summary(ARIMAfit)
Also tell me how I can forecast for next 48 days.I know how to forecast using simple auto.Arima() and n.ahead but dont know how to do it when xreg is used.
A few points. One, you can just convert the entire matrix to a ts object and then isolate the variables later. Second, if you are using covariates in your arima model then you will need to provide them when you forecast out-of-sample. This may mean forecasting each of the covariates before generating forecasts for your variable of interest. In the example below I split the data into two samples for simplicity.
dta = read.csv("xdata.csv")[1:96,]
dta <- ts(dta, start = 1)
# to illustrate out of sample forecasting with covariates lets split the data
train <- window(dta, end = 90)
test <- window(dta, start = 91)
# fit model
covariates <- c("DayOfWeek", "Customers", "Open", "Promo", "SchoolHoliday")
fit <- auto.arima(train[,"Sales"], xreg = train[, covariates])
# forecast
fcast <- forecast(fit, xreg = test[, covariates])

Convergence of Bayesian time series model (BSTS)

I am starting to use BSTS package developed by Steve Scott at Google for time series forecasting. And for most cases I put a fairly the number of MCMC iterations to 500. The forecasts look reasonable but I also wanted to understand the convergence of the model. Once step ahead prediction error variance is often taken as a measure to judge the convergence. bsts model returns one.step.prediction.error as part of the model object. for each MCMC iteration I estimated the one step ahead prediction error variance by the sum of squares and plotted that. Here is a snippet of code with the AirPassangers data which is used for all the bsts example in the package
data(AirPassengers)
y <- log(AirPassengers)
ss <- AddGeneralizedLocalLinearTrend(list(), y)
ss <- AddSeasonal(ss, y, nseasons = 12)
model <- bsts(y, state.specification = ss, niter = 5000)
pred <- predict(model, horizon = 12, burn = 100)
a <- model$one.step.prediction.error
plot(rowMeans(a ^ 2), type = "l")
Clearly the one-step-prediction-error more or less stays the same. Am I missing something here or is there another way of showing the model convergence. Any help would be much appreciated.

Generation of ARIMA.sim

I have generated an ARIMA Model for data I have and need to simulate the model generated into the future by 10 years (approximately 3652 days as the data is daily). This was the best fit model for the data generated by auto.arima, my question is how to simulate it into the future?
mydata.arima505 <- arima(d.y, order=c(5,0,5))
The forecast package has the simulate.Arima() function which does what you want. But first, use the Arima() function rather than the arima() function to fit your model:
library(forecast)
mydata.arima505 <- arima(d.y, order=c(5,0,5))
future_y <- simulate(mydata.arima505, 100)
That will simulate 100 future observations conditional on the past observations using the fitted model.
If your question is to simulate an specific arima process you can use the function arima.sim(). But I am not sure if that really is what you want. Usually you would use your model for predictions.
library(forecast)
# True Data Generating Process
y <- arima.sim(model=list(ar=0.4, ma = 0.5, order =c(1,0,1)), n=100)
#Fit an Model arima model
fit <- auto.arima(y)
#Use the estimaes for a simulation
arima.sim(list(ar = fit$coef["ar1"], ma = fit$coef["ma1"]), n = 50)
#Use the model to make predictions
prediced values <- predict(fit, n.ahead = 50)

Different no of tuples for the prediction model and test set data in SVM

I have a dataset with two columns as shown below, where Column 1, timestamp is a particular value for time for which Column.10 gives the total power usage at that instance of time. There are totally 81502 instances for this data.
I'm doing support vector regression on this data in R using the e1071 package to predict the future usage of power. The code is given below. I first divided the dataset into training and test data. Then using the training data modeled the data using the svm function and then predict the power usage for the testset.
library(e1071)
attach(data.csv)
index <- 1:nrow(data.csv)
testindex <- sample(index,trunc(length(index)/3))
testset <- na.omit(data.csv[testindex, ])
trainingset <- na.omit(data.csv[-testindex, ])
model <- svm(Column.10 ~ timestamp, data=trainingset)
prediction <- predict(model, testset[,-2])
tab <- table(pred = prediction, true = testset[,2])
However, when I try to make a confusion matrix from the prediction, I'm getting the error:
Error in table(pred = prediction, true = testset[, 2]) : all arguments must have the same length
So I tried to find the length of the two arguments and found that
the length(prediction) to be 81502
and the length(testset[,2]) to be 27167
Since I had done the prediction only for the testset, I don't know how prediction is done for 81502 values. How are the total no of values different for the prediction and the testset? How is the power value for the entire dataset getting predicted eventhough I gave it only for the testset?
Change
prediction <- predict(model, testset[,-2])
in
prediction <- predict(model, testset)
However, you should not use table when doing regression, use the MSE instead.

Forecasting ARIMA with xreg

I'm trying to forecast time in time out ("TiTo") for someone ordering food at a restaurant using the code below. TiTo is the total time it takes someone from the time they walk through the door to the time they get their food. TimeTT is the time the customer spends talking to the waiter. I believe TimeTT is a predictor of TiTo and I would like to use it as a covariate in the forecast for TiTo. I've read some about ARIMA, and as I understand it you add the predictors to the model in the xreg parameter. I'm thinking of the xreg parameter as something like the independent variable for a regression model, like lm(TiTo ~ TimeTT). Is this the correct way to think of the xreg parameter? Also what does the error message below mean? Do I need to convert TimeTT into a time series to use it in the xreg parameter? I'm new to forecasting so all help is very appreciated.
Forecast Attempt:
OV<-zoo(SampleData$TiTo, order.by=SampleData$DateTime)
eData <- ts(OV, frequency = 24)
Train <-eData[1:15000]
Test <- eData[15001:20809]
Arima.fit <- auto.arima(Train)
Acast<-forecast(Arima.fit, h=5808, xreg = SampleData$TimeTT)
Error:
Error in if (ncol(xreg) != ncol(object$call$xreg)) stop("Number of regressors does not match fitted model") :
argument is of length zero
Data:
dput(Train[1:5])
c(1152L, 1680L, 1680L, 968L, 1680L)
dput(SampleData[1,]$TimeTT)
structure(1156L, .Label = c("0.000000", "0.125000", "0.142857",
"96.750000", "97.800000", "99.000000", "99.600000", "NULL"), class = "factor")
You need to define the xreg when you estimate the model itself, and these need to be forecasted ahead as well. So this will look something like:
Arima.fit <- auto.arima(Train, xreg = SampleData$TimeTT)
forecast(Arima.fit, h = 508, xreg = NewData$TimeTT)
Here is an example using Arima and xreg from Rob Hyndman (here is the link to the example, but to read more about using contemporaneous covariates in ARIMA models go here), this is analogous to auto.arima.
n <- 2000
m <- 200
y <- ts(rnorm(n) + (1:n)%%100/30, f=m)
library(forecast)
fit <- Arima(y, order=c(2,0,1), xreg=fourier(y, K=4))
plot(forecast(fit, h=2*m, xreg=fourierf(y, K=4, h=2*m)))
Hope this helps.

Resources