Create Forecast and Check Accuracy - r

I have data of the form SaleDateTime = '2015-01-02 23:00:00.000' SaleCount=4.
I'm trying to create an hourly forecast for the next 12 hours, using the code below.
I'm new to forecasting and could definitely appreciate some advice.
I'm trying to partition the data, train a model, plot the forecast with x axis of the form '2015-01-02 23:00:00.000', and test the accuracy of the model on a test time series.
I'm getting the error message below, when I try to run the accuracy as shown. Does anyone know why I'm getting the error message below?
When I run the plot as shown below it has an x axis from 0 to 400, does anyone know how to show that as something like '2015-01-02 23:00:00.000'? I would also like to narrow the plot to the last say 3 months.
My understanding is that if you don't specify a model for forecast, then it tries to fit the best model it can to the data for the forecast. Is that correct?
How do I filter for the same timeseries range with the forecast as the ts1Test that I'm trying to run accuracy on, is it something like ts(fcast2, start=2001, end = 8567) ?
Since I'm using the zoo package is the as.POSIXct step unnecessary, could I just do eventdata <- zoo(Value, order.by = SaleDateTime) instead?
library("forecast")
library("zoo")
SampleData<-SampleData
Value<-SampleData[,c("SaleDateTime","SaleCount")]
rDateTime<-as.POSIXct(SampleData$SaleDateTime, format="%Y-%m-%d %H:%M:%S")
eventdata <- zoo(Value, order.by = rDateTime)
##Partitioning data Training/Testing
ts1SampleTrain<-eventdata[1:2000,]
ts1Train<-ts(ts1SampleTrain$SaleCount, frequency=24)
ts1SampleTest<-eventdata[2001:28567,]
ts1Test<-ts(ts1SampleTest$SaleCount, frequency=24)
#Training Model
fcast2<-forecast(ts1Train,h=8567)
plot(fcast2)
accuracy(fcast2,ts1Test)
New Error:
Error in -.default(xx, ff[1:n]) : non-numeric argument to binary operator

To make your accuracy test run you should ensure that the length of your test data ts1Test and your forecasting horizon, h in fcast2<-forecast(ts1Train,h=8567) are of the same length. Now you have 26567 datapoints vs 8567.
Following your approach, the next toy example will work:
library(forecast)
library(zoo)
Value <- rnorm(1100)
rDateTime <- seq(as.POSIXct('2012-01-01 00:00:00'), along.with=Value, by='hour')
eventDate <- ts(zoo(Value, order.by=rDateTime), frequency = 24)
tsTrain <-eventDate[1:1000]
tsTest <- eventDate[1001:1100]
fcast<-forecast(tsTrain,h=100)
accuracy(fcast, tsTest)
ME RMSE MAE MPE MAPE MASE ACF1
Training set -2.821378e-04 9.932745e-01 7.990188e-01 1.003861e+02 1.007542e+02 7.230356e-01 4.638487e-02
Test set 0.02515008 1.02271839 0.86072703 99.79208174 100.14023919 NA NA
Concerning your other two questions:
Use of POSIX timestamps and zoo package. You don't need them to
use forecast. ts(Value, frequency) would suffice.
Plotting time series object with datetimes as your labels. The
following code snippet should get you started in this direction. Look for
axis function that provides the desired behavior:
par(mar=c(6,2,1,1)) # bottom, left, top, right margins
plot(tsTrain, type="l", xlab="", xaxt="n")
axis(side=1, at=seq(1,1000,100), label=format(rDateTime[seq(1,1000,100)], "%Y-%m-%d"), las=2)

Related

Error in rep(1, n.ahead) : invalid 'times' argument in R

I'm working on dataset to forecast with ARIMA, and I'm so close to the last step but I'm getting error and couldn't find reference to figure out what I'm missing.
I keep getting error message when I do the following command:
ForcastData<-forecast(fitModel,testData)
Error in rep(1, n.ahead) : invalid 'times' argument
I'll give brief view on the work I did where I have changed my dataset from data frame to Time series and did all tests to check volatility, and Detect if data stationary or not.
Then I got the DataAsStationary as good clean data to apply ARIMA, but since I wanna train the model on train data and test it on the other part of the data, I splitted dataset into training 70% and testing 30%:
ind <-sample(2, nrow(DataAsStationary), replace = TRUE, prob = c(0.7,0.3))
traingData<- DataStationary1[ind==1,]
testData<- DataStationary1[ind==2,]
I used Automatic Selection Algorithm and found that Arima(2,0,3) is the best.
autoARIMAFastTrain1<- auto.arima(traingData, trace= TRUE, ic ="aicc", approximation = FALSE, stepwise = FALSE)
I have to mentioned that I did check the if residuals are Uncorrelated (White Noise) and deal with it.
library(tseries)
library(astsa)
library(forecast)
After that I used the training dataset to fit the model:
fitModel <- Arima(traingData, order=c(2,0,3))
fitted(fitModel)
ForcastData<-forecast(fitModel,testData)
output <- cbind(testData, ForcastData)
accuracy(testData, ForcastData)
plot(outp)
Couldn't find any resource about the error:
Error in rep(1, n.ahead) : invalid 'times' argument
Any suggestions!! Really
I tried
ForcastData<-forecast.Arima(fitModel,testData)
but I get error that
forecast.Arima not found !
Any idea why I get the error?
You need to specify the arguments to forecast() a little differently; since you didn't post example data, I'll demonstrate with the gold dataset in the forecast package:
library(forecast)
data(gold)
trainingData <- gold[1:554]
testData <- gold[555:1108]
fitModel <- Arima(trainingData, order=c(2, 0, 3))
ForcastData <- forecast(fitModel, testData)
# Error in rep(1, n.ahead) : invalid 'times' argument
ForcastData <- forecast(object=testData, model=fitModel) # no error
accuracy(f=ForcastData) # you only need to give ForcastData; see help(accuracy)
ME RMSE MAE MPE MAPE MASE
Training set 0.4751156 6.951257 3.286692 0.09488746 0.7316996 1.000819
ACF1
Training set -0.2386402
You may want to spend some time with the forecast package documentation to see what the arguments for the various functions are named and in what order they appear.
Regarding your forecast.Arima not found error, you can see this answer to a different question regarding the forecast package -- essentially that function isn't meant to be called by the user, but rather called by the forecast function.
EDIT:
After receiving your comment, it seems the following might help:
library(forecast)
# Read in the data
full_data <- read.csv('~/Downloads/onevalue1.csv')
full_data$UnixHour <- as.Date(full_data$UnixHour)
# Split the sample
training_indices <- 1:floor(0.7 * nrow(full_data))
training_data <- full_data$Lane1Flow[training_indices]
test_data <- full_data$Lane1Flow[-training_indices]
# Use automatic model selection:
autoARIMAFastTrain1 <- auto.arima(training_data, trace=TRUE, ic ="aicc",
approximation=FALSE, stepwise=FALSE)
# Fit the model on test data:
fit_model <- Arima(training_data, order=c(2, 0, 3))
# Do forecasting
forecast_data <- forecast(object=test_data, model=fit_model)
# And plot the forecasted values vs. the actual test data:
plot(x=test_data, y=forecast_data$fitted, xlab='Actual', ylab='Predicted')
# It could help more to look at the following plot:
plot(test_data, type='l', col=rgb(0, 0, 1, alpha=0.7),
xlab='Time', ylab='Value', xaxt='n', ylim=c(0, max(forecast_data$fitted)))
ticks <- seq(from=1, to=length(test_data), by=floor(length(test_data)/4))
times <- full_data$UnixHour[-training_indices]
axis(1, lwd=0, lwd.ticks=1, at=ticks, labels=times[ticks])
lines(forecast_data$fitted, col=rgb(1, 0, 0, alpha=0.7))
legend('topright', legend=c('Actual', 'Predicted'), col=c('blue', 'red'),
lty=1, bty='n')
I was able to run
ForcastData <- forecast(object=testData, model=fitModel)
without no error
and Now want to plot the testData and the forecasting data and check if my model is accurate:
so I did:
output <- cbind(testData, ForcastData)
plot(output)
and gave me the error:
Error in error(x, ...) :
improper length of one or more arguments to merge.xts
So when I checked ForcastData, it gave the output:
> ForcastData
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2293201 -20.2831770 -308.7474 268.1810 -461.4511 420.8847
2296801 -20.1765782 -346.6400 306.2868 -519.4593 479.1061
2300401 -18.3975657 -348.8556 312.0605 -523.7896 486.9945
2304001 -2.2829565 -332.7483 328.1824 -507.6860 503.1201
2307601 2.7023277 -327.8611 333.2658 -502.8509 508.2555
2311201 4.5777316 -328.6756 337.8311 -505.0893 514.2447
2314801 4.3198927 -331.4470 340.0868 -509.1913 517.8310
2318401 3.8277285 -332.7898 340.4453 -510.9844 518.6398
2322001 1.4364973 -335.2403 338.1133 -513.4662 516.3392
2325601 -0.4013561 -337.0807 336.2780 -515.3080 514.5053
I thought I will get list of result as I have in my testData. I need to get the chart that shows 2 lines of actual data(testData), and expected data(ForcastData).
I have really went through many documentation about forcast, but I can't find something explain what I wanna do.

ets: Error in ets(timeseries, model = "MAM") : Nonseasonal data

I'm trying to create a forecast using an exponential smoothing method, but get the error "nonseasonal data". This is clearly not true - see code below.
Why am I getting this error? Should I use a different function (it should be able to perform simple, double, damped trend, seasonal, Winters method)?
library(forecast)
timelen<-48 # use 48 months
dates<-seq(from=as.Date("2008/1/1"), by="month", length.out=timelen)
# create seasonal data
time<-seq(1,timelen)
season<-sin(2*pi*time/12)
constant<-40
noise<-rnorm(timelen,mean=0,sd=0.1)
trend<-time*0.01
values<-constant+season+trend+noise
# create time series object
timeseries<-as.ts(x=values,start=min(dates),end=max(dates),frequency=1)
plot(timeseries)
# forecast MAM
ets<-ets(timeseries,model="MAM") # ANN works, why MAM not?
ets.forecast<-forecast(ets,h=24,level=0.9)
plot(ets.forecast)
Thanks&kind regards
You should use ts simply to create a time series from a numeric vector. See the help file for more details.
Your start and end values aren't correctly specified.
And setting the frequency at 1 is not a valid seasonality, it's the same as no seasonality at all.
Try:
timeseries <- ts(data=values, frequency=12)
ets <- ets(timeseries, model="MAM")
print(ets)
#### ETS(M,A,M)
#### Call:
#### ets(y = timeseries, model = "MAM")
#### ...
The question in your comments, why ANN works is because the third N means no seasonnality, so the model can be computed even with a non-seasonal timeseries.

arima model for multiple seasonalities in R

I'm learning to create a forecasting model for time series that has multiple seasonalities. Following is the subset of dataset that I'm refering to. This dataset includes hourly data points and I wish to include daily as well as weekly seasonalities in my arima model. Following is the subset of dataset:
data= c(4,4,1,2,6,21,105,257,291,172,72,10,35,42,77,72,133,192,122,59,29,25,24,5,7,3,3,0,7,15,91,230,284,147,67,53,54,55,63,73,114,154,137,57,27,31,25,11,4,4,4,2,7,18,68,218,251,131,71,43,55,62,63,80,120,144,107,42,27,11,10,16,8,10,7,1,4,3,12,17,58,59,68,76,91,95,89,115,107,107,41,40,25,18,14,15,6,12,2,4,1,6,9,14,43,67,67,94,100,129,126,122,132,118,68,26,19,12,9,5,4,2,5,1,3,16,89,233,304,174,53,55,53,52,59,92,117,214,139,73,37,28,15,11,8,1,2,5,4,22,103,258,317,163,58,29,37,46,54,62,95,197,152,58,32,30,17,9,8,1,3,1,3,16,109,245,302,156,53,34,47,46,54,65,102,155,116,51,30,24,17,10,7,4,8,0,11,0,2,225,282,141,4,87,44,60,52,74,135,157,113,57,44,26,29,17,8,7,4,4,2,10,57,125,182,100,33,27,41,39,35,50,69,92,66,30,11,10,11,9,6,5,10,4,1,7,9,17,24,21,29,28,48,38,30,21,26,25,35,10,9,4,4,4,3,5,4,4,4,3,5,10,16,28,47,63,40,49,28,22,18,27,18,10,5,8,7,3,2,2,4,1,4,19,59,167,235,130,57,45,46,42,40,49,64,96,54,27,17,18,15,7,6,2,3,1,2,21,88,187,253,130,77,47,49,48,53,77,109,147,109,45,41,35,16,13)
The code I'm trying to use is following:
tsdata = ts (data, frequency = 24)
aicvalstemp = NULL
aicvals= NULL
for (i in 1:5) {
for (j in 1:5) {
xreg1 = fourier(tsdata,i,24)
xreg2 = fourier(tsdata,j,168)
xregs = cbind(xreg1,xreg2)
armodel = auto.arima(bike_TS_west, xreg = xregs)
aicvalstemp = cbind(i,j,armodel$aic)
aicvals = rbind(aicvals,aicvalstemp)
}
}
The cbind command in the above command fails because the number of rows in xreg1 and xreg2 are different. I even tried using 1:length(data) argument in the fourier function but that also gave me an error. If someone can rectify the mistakes in the above code to produce a forecast of next 24 hours using an arima model with minimum AIC values, it would be really helpful. Also if you can include datasplitting in your code by creating training and testing data sets, it would be totally awesome. Thanks for your help.
I don't understand the desire to fit a weekly "season" to these data as there is no evidence for one in the data subset you provided. Also, you should really log-transform the data because they do not reflect a Gaussian process as is.
So, here's how you could fit models with a some form of hourly signals.
## the data are not normal, so log transform to meet assumption of Gaussian errors
ln_dat <- log(tsdata)
## number of hours to forecast
hrs_out <- 24
## max number of Fourier terms
max_F <- 5
## empty list for model fits
mod_res <- vector("list", max_F)
## fit models with increasing Fourier terms
for (i in 1:max_F) {
xreg <- fourier(ln_dat,i)
mod_res[[i]] <- auto.arima(tsdata, xreg = xreg)
}
## table of AIC results
aic_tbl <- data.frame(F=seq(max_F), AIC=sapply(mod_res, AIC))
## number of Fourier terms in best model
F_best <- which(aic_tbl$AIC==min(aic_tbl$AIC))
## forecast from best model
fore <- forecast(mod_res[[F_best]], xreg=fourierf(ln_dat,F_best,hrs_out))

Forecast Hourly Partitioning Data and Plotting

I am trying to forecast hourly sales based on past years of data, display the plot of the forecast with x axis SaleDateTime, and check the accuracy against a test set of dates. I keep running in to errors.
I tried using dput to generate a small sample of data but for some reason it still tries to output many more dates then I have in the subset sample data.
My data looks like this: SaleDateTime = "2015-01-02 23:00:00.000" and SaleCount = "1".
It looks like my main issue is with how I'm trying to partition the data into training and test sets.
Also I would like to x axis on the plot to have the form "2015-03-01 23:00:00". I'm pretty new to forecasting so all help is very much appreciated.
Code:
library("forecast")
library("zoo")
SampleData <- read.csv("SampleDataAll.csv")
Value<-SampleData[,c("SaleDateTime","SaleCount")]
rDateTime<-as.POSIXct(SampleData$SaleDateTime, format="%Y-%m-%d %H:%M:%S")
eventdata <- zoo(Value, order.by = rDateTime)
timeseries <- ts(eventdata$SaleCount, frequency=24)
##Partitioning data Training/Testing
ts1Train <- window(timeseries,start="2011-08-01 00:00:00", end="2014-08-01 00:00:00")
Error:
Error in window.default(x, ...) : 'start' cannot be after 'end'
In addition: Warning message:
In window.default(x, ...) : 'end' value not changed
ts1Test <- window(timeseries,start="2014-08-01 01:00:00", end="2015-08-01 00:00:00")
Error in window.default(x, ...) : 'start' cannot be after 'end'
In addition: Warning message:
In window.default(x, ...) : 'end' value not changed
fcast2<-forecast(ts1Train,h=8764)
Error:
Error in forecast(ts1Train, h = 8764) : object 'ts1Train' not found
plot(fcast2)
accuracy(fcast2,ts1Test)
Error:
Error in frequency(x) : object 'ts1Test' not found
UPDATE:
I made the changes below to how I partition the training and testing data as per the suggestion. Now I'm getting the error message below when I try to run accuracy on the ts1Test data.
New Code:
library("forecast")
library("zoo")
SampleData<-SampleData
Value<-SampleData[,c("SaleDateTime","SaleCount")]
rDateTime<-as.POSIXct(SampleData$SaleDateTime, format="%Y-%m-%d %H:%M:%S")
eventdata <- zoo(Value, order.by = rDateTime)
##Partitioning data Training/Testing
ts1SampleTrain<-eventdata[1:2000,]
ts1Train<-ts(ts1SampleTrain$SaleCount, frequency=24)
ts1SampleTest<-eventdata[2001:28567,]
ts1Test<-ts(ts1SampleTest$SaleCount, frequency=24)
#Training Model
fcast2<-forecast(ts1Train,h=8567)
plot(fcast2)
accuracy(fcast2,ts1Test)
New Error:
Error in -.default(xx, ff[1:n]) :
non-numeric argument to binary operator
You can try to split the function before converting the data to a time series function. For example:
train <- sampleData[1:100,] # choose the first 100 row as training set
test <- sampleData[101:200,] # choose the following 100 row as testing set
There are several issues in your code:
The start parameter in window function accept integer (as year) or vector (year and month). ?window will give more info.
The error from the first window function will not give expected input for the following code, especailly the forecast part.
forecast as the name tells, is a forecasting function. You need to build a time series model (for example, using arima function) on your training data.
I would suggest you read some time series tutorials, and here is the one:
https://a-little-book-of-r-for-time-series.readthedocs.org/en/latest/

using forecast accuracy function

I'm using the forecast command on my time series.
When using the accuracy function, I get strange errors and results that I don't understand.
For example, when I do the following:
sinData <- ts(sin(2*pi*seq(from=0.01, to=10, by=0.01)), frequency=100)
fcast <- forecast(sinData)
accuracy(fcast, sinData)
I get the error:
Error in window.default(x, ...) : 'start' cannot be after 'end'
My first question is why do I get this error?
when I do the following:
sinData <- ts(sin(2*pi*seq(from=0.01, to=10, by=0.01)), frequency=100)
fcast <- forecast(sinData)
sinData <- ts(sin(2*pi*seq(from=0.01, to=10, by=0.01)))
accuracy(fcast, sinData)
I get:
ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
Training set -7.570495e-18 1.080575e-15 6.783189e-16 -0.1144996 1.493135 1.065851e-15 NA NA
Test set 5.669237e-01 5.669761e-01 5.669237e-01 85.9202023 85.920202 7.316031e+14 -0.5 11.86708
My second and main question is why do I get completely different errors between the "training set" and "test set", while clearly I use exactly the same data.
The second argument in accuracy should be future data of the same period as the forecasts. Putting the historical data there will cause an error because it is from before the period of the forecasts.
In the second example you have tried to fool accuracy by changing the frequency attribute. accuracy will try to find the overlapping window of observations between the forecasts and the "future" data you have passed in the second argument. In this case, only a subset of the data are used because of the different frequency attributes.
The example in the help file for accuracy() shows how to use the function properly.

Resources