Forecast Hourly Partitioning Data and Plotting - r

I am trying to forecast hourly sales based on past years of data, display the plot of the forecast with x axis SaleDateTime, and check the accuracy against a test set of dates. I keep running in to errors.
I tried using dput to generate a small sample of data but for some reason it still tries to output many more dates then I have in the subset sample data.
My data looks like this: SaleDateTime = "2015-01-02 23:00:00.000" and SaleCount = "1".
It looks like my main issue is with how I'm trying to partition the data into training and test sets.
Also I would like to x axis on the plot to have the form "2015-03-01 23:00:00". I'm pretty new to forecasting so all help is very much appreciated.
Code:
library("forecast")
library("zoo")
SampleData <- read.csv("SampleDataAll.csv")
Value<-SampleData[,c("SaleDateTime","SaleCount")]
rDateTime<-as.POSIXct(SampleData$SaleDateTime, format="%Y-%m-%d %H:%M:%S")
eventdata <- zoo(Value, order.by = rDateTime)
timeseries <- ts(eventdata$SaleCount, frequency=24)
##Partitioning data Training/Testing
ts1Train <- window(timeseries,start="2011-08-01 00:00:00", end="2014-08-01 00:00:00")
Error:
Error in window.default(x, ...) : 'start' cannot be after 'end'
In addition: Warning message:
In window.default(x, ...) : 'end' value not changed
ts1Test <- window(timeseries,start="2014-08-01 01:00:00", end="2015-08-01 00:00:00")
Error in window.default(x, ...) : 'start' cannot be after 'end'
In addition: Warning message:
In window.default(x, ...) : 'end' value not changed
fcast2<-forecast(ts1Train,h=8764)
Error:
Error in forecast(ts1Train, h = 8764) : object 'ts1Train' not found
plot(fcast2)
accuracy(fcast2,ts1Test)
Error:
Error in frequency(x) : object 'ts1Test' not found
UPDATE:
I made the changes below to how I partition the training and testing data as per the suggestion. Now I'm getting the error message below when I try to run accuracy on the ts1Test data.
New Code:
library("forecast")
library("zoo")
SampleData<-SampleData
Value<-SampleData[,c("SaleDateTime","SaleCount")]
rDateTime<-as.POSIXct(SampleData$SaleDateTime, format="%Y-%m-%d %H:%M:%S")
eventdata <- zoo(Value, order.by = rDateTime)
##Partitioning data Training/Testing
ts1SampleTrain<-eventdata[1:2000,]
ts1Train<-ts(ts1SampleTrain$SaleCount, frequency=24)
ts1SampleTest<-eventdata[2001:28567,]
ts1Test<-ts(ts1SampleTest$SaleCount, frequency=24)
#Training Model
fcast2<-forecast(ts1Train,h=8567)
plot(fcast2)
accuracy(fcast2,ts1Test)
New Error:
Error in -.default(xx, ff[1:n]) :
non-numeric argument to binary operator

You can try to split the function before converting the data to a time series function. For example:
train <- sampleData[1:100,] # choose the first 100 row as training set
test <- sampleData[101:200,] # choose the following 100 row as testing set
There are several issues in your code:
The start parameter in window function accept integer (as year) or vector (year and month). ?window will give more info.
The error from the first window function will not give expected input for the following code, especailly the forecast part.
forecast as the name tells, is a forecasting function. You need to build a time series model (for example, using arima function) on your training data.
I would suggest you read some time series tutorials, and here is the one:
https://a-little-book-of-r-for-time-series.readthedocs.org/en/latest/

Related

Forecasting currencies with ARIMA (error occurred: 'ts' object must have one or more observations)

Hi I tried to plot future values with ARIMA but this error happened: 'ts' object must have one or more observations.
Import dataset:
library(priceR)
a <- historical_exchange_rates("THB", to = "USD",start_date = "2010-01-01", end_date = "2021-12-31")
Making a timeseries dataset:
library(forecast)
a.dfts <- ts(a$one_THB_equivalent_to_x_USD, frequency=12, start=c(2010,01,01), end=c(2021,12,31))
Apply ARIMA model:
library(TSA)
loga <- diff(log(a.dfts))
finala.arma <- arima(loga, order = c(0,0,2))
Forecasting future values:
forca.ARIMA <- forecast(finala.arma, h=24, level=c(80,95))
Then this is where I got an error.
Error in ts(x) : 'ts' object must have one or more observations

tsclean gives error message although data is definitely ts

I'm a beginner learning time series analysis.
I want to clean up the data I will be using for ARIMA. However, tsclean gives me an error message:
("Error in stl(xx, s.window = "periodic", robust = TRUE) : only univariate series are allowed")
And I don't understand why.
My code looks like that:
read.csv("FILENAME.csv",header=TRUE)
mydata<-read.csv("FILENAME.csv")
mydata2<-ts(mydata,freq=12,start=c(2011,1))
class(mydata2) # the result here is "ts"
mydataclean<-tsclean(mydata2) # the result is "Error in stl(xx, s.window = "periodic", robust = TRUE) : only univariate series are allowed"
The file FILENAME.csv is composed of just one column with data. The data is for full 7 years monthly.

Create Forecast and Check Accuracy

I have data of the form SaleDateTime = '2015-01-02 23:00:00.000' SaleCount=4.
I'm trying to create an hourly forecast for the next 12 hours, using the code below.
I'm new to forecasting and could definitely appreciate some advice.
I'm trying to partition the data, train a model, plot the forecast with x axis of the form '2015-01-02 23:00:00.000', and test the accuracy of the model on a test time series.
I'm getting the error message below, when I try to run the accuracy as shown. Does anyone know why I'm getting the error message below?
When I run the plot as shown below it has an x axis from 0 to 400, does anyone know how to show that as something like '2015-01-02 23:00:00.000'? I would also like to narrow the plot to the last say 3 months.
My understanding is that if you don't specify a model for forecast, then it tries to fit the best model it can to the data for the forecast. Is that correct?
How do I filter for the same timeseries range with the forecast as the ts1Test that I'm trying to run accuracy on, is it something like ts(fcast2, start=2001, end = 8567) ?
Since I'm using the zoo package is the as.POSIXct step unnecessary, could I just do eventdata <- zoo(Value, order.by = SaleDateTime) instead?
library("forecast")
library("zoo")
SampleData<-SampleData
Value<-SampleData[,c("SaleDateTime","SaleCount")]
rDateTime<-as.POSIXct(SampleData$SaleDateTime, format="%Y-%m-%d %H:%M:%S")
eventdata <- zoo(Value, order.by = rDateTime)
##Partitioning data Training/Testing
ts1SampleTrain<-eventdata[1:2000,]
ts1Train<-ts(ts1SampleTrain$SaleCount, frequency=24)
ts1SampleTest<-eventdata[2001:28567,]
ts1Test<-ts(ts1SampleTest$SaleCount, frequency=24)
#Training Model
fcast2<-forecast(ts1Train,h=8567)
plot(fcast2)
accuracy(fcast2,ts1Test)
New Error:
Error in -.default(xx, ff[1:n]) : non-numeric argument to binary operator
To make your accuracy test run you should ensure that the length of your test data ts1Test and your forecasting horizon, h in fcast2<-forecast(ts1Train,h=8567) are of the same length. Now you have 26567 datapoints vs 8567.
Following your approach, the next toy example will work:
library(forecast)
library(zoo)
Value <- rnorm(1100)
rDateTime <- seq(as.POSIXct('2012-01-01 00:00:00'), along.with=Value, by='hour')
eventDate <- ts(zoo(Value, order.by=rDateTime), frequency = 24)
tsTrain <-eventDate[1:1000]
tsTest <- eventDate[1001:1100]
fcast<-forecast(tsTrain,h=100)
accuracy(fcast, tsTest)
ME RMSE MAE MPE MAPE MASE ACF1
Training set -2.821378e-04 9.932745e-01 7.990188e-01 1.003861e+02 1.007542e+02 7.230356e-01 4.638487e-02
Test set 0.02515008 1.02271839 0.86072703 99.79208174 100.14023919 NA NA
Concerning your other two questions:
Use of POSIX timestamps and zoo package. You don't need them to
use forecast. ts(Value, frequency) would suffice.
Plotting time series object with datetimes as your labels. The
following code snippet should get you started in this direction. Look for
axis function that provides the desired behavior:
par(mar=c(6,2,1,1)) # bottom, left, top, right margins
plot(tsTrain, type="l", xlab="", xaxt="n")
axis(side=1, at=seq(1,1000,100), label=format(rDateTime[seq(1,1000,100)], "%Y-%m-%d"), las=2)

Problem when doing pre-whitening before ccf analysis

I have following R code which does not work when trying to pre-whiten other series by the model generated for the other series.
-- Libraries;
library(forecast);
library(TSA);
library(xts);
-- Read from csv;
....
-- Do transforms;
Power=xts(data1[2],seq(from=as.Date("2011-01-01"), to=as.Date("2013-09-18"),by="day"),frequency=7);
Temp=xts(data2[1],seq(from=as.Date("2011-01-01"), to=as.Date("2013-09-18"),by="day"),frequency=7);
-- Prewhiten for CCF;
mod1=Arima(Temp,order=c(2,0,1),seasonal=list(order=c(1,1,1)));
Box.test(mod1$residuals,lag=365,type=c("Ljung-Box"));
x_series=mod1$residuals;
y_filtered=residuals(Arima(Power,model=mod1));
Last Part does not work since I get error:
Error in stats::arima(x = x, order = order, seasonal = seasonal, include.mean = include.mean, :
wrong length for 'fixed'
What goes wrong here?
Arima and stats::arima both require ts objects. The error is caused by xts objects being used. Try this instead:
Power <- ts(data1[2], frequency=7)
Temp <- ts(data2[1], frequency=7)
mod1 <- Arima(Temp,order=c(2,0,1),seasonal=c(1,1,1))
Box.test(residuals(mod1),lag=365,type=c("Ljung-Box"))
x_series <- residuals(mod1)
y_filtered <- residuals(Arima(Power,model=mod1))

Error when doing linear regression using zoo objects ... Error in `$<-.zoo`(`*tmp*`

I am new to R and slowly getting acquainted. My question refers to the following piece of code.
I am creating a zoo object with the following headers and then filtering by date. On the filtered dates I am subtracting two columns (Tom from Elena). Everything works fine until here.
Code below:
b <- read.zoo(b1, header = TRUE, index.column = 1, format = "%d/%m/%Y")
startDate = "2013/11/02"
endDate = "2013/12/20"
dates <- seq(as.Date(startDate), as.Date(endDate), by=1)
TE = b[dates]$Tom - b[dates]$Elena
However I am then regressing the results from my subtraction (see above TE) on Elena. However i get an error message every time i try and to this regression
TE$model <- lm(TE ~ b[dates]$Elena)
Error in $<-.zoo(*tmp*, "model", value = list(coefficients = c(-0.0597128230859905, :
not possible for univariate zoo series
I have tried creating a data frame and then doing the regression but with no avail. Any help would be appreciated. Thanks.
You can not add the outcome of a regression (a list of class lm) to a time series of class zoo.
I recommend saving the model in a separate object, e.g.,
fit <- lm(TE ~ b[dates]$Elena)

Resources