using forecast accuracy function - r

I'm using the forecast command on my time series.
When using the accuracy function, I get strange errors and results that I don't understand.
For example, when I do the following:
sinData <- ts(sin(2*pi*seq(from=0.01, to=10, by=0.01)), frequency=100)
fcast <- forecast(sinData)
accuracy(fcast, sinData)
I get the error:
Error in window.default(x, ...) : 'start' cannot be after 'end'
My first question is why do I get this error?
when I do the following:
sinData <- ts(sin(2*pi*seq(from=0.01, to=10, by=0.01)), frequency=100)
fcast <- forecast(sinData)
sinData <- ts(sin(2*pi*seq(from=0.01, to=10, by=0.01)))
accuracy(fcast, sinData)
I get:
ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
Training set -7.570495e-18 1.080575e-15 6.783189e-16 -0.1144996 1.493135 1.065851e-15 NA NA
Test set 5.669237e-01 5.669761e-01 5.669237e-01 85.9202023 85.920202 7.316031e+14 -0.5 11.86708
My second and main question is why do I get completely different errors between the "training set" and "test set", while clearly I use exactly the same data.

The second argument in accuracy should be future data of the same period as the forecasts. Putting the historical data there will cause an error because it is from before the period of the forecasts.
In the second example you have tried to fool accuracy by changing the frequency attribute. accuracy will try to find the overlapping window of observations between the forecasts and the "future" data you have passed in the second argument. In this case, only a subset of the data are used because of the different frequency attributes.
The example in the help file for accuracy() shows how to use the function properly.

Related

Fama MacBeth regression pmg function error in R

I've been trying to run a Fama Macbeth regression using the pmg function for my data "Dev_Panel" but I keep getting this error message:
Fehler in pmg(BooktoMarket ~ Returns + Profitability + BEtoMEpersistence, :
Insufficient number of time periods
I've read in other posts on here that this could be due to NAs in the data. But I've already removed these from the panel.
Additionally, I've used the pmg function on the data frame "Em_Panel" for which I have undertaken the exact same data cleaning measures as for the "Dev_Panel". The regression for this panel worked, but it only produces a coefficient for the intercept. The other coefficients are NA.
Here's the code I used for the Em_Panel:
require(foreign)
require(plm)
require(lmtest)
Em_Panel <- read.csv2("Em_Panel.csv", na="NA")
FMR_Em <- pmg(BooktoMarket~Returns+Profitability+BEtoMEpersistence, Em_Panel, index = c("companyID", "years"))
And here's the code for the Dev_Panel:
Em_Panel <- read.csv2("Dev_Panel.csv", na="NA")
FMR_Dev <- pmg(BooktoMarket~Returns+Profitability+BEtoMEpersistence, Dev_Panel, index = c("companyID", "years"))
Since this seemingly is a problem concerning my data I will gladly provide it:
http://www.filedropper.com/empanel
http://www.filedropper.com/devpanel
Thank you so much for any help!!!
Edit
After switching the arguments as suggested the error is now produced by the Dev_Panel and not the Em_Panel.
Also the regression for the Em_Panel now only provides a coefficient for the intercept. The other coefficients are NA.

Error in rep(1, n.ahead) : invalid 'times' argument in R

I'm working on dataset to forecast with ARIMA, and I'm so close to the last step but I'm getting error and couldn't find reference to figure out what I'm missing.
I keep getting error message when I do the following command:
ForcastData<-forecast(fitModel,testData)
Error in rep(1, n.ahead) : invalid 'times' argument
I'll give brief view on the work I did where I have changed my dataset from data frame to Time series and did all tests to check volatility, and Detect if data stationary or not.
Then I got the DataAsStationary as good clean data to apply ARIMA, but since I wanna train the model on train data and test it on the other part of the data, I splitted dataset into training 70% and testing 30%:
ind <-sample(2, nrow(DataAsStationary), replace = TRUE, prob = c(0.7,0.3))
traingData<- DataStationary1[ind==1,]
testData<- DataStationary1[ind==2,]
I used Automatic Selection Algorithm and found that Arima(2,0,3) is the best.
autoARIMAFastTrain1<- auto.arima(traingData, trace= TRUE, ic ="aicc", approximation = FALSE, stepwise = FALSE)
I have to mentioned that I did check the if residuals are Uncorrelated (White Noise) and deal with it.
library(tseries)
library(astsa)
library(forecast)
After that I used the training dataset to fit the model:
fitModel <- Arima(traingData, order=c(2,0,3))
fitted(fitModel)
ForcastData<-forecast(fitModel,testData)
output <- cbind(testData, ForcastData)
accuracy(testData, ForcastData)
plot(outp)
Couldn't find any resource about the error:
Error in rep(1, n.ahead) : invalid 'times' argument
Any suggestions!! Really
I tried
ForcastData<-forecast.Arima(fitModel,testData)
but I get error that
forecast.Arima not found !
Any idea why I get the error?
You need to specify the arguments to forecast() a little differently; since you didn't post example data, I'll demonstrate with the gold dataset in the forecast package:
library(forecast)
data(gold)
trainingData <- gold[1:554]
testData <- gold[555:1108]
fitModel <- Arima(trainingData, order=c(2, 0, 3))
ForcastData <- forecast(fitModel, testData)
# Error in rep(1, n.ahead) : invalid 'times' argument
ForcastData <- forecast(object=testData, model=fitModel) # no error
accuracy(f=ForcastData) # you only need to give ForcastData; see help(accuracy)
ME RMSE MAE MPE MAPE MASE
Training set 0.4751156 6.951257 3.286692 0.09488746 0.7316996 1.000819
ACF1
Training set -0.2386402
You may want to spend some time with the forecast package documentation to see what the arguments for the various functions are named and in what order they appear.
Regarding your forecast.Arima not found error, you can see this answer to a different question regarding the forecast package -- essentially that function isn't meant to be called by the user, but rather called by the forecast function.
EDIT:
After receiving your comment, it seems the following might help:
library(forecast)
# Read in the data
full_data <- read.csv('~/Downloads/onevalue1.csv')
full_data$UnixHour <- as.Date(full_data$UnixHour)
# Split the sample
training_indices <- 1:floor(0.7 * nrow(full_data))
training_data <- full_data$Lane1Flow[training_indices]
test_data <- full_data$Lane1Flow[-training_indices]
# Use automatic model selection:
autoARIMAFastTrain1 <- auto.arima(training_data, trace=TRUE, ic ="aicc",
approximation=FALSE, stepwise=FALSE)
# Fit the model on test data:
fit_model <- Arima(training_data, order=c(2, 0, 3))
# Do forecasting
forecast_data <- forecast(object=test_data, model=fit_model)
# And plot the forecasted values vs. the actual test data:
plot(x=test_data, y=forecast_data$fitted, xlab='Actual', ylab='Predicted')
# It could help more to look at the following plot:
plot(test_data, type='l', col=rgb(0, 0, 1, alpha=0.7),
xlab='Time', ylab='Value', xaxt='n', ylim=c(0, max(forecast_data$fitted)))
ticks <- seq(from=1, to=length(test_data), by=floor(length(test_data)/4))
times <- full_data$UnixHour[-training_indices]
axis(1, lwd=0, lwd.ticks=1, at=ticks, labels=times[ticks])
lines(forecast_data$fitted, col=rgb(1, 0, 0, alpha=0.7))
legend('topright', legend=c('Actual', 'Predicted'), col=c('blue', 'red'),
lty=1, bty='n')
I was able to run
ForcastData <- forecast(object=testData, model=fitModel)
without no error
and Now want to plot the testData and the forecasting data and check if my model is accurate:
so I did:
output <- cbind(testData, ForcastData)
plot(output)
and gave me the error:
Error in error(x, ...) :
improper length of one or more arguments to merge.xts
So when I checked ForcastData, it gave the output:
> ForcastData
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2293201 -20.2831770 -308.7474 268.1810 -461.4511 420.8847
2296801 -20.1765782 -346.6400 306.2868 -519.4593 479.1061
2300401 -18.3975657 -348.8556 312.0605 -523.7896 486.9945
2304001 -2.2829565 -332.7483 328.1824 -507.6860 503.1201
2307601 2.7023277 -327.8611 333.2658 -502.8509 508.2555
2311201 4.5777316 -328.6756 337.8311 -505.0893 514.2447
2314801 4.3198927 -331.4470 340.0868 -509.1913 517.8310
2318401 3.8277285 -332.7898 340.4453 -510.9844 518.6398
2322001 1.4364973 -335.2403 338.1133 -513.4662 516.3392
2325601 -0.4013561 -337.0807 336.2780 -515.3080 514.5053
I thought I will get list of result as I have in my testData. I need to get the chart that shows 2 lines of actual data(testData), and expected data(ForcastData).
I have really went through many documentation about forcast, but I can't find something explain what I wanna do.

How to feed data into ode while doing optimisation

I'm new to R. I found very useful code, which I've tried to use for my purposes. however, I get an error:
Error in func(time, state, parms, ...) : object 'k4' not found and Error in func(time, state, parms, ...) : object 'E' not found
I don't know where the problem is as I can see all parameters and data.frame is correct as well.
Thank you everyone for taking time to look at this. I've tried to reduce the number of parameters to3 (k10, k11,k12), and using estimated values for the remaining (embeded values in the code). However, I still get an error message, the E value from data.frame is not passed into rxnrate function and as result ode can't use it. I've tried to use events and forcing functions but it doesn't seem to work. Thank you for spotting P4, it was a typo, should be P, I've corrected already.
Editors note: This was crossposted to Rhelp and that message included the source of this code as a stackoverflow question "r-parameter and initial conditions fitting ODE models with nls.lm."
#set working directory
setwd("~/R/wkspace")
#load libraries
library(ggplot2)
library(reshape2)
library(deSolve)
library(minpack.lm)
time=c(22,23,24,46,47,48)
cE=c(15.92,24.01,25.29,15.92,24.01,25.29)
cP=c(0.3,0.14,0.29,0.3,0.14,0.29)
cL=c(6.13,3.91,38.4,6.13,3.91,38.4)
df<-data.frame(time,cE,cP,cL)
df
names(df)=c("time","cE","cP","cL")
#rate function
rxnrate=function(t,c,parms){
#rate constant passed through a list called
k1=parms$k1
k2=parms$k2
k3=parms$k3
k4=parms$k4
k5=parms$k5
k6=parms$k6
k7=parms$k7
k8=parms$k8
k9=parms$k9
k10=parms$k10
#c is the concentration of species
#derivatives dc/dt are computed below
r=rep(0,length(c))
r[1]=(k1+(k2*E^k10)/(k3^k10+E^k10))/(1+P/k6)-k4* ((1+k5*P)/(1+k7*E))*c["pLH"]; #dRP_LH/dt
r[2]=(1/k8)*k4*((1+k5*P)/(1+k7*E))*c["p"]-k9*c["L"] #dL/dt
return(list(r))
}
ssq=function(myparms){
#initial concentration
cinit=c(pLH=unname(myparms[11]),LH=unname(myparms[12]))
print(cinit)
#time points for which conc is reported
#include the points where data is available
t=c(seq(0,46,2),df$time)
t=sort(unique(t))
#parameters from the parameters estimation
k1=myparms[1]
k2=myparms[2]
k3=myparms[3]
k4=myparms[4]
k5=myparms[5]
k6=myparms[6]
k7=myparms[7]
k8=myparms[8]
k9=myparms[9]
k10=myparms[10]
#solve ODE for a given set of parameters
out=ode(y=cinit,times=t,func=rxnrate,parms=list(k1=k1,k2=k2,k3=k3,k4=k4,k5=k5,k6=k6,k7=k7,k8=k8,k9=k9,k10=k10,E=cE,P=cP))
#Filter data that contains time points
outdf=data.frame(out)
outdf=outdf[outdf$time%in% df$time,]
#Evaluate predicted vs experimental residual
preddf=melt(outdf,id.var="time",variable.name="species",value.name="conc")
expdf=melt(df,id.var="time",variable.name="species",value.name="conc")
ssqres=preddf$conc-expdf$conc
return(ssqres)
}
# parameter fitting using levenberg marquart
#initial guess for parameters
myparms=c(k1=500, k2=4500, k3=200,k4=2.42, k5=0.26,k6=12.2,k7=0.004,k8=55,k9=24,k10=8,pLH=14.5,LH=3.55)
#fitting
fitval=nls.lm(par=myparms,fn=ssq)
#summary of fit
summary(fitval)
#estimated parameter
parest=as.list(coef(fitval))

arima model for multiple seasonalities in R

I'm learning to create a forecasting model for time series that has multiple seasonalities. Following is the subset of dataset that I'm refering to. This dataset includes hourly data points and I wish to include daily as well as weekly seasonalities in my arima model. Following is the subset of dataset:
data= c(4,4,1,2,6,21,105,257,291,172,72,10,35,42,77,72,133,192,122,59,29,25,24,5,7,3,3,0,7,15,91,230,284,147,67,53,54,55,63,73,114,154,137,57,27,31,25,11,4,4,4,2,7,18,68,218,251,131,71,43,55,62,63,80,120,144,107,42,27,11,10,16,8,10,7,1,4,3,12,17,58,59,68,76,91,95,89,115,107,107,41,40,25,18,14,15,6,12,2,4,1,6,9,14,43,67,67,94,100,129,126,122,132,118,68,26,19,12,9,5,4,2,5,1,3,16,89,233,304,174,53,55,53,52,59,92,117,214,139,73,37,28,15,11,8,1,2,5,4,22,103,258,317,163,58,29,37,46,54,62,95,197,152,58,32,30,17,9,8,1,3,1,3,16,109,245,302,156,53,34,47,46,54,65,102,155,116,51,30,24,17,10,7,4,8,0,11,0,2,225,282,141,4,87,44,60,52,74,135,157,113,57,44,26,29,17,8,7,4,4,2,10,57,125,182,100,33,27,41,39,35,50,69,92,66,30,11,10,11,9,6,5,10,4,1,7,9,17,24,21,29,28,48,38,30,21,26,25,35,10,9,4,4,4,3,5,4,4,4,3,5,10,16,28,47,63,40,49,28,22,18,27,18,10,5,8,7,3,2,2,4,1,4,19,59,167,235,130,57,45,46,42,40,49,64,96,54,27,17,18,15,7,6,2,3,1,2,21,88,187,253,130,77,47,49,48,53,77,109,147,109,45,41,35,16,13)
The code I'm trying to use is following:
tsdata = ts (data, frequency = 24)
aicvalstemp = NULL
aicvals= NULL
for (i in 1:5) {
for (j in 1:5) {
xreg1 = fourier(tsdata,i,24)
xreg2 = fourier(tsdata,j,168)
xregs = cbind(xreg1,xreg2)
armodel = auto.arima(bike_TS_west, xreg = xregs)
aicvalstemp = cbind(i,j,armodel$aic)
aicvals = rbind(aicvals,aicvalstemp)
}
}
The cbind command in the above command fails because the number of rows in xreg1 and xreg2 are different. I even tried using 1:length(data) argument in the fourier function but that also gave me an error. If someone can rectify the mistakes in the above code to produce a forecast of next 24 hours using an arima model with minimum AIC values, it would be really helpful. Also if you can include datasplitting in your code by creating training and testing data sets, it would be totally awesome. Thanks for your help.
I don't understand the desire to fit a weekly "season" to these data as there is no evidence for one in the data subset you provided. Also, you should really log-transform the data because they do not reflect a Gaussian process as is.
So, here's how you could fit models with a some form of hourly signals.
## the data are not normal, so log transform to meet assumption of Gaussian errors
ln_dat <- log(tsdata)
## number of hours to forecast
hrs_out <- 24
## max number of Fourier terms
max_F <- 5
## empty list for model fits
mod_res <- vector("list", max_F)
## fit models with increasing Fourier terms
for (i in 1:max_F) {
xreg <- fourier(ln_dat,i)
mod_res[[i]] <- auto.arima(tsdata, xreg = xreg)
}
## table of AIC results
aic_tbl <- data.frame(F=seq(max_F), AIC=sapply(mod_res, AIC))
## number of Fourier terms in best model
F_best <- which(aic_tbl$AIC==min(aic_tbl$AIC))
## forecast from best model
fore <- forecast(mod_res[[F_best]], xreg=fourierf(ln_dat,F_best,hrs_out))

Create Forecast and Check Accuracy

I have data of the form SaleDateTime = '2015-01-02 23:00:00.000' SaleCount=4.
I'm trying to create an hourly forecast for the next 12 hours, using the code below.
I'm new to forecasting and could definitely appreciate some advice.
I'm trying to partition the data, train a model, plot the forecast with x axis of the form '2015-01-02 23:00:00.000', and test the accuracy of the model on a test time series.
I'm getting the error message below, when I try to run the accuracy as shown. Does anyone know why I'm getting the error message below?
When I run the plot as shown below it has an x axis from 0 to 400, does anyone know how to show that as something like '2015-01-02 23:00:00.000'? I would also like to narrow the plot to the last say 3 months.
My understanding is that if you don't specify a model for forecast, then it tries to fit the best model it can to the data for the forecast. Is that correct?
How do I filter for the same timeseries range with the forecast as the ts1Test that I'm trying to run accuracy on, is it something like ts(fcast2, start=2001, end = 8567) ?
Since I'm using the zoo package is the as.POSIXct step unnecessary, could I just do eventdata <- zoo(Value, order.by = SaleDateTime) instead?
library("forecast")
library("zoo")
SampleData<-SampleData
Value<-SampleData[,c("SaleDateTime","SaleCount")]
rDateTime<-as.POSIXct(SampleData$SaleDateTime, format="%Y-%m-%d %H:%M:%S")
eventdata <- zoo(Value, order.by = rDateTime)
##Partitioning data Training/Testing
ts1SampleTrain<-eventdata[1:2000,]
ts1Train<-ts(ts1SampleTrain$SaleCount, frequency=24)
ts1SampleTest<-eventdata[2001:28567,]
ts1Test<-ts(ts1SampleTest$SaleCount, frequency=24)
#Training Model
fcast2<-forecast(ts1Train,h=8567)
plot(fcast2)
accuracy(fcast2,ts1Test)
New Error:
Error in -.default(xx, ff[1:n]) : non-numeric argument to binary operator
To make your accuracy test run you should ensure that the length of your test data ts1Test and your forecasting horizon, h in fcast2<-forecast(ts1Train,h=8567) are of the same length. Now you have 26567 datapoints vs 8567.
Following your approach, the next toy example will work:
library(forecast)
library(zoo)
Value <- rnorm(1100)
rDateTime <- seq(as.POSIXct('2012-01-01 00:00:00'), along.with=Value, by='hour')
eventDate <- ts(zoo(Value, order.by=rDateTime), frequency = 24)
tsTrain <-eventDate[1:1000]
tsTest <- eventDate[1001:1100]
fcast<-forecast(tsTrain,h=100)
accuracy(fcast, tsTest)
ME RMSE MAE MPE MAPE MASE ACF1
Training set -2.821378e-04 9.932745e-01 7.990188e-01 1.003861e+02 1.007542e+02 7.230356e-01 4.638487e-02
Test set 0.02515008 1.02271839 0.86072703 99.79208174 100.14023919 NA NA
Concerning your other two questions:
Use of POSIX timestamps and zoo package. You don't need them to
use forecast. ts(Value, frequency) would suffice.
Plotting time series object with datetimes as your labels. The
following code snippet should get you started in this direction. Look for
axis function that provides the desired behavior:
par(mar=c(6,2,1,1)) # bottom, left, top, right margins
plot(tsTrain, type="l", xlab="", xaxt="n")
axis(side=1, at=seq(1,1000,100), label=format(rDateTime[seq(1,1000,100)], "%Y-%m-%d"), las=2)

Resources