R: forecast::accuracy() Vs Metrics::accuracy() Functions Results Not the Same - r

I am testing for the RMSE of a forecast and observed that the two forecast::accuracy()[2] and Metrics::accuracy() are not the same. In fact, the latter is even 0
set.seed(289805)
ts1 <- arima.sim(n = 10, model = list(ar = 0.8, order = c(1, 0, 0)), sd = 1) # the series I want to forecast for
train_ts1 <- head(ts1, length(ts1) - 2) # the part of series I want to project int the future time
test_ts1 <- tail(ts1, length(ts1) - length(train_ts1)) # the part of series I want to compare my forecast with
set.seed(837530)
ts2 <- arima.sim(n = 10, model = list(ma = 0.8, order = c(0, 0, 1)), sd = 1) # the second series, part of which I want to train
train_ts2 <- head(ts2, length(ts2) - 2) # trainning part of second series
test_ts2 <- tail(ts2, length(ts2) - length(train_ts2)) # do not seem to need this part of the series
fcast <- forecast::forecast(train_ts1, model = forecast::auto.arima(train_ts2), h = 2)$mean # my
forecast using the best model from trainning set of second series
forecast::accuracy(fcast, test_ts1)[2] # RMSE for the forecast
# [1] 0.6412488
Metrics::accuracy(test_ts1, fcast)
# [1] 0
Please what am I doing wrong?

Related

Forecasting using Dlm in R

I have this dataset where I need to do a forecast for the next 6 Quarters using a Dynamic linear model (dlm) in R. The problem I am facing is that (dlmfc) keeps returning the same value for all the quarters. Also, including at least a trend and a seasonal value. Please, help me.
Here is the dataset.
[https://i.stack.imgur.com/DKhf3.png][1]
Here is my code.
DLM <- data1.ts
dlmMod <- dlmModPoly(order = 1, dV = 0.8, dW = 0.1) +
dlmModARMA(ar = 1, ma = 1, sigma2 = 1)
dlmFilt <- dlmFilter(DLM, mod = dlmMod)
dlmFc <- dlmForecast(dlmFilt, nAhead = 6)

How can I write a loop for forecasting 100 different series using forecast package auto.arima loop?

My series has 3 different columns, first ID tag identifying the first outlet, then time tag, and finally the measurement.
I need to create forecasts for 100 different series (outlets). First I need to subset ID for the first outlet, then predict arima functions and finally collect 7 days ahead forecasts for every outlet. Moreover, I also need hourly, weekly, daily dummies in my model. So I need to xregs to the auto.arima procedure.
However, I am incapable create the code bellow with a loop that would run for all 100 different IDs.
df11 <-subset(df10,ID==288)%>%select(Tag,Measure)
sales.xts <- xts(df11[ ,c(-1)],order.by = df11$Tag)
sales.xts_m<-sales.xts["2020-07-22/2020-10-04"]
dummies<- xts(Seasonaldummies_all[,-1],order.by = Seasonaldummies_all$Tag)
dummies_hd_m<-dummies_hd["2020-07-22/2020-10-04"]
model<-auto.arima(sales.xts_m,xreg=dummies_hd_m, biasadj = TRUE,max.p=7,max.q=7,seasonal=FALSE,test=c("kpss"),lambda = "auto",num.cores=15,stationary = TRUE)
Can you show me a quick way to do that job by apply or loop functions?
You if you want to use forecast package need to convert your data into a ts (mts) object. To do that fist transform your data from long format to wide format (from the image you post above I assume your data is in a long format). Then by using ts() function to create a ts() object, see the example below.
Let's generate some example ts data
sales.xts_m <- ts(data.frame(AA = arima.sim(list(order=c(1,0,0), ar=.5), n=100,
mean = 12),
AB = arima.sim(list(order=c(1,0,0), ar=.5), n=100,
mean = 12),
AC = arima.sim(list(order=c(1,0,0), ar=.5), n=100,
mean = 11),
BA = arima.sim(list(order=c(1,0,0), ar=.5), n=100,
mean = 10),
BB = arima.sim(list(order=c(1,0,0), ar=.5), n=100,
mean = 14)), start = c(2000, 1),
frequency = 12)
nts <- ncol(sales.xts_m) # number of time series
h <- 12 # forecast horizon
Example xreg
dummies_hd_m <- forecast::seasonaldummy(sales.xts_m[,1])
dummies_hd_m_future <- forecast::seasonaldummy(sales.xts_m[,1], h = h)
mylist <- list()
fc <- matrix(nrow = h, ncol = nts)
if you need to keep the models --------------------
models will be in mylist and point forecast in fc for each ts
for (i in 1:nts) {
mylist[[i]] <- auto.arima(sales.xts_m[,i],xreg=dummies_hd_m, biasadj = TRUE,
max.p=7,max.q=7,seasonal= FALSE,test=c("kpss"),
lambda = "auto",num.cores=15,stationary = TRUE )
fc[,i] <- forecast(mylist[[i]], h=h, xreg = dummies_hd_m_future)$mean
}
#ts names
colnames(fc) <- colnames(sales.xts_m)
if you do not need to keep models --------------------
fc <- matrix(nrow = h, ncol = nts)
for (i in 1:nts) {
fc[,i] <- forecast(auto.arima(sales.xts_m[,i],xreg=dummies_hd_m, biasadj = TRUE,
max.p=7,max.q=7,seasonal=FALSE,test=c("kpss"),
lambda = "auto",num.cores=15,stationary = TRUE ), h=h,
xreg = dummies_hd_m_future)$mean
}
#ts names
colnames(fc) <- colnames(sales.xts_m)
If you want to use ML models for your projects
devtools::install_github("Akai01/caretForecast")
library(caretForecast)
nts <- ncol(sales.xts_m) # mumber of time series
h <- 12 # forecast horizon
fc <- matrix(nrow = h, ncol = nts)
example: Support Vector Machines with Linear Kernel. You need to change only caret_method argument to use another model, for example caret_method = "ridge" or caret_method = "rf" etc. Ref: https://github.com/Akai01/caretForecast
for (i in 1:nts) {
fc[,i] <- forecast(ARml(sales.xts_m[,i], maxlag = 12, xreg = dummies_hd_m,
caret_method = "svmLinear", seasonal = FALSE ),
h=h, xreg = dummies_hd_m_future)$mean
}
colnames(fc) <- colnames(sales.xts_m)

How to transform time series code to automated code to use for multiple time series?

I want to transform my time series code for one time series to an automated code which can be used for multiple time series data (my data contains a monthly time series).
My general approach for one time series was to remove the seasonal component and take first differences to achieve stationarity. Then I use auto.arima to get the ARIMA parameters. I use these parameters to build my ARIMA model with my original time series data. Then I forecast and compare to the actual data of 4 months (which I have cut out before) and calculate the RMSE.
As I cannot use my actual data, I just generate a random time series and test set as an example - of course the outcome does not make much sense.
library('forecast')
set.seed(123)
# create random time series and 4 months testing data
ts <- ts(runif(26, min = 50, max = 3000), start = c(2017,01), end = c(2019,02), frequency = 12)
test.data <- runif(4, min = 50, max = 3000)
# Decomompose
comp.ts = decompose(ts)
# subtrect seasonal trend
ts2 <- ts - comp.ts$seasonal
ts2 <- diff(ts2, differences=1)
auto.arima(ts2, trace = T, seasonal = TRUE,ic = 'aicc', max.p = 10,max.q = 10,max.P = 10,max.Q = 10,max.d = 10, stepwise = F)
# Use auto.arima outcome as input
my.arima <- Arima(ts2, order=c(0,0,0),seasonal = list(order = c(0,1,0), period = 12),method="ML", include.drift = F)
# Forecast and calculate RMSE
data.forecast <- forecast(my.arima, h=4, level=c(99.5))
my.difference <- test.data - data.forecast$mean
my.rmse <- (sum(sqrt(my.difference^2)))/length(my.difference)
As my actual data set contains over 500 time series, I need to automate the whole process. Unfortunately, I have not used R for time series so far, so I have problems coming up with an automated process.
Lets assume 4 random time series with 4 random test sets. How could I generate an automated process for these time series (which I can also use for my actual 500+ time series) which does the exact same thing as above?
ts1 <- ts(runif(26, min = 50, max = 3000), start = c(2017,01), end = c(2019,02), frequency = 12)
ts2 <- ts(runif(26, min = 50, max = 3000), start = c(2017,01), end = c(2019,02), frequency = 12)
ts3 <- ts(runif(26, min = 50, max = 3000), start = c(2017,01), end = c(2019,02), frequency = 12)
ts4 <- ts(runif(26, min = 50, max = 3000), start = c(2017,01), end = c(2019,02), frequency = 12)
test.data1 <- runif(4, min = 50, max = 3000)
test.data2 <- runif(4, min = 50, max = 3000)
test.data3 <- runif(4, min = 50, max = 3000)
test.data4 <- runif(4, min = 50, max = 3000)
Thanks for the help!
Just put your workflow into a function.
serialArima <- function(ts, test.data) {
library(forecast)
# Decomompose
comp.ts=decompose(ts)
# subtrect seasonal trend
ts2 <- ts - comp.ts$seasonal
ts2 <- diff(ts2, differences=1)
auto.arima(ts2, trace=T, seasonal=TRUE, ic='aicc', max.p=0, max.q=0, max.P=0,
max.Q=0, max.d=0, stepwise=F)
# Use auto.arima outcome as input
my.arima <- Arima(ts2, order=c(0, 0, 0),
seasonal=list(order=c(0, 1, 0), period=2),
method="ML", include.drift=F)
# Forecast and calculate RMSE
data.forecast <- forecast(my.arima, h=4, level=c(99.5))
my.difference <- test.data - data.forecast$mean
my.rmse <- (sum(sqrt(my.difference^2)))/length(my.difference)
return(list(data.forecast=data.forecast, my.difference=my.difference, my.rmse=my.rmse))
}
Singular application
serialArima(ts, test.data)
# ARIMA(0,0,0) with zero mean : 82.45803
# ARIMA(0,0,0) with non-zero mean : 88.13593
#
#
#
# Best model: ARIMA(0,0,0) with zero mean
#
# $data.forecast
# Point Forecast Lo 99.5 Hi 99.5
# 2020.00 -349.1424 -2595.762 1897.477
# 2020.50 772.6014 -1474.018 3019.221
# 2021.00 -349.1424 -3526.342 2828.057
# 2021.50 772.6014 -2404.598 3949.801
#
# $my.difference
# Time Series:
# Start = c(2020, 1)
# End = c(2021, 2)
# Frequency = 2
# [1] 1497.2446 840.4139 2979.4553 993.5614
#
# $my.rmse
# [1] 1577.669
Multiple application
Map(serialArima, list(ts1, ts2, ts3, ts4),
list(test.data1, test.data2, test.data3, test.data4))

multivariate state space model dlm okuns law

I'm trying to estimate an Okun's law equation with a dlm using the dlm package in R. I can estimate the non-time varying model using nls as follows:
const_coef <- nls(formula = dur~ b1*dur_lag1 + b2*(d2lgdp-b0) + b3*d2lrulc_lag2 ,
start = list(b0 =0.1, b1=0.1, b2=0.1, b3=0.1),
data = mod_data)
the dlm model I want to be able to estimate allows for b1 and b0 in the above to follow random walks. I can do this in Eviews by declaring the measurement equation and appending the states (below is some code provided by the authors of the original paper which I can replicate:
'==========================
' SPECIFY THE KALMAN FILTER
'==========================
'Priors on state variables
vector(2) mprior
mprior(1) = 4 'Prior on starting value for trend GDP growth (annual average GDP growth over 1950s)
mprior(2) = 0 'Prior on starting value for lagged dependent variable
sym(2) vprior
vprior(1,1) = 5 'Prior on variance of trend GDP growth (variance of annual GDP growth over 1950s)
vprior(2,2) = 1 'Prior on variance of lagged dependent variable
'Specify coefficient vector
coef(8) ckf
'Declare state space
sspace ss1
ss1.append dur = lag*dur(-1) + ckf(2)*(d2lgdp-trend)+ckf(3)*D2LRULC(-2)+[var=exp(ckf(4))] 'Measurement equation
ss1.append #state trend = 1*trend(-1) + [var = exp(ckf(5))] 'State equation for trend GDP growth (random walk)
ss1.append #state lag = 1*lag(-1) + [var = exp(ckf(6))] 'State equation for lagged dependent variable (random walk)
'Apply priors to state space
ss1.append #mprior mprior
ss1.append #vprior vprior
'Set parameter starting values
param ckf(2) -0.0495 ckf(3) 0.01942 ckf(4) -2.8913 ckf(5) -4.1757 ckf(6) -6.2466 'starting values for parameters
'=====================
' ESTIMATE THE MODEL
'=====================
'Estimate state space
smpl %estsd %ested 'Estimation sample
ss1.ml(m=500,showopts) 'Estimate Kalman filter by maximum likelihood
freeze(mytab) ss1.stats
I'm really not sure how to do this with the dlm package. I've tried the following:
buildSS <- function(v){
dV <- exp(v[1]) # Variance of the measurment equation (ckf4)
dW <- c(exp(v[2]), # variance of the lagged dep (ckf6)
0, # variance of the coef on d2lgdp ckf(2) set to 0
0, # variance of the coef on d2lrulc ckf(3) set to 0
exp(v[3]) # variance of the random walk intercept (ckf5)
)
beta.vec <- c(1,v[4],v[5],1) # Params ckf(2) ckf3(3)
okuns <- dlmModReg(mod_data.tvp[,-1], addInt = TRUE, dV =dV, dW = dW, m0 = beta.vec)
}
#'Set parameter starting values
ckf4Guess <- -2.8913
ckf2guess <- -0.0495
ckf3guess <- 0.01942
ckf5guess <- -4.1757
ckf6guess <- -6.2466
params <- c(ckf4Guess,
ckf5guess,
ckf6guess,
ckf2guess,
ckf3guess)
tvp_mod.mle <- dlmMLE(mod_data.tvp[,"dur"] , parm = params, build = buildSS)
tvp_mod <- buildSS(tvp_mod.mle$par)
tvp_filter <- dlmFilter(mod_data$dur,tvp_mod)
The above code runs, but the outputs are not correct. I am not specifying the the states properly. Does anyone have any experience in building dlms with mutlvirate regression in R?
I think I have gotten to a solution - I've managed to recreate the estimates in the paper which estimates this model using Eviews (also checked this using Eviews).
#--------------------------------------------------------------------------------------------------------------------------
# tvp model full model - dur = alpha*dur(-1)+ beta(dgdp-potential) + gamma*wages
#--------------------------------------------------------------------------------------------------------------------------
# Construct DLM
OkunsDLMfm <- dlm(
FF = matrix(c(1,1,1,1),ncol = 4, byrow = TRUE),
V = matrix(1),
GG = matrix(c(1,0,0,0,
0,1,0,0,
0,0,1,0,
0,0,0,1), ncol = 4, byrow = TRUE),
W = matrix(c(1,0,0,0,
0,1,0,0,
0,0,1,0,
0,0,0,1), ncol = 4, byrow = TRUE),
JFF = matrix(c(1,2,3,0),ncol = 4, byrow = TRUE),
X = cbind(mod_data$dur_lag1,mod_data$d2lgdp, mod_data$d2lrulc_lag2), # lagged dep var, dgdp, wages.
m0 = c(0,0,0,0),
C0 = matrix(c(1e+07,0,0,0,
0,1e+07,0,0,
0,0,1e+07,0,
0,0,0,1e+07), ncol = 4, byrow = TRUE)
)
buildOkunsFM <- function(p){
V(OkunsDLMfm) <- exp(p[2])
GG(OkunsDLMfm)[1,1] <- 1
GG(OkunsDLMfm)[2,2] <- 1
GG(OkunsDLMfm)[3,3] <- 1
GG(OkunsDLMfm)[4,4] <- 1
W(OkunsDLMfm)[1,1] <- exp(p[3])
W(OkunsDLMfm)[2,2] <- 0
W(OkunsDLMfm)[3,3] <- 0
W(OkunsDLMfm)[4,4] <- exp(p[4])
m0(OkunsDLMfm) <- c(0,0,0,p[1]*4)
C0(OkunsDLMfm)[1,1] <- 1
C0(OkunsDLMfm)[4,4] <- 5
return(OkunsDLMfm)
}
okuns.estfm <- dlmMLE(y = mod_data$dur, parm = c(-0.049,-1.4,-6,-5), build = buildOkunsFM)
OkunsDLM1fm <- buildOkunsFM(okuns.estfm$par)
The time varying level, the estimate of potential output, is derived by dividing the 4 element of the state vector by the second * by negative 1.
Not sure if this is best way to specify the DLM, but the results from the model are very close to what is reported (within 0.01) of the results from using Eviews. That being said, very open to any other specifications.

Fitting ARIMA model to multiple time series and storing forecast into a matrix

As it is large I can't dput it here. But suppose the realmatrix is a "mts" with non-trivial values
realmatrix <- matrix(NA, ncol = 100, nrow = 138)
In fact it stores 100 time series with length (rows) = 138 (from Jan 2005 to June 2016).
I want to store the Arima forecasts (12 months ahead: that is, from July 2016 to June 2017) in another matrix farimamatrix (which should have 12 rows and 100 columns), via the following loop:
farimamatrix <- matrix(NA, nrow = 12, ncol = 100)
m <- k <- list()
for (i in 1:100) {
try(m[[i]] <- Arima(realmatrix[,i], order = c(0,1,0), seasonal = c(1,0,1)))
k[[i]] <- forecast.Arima(m[[i]], h=12)
farimamatrix[,i] <- fitted(k[[i]])
}
But I am getting the following message:
Error in farimamatrix[, i] <- fitted(k[[i]]) :
incorrect number of subscripts on matrix
What's wrong? Thanks in advance.
Edited (24/10): updated / corrected under Zheyuan's answer and previous problem gone
Original data:
tsdata <-
structure(c(28220L, 27699L, 28445L, 29207L, 28482L, 28326L, 28322L,
28611L, 29187L, 29145L, 29288L, 29352L, 28881L, 29383L, 29898L,
29888L, 28925L, 29069L, 29114L, 29886L, 29917L, 30144L, 30531L,
30494L, 30700L, 30325L, 31313L, 32031L, 31383L, 30767L, 30500L,
31181L, 31736L, 32136L, 32654L, 32305L, 31856L, 31731L, 32119L,
31953L, 32300L, 31743L, 32150L, 33014L, 32964L, 33674L, 33410L,
31559L, 30667L, 30495L, 31978L, 32043L, 30945L, 30715L, 31325L,
32262L, 32717L, 33420L, 33617L, 34123L, 33362L, 33731L, 35118L,
35027L, 34298L, 34171L, 33851L, 34715L, 35184L, 35190L, 35079L,
35958L, 35875L, 35446L, 36352L, 36050L, 35567L, 35161L, 35419L,
36337L, 36967L, 36745L, 36370L, 36744L, 36303L, 36899L, 38621L,
37994L, 36809L, 36527L, 35916L, 37178L, 37661L, 37794L, 38642L,
37763L, 38367L, 38006L, 38442L, 38654L, 38345L, 37628L, 37698L,
38613L, 38525L, 39389L, 39920L, 39556L, 40280L, 41653L, 40269L,
39592L, 39100L, 37726L, 37867L, 38551L, 38895L, 40100L, 40950L,
39838L, 40643L, 40611L, 39611L, 39445L, 38059L, 37131L, 36697L,
37746L, 37733L, 39188L, 39127L, 38554L, 38219L, 38497L, 39165L,
40077L, 38370L, 37174L), .Dim = c(138L, 1L), .Dimnames = list(
NULL, "Data"), .Tsp = c(2005, 2016.41666666667, 12), class = "ts")
Code
library("forecast")
z <- stl(tsdata[, "Data"], s.window="periodic")
t <- z$time.series[,"trend"]
s <- z$time.series[,"seasonal"]
e <- z$time.series[,"remainder"]
# error matrix
ematrix <- matrix(rnorm(138 * 100, sd = 100), nrow = 138)
# generating a ts class error matrix
ematrixts <- ts(ematrix, start=c(2005,1), freq=12)
# combining the trend + season + error matrix into a real matrix
realmatrix <- t + s + ematrixts
# creating a (forecast) arima matrix
farimamatrix <- matrix(NA, ncol = 100, nrow = 12)
m <- k <- vector("list", length = 100)
for (i in 1:100) {
try(m[[i]] <- Arima(realmatrix[,i], order = c(0,1,0), seasonal = c(1,0,1)))
print(i)
k[[i]] <- forecast.Arima(m[[i]], h = 12)
farimamatrix[,i] <- k[[i]]$mean
}
# ts.plot(farimamatrix[,1:100],col = c(rep("gray",100),rep("red",1)))
The loop seems to work, but breaks down after a few iterations due to failure of Arima:
Error in stats::arima(x = x, order = order, seasonal = seasonal, include.mean = include.mean, : " non-stationary seasonal AR part from CSS
Yep, the previous problem is gone, and now you have a new problem, regarding the failure of Arima. Strictly speaking you should raise a new question on this. But I will answer it here anyway.
The error message is quite illustrative. When you fit a model ARIMA(0,1,0)(1,0,1), sometimes the seasonal part is non-stationary, so a further seasonal differencing is needed.
By looking at ts.plot(realmatrix),I see that all 100 columns of realmatrix are pretty similar. I will thus take out the first column for some analysis.
x <- realmatrix[,1]
Obviously the non-seasonal differencing is a must, but do we need a seasonal differencing as well? Have a check with ACF
acf(diff(x))
We actually spotted strong evidence that for the seasonal pattern. So yes, a seasonal differencing is needed.
Now let's check the ACF after both differencing:
acf(diff(diff(x, lag = 12))) ## first do seasonal diff, then non-seasonal diff
There appears to be a negative spike between season, suggesting a seasonal MA process. So ARIMA(0,1,0)(0,1,1)[12] would be a good bet.
fit <- arima(x, order = c(0,1,0), seasonal = c(0,1,1))
Have a check at the residuals:
acf(fit$residuals)
I would actually be pretty happy about this result, as there is no lag 1 or even lag 2 autocorrelation at all, and there is also no seasonal autocorrelation. You can actually try further adding a seasonal and / or non-seasonal AR(1), but there will be no improvement. So this is our final model to go.
So use the following loop:
farimamatrix <- matrix(NA, ncol = 100, nrow = 12)
m <- k <- vector("list", length = 100)
for (i in 1:100) {
m[[i]] <- Arima(realmatrix[,i], order = c(0,1,0), seasonal = c(0,1,1))
print(i)
k[[i]] <- forecast.Arima(m[[i]], h = 12)
farimamatrix[,i] <- k[[i]]$mean
}
Now all 100 model fitting are successful.
---------
A retrospect reflection
Perhaps I should explain why ARIMA(0,1,0)(1,0,1)[12] models works for my simulated data in the initial answer. Because note how I simulate my data:
seasonal <- rep_len(sin((1:12) * pi / 6), 138)
Yes, the underlying seasonal pattern is a true replication and of course stationary.

Resources