How to forecast and fit the optimal model for multiple time series? - r

I want to do batch forecasting among multiple series, for example, if I want to forecast time series with IDs that end with 1(1,11,21,31...), how can I do that?

Since you did not provide detailed information, I was not sure which forecasting method you want to use hence I give here an example of a univariate time series model:
Load required packages:
library(forecast)
library(dplyr)
We use example data from Rob Hyndman:
dta <- read.csv("https://robjhyndman.com/data/ausretail.csv")
Now change the column names:
colnames(dta) <- c("date", paste0("tsname_", seq_len(ncol(dta[,-1]))))
Select timeseries which end with 1:
dta_ends_with1 <- dplyr::select(dta, dplyr::ends_with("1"))
Create a ts object:
dta_ends_with1 <- ts(dta_ends_with1, start = c(1982,5), frequency = 12)
Specify how many steps ahead you want to forecast, here I set it to 6 steps ahead,
h <- 6
Now we prepare a matrix to save the forecast:
fc <- matrix(NA, ncol = ncol(dta_ends_with1), nrow = h)
Forecasting loop.
for (i in seq_len(ncol(dta_ends_with1))) {
fc[,i] <- forecast::forecast(forecast::auto.arima(dta_ends_with1[,i]),
h = h)$mean
}
Set the column names:
colnames(fc) <- colnames(dta_ends_with1)
head(fc)

Related

Time series prediction with and without NAs (ARIMA and Forecast package) in R

This is my first question on stack overflow.
Situation: I have 2 time series. Both series have the same values but the second series has 5 NAs at the start. Hence, first series has 105 observations, where 2nd series has 110 observations. I have fitted an ARIMA(0,1,0) using the Arima function to both series separately. And then I used the forecast package to predict 10 steps to the future.
Issue: Even though the ARIMA coefficient for both series are the same, the projections (10 steps) appear to be different. I am uncertain why this is the case. Has anyone come across this before? Any guidance is highly appreciated.
Tried: I tried setting seed, creating index manually, and using auto.ARIMA for the model fitting. However, none of the steps has helped me to reconcile the difference.
I have added a picture to show you what I see. Please note I have hidden the mid part of the series so that you can see the start and the end of the series. The yellow highlighted cells are the projection outputs from the 'Forecast' package. I have manually added the index to be years after extracting the results from R.
Time series projected and base in excel
Rates <- read.csv("Rates_for_ARIMA.csv")
set.seed(123)
#ARIMA with NA
Simple_Arima <- Arima(
ts(Rates$Rates1),
order = c(0,1,0),
include.drift = TRUE)
fcasted_Arima <- forecast(Simple_Arima, h = 10)
fcasted_Arima$mean
#ARIMA Without NA
Rates2 <- as.data.frame(Rates$Rates2)
##Remove the final spaces from the CSV
Rates2 <- Rates2[-c(106,107,108,109,110),]
Simple_Arima2 <- Arima(
ts(Rates2),
order = c(0,1,0),
include.drift = TRUE)
fcasted_Arima2 <- forecast(Simple_Arima2, h = 10)
fcasted_Arima2$mean
The link to data is here, CSV format
Could you share your data and code such that others can see if there is any issue with it?
I tried to come up with an example and got the same results for both series, one that includes NAs and one that doesn't.
library(forecast)
library(xts)
set.seed(123)
ts1 <- arima.sim(model = list(0, 1, 0), n = 105)
ts2 <- ts(c(rep(NA, 5), ts1), start = 1)
fit1 <- forecast::Arima(ts1, order = c(0, 1, 0))
fit2 <- forecast::Arima(ts2, order = c(0, 1, 0))
pred1 <- forecast::forecast(fit1, 10)
pred2 <- forecast::forecast(fit2, 10)
forecast::autoplot(pred1)
forecast::autoplot(pred2)
> all.equal(as.numeric(pred1$mean), as.numeric(pred2$mean))
[1] TRUE

Plotting Forecast and Real values in one plot using a Rolling Window

I have a code which takes the input as the Yield Spread (dependent var.) and Forward Rates(independent var.) and operate an auto.arima to get the orders. Afterwards, I am forecasting the next 25 dates (forc.horizon). My training data are the first 600 (training). Then I am moving the time window 25 dates, meaning using the data from 26 to 625, estimating the auto.arima and then forecasting the data from 626 to 650 and so on. My data sets are 2298 rows (date) and 30 columns (maturity).
I want to store all of the forecasts and then plot the forecasted and real values in the same plot.
This is the code I have, but it doesn't store the forecasts in a way to plot later.
forecast.func <- function(NS.spread, ind.v, maturity, training, forc.horizon){
NS.spread <- NS.spread/100
forc <- c()
j <- 0
for(i in 1:floor((nrow(NS.spread)-training)/forc.horizon)){
# test data
y <- NS.spread[(1+j):(training+j) , maturity]
f <- ind.v[(1+j):(training+j) , maturity]
# auto- arima
c <- auto.arima(y, xreg = f, test= "adf")
# forecast
e <- ind.v[(training+j+1):(training+j+forc.horizon) , maturity]
h <- forecast(c, xreg = lagmatrix(e, -1))
forc <- c(forc, list(h))
j <- j + forc.horizon
}
return(forc)
}
a <- forecast.func(spread.NS.JPM, Forward.rate.JPM, 10, 600, 25)
lapply(a, plot)
Here's a link to my two datasets:
https://drive.google.com/drive/folders/1goCxllYHQo3QJ0IdidKbdmfR-DZgrezN?usp=sharing
LOOK AT THE END for a full functional example on how to handle AUTO.ARIMA MODEL with DAILY DATA using XREG and FOURIER SERIES with ROLLING STARTING TIMES and cross validated training and test.
Without a reproducible example no one can help you, because they can't run your code. You need to provide data. :-(
Even if it's not part of StackOverflow to discuss statistics matters, why don't you do an auto.arima with xreg instead of lm + auto.arima on residuals? Especially, considering how you forecast at the end, that training method looks really wrong. Consider using:
fit <- auto.arima(y, xreg = lagmatrix(f, -1))
h <- forecast(fit, xreg = lagmatrix(e, -1))
auto.arima will automatically calculate the best parameters by max likelihood.
On your coding question..
forc <- c() should be outside of the for loop, otherwise at every run you delete your previous results.
Same for j <- 0: at every run you're setting it back to 0. Put it outside if you need to change its value at every run.
The output of forecast is an object of class forecast, which is actually a type of list. Therefore, you can't use cbind effectively.
I'm my opinion, you should create forc in this way: forc <- list()
And create a list of your final results in this way:
forc <- c(forc, list(h)) # instead of forc <- cbind(forc, h)
This will create a list of objects of class forecast.
You can then plot them with a for loop by getting access at every object or with a lapply.
lapply(output_of_your_function, plot)
This is as far as I can go without a reproducible example.
FINAL EDIT
FULL FUNCTIONAL EXAMPLE
Here I try to sum up a conclusion out of the million comments we wrote.
With the data you provided, I built a code that can handle everything you need.
From training and test to model, till forecast and finally plotting which have the X axis with the time as required in one of your comments.
I removed the for loop. lapply is much better for your case.
You can leave the fourier series if you want to. That's how Professor Hyndman suggests to handle daily time series.
Functions and libraries needed:
# libraries ---------------------------
library(forecast)
library(lubridate)
# run model -------------------------------------
.daily_arima_forecast <- function(init, training, horizon, tt, ..., K = 10){
# create training and test
tt_trn <- window(tt, start = time(tt)[init] , end = time(tt)[init + training - 1])
tt_tst <- window(tt, start = time(tt)[init + training], end = time(tt)[init + training + horizon - 1])
# add fourier series [if you want to. Otherwise, cancel this part]
fr <- fourier(tt_trn[,1], K = K)
frf <- fourier(tt_trn[,1], K = K, h = horizon)
tsp(fr) <- tsp(tt_trn)
tsp(frf) <- tsp(tt_tst)
tt_trn <- ts.intersect(tt_trn, fr)
tt_tst <- ts.intersect(tt_tst, frf)
colnames(tt_tst) <- colnames(tt_trn) <- c("y", "s", paste0("k", seq_len(ncol(fr))))
# run model and forecast
aa <- auto.arima(tt_trn[,1], xreg = tt_trn[,-1])
fcst <- forecast(aa, xreg = tt_tst[,-1])
# add actual values to plot them later!
fcst$test.values <- tt_tst[,1]
# NOTE: since I modified the structure of the class forecast I should create a new class,
# but I didnt want to complicate your code
fcst
}
daily_arima_forecast <- function(y, x, training, horizon, ...){
# set up x and y together
tt <- ts.intersect(y, x)
# set up all starting point of the training set [give it a name to recognize them later]
inits <- setNames(nm = seq(1, length(y) - training, by = horizon))
# remove last one because you wouldnt have enough data in front of it
inits <- inits[-length(inits)]
# run model and return a list of all your models
lapply(inits, .daily_arima_forecast, training = training, horizon = horizon, tt = tt, ...)
}
# plot ------------------------------------------
plot_daily_forecast <- function(x){
autoplot(x) + autolayer(x$test.values)
}
Reproducible Example on how to use the previous functions
# create a sample data
tsp(EuStockMarkets) <- c(1991, 1991 + (1860-1)/365.25, 365.25)
# model
models <- daily_arima_forecast(y = EuStockMarkets[,1],
x = EuStockMarkets[,2],
training = 600,
horizon = 25,
K = 5)
# plot
plots <- lapply(models, plot_daily_forecast)
plots[[1]]
Example for the author of the post
# your data
load("BVIS0157_Forward.rda")
load("BVIS0157_NS.spread.rda")
spread.NS.JPM <- spread.NS.JPM / 100
# pre-work [out of function!!!]
set_up_ts <- function(m){
start <- min(row.names(m))
end <- max(row.names(m))
# daily sequence
inds <- seq(as.Date(start), as.Date(end), by = "day")
ts(m, start = c(year(start), as.numeric(format(inds[1], "%j"))), frequency = 365.25)
}
mts_spread.NS.JPM <- set_up_ts(spread.NS.JPM)
mts_Forward.rate.JPM <- set_up_ts(Forward.rate.JPM)
# model
col <- 10
models <- daily_arima_forecast(y = mts_spread.NS.JPM[, col],
x = stats::lag(mts_Forward.rate.JPM[, col], -1),
training = 600,
horizon = 25,
K = 5) # notice that K falls between ... that goes directly to the inner function
# plot
plots <- lapply(models, plot_daily_forecast)
plots[[5]]

Export or save tsclust model

I am doing a time-series clustering on a large dataset (3000 time-series, with > 50 points each). Thus, I was wondering if once I have finished the analysis, it would be possible to:
export the model so that I can cluster new series swiftly.
export the "centroids" so that I can use them as the template for matching new series.
A simple MRE could look like this
my_matrix <- matrix(rnorm(1000), 100, 10)
k <- 5
clustering <- tsclust(my_matrix , k = k, trace = T)
# somehow saveRDS(clustering)
# new series coming
y <- rnorm(10)
my_clusters <- loadRDS(clustering)
# Somehow clusternew(y, my_clusters)
Thank you,
S

Forecasting multiple variable time series in R

I am trying to forecast three variables using R, but I am running into issues on how to deal with correlation.
The three variables I am trying to forecast are Revenue, Subscriptions and Price.
My initial approach was to do two independent time series forecast of subscriptions and price and multiply the outcomes to generate the revenue forecast.
I wanted to understand if this approach makes sense, as there is an inherent correlation between the price and the subscribers, and this is the part I do not know how to deal with.
# Load packages.
library(forecast)
# Read data
data <- read.csv("data.csv")
data.train <- data[0:57,]
data.test <- data[58:72,]
# Create time series for variables of interest
data.subs <- ts(data.train$subs, start=c(2014,1), frequency = 12)
data.price <- ts(data.train$price, start=c(2014,1), frequency = 12)
#Create model
subs.stlm <- stlm(data.subs)
price.stlm <- stlm(data.price)
#Forecast
subs.pred <- forecast(subs.stlm, h = 15, level = c(0.6, 0.75, 0.9))
price.pred <- forecast(price.stlm, h = 15, level = c(0.6, 0.75, 0.9))
Any help is greatly appreciated!
Looks like you can use the vector autoregression (VAR) model. Take a look at the description and the code provided here:
https://otexts.org/fpp2/VAR.html

How to get top down forecasts using `hts::combinef()`?

I am trying to compare forecast reconciliation methods from the hts package on previously existing forecasts. The forecast.gts function is not available to me since there is no computationally tractable way to create a user defined function that returns the values in a forecast object. Because of this, I am using the combinef() function in the package to redistribute the forecasts. I have been able to work of the proper weights to get the wls and nseries methods, and the ols version is the default. I was able to get the "bottom up" method using:
# Creates sample forecasts, taken from `combinef()` example
library(hts)
h <- 12
ally <- aggts(htseg1)
allf <- matrix(NA, nrow = h, ncol = ncol(ally))
for(i in 1:ncol(ally))
allf[,i] <- forecast(auto.arima(ally[,i]), h = h, PI = FALSE)$mean
allf <- ts(allf, start = 51)
# create the weight vector
numTS <- ncol(allf) # Get the total number of series
numBaseTS <- sum(tail(htseg1$nodes, 1)[[1]]) # Get the number of bottom level series
# Create weights of 0 for all aggregate ts and 1 for the base level
weightVals <- c(rep(0, numTS - numBaseTS), rep(1, numBaseTS))
y.f <- combinef(allf, htseg1$nodes, weights = weightVals)
I was hoping that something like making the first weight 1 and the rest 0 might give me one of the three top down forecast, but that just results in a bunch of 0s or NaN values depending on how you try to look at it.
combinef(allf, htseg1$nodes, weights = c(1, rep(0, numTS - 1)))
I know the top down methods aren't the hardest thing to compute manually, and I can just write a function to do that, but are there any tools in the hts package that can help with this? I'd like to keep the data format consistent to simplify my analysis. Most specifically, I would like to get the "top down forcasted proportions" or tdfp method.
The functions to reconcile the forecasts using the "top-down" method are currently not exported. Probably I should export them to make the "top-down" results as tractable as combinef() in the next version. The workaround is as follows:
hts:::TdFp(allf, nodes = htseg1$nodes)
Hope it helps.

Resources