I asked this question over at RStudio community and received no answer so I figured I'd give it a go here. My question pertains to what budugulo asked here Select models with lowest RMSE but I'm wondering how I can go further and use the models with the best predictive capability against the test data and apply it across the entire original hierarchical dataset to get future observations.
I understand how to forecast into the future with one individual time series, but I'm trying to forecast a hierarchical dataset that would require too much time to forecast the best models onto all of the original time series individually to forecast future observations. Is there a way to fit the best models (using lowest RMSE) onto the original time series in a hierarchical dataset to forecast future observations 3 years into the future (2020)? I tried using refit() but to no avail.
Hopefully the code below will help towards answering my question.
library(tidyverse)
library(tsibble)
library(fable)
library(fpp3)
fit <- tourism %>%
filter(Quarter <= yearquarter("2015 Q1")) %>%
model(
ets = ETS(Trips),
arima = ARIMA(Trips)
)
fc <- fit %>%
forecast(new_data = filter(tourism, Quarter > yearquarter("2015 Q1")))
bestrmse <- accuracy(fc, tourism) %>%
group_by(Region, State, Purpose) %>%
filter(RMSE == min(RMSE)) %>%
select(.model:Region)
bestfits <- fit %>%
pivot_longer(cols=ets:arima, names_to = ".model", values_to = "fit") %>%
right_join(bestrmse) %>%
mutate(.model = "best") %>%
pivot_wider(Region:Purpose, names_from = ".model", values_from = "fit") %>%
as_mable(key = c(Region, State, Purpose), model = best)
#Apply 'best' models from bestfits onto original non-trained/non-tested time series and
#forecast future observations into 2020.
Related
Please consider this minimal reproducible example of a random forest regression estimate
library(randomForest)
# fix missing data
airquality <- na.roughfix(airquality)
set.seed(123)
#fit the random forest model
rf_fit <- randomForest(formula = Ozone ~ ., data = airquality)
#define new observation
new <- data.frame(Solar.R=250, Wind=8, Temp=70, Month=5, Day=5)
set.seed(123)
#use predict all on new observation
rf_predict<-predict(rf_fit, newdata=new, predict.all = TRUE)
rf_predict$aggregate
library(tidyverse)
predict_mean <- rf_predict$individual %>%
as_tibble() %>%
rowwise() %>%
transmute(avg = mean(V1:V500))
predict_mean
I was expecting to get the same value by rf_predict$aggregate and predict_mean
Where and why am I wrong about this assumption?
My final objective is to get a confidence interval of the predicted value.
I believe your code needs to include a c_across() call for the calculation to be performed correctly:
The ?c_across documentations tells us:
c_across() is designed to work with rowwise() to make it easy to
perform row-wise aggregations.
predict_mean <- rf_predict$individual %>%
as_tibble() %>%
rowwise() %>%
transmute(avg = mean(c_across(V1:V500)))
>predict_mean
[1] 30.5
An answer to a previous question, points out that mean() can't handle a data.frame. And in your code the data being provide to mean() is a row-wise data frame with class rowwise_df. c_across allows the data in the rows to be presented to mean() as vectors (I think).
My purpose is forecast multiple-step without re-estimation. And I will update new observation to next forecast.
I did not using fit and apply forecast(h=7) because this function using fitted value to forecast next observation.
I used following codes to get 1-step ahead forecast with stretch_tsibble to do it.
library(fable)
library(dplyr)
library(tsibble)
library(feasts)
us_accidental_deaths <- as_tsibble(USAccDeaths)
stretch_dt <- us_accidental_deaths %>%
stretch_tsibble(.init = 60, .step = 1)
fit_train <- stretch_dt %>%
# keep same estimate period with each .id
filter_index(. ~ '1977 Dec') %>%
model(stl_ets_mod = decomposition_model(
STL(value, ~ season(window = 12)),
ETS(season_adjust ~ season("N")),
SNAIVE(season_year)
),
arima_mod = ARIMA(value))
It's ok when I refit ARIMA model
fit_train %>%
select(arima_mod) %>%
refit(stretch_dt) %>%
forecast(h = 1)
But I met error when I refit STL model.
fit_train %>%
select(stl_ets_mod) %>%
refit(stretch_dt) %>%
forecast(h = 1)
Many thanks !!!
The error you are getting is
! no applicable method for 'refit' applied to an object of class "c('decomposition_model', 'model_combination')"
refit() is not available for all models.
It is not clear how a refit should work for an STL decomposition. The STL components are specific to the data set used for training. If the model is applied to a different data set, potentially of a different length, what should the components be?
I am doing hierarchical time series forecasting using fable. I am using optimal reconciliation method to reconcile the forecast. Here is the example code.
agg_sw <- df %>%
aggregate_key(productcategory/brand/sku, sales = sum(sales))
#Fit the model
ets_fit <- agg_sw %>%
model(ets = ETS(sales)) %>%
reconcile(ols = min_trace(ets, method = "ols"))
# Forecast
fc <- forecast(ets_fit,h= "1 year")
Is it possible to use different forecasting method at each level(eg:sku/brand/product) and reconcile? If so, kindly let me know how to do it.
(Using Orange dataset from library(Ecdat) for reproducibility.)
I am trying to fit a mean forecasting model in R using tsibble, fable package in R. The code below is pretty simple, however I get the error Error in NCOL(x) : object 'value' not found when I try to run the last model part (even though value is a column name in o_ts), not sure why would that be. I am following RJH tutorials from here (https://robjhyndman.com/hyndsight/fable/).
I would also appreciate any help whether arima & mean forecasting model are same, if not what is the function that I should be using instead of Arima.
library(Ecdat)
library(tsibble)
library(feasts)
library(tidyverse)
library(fable)
o<- Orange
o_ts <- o %>% as_tsibble()
o_ts %>%
filter(key=="priceoj") %>%
model(
arima=arima(value))
arima is from the stats package. I believe you want ARIMA from fable.
o_ts %>%
filter(key == "priceoj") %>%
model(
arima = ARIMA(value)
)
#> # A mable: 1 x 2
#> # Key: key [1]
#> key arima
#> <chr> <model>
#> 1 priceoj <ARIMA(1,1,0)(0,0,1)[12]>
If you by mean forecasting model are referring to taking the mean of the last X observation (Moving Average), then you should be using MEAN.
While ARIMA does refer to Moving Average (Auto Regressive Integrated Moving Average), however this refers to a weighted moving average of the forecast errors - you can read more here: 9.4 Moving average models in Forecasting: Principles and Practice
o <- Orange
o_ts <- o %>% as_tsibble()
o_ts %>%
filter(key == "priceoj") %>%
model(mean = MEAN(value))
If you want to specify the amount of observations to take the mean of, then you need to add the special ~window(size = X). Otherwise all observations are used.
o_ts %>%
filter(key == "priceoj") %>%
model(mean = MEAN(value ~ window(size = 3)))
Suppose I am creating a recipe for my machine learning model, and I need to preprocess my outcome.
How do I reverse the preprocess my outcome or my predictors?
If I preprocess my outcome, how to reverse the output of a model to the original scale?
library(recipes)
biomass <- biomass
rec <- biomass %>%
recipe(carbon ~ hydrogen ) %>%
step_BoxCox(all_outcomes()) %>%
prep()
biomass_box <- rec %>% bake(biomass)
In this example I have made a BoxCox Transformation on my outcome. How do I get biomass_box$carbon back to its original values? recipes may have an easy way of undoing it, but I've been unable to find it.
Have you tried using step_inverse()?
library(recipes)
biomass <- biomass
rec <- biomass %>%
recipe(carbon ~ hydrogen ) %>%
step_BoxCox(all_outcomes()) %>%
prep() %>%
step_inverse(all_predictors())