Adding theta model with fable forecasting estimates - r

I want to use theta model implemented in Forecast package inside my fable forecasting model. This what I am trying to do.
library(forecast)
library(tidyverse)
library(fable)
library(tsibble)
library(fabletools)
tourism_aus <- tourism %>%
summarise(Trips = sum(Trips))
tourism_aus
fit <- tourism_aus %>%
model(
ets = ETS(Trips),
arima = ARIMA(Trips),
theta = forecast::thetaf(Trips)
) %>%
mutate(
average = (ets + arima + theta) / 3
)
fit
fit %>%
forecast(h = "2 years") %>%
autoplot(tourism_aus, level = 95, alpha = 0.5)
I am having this error message,
Failure in thetaf(Trips) : Objekt 'Trips' not found
Is there any way I can use theta method inside fable?

Models from the forecast package use a different interface, and so are not compatible with the model() function used by fable. The theta model will be added to fable in the next release.
You can create a fable yourself, by using the forecast output of forecast::thetaf() to identify an appropriate distribution. This can be useful for plotting, accuracy evaluation and reconciliation, however ensembling requires models to use the fable interface.
Update:
The THETA() model has now been added to fable.

Related

How to get multiple-steps ahead forecast with STL model in fable-r?

My purpose is forecast multiple-step without re-estimation. And I will update new observation to next forecast.
I did not using fit and apply forecast(h=7) because this function using fitted value to forecast next observation.
I used following codes to get 1-step ahead forecast with stretch_tsibble to do it.
library(fable)
library(dplyr)
library(tsibble)
library(feasts)
us_accidental_deaths <- as_tsibble(USAccDeaths)
stretch_dt <- us_accidental_deaths %>%
stretch_tsibble(.init = 60, .step = 1)
fit_train <- stretch_dt %>%
# keep same estimate period with each .id
filter_index(. ~ '1977 Dec') %>%
model(stl_ets_mod = decomposition_model(
STL(value, ~ season(window = 12)),
ETS(season_adjust ~ season("N")),
SNAIVE(season_year)
),
arima_mod = ARIMA(value))
It's ok when I refit ARIMA model
fit_train %>%
select(arima_mod) %>%
refit(stretch_dt) %>%
forecast(h = 1)
But I met error when I refit STL model.
fit_train %>%
select(stl_ets_mod) %>%
refit(stretch_dt) %>%
forecast(h = 1)
Many thanks !!!
The error you are getting is
! no applicable method for 'refit' applied to an object of class "c('decomposition_model', 'model_combination')"
refit() is not available for all models.
It is not clear how a refit should work for an STL decomposition. The STL components are specific to the data set used for training. If the model is applied to a different data set, potentially of a different length, what should the components be?

Hierarchical time series forecasting using Fable in R

I am doing hierarchical time series forecasting using fable. I am using optimal reconciliation method to reconcile the forecast. Here is the example code.
agg_sw <- df %>%
aggregate_key(productcategory/brand/sku, sales = sum(sales))
#Fit the model
ets_fit <- agg_sw %>%
model(ets = ETS(sales)) %>%
reconcile(ols = min_trace(ets, method = "ols"))
# Forecast
fc <- forecast(ets_fit,h= "1 year")
Is it possible to use different forecasting method at each level(eg:sku/brand/product) and reconcile? If so, kindly let me know how to do it.

When ets() is used, why R is not responding and crashes?

I am trying to find the best model to forecast the average monthly rainfall of a particular region.
So far I have used a a seasonal naive method and SARIMA. But when trying to run ets(), R crashes without producing an output.
I tend to use fable and fabletools. The followup of forecast. Using package fpp3 loads all the needed packages for working with tsibbles, dplyr and date objects.
I don't have any issues running any forecasts methods on your data. I tried both fable and forecast and get the same outcomes. See code below.
# load your data
df1 <- readxl::read_excel("datasets/Copy.xlsx")
colnames(df1) <- c("date", "rainfall")
library(fpp3)
fit <- df1 %>%
mutate(date = yearmonth(date)) %>%
as_tsibble() %>%
model(ets = ETS(rainfall))
report(fit)
Series: rainfall
Model: ETS(M,N,A)
Smoothing parameters:
alpha = 0.002516949
gamma = 0.0001065384
Initial states:
l[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6] s[-7] s[-8] s[-9] s[-10]
86.7627 -77.53686 -57.90353 -18.72201 86.57944 150.0896 166.8125 60.45602 -39.25331 -55.94238 -68.85851 -70.52719
s[-11]
-75.19377
sigma^2: 0.1109
AIC AICc BIC
2797.766 2799.800 2850.708
Using forecast:
library(forecast)
fit <- forecast::ets(ts(df1[, 2], frequency = 12))
fit
ETS(M,N,A)
Call:
forecast::ets(y = ts(df1[, 2], frequency = 12))
Smoothing parameters:
alpha = 0.0025
gamma = 1e-04
Initial states:
l = 86.7627
s = -77.5369 -57.9035 -18.722 86.5794 150.0896 166.8125
60.456 -39.2533 -55.9424 -68.8585 -70.5272 -75.1938
sigma: 0.333
AIC AICc BIC
2797.766 2799.800 2850.708

How to extract random intercepts from mixed effects Tidymodels

I am trying to extract random intercepts from tidymodels using lme4 and multilevelmod. I able to do this using lme4 below:
Using R and lme4:
library("tidyverse")
library("lme4")
# set up model
mod <- lmer(Reaction ~ Days + (1|Subject),data=sleepstudy)
# create expanded df
expanded_df <- with(sleepstudy,
data.frame(
expand.grid(Subject=levels(Subject),
Days=seq(min(Days),max(Days),length=51))))
# create predicted df with **random intercepts**
predicted_df <- data.frame(expanded_df,resp=predict(mod,newdata=expanded_df))
predicted_df
# plot intercepts
ggplot(predicted_df,aes(x=Days,y=resp,colour=Subject))+
geom_line()
Using tidymodels:
# example from
# https://github.com/tidymodels/multilevelmod
library("multilevelmod")
library("tidymodels")
library("tidyverse")
library("lme4")
#> Loading required package: parsnip
data(sleepstudy, package = "lme4")
# set engine to lme4
mixed_model_spec <- linear_reg() %>% set_engine("lmer")
# create model
mixed_model_fit_tidy <-
mixed_model_spec %>%
fit(Reaction ~ Days + (1 | Subject), data = sleepstudy)
expanded_df_tidy <- with(sleepstudy,
data.frame(
expand.grid(Subject=levels(Subject),
Days=seq(min(Days),max(Days),length=51))))
predicted_df_tidy <- data.frame(expanded_df_tidy,resp=predict(mixed_model_fit_tidy,new_data=expanded_df_tidy))
ggplot(predicted_df_tidy,aes(x=Days,y=.pred,colour=Subject))+
geom_line()
Using the predict() function seems to gives only the fixed effect predictions.
Is there a way to extract the random intercepts from tidymodels and multilevelmod? I know the package is still in development so it might not be possible at this stage.
I think you can work around this as follows:
predicted_df_tidy <- mutate(expanded_df_tidy,
.pred = predict(mixed_model_fit_tidy,
new_data=expanded_df_tidy,
type = "raw", opts=list(re.form=NULL)))
bind_cols() instead of mutate() might be useful in some circumstances?
the issue is that multilevelmod internally sets the default for prediction to re.form = NA; the code above resets it to re.form = NULL (which is the lme4 default, i.e. include all random effects in the prediction)
If you actually want the random intercepts (only) I guess you could predicted_df_tidy %>% filter(Days==0)
PS If you want to be more 'tidy' about this I think you can use purrr::cross_df() in place of expand.grid and pipe the results directly to mutate() ...

Plotting the Hazard Function in R using survminer?

We can use survminer to plot the survival function or cumulative hazard function, but I cannot see a way to use it to plot the hazard function.
For example,
library(survival)
library(tidyverse)
library(survminer)
data(lung)
# Run Kaplan-Meier on the data
mod.lung <- survfit(Surv(time, status) ~ 1, data = lung)
# Kaplan-Meier Survival Curve
ggsurvplot(mod.lung)
# Cumulative Hazard
ggsurvplot(mod.lung, fun = function(y) -log(y))
Since the cumulative hazard function is H(t) = -log(S(t)) then I just need to add in fun = function(y) -log(y) to get the cumulative hazard plot.
The hazard function is h(t) = -d/dt log(S(t)), and so I am unsure how to use this to get the hazard function in a survminer plot.
An alternative definition of the hazard function is h(t) = f(t)/S(t), however, I'm unsure how to use this to get the plot.
I have found ways to get a hazard plot using ggplot2, for example
survival.table1 <- broom::tidy(mod.lung) %>% filter(n.event > 0)
survival.table1 <- survival.table1 %>% mutate(hazard = n.event / (n.risk * (lead(time) - time)))
ggplot() +
geom_step(data = survival.table1, aes(x = time, y = hazard)) +
labs(x = "Time", y = "Hazard")
However, I mainly wish to find a way with the survminer package, partly to have some consistency.
Thanks
In the rms package, you can using the survplot function with "what" parameter specified as "hazard" to plot the hazard function verse time.
https://rdrr.io/cran/rms/man/survplot.html
A couple of years late, but here for others.
survplot can only be used to plot the hazard if the estimate was created by the psm function. The psm function of the rms library fits the accelerated failure time family of parametric survival models (defaulting to a Weibull distribution). Other available distributions are in the documentation for the survreg package:
These include "weibull", "exponential", "gaussian", "logistic","lognormal" and "loglogistic". Otherwise, it is assumed to be a user defined list conforming to the format described in survreg.distributions
library(rms)
mod.lung <- psm(Surv(time, status) ~ 1, data = lung)
survplot(mod.lung, what="hazard")
For non parametric survival models, the muhaz library might be more useful. This example uses the default "epanechnikov" boundary kernel function. You may wish to explore different bandwidth options - see the muhaz package documentation.
library(muhaz)
mod.lung <- muhaz(lung$time, lung$status - 1) # status must be 1 for failure and 0 for censored
plot(mod.lung)
Alternatively, to apply B-splines instead of kernel density smoothing to the hazard, have a look at the bshazard library
library(bshazard)
mod.lung <- bshazard(Surv(time, status) ~ 1, data = lung)
plot(mod.lung)

Resources