When ets() is used, why R is not responding and crashes? - r

I am trying to find the best model to forecast the average monthly rainfall of a particular region.
So far I have used a a seasonal naive method and SARIMA. But when trying to run ets(), R crashes without producing an output.

I tend to use fable and fabletools. The followup of forecast. Using package fpp3 loads all the needed packages for working with tsibbles, dplyr and date objects.
I don't have any issues running any forecasts methods on your data. I tried both fable and forecast and get the same outcomes. See code below.
# load your data
df1 <- readxl::read_excel("datasets/Copy.xlsx")
colnames(df1) <- c("date", "rainfall")
library(fpp3)
fit <- df1 %>%
mutate(date = yearmonth(date)) %>%
as_tsibble() %>%
model(ets = ETS(rainfall))
report(fit)
Series: rainfall
Model: ETS(M,N,A)
Smoothing parameters:
alpha = 0.002516949
gamma = 0.0001065384
Initial states:
l[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6] s[-7] s[-8] s[-9] s[-10]
86.7627 -77.53686 -57.90353 -18.72201 86.57944 150.0896 166.8125 60.45602 -39.25331 -55.94238 -68.85851 -70.52719
s[-11]
-75.19377
sigma^2: 0.1109
AIC AICc BIC
2797.766 2799.800 2850.708
Using forecast:
library(forecast)
fit <- forecast::ets(ts(df1[, 2], frequency = 12))
fit
ETS(M,N,A)
Call:
forecast::ets(y = ts(df1[, 2], frequency = 12))
Smoothing parameters:
alpha = 0.0025
gamma = 1e-04
Initial states:
l = 86.7627
s = -77.5369 -57.9035 -18.722 86.5794 150.0896 166.8125
60.456 -39.2533 -55.9424 -68.8585 -70.5272 -75.1938
sigma: 0.333
AIC AICc BIC
2797.766 2799.800 2850.708

Related

How to get multiple-steps ahead forecast with STL model in fable-r?

My purpose is forecast multiple-step without re-estimation. And I will update new observation to next forecast.
I did not using fit and apply forecast(h=7) because this function using fitted value to forecast next observation.
I used following codes to get 1-step ahead forecast with stretch_tsibble to do it.
library(fable)
library(dplyr)
library(tsibble)
library(feasts)
us_accidental_deaths <- as_tsibble(USAccDeaths)
stretch_dt <- us_accidental_deaths %>%
stretch_tsibble(.init = 60, .step = 1)
fit_train <- stretch_dt %>%
# keep same estimate period with each .id
filter_index(. ~ '1977 Dec') %>%
model(stl_ets_mod = decomposition_model(
STL(value, ~ season(window = 12)),
ETS(season_adjust ~ season("N")),
SNAIVE(season_year)
),
arima_mod = ARIMA(value))
It's ok when I refit ARIMA model
fit_train %>%
select(arima_mod) %>%
refit(stretch_dt) %>%
forecast(h = 1)
But I met error when I refit STL model.
fit_train %>%
select(stl_ets_mod) %>%
refit(stretch_dt) %>%
forecast(h = 1)
Many thanks !!!
The error you are getting is
! no applicable method for 'refit' applied to an object of class "c('decomposition_model', 'model_combination')"
refit() is not available for all models.
It is not clear how a refit should work for an STL decomposition. The STL components are specific to the data set used for training. If the model is applied to a different data set, potentially of a different length, what should the components be?

Hierarchical time series forecasting using Fable in R

I am doing hierarchical time series forecasting using fable. I am using optimal reconciliation method to reconcile the forecast. Here is the example code.
agg_sw <- df %>%
aggregate_key(productcategory/brand/sku, sales = sum(sales))
#Fit the model
ets_fit <- agg_sw %>%
model(ets = ETS(sales)) %>%
reconcile(ols = min_trace(ets, method = "ols"))
# Forecast
fc <- forecast(ets_fit,h= "1 year")
Is it possible to use different forecasting method at each level(eg:sku/brand/product) and reconcile? If so, kindly let me know how to do it.

Tbl_regression from the gtsummary package for negative binomial regressions

To assess if there is an association between certain groups of patients (patient_group; categorical variable) and a disease (disease_outcome; count variable) I am running negative binomial regression models (due to overdispersion). To check for confounding by other variables I am running 3 models with increasing amounts of covariates.
To display the IRRs and CIs i want to use the tbl_regression function from the package gtsummary (I am using the latest version 1.3.7.9022). However, calling the function returns the IRR and the corresponding 95% CIs non-exponentiated, even though I put exponentiate=TRUE:
# Load packages
library(haven)
library(magrittr)
library(MASS)
library(dplyr)
install.packages("gtsummary")
remotes::install_github("ddsjoberg/gtsummary")
library(gtsummary)
# Load example data.
dat <- read_dta("https://stats.idre.ucla.edu/stat/stata/dae/nb_data.dta")
dat <- within(dat, {
prog <- factor(prog, levels = 1:3, labels = c("General", "Academic", "Vocational"))
id <- factor(id)
})
# Run negative binomial regression and pipe in the tbl_regression function
Model 1 <-
glm.nb(data=dat, formula=daysabs ~ prog) %>%
tbl_regression(exponentiate=TRUE)
Model 1
This returns the summary table, but the regression coefficients have not been exponentiated. Is there a way to get gtsummary to return exponentiated coefficients and CIs?
Thanks!
I was just doing some poking around to see what is going on. The tbl_regression() function uses broom::tidy() in the background. Support for negbin models was just added 7 days ago, but for some reason an exponentiate= argument was not added for this type of model.
I am going to request that it be added. In the meantime, this code should get you up and going with negbin models.
library(gtsummary)
library(tidyverse)
# add a custom tidying function
my_negbin_tidy <- function(x, exponentiate = FALSE, ...) {
df_tidy <- broom::tidy(x, ...)
# exponentiate coef and CI if requested
if (exponentiate) {
df_tidy <-
df_tidy %>%
mutate_at(vars(any_of(c("estimate", "conf.low", "conf.high"))), exp)
}
df_tidy
}
# build model
mod <- MASS::glm.nb(response ~ age, gtsummary::trial)
# summarize model results
tbl <-
tbl_regression(
mod,
exponentiate = TRUE,
tidy_fun = my_negbin_tidy
)
Created on 2021-04-12 by the reprex package (v2.0.0)

Adding theta model with fable forecasting estimates

I want to use theta model implemented in Forecast package inside my fable forecasting model. This what I am trying to do.
library(forecast)
library(tidyverse)
library(fable)
library(tsibble)
library(fabletools)
tourism_aus <- tourism %>%
summarise(Trips = sum(Trips))
tourism_aus
fit <- tourism_aus %>%
model(
ets = ETS(Trips),
arima = ARIMA(Trips),
theta = forecast::thetaf(Trips)
) %>%
mutate(
average = (ets + arima + theta) / 3
)
fit
fit %>%
forecast(h = "2 years") %>%
autoplot(tourism_aus, level = 95, alpha = 0.5)
I am having this error message,
Failure in thetaf(Trips) : Objekt 'Trips' not found
Is there any way I can use theta method inside fable?
Models from the forecast package use a different interface, and so are not compatible with the model() function used by fable. The theta model will be added to fable in the next release.
You can create a fable yourself, by using the forecast output of forecast::thetaf() to identify an appropriate distribution. This can be useful for plotting, accuracy evaluation and reconciliation, however ensembling requires models to use the fable interface.
Update:
The THETA() model has now been added to fable.

ARIMA modelling, prediction and plotting with CO2 dataset in R

I am working with arima0() and co2. I would like to plot arima0() model over my data. I have tried fitted() and curve() with no success.
Here is my code:
###### Time Series
# format: time series
data(co2)
# format: matrix
dmn <- list(month.abb, unique(floor(time(co2))))
co2.m <- matrix(co2, 12, dimnames = dmn)
co2.dt <- pracma::detrend(co2.m, tt = 'linear')
co2.dt <- ts(as.numeric(co2.dt), start = c(1959,1), frequency=12)
# first diff
co2.dt.dif <- diff(co2.dt,lag = 12)
# Second diff
co2.dt.dif2 <- diff(co2.dt.dif,lag = 1)
With the data prepared, I ran the following arima0:
results <- arima0(co2.dt.dif2, order = c(2,0,0), method = "ML")
resultspredict <- predict(results, n.ahead = 36)
I would like to plot the model and the prediction. I am hoping there is a way to do this in base R. I would also like to be able to plot the predictions as well.
Session 1: To begin with...
To be honest, I am pretty much worried about your way in modelling co2 time series. Something wrong happened already when you de-trended co2. Why use tt = "linear"? You fit a linear trend within each period (i.e., year), and take the residuals for further inspection. This is often not recommended as it tends to introduce artificial effects to the residual series. I would incline to do tt = "constant", i.e., simply dropping off yearly average. This would at least preserve the with-season correlation as in the original data.
Perhaps you want to see some evidence here. Consider using ACF to help you diagnose.
data(co2)
## de-trend by dropping yearly average (no need to use `pracma::detrend`)
yearlymean <- ave(co2, gl(39, 12), FUN = mean)
co2dt <- co2 - yearlymean
## de-trend by dropping within season linear trend
co2.m <- matrix(co2, 12)
co2.dt <- pracma::detrend(co2.m, tt = "linear")
co2.dt <- ts(as.numeric(co2.dt), start = c(1959, 1), frequency = 12)
## compare time series and ACF
par(mfrow = c(2, 2))
ts.plot(co2dt); acf(co2dt)
ts.plot(co2.dt); acf(co2.dt)
Both de-trended series have strong seasonal effect, thus a further seasonal differencing is required.
## seasonal differencing
co2dt.dif <- diff(co2dt, lag = 12)
co2.dt.dif <- diff(co2.dt, lag = 12)
## compare time series and ACF
par(mfrow = c(2, 2))
ts.plot(co2dt.dif); acf(co2dt.dif)
ts.plot(co2.dt.dif); acf(co2.dt.dif)
The ACF for co2.dt.dif has more significant negative correlations. This is the sign of over-de-trending. So we prefer to co2dt. co2dt is already stationary, and no more differencing is needed (otherwise you just over-difference it and introduce more negative autocorrelation).
The big negative spike at lag 1 for ACF of co2dt.dif suggests that we want seasonal MA. Also, the positive spike with the season implies a mild AR process in general. So consider:
## we exclude mean because we found estimation of mean is 0 if we include it
fit <- arima0(co2dt.dif, order = c(1,0,0), seasonal = c(0,0,1), include.mean = FALSE)
Whether this model is doing good, we need to inspect ACF of residuals:
acf(fit$residuals)
Looks like this model is decent (actually pretty great).
For prediction purpose, it is actually a better idea to integrate seasonal differencing of co2dt with model fitting of co2dt.dif. Let's do
fit <- arima0(co2dt, order = c(1,0,0), seasonal = c(0,1,1), include.mean = FALSE)
This will give exactly as same estimate for AR and MA coefficients as above two-stage work, but now prediction is fairly easy to be dealt with a single predict call.
## 3 years' ahead prediction (no prediction error; only mean)
predco2dt <- predict(fit, n.ahead = 36, se.fit = FALSE)
Let's plot co2dt, fitted model and prediction together:
fittedco2dt <- co2dt - fit$residuals
ts.plot(co2dt, fittedco2dt, predco2dt, col = 1:3)
The result looks very promising!
Now the final stage, is to actually map this back to the original co2 series. For fitted values, we just add back the yearly mean we have dropped off:
fittedco2 <- fittedco2dt + yearlymean
But for prediction it is more difficult, because we don't know what yearly mean in the future would be. In this regard, our modelling though looks good, is not practically useful. I will talk about a better idea in another answer. To finish this session, we plot co2 with its fitted values only:
ts.plot(co2, fittedco2, col = 1:2)
Session 2: A better idea for time series modelling
In previous session, we have seen the difficulty in prediction if we separate de-trending and modelling of de-trended series. Now, we try to combine those two stages in one go.
The seasonal pattern of co2 is really strong, so we need a seasonal differencing anyway:
data(co2)
co2dt <- diff(co2, lag = 12)
par(mfrow = c(1,2)); ts.plot(co2dt); acf(co2dt)
After this seasonal differencing, co2dt does not look stationary. So we need further a non-seasonal differencing.
co2dt.dif <- diff(co2dt)
par(mfrow = c(1,2)); ts.plot(co2dt.dif); acf(co2dt.dif)
The negative spikes within season and between season suggest that a MA process is needed for both. I will not work with co2dt.dif; we can work with co2 directly:
fit <- arima0(co2, order = c(0,1,1), seasonal = c(0,1,1))
acf(fit$residuals)
Now the residuals are perfectly uncorrelated! So we have an ARIMA(0,1,1)(0,1,1)[12] model for co2 series.
As usual, fitted values are obtained by subtracting residuals from data:
co2fitted <- co2 - fit$residuals
Predictions are made by a single call to predict:
co2pred <- predict(fit, n.ahead = 36, se.fit = FALSE)
Let's plot them together:
ts.plot(co2, co2fitted, co2pred, col = 1:3)
Oh, this is just gorgeous!
Session 3: Model selection
The story should have finished by now; but I would like to make a comparison with auto.arima from forecast, that can automatically decide on the "best" model.
library(forecast)
autofit <- auto.arima(co2)
#Series: co2
#ARIMA(1,1,1)(1,1,2)[12]
#
#Coefficients:
# ar1 ma1 sar1 sma1 sma2
# 0.2569 -0.5847 -0.5489 -0.2620 -0.5123
#s.e. 0.1406 0.1204 0.5880 0.5701 0.4819
#
#sigma^2 estimated as 0.08576: log likelihood=-84.39
#AIC=180.78 AICc=180.97 BIC=205.5
auto.arima has chosen ARIMA(1,1,1)(1,1,2)[12], which is much more complicated as it involves both seasonal differencing and non-seasonal differencing.
Our model based on step-by-step investigation suggests an ARIMA(0,1,1)(0,1,1)[12]:
fit <- arima0(co2, order = c(0,1,1), seasonal = c(0,1,1))
#Call:
#arima0(x = co2, order = c(0, 1, 1), seasonal = c(0, 1, 1))
#
#Coefficients:
# ma1 sma1
# -0.3495 -0.8515
#s.e. 0.0497 0.0254
#
#sigma^2 estimated as 0.08262: log likelihood = -85.98, aic = 177.96
AIC values suggest our model better. So does BIC:
BIC = -2 * loglik + log(n) * p
We have n <- length(co2) data, and p <- length(fit$coef) + 1 parameters (the additional one for sigma2), thus our model has BIC
-2 * fit$loglik + log(n) * p
# [1] 196.5503
So, auto.arima has over-fitted data.
In fact, as soon as we see ARIMA(1,1,1)(1,1,2)[12], we have strong suspicion for its over-fitting. Because different effects "cancel off" each other. This happens to the additional seasonal MA and non-seasonal AR introduced by auto.arima, as AR introduces positive autocorrelation while MA introduces negative one.

Resources