I have a grouped time series with items and their category and I would like to make 6months sales forecasting.
I would like to o use intermediate level (category) to make base forecasting because the stagionality and trends maybe are better valued.
So i grouped my data for key, and i would like to use middle_out approch, the total sales use bottom up and single item are forected useing top down approach
I'm using fabletools middle_out function, but when i try to make forecast it doesn't work
this is my code:
library(reshape)
library(tidyverse)
library(tsibble)
library(dplyr)
library(fable)
library(fpp2)
library(forecast)
#read data from csv
#example dataset
set.seed(42) ## for sake of reproducibility
n <- 6
data_example <- data.frame(Date=seq.Date(as.Date("2020-12-01"), as.Date("2021-05-01"), "month"),
No_=sample(1800:1830, n, replace=TRUE),
Category=rep(LETTERS[1:3], n),
Quantity=sample(18:24, n, replace=TRUE))
sell_full <- data_example %>% mutate(Month=yearmonth(Date)) %>% group_by(No_,Category, Month) %>% summarise(Quant = sum(Quantity), .groups = 'keep')
sell_full <- na.omit(sell_full)
#data
#conversion to tsibble for forecastings
sell_full <- as_tsibble(sell_full, key=c(No_, Category), index=Month)
sell_full <- sell_full %>% aggregate_key((Category/No_), Quant= sum(Quant))
#sell_full<- filter(sell_full, !is.na(sell_full$Quant))
sell_full <- sell_full %>% fill_gaps(Quant=0, .full=TRUE)
fit <- sell_full %>%model(ets = ETS(Quant~ error("A") + trend("A") + season("A")))%>% middle_out(split=1)
fc <- forecast(fit, h = "6 months", level=1,lambda="auto")
if I put method="mo" in forecast method as documentation says it return this error
Error in meanf(object, h = h, level = level, fan = fan, lambda = lambda, :
unused argument (method = "mo")
if i doesn't put method info in forecast it return this error:
<error/vctrs_error_ptype2>
Error in `vec_compare()`:
! Can't combine `..1` <agg_vec> and `..2` <double>.
---
Backtrace:
1. generics::forecast(fit, h = "6 months", level = 1, lambda = "auto")
2. forecast:::forecast.default(fit, h = "6 months", level = 1, lambda = "auto")
3. forecast:::forecast.ts(object, ...)
4. forecast::meanf(...)
5. forecast::BoxCox(x, lambda)
6. forecast::BoxCox.lambda(x, lower = -0.9)
7. fabletools:::Ops.lst_mdl(x, 0)
11. fabletools:::map2(e1, e2, .Generic)
12. base::mapply(.f, .x, .y, MoreArgs = list(...), SIMPLIFY = FALSE)
13. vctrs:::`<=.vctrs_vctr`(dots[[1L]][[1L]], dots[[2L]][[1L]])
14. vctrs::vec_compare(e1, e2)
The Documentions about it is very bad,
someone can help me?
UPDATE:
As someone suggest to me, I tried to remove some package, now my library are:
library(tsibble)
library(dplyr)
library(fable)
library(fpp3)
library(conflicted)
Now the error is changed. when I try to make forecast function I have this error:
Error in build_key_data_smat(key_data) :
argument "key_data" is missing, with no default
and if I put key_data = "Category" (Category is the split layer) the error is:
fc <- forecast(fit, h = "6 months",level=1,lambda="auto", key_data= "Category")
Error in -ncol(x) : invalid argument to unary operator
library(conflicted)
library(fpp3)
library(tidyverse)
n <- 6
data_example <- data.frame(Date = seq.Date(as.Date("2020-12-01"), as.Date("2021-05-01"), "month"),
No_ = sample(1800:1830, n, replace = TRUE),
Category = rep(LETTERS[1:3], n),
Quantity = sample(18:24, n, replace = TRUE))
sell_full <- data_example |> mutate(Month = yearmonth(Date)) |> group_by(No_,Category, Month) |> summarise(Quant = sum(Quantity), .groups = 'keep')
sell_full <- ungroup(sell_full)
sell_full <- as_tsibble(sell_full, key = c(No_, Category), index = Month)
sell_full <- sell_full %>% aggregate_key((Category/No_), Quant = sum(Quant))
sell_full <- sell_full %>% fill_gaps(Quant = 0, .full = TRUE)
fit <- sell_full %>% model(ets = ETS(Quant~ error("A") + trend("A")))
fc <- fabletools::forecast(fit, h = "6 months", lambda = "auto")
Thought I'd have a look at the code to generate sell_full.
Added an ungroup, took out the seasonal, and took out the middle_out. Runs now, and no longer asks for key_value. The ungroup, as it seemed that you were finished with the grouping. The seasonal as it was not supported by the data. The middle out as it would cause the prompt for key_value. Spent a bit of time on the middle_out leading to forecast asking for key_value, though, hence comment above.
This led me to try another way to do middle_out:
fit <- sell_full %>% model(ets = ETS(Quant~ error("A") + trend("A"))) |> reconcile(mo = middle_out(ets))
This runs fine. This idea came from fpp3 Hoping that this helps! :-)
Related
I create some models like this using a nested tidyr dataframe:
set.seed(1)
library(tidyr)
library(dplyr)
library(sjPlot)
library(tibble)
library(purrr)
fits <- tribble(~group, ~colA, ~colB, ~colC,
sample(c("group1", "group2"), 10, replace = T), 0, sample(10, replace = T), sample(10, replace = T),
sample(c("group1", "group2"), 10, replace = T), 1, sample(10, replace = T), sample(10, replace = T)) %>%
unnest(cols = c(colB, colC)) %>%
nest(data=-group) %>%
mutate(fit= map(data, ~glm(formula = colA ~ colB + colC, data = .x, family="binomial"))) %>%
dplyr::select(group, fit) %>%
tibble::column_to_rownames("group")
I would like to use this data to create some quick marginal effects plots with sjPlot::plot_models like this
plot_models(as.list(fits), type = "pred", terms = c("colB", "colA", "colC"))
Unfortunately, I get the error
Error in if (fam.info$is_linear) tf <- NULL else tf <- "exp" :
argument is of length zero
In addition: Warning message:
Could not access model information.
I've played around a bit with the nesting of the data but I've been unable to get it into a format that sjPlot::plot_models will accept.
What I was expecting to get is a "Forest plot of multiple regression models" as described in the help file. Ultimately, the goal is to plot the marginal effects of regression models by group, which I was hoping the plot_models will do (please correct me if I'm wrong).
It think there are some issues with the original code as well as with the data. There are arguments from plot_model in the function call which are not supported in plot_models. I first show an example that shows how plot_models can be called and used with a nested tibble using {ggplot2}'s diamonds data set. Then I apply this approach to the OP's sample data, which doesn't yield useable results*. Finally, I create some new toy data to show how the approach could be applied to a binominal model.
(* In the original toy data the dependent variable is either always 0 or always 1 in each model so this is unlikely to yield useable results).
set.seed(1)
library(tidyr)
library(dplyr)
library(sjPlot)
library(tibble)
library(ggplot2)
# general example
fits <- tibble(id = c("x", "y", "z")) %>%
rowwise() %>%
mutate(fit = list(glm(reformulate(
termlabels = c("cut", "color", "depth", "table", "price", id),
response = "carat"),
data = diamonds)))
plot_models(fits$fit)
# OP's example data
fits2 <- tribble(~group, ~colA, ~colB, ~colC,
sample(c("group1", "group2"), 10, replace = T), 0,
sample(10, replace = T), sample(10, replace = T),
sample(c("group1", "group2"), 10, replace = T), 1,
sample(10, replace = T),
sample(10, replace = T)) %>%
unnest(cols = c(colB, colC)) %>%
nest(data = -group) %>%
rowwise() %>%
mutate(fit = list(glm(formula = colA ~ colB + colC, data = data, family="binomial")))
plot_models(fits2$fit)
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning: Removed 4 rows containing missing values (geom_point).
# new data for binominal model
n <- 500
g <- round(runif(n, 0L, 1L), 0)
x1 <- runif(n,0,100)
x2 <- runif(n,0,100)
y <- (x2 - x1 + rnorm(n,sd=20)) < 0
fits3 <- tibble(g, y, x1, x2) %>%
nest_by(g) %>%
mutate(fit = list(glm(formula = y ~ x1 + x2, data = data, family="binomial")))
plot_models(fits3$fit)
Created on 2021-01-23 by the reprex package (v0.3.0)
I'm trying to make grid search for my ARIMA model working and I need additional help with it.
I have the following data:
head(train)
Date Count
<date> <int>
1 2016-06-15 21
2 2016-06-16 21
3 2016-06-17 12
4 2016-06-18 20
5 2016-06-19 29
6 2016-06-20 30
Train data Date variable ranges from 2016-06-15 to 2019-06-30 with 1111 observations in total
Train data Count variable ranges from min=3 to max=154 with mean=23.83 and sd=13.84.
I was able to define hyper parameters and create 36 ARIMA models with the following code:
#Create ts data
ts_train = xts(train[, -1], order.by = as.POSIXct(train$Date), frequency = 365)
#ARIMA model tune
#tibble helper function
to_tibble <- function(forecast_object){
point_estimate <- forecast_object$mean %>%
as_tsibble() %>%
rename(point_estimate = value,
date = index)
upper <- forecast_object$upper %>%
as_tsibble() %>%
spread(key, value) %>%
rename(date = index,
upper80 = `80%`,
upper95 = `95%`)
lower <- forecast_object$lower %>%
as_tsibble() %>%
spread(key, value) %>%
rename(date = index,
lower80 = `80%`,
lower95 = `95%`)
reduce(list(point_estimate, upper, lower), full_join)
}
#Trend hyper parameters
order_list <- list("p" = seq(0, 2),
"d" = seq(0, 1),
"q" = seq(0, 2)) %>%
cross() %>%
map(lift(c))
#Seasonal hyper parameteres
season_list <- list("P" = seq(0, 2),
"D" = seq(0, 1),
"Q" = seq(0, 2),
"period" = 365) %>%
cross() %>%
map(lift(c))
#Coerce vectors to tibbles
orderdf <- tibble("order" = order_list)
seasondf <- tibble("season" = season_list)
#Create grid of hyper-parameters
hyper_parameters_df <- crossing(orderdf, seasondf)
#Run grid search of ARIMA models
tic <- Sys.time()
models_df <- hyper_parameters_df %>%
mutate(models = map2(.x = order,
.y = season,
~possibly(arima, otherwise = NULL)(x = ts_train,
order = .x, seasonal = .y)))
running_time <- Sys.time() - tic
running_time
#Drop models which couldn't compute ARIMA
final_models = models_df %>% drop_na()
nrows <- nrow(final_models)
And than I get an error when I try to calculate RMSE across my test data with the following code:
final_models <- final_models %>%
mutate(forecast = map(models, ~possibly(forecast, otherwise = NULL)(., h = 183))) %>%
mutate(point_forecast = map(forecast, ~.$`mean`)) %>%
mutate(true_value = rerun(nrows, test)) %>%
mutate(rmse = map2_dbl(point_forecast, true_value,
~sqrt(mean((.x - .y) ** 2))))
I get one error and one warning message:
Error in .x - .y : non-numeric argument to binary operator
In addition: Warning message:
In mean((.x - .y)^2) :
Incompatible methods ("Ops.ts", "Ops.data.frame") for "-"
Can someone please help me with that?
Here is my test data if it's needed to create dummy data:
head(test)
Date Count
<date> <int>
1 2019-07-02 20
2 2019-07-03 28
3 2019-07-04 35
4 2019-07-05 34
5 2019-07-06 60
6 2019-07-07 63
Test data Date variable ranges from 2019-07-01 to 2019-12-31 with 184 observations in total
Train data Count variable ranges from min=6 to max=63 with mean=21.06 and sd=9.89.
The problem is that when you are computing the RMSE you are using time series rather than vectors. So, you have to change the class of both predictions and true values to numeric.
Here is my solution:
# Load libraries
library(fpp2)
library(dplyr)
library(xts)
library(purrr)
library(tidyr)
# Create sample dataset
dates <- seq.Date(as.Date("2019-07-02"), by = "day", length.out = length(WWWusage))
train <- data.frame(Date = dates, Count = WWWusage)
# Get test dataset using drift method
test <- forecast::rwf(WWWusage, h = 183, drift = TRUE)$mean
#Create ts data
ts_train = xts(train[, -1], order.by = as.POSIXct(train$Date), frequency = 365)
#ARIMA model tune
#tibble helper function
to_tibble <- function(forecast_object){
point_estimate <- forecast_object$mean %>%
as_tsibble() %>%
rename(point_estimate = value,
date = index)
upper <- forecast_object$upper %>%
as_tsibble() %>%
spread(key, value) %>%
rename(date = index,
upper80 = `80%`,
upper95 = `95%`)
lower <- forecast_object$lower %>%
as_tsibble() %>%
spread(key, value) %>%
rename(date = index,
lower80 = `80%`,
lower95 = `95%`)
reduce(list(point_estimate, upper, lower), full_join)
}
#Trend hyper parameters
order_list <- list("p" = seq(0, 2),
"d" = seq(0, 1),
"q" = seq(0, 2)) %>%
cross() %>%
map(lift(c))
#Seasonal hyper parameteres
season_list <- list("P" = seq(0, 2),
"D" = seq(0, 1),
"Q" = seq(0, 2),
"period" = 365) %>%
cross() %>%
map(lift(c))
#Coerce vectors to tibbles
orderdf <- tibble("order" = order_list)
seasondf <- tibble("season" = season_list)
#Create grid of hyper-parameters
hyper_parameters_df <- crossing(orderdf, seasondf)
#Run grid search of ARIMA models
tic <- Sys.time()
models_df <- hyper_parameters_df %>%
mutate(models =
map2(.x = order,
.y = season,
~possibly(arima, otherwise = NULL)(x = ts_train, order = .x, seasonal = .y)))
running_time <- Sys.time() - tic
running_time
#Drop models which couldn't compute ARIMA
final_models = models_df %>% drop_na()
nrows <- nrow(final_models)
# Estimate RSME for each candidate
# Note: you have to make sure that both .x and .y are numeric
final_models2 <- final_models %>%
mutate(forecast = map(models, ~possibly(forecast, otherwise = NULL)(., h = 183))) %>%
mutate(point_forecast = map(forecast, ~.$`mean`)) %>%
mutate(true_value = rerun(nrows, test)) %>%
mutate(rmse = map2_dbl(point_forecast, true_value,
~sqrt(mean((as.numeric(.x) - as.numeric(.y)) ** 2))))
I have the following data frame:
library(tidyverse)
set.seed(1234)
df <- data.frame(
x = seq(1, 100, 1),
y = rnorm(100)
)
Where I apply a smooth spline using different knots:
nknots <- seq(4, 15, 1)
output <- map(nknots, ~ smooth.spline(x = df$x, y = df$y, nknots = .x))
What I need to do now is to apply the same function using 2-point and 3-point averages:
df_2 <- df %>%
group_by(., x = round(.$x/2)*2) %>%
summarise_all(funs(mean))
df_3 <- df %>%
group_by(., x = round(.$x/3)*3) %>%
summarise_all(funs(mean))
In summary, I need to apply the function I used in output with the following data frames:
df
df_2
df_3
Of course, this is a minimal example, so I am looking for a efficient way of doing it. Preferably with the purrr package.
Using lapply, and the library zoo to calculate the moving average in a more simple and elegant manner:
library(zoo)
lapply(1:3,function(roll){
dftemp <- as.data.frame(rollmean(df,roll))
map(nknots, ~ smooth.spline(x = dftemp$x, y = dftemp$y, nknots = .x))
})
Here's one possible solution:
library(tidyverse)
set.seed(1234)
df <- data.frame(x = seq(1, 100, 1),
y = rnorm(100))
# funtion to get v-point averages
GetAverages = function(v) {
df %>%
group_by(., x = round(.$x/v)*v) %>%
summarise_all(funs(mean)) }
# specify nunber of knots
nknots <- seq(4, 15, 1)
dt_res = tibble(v=1:3) %>% # specify v-point averages
mutate(d = map(v, GetAverages)) %>% # get data for each v-point
crossing(., data.frame(nknots=nknots)) %>% # combine each dataset with a knot
mutate(res = map2(d, nknots, ~smooth.spline(x = .x$x, y = .x$y, nknots = .y))) # apply smooth spline
You can use dt_res$res[dt_res$v == 1] to see all results for your original daatset, dt_res$res[dt_res$v == 2] to see results for your 2-point estimate, etc.
I'm trying to calculate a rolling Beta regression for multiple stocks with a
width of 12 past months.
I have the following dataset
Looks like:
I was searching a lot of posts, but somehow I didn't get it to work for my data frame.
func1 <- . %>% {
roll_regres.fit(x = cbind(1, .$MKT_ex),
y = .$r_rf, width = 12L)$coefs }
out <- dt %>%
group_by(Stkcd) %>%
# make it explicit that data needs to be sorted
arrange(Date, .by_group = TRUE) %>%
do(cbind(reg_col = select(., MKT_ex, r_rf) %>% func1,
date_col = select(., Date))) %>%
ungroup
I get the error message:
Error in roll_cpp(Y = y, X = x, window = width, do_compute_R_sqs =
do_compute_R_sqs, : 'dchdd' failed with code -1
My goal is to get an output, which contains the Date, Stkcd (Stocknumber) and the calculatet Beta (r_rf regressed on MKT_ex).
What did I miss in my code?
I found also this code in the forum from
G. Grothendieck
, sadly it doesn't work for my dataset and I can't find out why.
rolli <- function(ix) {
data.frame(coef = rollapplyr(ix, width = 12, function(ix) {
coef(lm(y ~ x, data = dat, subset = ix))[2]
}, by = 1), Date = dt$Date[ix][1], Stkcd = dt$Stkcd[ix][1])
}
do.call("rbind", by(1:nrow(dt), dat[c("Date", "Stkcd")], rolli)
I'm collecting time series data from Wikipedia and want to run a change-point analysis on each time series using dplyr. But when I do so I get an error saying the data need to be numeric, even though the class function states it is numeric. Hope you can help.
library(changepoint)
library(dplyr)
library(pageviews)
library(data.table)
articles <- c("Rugby_union", "Football")
foo <- function(x){article_pageviews(project = "en.wikipedia",
article = x,
start = as.Date('2017-01-01'),
end = as.Date("2017-12-31")
, user_type = "user", platform = c("mobile-web"))
}
output<-articles %>% foo
output %>%
select(article, views) %>%
do(cpt.mean(.))
class(output$views)
library(changepoint)
library(dplyr)
library(pageviews)
articles <- c("Rugby_union", "Football")
foo <- function(x){article_pageviews(project = "en.wikipedia", article = x,
start = as.Date('2017-01-01'),
end = as.Date("2017-12-31"),
user_type = "user", platform = c("mobile-web"))
}
output <- articles %>%
foo
df <- as.data.frame(table(output$article))
output1 <- output %>%
dplyr::select(article, views) %>%
dplyr::filter(article == df[1,1])
output2 <- output %>%
dplyr::select(article, views) %>%
dplyr::filter(article == df[2,1])
q <- floor((min(length(output1$views), length(output2$views)))/2 + 1)
cp1 <- changepoint::cpt.mean(data = output1$views, Q = q, method = "BinSeg", penalty
= "SIC")
plot(cp1)
cp2 <- changepoint::cpt.mean(data = output2$views, Q = q, method = "BinSeg", penalty
= "SIC")
plot(cp2)