I am trying to forecast student behaviour by year. It isn't working, maybe because my data is too small. I'm using Arima; however, the trend line keeps showing a straight line which I'm not sure is right. Might be this because ARIMA shows ARIMA(0,0,0) with non-zero mean.
Year - General
Students - Numeric
How can I forecast a student's behaviour by year?
Small data set, but this is one way to go about it. Modelling each student separately (with student as a key), and using the tidyverts approach:
library(dplyr)
library(tidyr)
library(tsibble)
library(feasts)
library(fable)
Data set
df <- structure(list(Year = structure(c(1995, 1996, 1997), class = "numeric"),
Student1 = c(3, 1, 3), Student2 = c(2, 2, 2), Student3 = c(2,
3, 3), Student4 = c(2, 3, 2), Student5 = c(3, 3, 4)), row.names = c(NA,
3L), class = "data. Frame")
Tidy data
df <- df |> pivot_longer(names_to = "Student", cols = starts_with("Student")) |>
as_tsibble(index = Year, key = Student)
Visualise
df |> autoplot()
df |>
filter(Student == "Student1") |>
gg_tsdisplay(value, plot_type = 'partial')
Fit ARIMA
Stu_fit <- df |>
model(search = ARIMA(value, stepwise = FALSE))
Check fit
glance(Stu_fit)
Stu_fit$search
Stu_fit |>
filter(Student == "Student1") |>
gg_tsresiduals()
Forecast
Stu_fit |>
forecast(h = 5) |>
filter(.model == 'search') |>
autoplot()
Hope this is helps! :-)
Related
Trying to add a column of statistical test results in gtsummary::tbl_continuous().
Here is an example:
library(tidyverse)
library(gtsummary)
df <- tibble(
Group = as.factor(c(rep('A', 8), rep('B', 8), rep('C', 8))),
Variable = as.factor(rep(c(rep('x', 4), rep('y', 4)), 3)),
Value = runif(24, 0, 10)
)
df %>%
tbl_continuous(
variable = Value,
by = Variable
) %>%
add_p()
which produces:
I would like to add t-tests (or more generally any appropriate test, ANOVA for instance if there were three columns) to the p-value column for each row to compare (x, y) between groups. I tried implementing the suggestion here without success.
I have a grouped time series with items and their category and I would like to make 6months sales forecasting.
I would like to o use intermediate level (category) to make base forecasting because the stagionality and trends maybe are better valued.
So i grouped my data for key, and i would like to use middle_out approch, the total sales use bottom up and single item are forected useing top down approach
I'm using fabletools middle_out function, but when i try to make forecast it doesn't work
this is my code:
library(reshape)
library(tidyverse)
library(tsibble)
library(dplyr)
library(fable)
library(fpp2)
library(forecast)
#read data from csv
#example dataset
set.seed(42) ## for sake of reproducibility
n <- 6
data_example <- data.frame(Date=seq.Date(as.Date("2020-12-01"), as.Date("2021-05-01"), "month"),
No_=sample(1800:1830, n, replace=TRUE),
Category=rep(LETTERS[1:3], n),
Quantity=sample(18:24, n, replace=TRUE))
sell_full <- data_example %>% mutate(Month=yearmonth(Date)) %>% group_by(No_,Category, Month) %>% summarise(Quant = sum(Quantity), .groups = 'keep')
sell_full <- na.omit(sell_full)
#data
#conversion to tsibble for forecastings
sell_full <- as_tsibble(sell_full, key=c(No_, Category), index=Month)
sell_full <- sell_full %>% aggregate_key((Category/No_), Quant= sum(Quant))
#sell_full<- filter(sell_full, !is.na(sell_full$Quant))
sell_full <- sell_full %>% fill_gaps(Quant=0, .full=TRUE)
fit <- sell_full %>%model(ets = ETS(Quant~ error("A") + trend("A") + season("A")))%>% middle_out(split=1)
fc <- forecast(fit, h = "6 months", level=1,lambda="auto")
if I put method="mo" in forecast method as documentation says it return this error
Error in meanf(object, h = h, level = level, fan = fan, lambda = lambda, :
unused argument (method = "mo")
if i doesn't put method info in forecast it return this error:
<error/vctrs_error_ptype2>
Error in `vec_compare()`:
! Can't combine `..1` <agg_vec> and `..2` <double>.
---
Backtrace:
1. generics::forecast(fit, h = "6 months", level = 1, lambda = "auto")
2. forecast:::forecast.default(fit, h = "6 months", level = 1, lambda = "auto")
3. forecast:::forecast.ts(object, ...)
4. forecast::meanf(...)
5. forecast::BoxCox(x, lambda)
6. forecast::BoxCox.lambda(x, lower = -0.9)
7. fabletools:::Ops.lst_mdl(x, 0)
11. fabletools:::map2(e1, e2, .Generic)
12. base::mapply(.f, .x, .y, MoreArgs = list(...), SIMPLIFY = FALSE)
13. vctrs:::`<=.vctrs_vctr`(dots[[1L]][[1L]], dots[[2L]][[1L]])
14. vctrs::vec_compare(e1, e2)
The Documentions about it is very bad,
someone can help me?
UPDATE:
As someone suggest to me, I tried to remove some package, now my library are:
library(tsibble)
library(dplyr)
library(fable)
library(fpp3)
library(conflicted)
Now the error is changed. when I try to make forecast function I have this error:
Error in build_key_data_smat(key_data) :
argument "key_data" is missing, with no default
and if I put key_data = "Category" (Category is the split layer) the error is:
fc <- forecast(fit, h = "6 months",level=1,lambda="auto", key_data= "Category")
Error in -ncol(x) : invalid argument to unary operator
library(conflicted)
library(fpp3)
library(tidyverse)
n <- 6
data_example <- data.frame(Date = seq.Date(as.Date("2020-12-01"), as.Date("2021-05-01"), "month"),
No_ = sample(1800:1830, n, replace = TRUE),
Category = rep(LETTERS[1:3], n),
Quantity = sample(18:24, n, replace = TRUE))
sell_full <- data_example |> mutate(Month = yearmonth(Date)) |> group_by(No_,Category, Month) |> summarise(Quant = sum(Quantity), .groups = 'keep')
sell_full <- ungroup(sell_full)
sell_full <- as_tsibble(sell_full, key = c(No_, Category), index = Month)
sell_full <- sell_full %>% aggregate_key((Category/No_), Quant = sum(Quant))
sell_full <- sell_full %>% fill_gaps(Quant = 0, .full = TRUE)
fit <- sell_full %>% model(ets = ETS(Quant~ error("A") + trend("A")))
fc <- fabletools::forecast(fit, h = "6 months", lambda = "auto")
Thought I'd have a look at the code to generate sell_full.
Added an ungroup, took out the seasonal, and took out the middle_out. Runs now, and no longer asks for key_value. The ungroup, as it seemed that you were finished with the grouping. The seasonal as it was not supported by the data. The middle out as it would cause the prompt for key_value. Spent a bit of time on the middle_out leading to forecast asking for key_value, though, hence comment above.
This led me to try another way to do middle_out:
fit <- sell_full %>% model(ets = ETS(Quant~ error("A") + trend("A"))) |> reconcile(mo = middle_out(ets))
This runs fine. This idea came from fpp3 Hoping that this helps! :-)
I have a data set composed of 2 subjects and measures 8 times for each subject.
dat <- data.frame(c(1, 1, 2, 2), rep(c("t1", "t2"), 2), c(50, 52, 49, 51))
colnames(dat) <- c("subject", "time", "result")
dat <- dat %>% mutate(subject = as.factor(subject)) %>%
mutate(time = as.factor(time))
and so on for the rest of the 6 times left.
I am trying to apply a repeated-measures ANOVA to see if the effect of time is significant for each subject, but I keep getting DFd is zero, when it is actually 1.
aov <- dat %>% anova_test(dv = result, wid = subject, within = time, type = 2, detailed = TRUE)
get_anova_table(aov, correction = "none")
Can someone please help me?
I create some models like this using a nested tidyr dataframe:
set.seed(1)
library(tidyr)
library(dplyr)
library(sjPlot)
library(tibble)
library(purrr)
fits <- tribble(~group, ~colA, ~colB, ~colC,
sample(c("group1", "group2"), 10, replace = T), 0, sample(10, replace = T), sample(10, replace = T),
sample(c("group1", "group2"), 10, replace = T), 1, sample(10, replace = T), sample(10, replace = T)) %>%
unnest(cols = c(colB, colC)) %>%
nest(data=-group) %>%
mutate(fit= map(data, ~glm(formula = colA ~ colB + colC, data = .x, family="binomial"))) %>%
dplyr::select(group, fit) %>%
tibble::column_to_rownames("group")
I would like to use this data to create some quick marginal effects plots with sjPlot::plot_models like this
plot_models(as.list(fits), type = "pred", terms = c("colB", "colA", "colC"))
Unfortunately, I get the error
Error in if (fam.info$is_linear) tf <- NULL else tf <- "exp" :
argument is of length zero
In addition: Warning message:
Could not access model information.
I've played around a bit with the nesting of the data but I've been unable to get it into a format that sjPlot::plot_models will accept.
What I was expecting to get is a "Forest plot of multiple regression models" as described in the help file. Ultimately, the goal is to plot the marginal effects of regression models by group, which I was hoping the plot_models will do (please correct me if I'm wrong).
It think there are some issues with the original code as well as with the data. There are arguments from plot_model in the function call which are not supported in plot_models. I first show an example that shows how plot_models can be called and used with a nested tibble using {ggplot2}'s diamonds data set. Then I apply this approach to the OP's sample data, which doesn't yield useable results*. Finally, I create some new toy data to show how the approach could be applied to a binominal model.
(* In the original toy data the dependent variable is either always 0 or always 1 in each model so this is unlikely to yield useable results).
set.seed(1)
library(tidyr)
library(dplyr)
library(sjPlot)
library(tibble)
library(ggplot2)
# general example
fits <- tibble(id = c("x", "y", "z")) %>%
rowwise() %>%
mutate(fit = list(glm(reformulate(
termlabels = c("cut", "color", "depth", "table", "price", id),
response = "carat"),
data = diamonds)))
plot_models(fits$fit)
# OP's example data
fits2 <- tribble(~group, ~colA, ~colB, ~colC,
sample(c("group1", "group2"), 10, replace = T), 0,
sample(10, replace = T), sample(10, replace = T),
sample(c("group1", "group2"), 10, replace = T), 1,
sample(10, replace = T),
sample(10, replace = T)) %>%
unnest(cols = c(colB, colC)) %>%
nest(data = -group) %>%
rowwise() %>%
mutate(fit = list(glm(formula = colA ~ colB + colC, data = data, family="binomial")))
plot_models(fits2$fit)
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning: Removed 4 rows containing missing values (geom_point).
# new data for binominal model
n <- 500
g <- round(runif(n, 0L, 1L), 0)
x1 <- runif(n,0,100)
x2 <- runif(n,0,100)
y <- (x2 - x1 + rnorm(n,sd=20)) < 0
fits3 <- tibble(g, y, x1, x2) %>%
nest_by(g) %>%
mutate(fit = list(glm(formula = y ~ x1 + x2, data = data, family="binomial")))
plot_models(fits3$fit)
Created on 2021-01-23 by the reprex package (v0.3.0)
In the following example I try to compute the first coefficient from a linear model for time t = 1 until t. It's an expanding rolling window.
It works well with ungrouped data, but when grouped by case, I get the error Error: Columncoef1must be length 10 (the group size) or one, not 30.
How can I handle grouped data?
library(dplyr)
library(slider)
get_coef1 <- function(data) {
coef1 <- lm(data = data, r1 ~ r2 + r3) %>%
coef() %>%
.["r2"] %>%
unname()
return(coef1)
}
data <- tibble(t = rep(1:10, 3),
case = c(rep("a", 10), rep("b", 10), rep("c", 10)),
r1 = rnorm(30),
r2 = rnorm(30),
r3 = rnorm(30))
data %>%
# ungroup() %>%
group_by(case) %>%
mutate(coef1 = slider::slide_dbl(., ~get_coef1(.x),
.before = Inf, .complete = T))
You have to first tidyr::nest the cases. Within the nested tibbles (accessed via purrr::map) you can then apply slide (same technique as with purrr::map). The important point is that you do not want to slide across cases, but only within cases.
library(dplyr)
library(tidyr)
library(purrr)
library(slider)
get_coef1 <- function(data) {
coef1 <- lm(data = data, r1 ~ r2 + r3) %>%
coef() %>%
.["r2"] %>%
unname()
return(coef1)
}
data <- tibble(t = rep(1:10, 3),
case = c(rep("a", 10), rep("b", 10), rep("c", 10)),
r1 = rnorm(30),
r2 = rnorm(30),
r3 = rnorm(30))
data %>%
# ungroup() %>%
group_by(case) %>% nest() %>%
mutate(rollreg = map(data, ~ .x %>% mutate(coef1 = slider::slide_dbl(., ~get_coef1(.x), .before = Inf, .complete = TRUE)))) %>%
select(-data) %>% unnest(rollreg)
I have been trying for a while to use the new dplyr::nest_by() from dplyr 1.0.0 trying to use summarise in combination with the rowwise cases but couldn't get that to work.
I realize this is an old post, but for the sake of completeness, I offer another solution. Is this what you're looking for? Two subtle changes to the arguments to slide_dbl. The code runs.
data %>%
# ungroup() %>%
group_by(case) %>%
mutate(coef1 = slider::slide_dbl(.x = cur_data(), # use cur_data() instead of .; arg .x
.f = ~get_coef1(.x), # arg .f
.before = Inf, .complete = T))
See the slider() documentation for underlying reasons.