Related
I am trying to forecast student behaviour by year. It isn't working, maybe because my data is too small. I'm using Arima; however, the trend line keeps showing a straight line which I'm not sure is right. Might be this because ARIMA shows ARIMA(0,0,0) with non-zero mean.
Year - General
Students - Numeric
How can I forecast a student's behaviour by year?
Small data set, but this is one way to go about it. Modelling each student separately (with student as a key), and using the tidyverts approach:
library(dplyr)
library(tidyr)
library(tsibble)
library(feasts)
library(fable)
Data set
df <- structure(list(Year = structure(c(1995, 1996, 1997), class = "numeric"),
Student1 = c(3, 1, 3), Student2 = c(2, 2, 2), Student3 = c(2,
3, 3), Student4 = c(2, 3, 2), Student5 = c(3, 3, 4)), row.names = c(NA,
3L), class = "data. Frame")
Tidy data
df <- df |> pivot_longer(names_to = "Student", cols = starts_with("Student")) |>
as_tsibble(index = Year, key = Student)
Visualise
df |> autoplot()
df |>
filter(Student == "Student1") |>
gg_tsdisplay(value, plot_type = 'partial')
Fit ARIMA
Stu_fit <- df |>
model(search = ARIMA(value, stepwise = FALSE))
Check fit
glance(Stu_fit)
Stu_fit$search
Stu_fit |>
filter(Student == "Student1") |>
gg_tsresiduals()
Forecast
Stu_fit |>
forecast(h = 5) |>
filter(.model == 'search') |>
autoplot()
Hope this is helps! :-)
I have a data set composed of 2 subjects and measures 8 times for each subject.
dat <- data.frame(c(1, 1, 2, 2), rep(c("t1", "t2"), 2), c(50, 52, 49, 51))
colnames(dat) <- c("subject", "time", "result")
dat <- dat %>% mutate(subject = as.factor(subject)) %>%
mutate(time = as.factor(time))
and so on for the rest of the 6 times left.
I am trying to apply a repeated-measures ANOVA to see if the effect of time is significant for each subject, but I keep getting DFd is zero, when it is actually 1.
aov <- dat %>% anova_test(dv = result, wid = subject, within = time, type = 2, detailed = TRUE)
get_anova_table(aov, correction = "none")
Can someone please help me?
I create some models like this using a nested tidyr dataframe:
set.seed(1)
library(tidyr)
library(dplyr)
library(sjPlot)
library(tibble)
library(purrr)
fits <- tribble(~group, ~colA, ~colB, ~colC,
sample(c("group1", "group2"), 10, replace = T), 0, sample(10, replace = T), sample(10, replace = T),
sample(c("group1", "group2"), 10, replace = T), 1, sample(10, replace = T), sample(10, replace = T)) %>%
unnest(cols = c(colB, colC)) %>%
nest(data=-group) %>%
mutate(fit= map(data, ~glm(formula = colA ~ colB + colC, data = .x, family="binomial"))) %>%
dplyr::select(group, fit) %>%
tibble::column_to_rownames("group")
I would like to use this data to create some quick marginal effects plots with sjPlot::plot_models like this
plot_models(as.list(fits), type = "pred", terms = c("colB", "colA", "colC"))
Unfortunately, I get the error
Error in if (fam.info$is_linear) tf <- NULL else tf <- "exp" :
argument is of length zero
In addition: Warning message:
Could not access model information.
I've played around a bit with the nesting of the data but I've been unable to get it into a format that sjPlot::plot_models will accept.
What I was expecting to get is a "Forest plot of multiple regression models" as described in the help file. Ultimately, the goal is to plot the marginal effects of regression models by group, which I was hoping the plot_models will do (please correct me if I'm wrong).
It think there are some issues with the original code as well as with the data. There are arguments from plot_model in the function call which are not supported in plot_models. I first show an example that shows how plot_models can be called and used with a nested tibble using {ggplot2}'s diamonds data set. Then I apply this approach to the OP's sample data, which doesn't yield useable results*. Finally, I create some new toy data to show how the approach could be applied to a binominal model.
(* In the original toy data the dependent variable is either always 0 or always 1 in each model so this is unlikely to yield useable results).
set.seed(1)
library(tidyr)
library(dplyr)
library(sjPlot)
library(tibble)
library(ggplot2)
# general example
fits <- tibble(id = c("x", "y", "z")) %>%
rowwise() %>%
mutate(fit = list(glm(reformulate(
termlabels = c("cut", "color", "depth", "table", "price", id),
response = "carat"),
data = diamonds)))
plot_models(fits$fit)
# OP's example data
fits2 <- tribble(~group, ~colA, ~colB, ~colC,
sample(c("group1", "group2"), 10, replace = T), 0,
sample(10, replace = T), sample(10, replace = T),
sample(c("group1", "group2"), 10, replace = T), 1,
sample(10, replace = T),
sample(10, replace = T)) %>%
unnest(cols = c(colB, colC)) %>%
nest(data = -group) %>%
rowwise() %>%
mutate(fit = list(glm(formula = colA ~ colB + colC, data = data, family="binomial")))
plot_models(fits2$fit)
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning: Removed 4 rows containing missing values (geom_point).
# new data for binominal model
n <- 500
g <- round(runif(n, 0L, 1L), 0)
x1 <- runif(n,0,100)
x2 <- runif(n,0,100)
y <- (x2 - x1 + rnorm(n,sd=20)) < 0
fits3 <- tibble(g, y, x1, x2) %>%
nest_by(g) %>%
mutate(fit = list(glm(formula = y ~ x1 + x2, data = data, family="binomial")))
plot_models(fits3$fit)
Created on 2021-01-23 by the reprex package (v0.3.0)
I'm looking to run a function on each group of a dataset, and bind the output to the existing set inside the tidyverse environment. After the example set, I've added how I do it right now, which requires splitting the set and running lapply (I want to move everything towards the tidyverse).
library(TTR)
test = data.frame('high'=rnorm(100,10,0.1),'low'=rnorm(100,0,0.1), 'close'=rnorm(100,5,0.1))
stoch(test,
nFastK = 14, nFastD = 3, nSlowD = 3,
maType=list(list(SMA), list(SMA), list(SMA)),
bounded = TRUE,
smooth = 1)
Here is how it used to be done with lists:
get_stoch = function(dat_) {
stochs = stoch(dat_ %>% select(-ticker), nFastK = 14, nFastD = 3, nSlowD = 3,
maType=list(list(SMA), list(SMA), list(SMA)),
bounded = TRUE, smooth = 1)
dat_ = cbind(dat_,stochs)
}
test = data.frame('ticker'=c(rep('A',50),rep('B',50)),
'high'=rnorm(100,10,0.1),'low'=rnorm(100,0,0.1), 'close'=rnorm(100,5,0.1)) %>%
split(.,.$ticker) %>%
lapply(.,get_stoch) %>%
bind_rows
If you want to translate your code to tidyverse you can use :
library(dplyr)
library(purrr)
df %>% group_split(ticker) %>% map_dfr(get_stoch)
You can use plyr::ddply to run a split-apply-bind method in tidyverse-like language:
df <- data.frame(ticker = c(rep('A', 50), rep('B', 50)),
high = rnorm(100, 10, 0.1),
low = rnorm(100, 0, 0.1),
close = rnorm(100, 5, 0.1))
test1 <- df %>%
split(.,.$ticker) %>%
lapply(.,get_stoch) %>%
bind_rows
test2 <- df %>%
ddply("ticker", get_stoch)
identical(test1, test2)
#> [1] TRUE
I have a dataframe like this:
df <- data.frame("subj.no" = rep(1:3, each = 24),
"trial.no" = rep(1:3, each = 8, length.out = 72),
"item" = c(rep(c("ball", "book"), 4), rep(c("doll", "rope"), 4), rep(c("fish", "box"), 4), rep(c("paper", "candle"), 4), rep(c("horse", "marble"), 4), rep(c("doll", "rope"), 4), rep(c("tree", "dog"), 4), rep(c("ball", "book"), 4), rep(c("horse", "marble"), 4)),
"rep.no" = rep(1:4, each = 2, length.out = 72),
"DV" = c(1,0,1,0,1,0,0,1,1,0,1,0,0,0,1,0,1,0,1,0,1,0,0,0,0,1,1,1,1,0,0,1,0,1,1,0,0,1,0,1,1,1,0,1,0,0,
1,0,0,1,1,0,1,0,0,1,1,1,1,0,0,0,0,0,0,1,0,1,0,1,1,0),)
I now want to create another column DV.no which says that the value 1 occurred the nth time within that combination of subj.no, trial.no and item. For DV==0, the value in the new column should be 0.
So the resulting vector should look like this:
DV.no = c(1,0,2,0,3,0,0,1,1,0,2,0,0,0,3,0,1,0,2,0,3,0,0,0,0,1,1,2,2,0,0,3,0,1,1,0,0,2,0,3,1,1,0,2,0,0,2,0,0,1,1,0,2,0,0,2,1,1,2,0,0,0,0,0,0,1,0,2,0,3,1,0)
So basically, for each unique combination of values in subj.no, trial.no and item, whenever the value of DV is 1, then 1 should be added to the count in the new variable.
(Remark: The column rep.no is not part of the relevant value combination. But it's in the df anyway, and since I didn't know if it's useful for the solution, I left it there.)
How can this be done in R?
We can do a group by cumsum on the 'DV' column
library(dplyr)
df %>%
group_by(subj.no, trial.no, item) %>%
mutate(V.no = cumsum(DV)* DV)
Or in base R with ave
df$V.no <- with(df, DV *ave(DV, subj.no, trial.no, item, FUN = cumsum))