Trying to use tidy for a power analysis and using clmm2 - r

I'm trying to do a power analysis on a clmm2 analysis that I'm doing.
This is the code for the particular statistical model:
test <- clmm2(risk_sensitivity ~ treat + sex + dispersal +
sex*dispersal + treat*dispersal + treat*sex,random = id, data = datasocial, Hess=TRUE)
Now, I have the following function:
sim_experiment_power <- function(rep) {
s <- sim_experiment(n_sample = 1000,
prop_disp = 0.10,
prop_fem = 0.35,
disp_probability = 0.75,
nondisp_probability = 0.90,
fem_probability = 0.75,
mal_probability = 0.90)
broom.mixed::tidy(s) %>%
mutate(rep = rep)
}
my_power <- map_df(1:10, sim_experiment_power)
The details of the function sim_experiment are not relevant because they are working as expected. The important thing to know is that it spits up a statistical clmm2 result. My objective with the function above is to do a power analysis. However, I get the following error:
Error: No tidy method for objects of class clmm2
I'm a bit new to R, but I guess it means that tidy doesn't work with clmm2. Does anyone know a work-around for this issue?
EDIT: This is what follows the code that I posted above, which is ultimately what I'm trying to get.
You can then plot the distribution of estimates across your simulations.
ggplot(my_power, aes(estimate, color = term)) +
geom_density() +
facet_wrap(~term, scales = "free")
You can also just calculate power as the proportion of p-values less than your alpha.
my_power %>%
group_by(term) %>%
summarise(power <- mean(p.value < 0.05))

For what you need, you can write a function to return the coefficients with the same column name:
library(ordinal)
library(dplyr)
library(purrr)
tidy_output_clmm = function(fit){
results = as.data.frame(coefficients(summary(fit)))
colnames(results) = c("estimate","std.error","statistic","p.value")
results %>% tibble::rownames_to_column("term")
}
Then we apply it using an example where I sample the wine dataset in ordinal:
sim_experiment_power <- function(rep) {
idx = sample(nrow(wine),replace=TRUE)
s <- clmm2(rating ~ temp, random=judge, data=wine[idx,], nAGQ=10,Hess=TRUE)
tidy_output_clmm(s) %>% mutate(rep=rep)
}
my_power <- map_df(1:10, sim_experiment_power)
Plotting works:
ggplot(my_power, aes(estimate, color = term)) +
geom_density() +
facet_wrap(~term, scales = "free")
And so does power:
my_power %>% group_by(term) %>% summarise(power = mean(p.value < 0.05))
# A tibble: 5 x 2
term power
<chr> <dbl>
1 1|2 0.9
2 2|3 0.1
3 3|4 1
4 4|5 1
5 tempwarm 1

Related

Stop plotting predictions beyond data limits LME ggpredict Effects

Using the 'iris' dataset (sightly modified as below), I plot the results of an LME.
PLEASE NOTE: I am only using the iris dataset as mock data for the purpose of plotting, so please do not critique the appropriateness of this test. I'm not interested in the statistics, rather the plotting.
Using ggpredict function and plotting the results, the plot extends the predictions beyond the range of the data. Is there a systematic way plot predictions only within the range of each faceted data?
I can plot each facet separately, limit the axis per plot manually, and cowplot them back together, but if there is way to say 'predict only to the max. and min. of the data for that group', this would be great.
Given that these are facets of a single model, perhaps not showing the predictions for different groups is in fact misleading, and I should rather create three different models if I only want predictions within those data subsets?
library(lme4)
library(ggeffects)
library(ggplot2)
data(iris)
glimpse(iris)
df = iris
glimpse(df)
df_ed = df %>% group_by(Species) %>% mutate(Sepal.Length = ifelse(Species == "setosa",Sepal.Length+10,Sepal.Length+0))
df_ed = df_ed %>% group_by(Species) %>% mutate(Sepal.Length = ifelse(Species == "versicolor",Sepal.Length-3,Sepal.Length+0))
glimpse(df_ed)
m_test =
lmer(Sepal.Width ~ Sepal.Length * Species +
(1|Petal.Width),
data = df_ed, REML = T)
summary(m_test)
test_plot = ggpredict(m_test, c("Sepal.Length", "Species"), type = "re") %>% plot(rawdata = T, dot.alpha = 0.6, facet = T, alpha = 0.3)
As per the OP's comment, I think this will provide a solution. In this example, I use data from the sleepstudy dataset that comes with the lme4 package. First, we have to postulate a mixed model, which I generically call fit.
Note that I do not perform any hypothesis test to formally select an appropriate random-effects structure. Of course, this is essential to adequately capture the correlations in the repeated measurements, but falls outside the scope of this post.
library(lme4)
library(splines)
# quantiles of Days
quantile(sleepstudy$Days, c(0.05, 0.95))
# 5% 95%
# 0 9
# mixed model
fit <- lmer(Reaction ~ ns(Days, df = 2, B = c(0, 9)) +
(Days | Subject), data = sleepstudy)
# new data.frame for prediction
ND <- with(sleepstudy, expand.grid(Days = seq(0L, 9L, len = 50)))
Then, we need a fucntion that enables us to obtain predictions from fit for certain values of the covariates. The function effectPlot_lmer() takes the following arguments:
object: a character string indicating the merMod object that was fitted (the mixed model).
ND: a character string indicating the new data.frame, which specifies the values of the covariates for which we want to obtain predictions.
orig_data: a character string specifying the data on which the mixed model was fitted.
# function to obtain predicted reaction times
effectPlot_lmer <- function (object, ND, orig_data) {
form <- formula(object, fixed.only = TRUE)
namesVars <- all.vars(form)
betas <- fixef(object)
V <- vcov(object)
orig_data <- orig_data[complete.cases(orig_data[namesVars]), ]
Terms <- delete.response(terms(form))
mfX <- model.frame(Terms, data = orig_data)
Terms_new <- attr(mfX, "terms")
mfX_new <- model.frame(Terms_new, ND, xlev = .getXlevels(Terms, mfX))
X <- model.matrix(Terms_new, mfX_new)
pred <- c(X %*% betas)
ses <- sqrt(diag(X %*% V %*% t(X)))
ND$pred <- pred
ND$low <- pred - 1.96 * ses
ND$upp <- pred + 1.96 * ses
return(ND)
}
Finally, we can make an effect plot with ggplot.
# effect plot
library(ggplot2)
ggplot(effectPlot_lmer(fit, ND, orig_data = sleepstudy),
aes(x = Days, y = pred)) +
geom_line(size = 1.2, colour = 'blue4') +
geom_ribbon(aes(ymin = low, ymax = upp), colour = NA,
fill = adjustcolor('blue4', 0.2)) +
theme_bw() + ylab('Expected Reaction (ms)')

Caterpillar plot of posterior brms samples: Order factors in a ggdist plot (stat_slab)

I ran a bayesian linear mixed model with brms and can plot the estimates nicely but I can't figure out how to order the single-subject estimates based on the mean of the posterior samples (so as to get a caterpillar plot). This is what I've done.
Toy data:
library(brms)
library(tidybayes)
library(tidyverse)
n = 20
n_condition = 6
ABC =
tibble(
condition = rep(c("A","B","C","D","E","F"), n),
response = rnorm(n * 6, c(0,1,2,1,-1,-2), 0.5),
treatment = rnorm(n * 6, c(0,1,2,1,-1,-2), 0.5),
subject = c(rep("X",(n_condition*n)/3),rep("Y",(n_condition*n)/3),rep("Z",(n_condition*n)/3))
)
Add a shift for some subjects
ABC$response[ABC$subject == "X"] = 20 + ABC$response[ABC$subject == "X"]
ABC$response[ABC$subject == "Y"] = -20 + ABC$response[ABC$subject == "Z"]
Run the model
m = brm(
response ~ treatment + (1|condition) + (1|subject),
data = ABC,
cores = 4, chains = 1,
iter = 500, warmup = 50
)
Plot
m %>%
spread_draws(b_treatment, r_subject[subject,]) %>%
mutate(subject_estimate = b_treatment + r_subject) %>%
mutate(subject = reorder(subject, sort(subject_estimate))) %>%
ggplot(aes(y = subject, x = subject_estimate)) +
stat_slab()
Gives me this:
The line mutate(subject = reorder(subject, sort(subject_estimate))) doesn't do anything, which might be fine as I probably need to reorder based on the mean of the posteriors, but when I try mutate(subject_order = reorder(subject, sort(mean(subject_estimate)))) I get the error message:
Error: Problem with mutate() input subject_order.
x arguments must have same length
ℹ Input subject_order is reorder(subject, sort(mean(subject_estimate))).
ℹ The error occurred in group 1: subject = "X".
Any pointers welcome
Two points for consideration:
Ungroup the result from spread_draws, otherwise you won't be able to reorder the levels of subject;
Use fct_reorder from the forcats package in tidyverse. It's designed for this exact purpose.
m %>%
spread_draws(b_treatment, r_subject[subject,]) %>%
ungroup() %>%
mutate(subject_estimate = b_treatment + r_subject) %>%
mutate(subject = fct_reorder(subject, subject_estimate, mean)) %>%
ggplot(aes(y = subject, x = subject_estimate)) +
stat_slab()
Result (data generated with set.seed(123)):

Plot beta distribution in R

Using the dataset Lahman::Batting I've estimated parameters for the beta distribution. Now I want to plot this empirically derived beta distribution onto the histogram that I estimated it from.
library(dplyr)
library(tidyr)
library(Lahman)
career <- Batting %>%
filter(AB > 0) %>%
anti_join(Pitching, by = "playerID") %>%
group_by(playerID) %>%
summarize(H = sum(H), AB = sum(AB)) %>%
mutate(average = H / AB)
I can plot the distribution of RBI as:
career %>%
filter(AB > 500) %>%
ggplot(aes(x = average)) +
geom_histogram() +
geom_freqpoly(color = "red")
And obtain:
I know I can use + geom_freqpoly to obtain:
But I want the smooth beta distribution. I can estimate beta parameters by:
career_filtered <- career %>%
filter(AB >= 500)
m <- MASS::fitdistr(career_filtered$average, dbeta,
start = list(shape1 = 1, shape2 = 10))
alpha0 <- m$estimate[1] # parameter 1
beta0 <- m$estimate[2] # parameter 2
Now that I have parameters alpha0 and beta0, how do I plot the beta distribution so that I obtain something like this:
This question is based on a post I'm reading here.
All code, including the code for the plots, can be found here. The following code is used to get the requested plot:
ggplot(career_filtered) +
geom_histogram(aes(average, y = ..density..), binwidth = .005) +
stat_function(fun = function(x) dbeta(x, alpha0, beta0), color = "red",
size = 1) +
xlab("Batting average")
Hope this helps.

Plot regression coefficient with confidence intervals

Suppose I have 2 data frames, one for 2015 and one for 2016. I want to run a regression for each data frame and plot one of the coefficient for each regression with their respective confidence interval. For example:
set.seed(1020022316)
library(dplyr)
library(stargazer)
df16 <- data.frame(
x1 = rnorm(1000, 0, 2),
t = sample(c(0, 1), 1000, T),
e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.5 * x1 + 2 * t + e) %>%
select(-e)
df15 <- data.frame(
x1 = rnorm(1000, 0, 2),
t = sample(c(0, 1), 1000, T),
e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.75 * x1 + 2.5 * t + e) %>%
select(-e)
lm16 <- lm(y ~ x1 + t, data = df16)
lm15 <- lm(y ~ x1 + t, data = df15)
stargazer(lm15, lm16, type="text", style = "aer", ci = TRUE, ci.level = 0.95)
I want to plot t=1.558, x=2015, and t=2.797, x=2016 with their respective .95 CI. What is the best way of doing this?
I could do it 'by hand', but I hope there is a better way.
library(ggplot2)
df.plot <-
data.frame(
y = c(lm15$coefficients[['t']], lm16$coefficients[['t']]),
x = c(2015, 2016),
lb = c(
confint(lm15, 't', level = 0.95)[1],
confint(lm16, 't', level = 0.95)[1]
),
ub = c(
confint(lm15, 't', level = 0.95)[2],
confint(lm16, 't', level = 0.95)[2]
)
)
df.plot %>% ggplot(aes(x, y)) + geom_point() +
geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1) +
geom_hline(aes(yintercept=0), linetype="dashed")
Best: The figure quality (looks nice), code elegance, easy to expand (more than 2 regressions)
This is a bit too long for a comment, so I post it as a partial answer.
It is unclear from your post if your main problem is to get the data into the right shape, or if it is the plotting itself. But just to follow up on one of the comments, let me show you how to do run several models using dplyr and broom that makes plotting easy. Consider the mtcars-dataset:
library(dplyr)
library(broom)
models <- mtcars %>% group_by(cyl) %>%
do(data.frame(tidy(lm(mpg ~ disp, data = .),conf.int=T )))
head(models) # I have abbreviated the following output a bit
cyl term estimate std.error statistic p.value conf.low conf.high
(dbl) (chr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
4 (Intercept) 40.8720 3.5896 11.39 0.0000012 32.752 48.99221
4 disp -0.1351 0.0332 -4.07 0.0027828 -0.210 -0.06010
6 (Intercept) 19.0820 2.9140 6.55 0.0012440 11.591 26.57264
6 disp 0.0036 0.0156 0.23 0.8259297 -0.036 0.04360
You see that this gives you all coefficients and confidence intervals in one nice dataframe, which makes plotting with ggplot easier. For instance, if your datasets have identical content, you could add a year identifier to them (e.g. df1$year <- 2000; df2$year <- 2001 etc), and bind them together afterwards (e.g. using bind_rows, of you can use bind_rows's .id option). Then you can use the year identifer instead of cyl in the above example.
The plotting then is simple. To use the mtcars data again, let's plot the coefficients for disp only (though you could also use faceting, grouping, etc):
ggplot(filter(models, term=="disp"), aes(x=cyl, y=estimate)) +
geom_point() + geom_errorbar(aes(ymin=conf.low, ymax=conf.high))
To use your data:
df <- bind_rows(df16, df15, .id = "years")
models <- df %>% group_by(years) %>%
do(data.frame(tidy(lm(y ~ x1+t, data = .),conf.int=T ))) %>%
filter(term == "t") %>%
ggplot(aes(x=years, y=estimate)) + geom_point() +
geom_errorbar(aes(ymin=conf.low, ymax=conf.high))
Note that you can easily add more and more models just by binding more and more data to the main dataframe. You can also easily use faceting, grouping or position-dodgeing to adjust the look of the corresponding plot if you want to plot more than one coefficient.
This is the solution I have right now:
gen_df_plot <- function(reg, coef_name){
df <- data.frame(y = reg$coefficients[[coef_name]],
lb = confint(reg, coef_name, level = 0.95)[1],
ub = confint(reg, coef_name, level = 0.95)[2])
return(df)
}
df.plot <- lapply(list(lm15,lm16), gen_df_plot, coef_name = 't')
df.plot <- data.table::rbindlist(df.plot)
df.plot$x <- as.factor(c(2015, 2016))
df.plot %>% ggplot(aes(x, y)) + geom_point(size=4) +
geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1, linetype="dotted") +
geom_hline(aes(yintercept=0), linetype="dashed") + theme_bw()
I don't love it, but it works.
Here is what might be generalized code. I have made a change to how "x" is defined so that you don't have to worry about alphabetic reordering of the factor.
#
# Paul Gronke and Paul Manson
# Early Voting Information Center at Reed College
#
# August 27, 2019
#
#
# Code to plot a single coefficient from multiple models, provided
# as an easier alternative to "coefplot" and "dotwhisker". Some users
# may find those packages more capable
#
# Code adapted from https://stackoverflow.com/questions/35582052/plot-regression-coefficient-with-confidence-intervals
# gen_df_plot function will create a tidy data frame for your plot
# Currently set up to display 95% confidence intervals
gen_df_plot <- function(reg, coef_name){
df <- data.frame(y = reg$coefficients[[coef_name]],
lb = confint(reg, coef_name, level = 0.95)[1],
ub = confint(reg, coef_name, level = 0.95)[2])
return(df)
}
# Populate the data frame with a list of your model results.
df.plot <- lapply(list(model1, # List your models here
model2),
gen_df_plot,
coef_name = 'x1') # Coefficient name
# Convert the list to a tidy data frame
df.plot <- data.table::rbindlist(df.plot)
# Provide the coefficient or regression labels below, in the
# order that you want them to appear. The "levels=unique(.)" parameter
# overrides R's desire to order the factor alphabetically
df.plot$x <- c("Group 1",
"Group 2") %>%
factor(., levels = unique(.),
ordered = TRUE)
# Create your plot
df.plot %>% ggplot(aes(x, y)) +
geom_point(size=4) +
geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1, linetype="dotted") +
geom_hline(aes(yintercept=0), linetype="dashed") +
theme_bw() +
ggtitle("Comparing Coefficients") +
ylab("Coefficient Value")```

ggplot2 geom_ribbon from mgcv::gamm

I'm trying to add a ribbon based on predictions from a gamm model, this seems a little harder than intended, as gamm is somewhat different from gam.
I first tried directly with geom_stat, but that will not work (and will not use my entire model, which also includes several other covariates)
library(tidyverse); library(mgcv)
dt = cbind(V1=scale(sample(1000)),
Age=rnorm(n = 1000, mean = 40, sd = 10),
ID=rep(seq(1:500),each=2) %>% as.data.frame()
# Works fine ----
dt %>% ggplot(aes(x=Age, y=V1)) +
stat_smooth(method="gam", formula= y~s(x,bs="cr"))
# Fails horribly :P
dt %>% ggplot(aes(x=Age, y=V1)) +
stat_smooth(method="gamm", formula= y~s(x,bs="cr"))
Maximum number of PQL iterations: 20
iteration 1
Warning message:
Computation failed in `stat_smooth()`:
no applicable method for 'predict' applied to an object of class "c('gamm', 'list')"
I've tried using the predict function on the model$gamm, but I'm not sure how to use this, and how to make the CI ribbon
dt.model = gamm(V1 ~ s(Age, bs="cr") + s(ID, bs = 're'), data=dt, family="gaussian", discrete=T)
dt$pred = predict(dt.model$gam)
dt %>% ggplot(aes(x = Age, y = V1)) +
geom_line(aes(group=ID), alpha=.3) +
geom_point(alpha=.2) +
geom_smooth(aes(y=pred))
I recognise this is shitty example data because this is a stupid shape.
But I'd like to be able to add a ribbon with the CI along the line as predicted by the model.fit. And I'd prefer to do this in ggplot, particularly as I want a spagetti plot in the background.
Use se.fit=TRUE inside predict:
library(tidyverse)
library(mgcv)
dt <- cbind(V1=scale(sample(1000)),
Age=rnorm(n = 1000, mean = 40, sd = 10),
ID=rep(seq(1:500),each=2)) %>% as.data.frame()
dt.model <- gamm(V1 ~ s(Age, bs="cr") + s(ID, bs = "re"),
data=dt, family="gaussian", discrete=T)
pred <- predict(dt.model$gam, se.fit=T)
dt %>% ggplot(aes(x = Age, y = V1)) +
geom_line(aes(group=ID), alpha=.3) +
geom_point(alpha=.2) +
geom_ribbon(aes(ymin=pred$fit-1.96*pred$se.fit,
ymax=pred$fit+1.96*pred$se.fit), alpha=0.2, fill="red")+
geom_line(aes(y=pred$fit), col="blue", lwd=1)

Resources