i have the following data and created a model with the package glmmTMB in R for plant diameters ~ plant density (number of plants) with a random plot effect:
d <- data.frame (diameter = c(17,16,15,13,11, 19,17,15,11,11, 19,15,14,11,8),
plant_density = c(1000,2000,3000,4000,5000, 1000,2000,3000,4000,5000, 1000,2000,3000,4000,5000),
plot = c(1,1,1,1,1, 2,2,2,2,2, 3,3,3,3,3))
glmm.model <- glmmTMB(diameter ~ plant_density + (1|plot),
data = d,
na.action = na.omit,
family="gaussian",
ziformula = ~ 0)
My intention was to create a plot with predicted diameter data for different plant densities with an included random plot effect. So i tried to predict the data:
new.dat <- data.frame(diameter= d$diameter,
plant_density = d$plant_density,
plot= d$plot)
new.dat$prediction <- predict(glmm.model, new.data = new.dat,
type = "response", re.form = NA)
Unfortunately I get an output for every plot but wanted a generalized prediction for the diameter ~ plant density.
My goal is to create a plot like here, but with a regression model from glmmTMB which consider the random effect.
Thanks for ur help!
The ggeffects package makes this type of thing very easy to implement and customize.
For example
library('ggplot2')
library('glmmTMB')
library('ggeffects')
d <- data.frame (diameter = c(17,16,15,13,11, 19,17,15,11,11, 19,15,14,11,8),
plant_density = c(1000,2000,3000,4000,5000, 1000,2000,3000,4000,5000, 1000,2000,3000,4000,5000),
plotx = as.factor( c(1,1,1,1,1, 2,2,2,2,2, 3,3,3,3,3)))
glmm.model <- glmmTMB(diameter ~ plant_density + (1|plotx),
data = d,
family="gaussian")
# basically what your looking for
plot(ggpredict(glmm.model, terms = "plant_density"))
# with additional a change of limits on the y-axis
plot(ggpredict(glmm.model, terms = "plant_density")) +
scale_y_continuous(limits = c(0, 20))
You can really do whatever you'd like with it from there, changing colors, themes, scales, the package has some nice vignettes as well.
Related
Using the 'iris' dataset (sightly modified as below), I plot the results of an LME.
PLEASE NOTE: I am only using the iris dataset as mock data for the purpose of plotting, so please do not critique the appropriateness of this test. I'm not interested in the statistics, rather the plotting.
Using ggpredict function and plotting the results, the plot extends the predictions beyond the range of the data. Is there a systematic way plot predictions only within the range of each faceted data?
I can plot each facet separately, limit the axis per plot manually, and cowplot them back together, but if there is way to say 'predict only to the max. and min. of the data for that group', this would be great.
Given that these are facets of a single model, perhaps not showing the predictions for different groups is in fact misleading, and I should rather create three different models if I only want predictions within those data subsets?
library(lme4)
library(ggeffects)
library(ggplot2)
data(iris)
glimpse(iris)
df = iris
glimpse(df)
df_ed = df %>% group_by(Species) %>% mutate(Sepal.Length = ifelse(Species == "setosa",Sepal.Length+10,Sepal.Length+0))
df_ed = df_ed %>% group_by(Species) %>% mutate(Sepal.Length = ifelse(Species == "versicolor",Sepal.Length-3,Sepal.Length+0))
glimpse(df_ed)
m_test =
lmer(Sepal.Width ~ Sepal.Length * Species +
(1|Petal.Width),
data = df_ed, REML = T)
summary(m_test)
test_plot = ggpredict(m_test, c("Sepal.Length", "Species"), type = "re") %>% plot(rawdata = T, dot.alpha = 0.6, facet = T, alpha = 0.3)
As per the OP's comment, I think this will provide a solution. In this example, I use data from the sleepstudy dataset that comes with the lme4 package. First, we have to postulate a mixed model, which I generically call fit.
Note that I do not perform any hypothesis test to formally select an appropriate random-effects structure. Of course, this is essential to adequately capture the correlations in the repeated measurements, but falls outside the scope of this post.
library(lme4)
library(splines)
# quantiles of Days
quantile(sleepstudy$Days, c(0.05, 0.95))
# 5% 95%
# 0 9
# mixed model
fit <- lmer(Reaction ~ ns(Days, df = 2, B = c(0, 9)) +
(Days | Subject), data = sleepstudy)
# new data.frame for prediction
ND <- with(sleepstudy, expand.grid(Days = seq(0L, 9L, len = 50)))
Then, we need a fucntion that enables us to obtain predictions from fit for certain values of the covariates. The function effectPlot_lmer() takes the following arguments:
object: a character string indicating the merMod object that was fitted (the mixed model).
ND: a character string indicating the new data.frame, which specifies the values of the covariates for which we want to obtain predictions.
orig_data: a character string specifying the data on which the mixed model was fitted.
# function to obtain predicted reaction times
effectPlot_lmer <- function (object, ND, orig_data) {
form <- formula(object, fixed.only = TRUE)
namesVars <- all.vars(form)
betas <- fixef(object)
V <- vcov(object)
orig_data <- orig_data[complete.cases(orig_data[namesVars]), ]
Terms <- delete.response(terms(form))
mfX <- model.frame(Terms, data = orig_data)
Terms_new <- attr(mfX, "terms")
mfX_new <- model.frame(Terms_new, ND, xlev = .getXlevels(Terms, mfX))
X <- model.matrix(Terms_new, mfX_new)
pred <- c(X %*% betas)
ses <- sqrt(diag(X %*% V %*% t(X)))
ND$pred <- pred
ND$low <- pred - 1.96 * ses
ND$upp <- pred + 1.96 * ses
return(ND)
}
Finally, we can make an effect plot with ggplot.
# effect plot
library(ggplot2)
ggplot(effectPlot_lmer(fit, ND, orig_data = sleepstudy),
aes(x = Days, y = pred)) +
geom_line(size = 1.2, colour = 'blue4') +
geom_ribbon(aes(ymin = low, ymax = upp), colour = NA,
fill = adjustcolor('blue4', 0.2)) +
theme_bw() + ylab('Expected Reaction (ms)')
Problem: Using multilevel (mixed-effects) model and not sure what to set grouping variable to in order to generate predicted probabilities for a measured group-level variable from glmer model using merTools' predictInterval function.
Goal: Generate predicted probabilities and SEs/CIs across a range of values from a "second level" group-level variable.
Seeking: advice on how to properly do this or other recommendations to generate predicted probabilities and CIs the range of values for a group level variable from a glmer model.
library(lme4)
library(merTools)
library(ggplot2)
hier_data <- data_frame(pass = sample(c(0, 1), size = 1000, replace = T),
wt = rnorm(1000),
ht = sample(1:5, size = 1000, replace = T, prob = c(.1, .1, .6, .1, .1)),
school_funding = rnorm(1000),
school = rep(c("A", "B", "C", "D", "E"), each = 200))
mod <- glmer(pass ~ wt + ht + school_funding + (1 | school),
family = binomial("logit"), data = hier_data)
### Without school: error
ndata <- data.frame(wt = median(hier_data$wt),
ht = median(hier_data$ht),
school_funding = seq(from = min(hier_data$school_funding), to =max(hier_data$school_funding), length.out = 100))
pp <- cbind(ndata, predictInterval(merMod = mod,
newdata = ndata,
type = "probability"))
### Problem, when adding school variable: which school?
ndata <- data.frame(wt = median(hier_data$wt),
ht = median(hier_data$ht),
school_funding = seq(from = min(hier_data$school_funding), to =max(hier_data$school_funding), length.out = 100),
school = "A")
pp <- cbind(ndata, predictInterval(merMod = mod,
newdata = ndata,
type = "probability"))
ggplot(pp, aes(x = school_funding, y = fit)) +
geom_point() +
geom_errorbar(aes(ymin = lwr, ymax = upr))
It seems what you are trying to achieve is effects plots for your variables, with fast prediction intervals. Note first of all that predictInterval does not incorporate the uncertainty in the estimated values of the variance parameters, theta. If more accurate confidence intervals are needed, you should use the bootMer function as described in ?bootMer which uses bootstrapping to estimate the uncertainty. However it might simply be infeasible as the model size and complexity increases. Alternatively the effects package contains the capability to illustrate the effects of merMod objects (however the documentation is simply atrocious).
In general when illustrating the effects of merMod objects a question is "which effects?". Are you interested in the marginal effects or the conditional effects (such as variability in random effects?). If your model contains only first-order random effects (no random slopes), and you are interested in the uncertainty of the fixed-effect coefficient or the effect on the conditional mean, you can get away with using any school and specifying which = "fixed" in predictInterval as
pp <- cbind(ndata, predictInterval(merMod = mod,
newdata = ndata, #<= any school chosen
type = "probability",
which = "fixed"))
Note the size will depend on the chosen school and remaining coefficients as in standard models, and are thus not causal.
If you are interested in the marginal effect, there are multiple methods for approximating this. The optimal would be to bootstrap the predicted values of the marginal mean. Alternatively if the number of independent groups in your grouping variable is "large" enough, you could (maybe) average predicts intervals across groups as illustrated below
newData <- expand.grid(wt = median(hier_data$wt),
ht = median(hier_data$ht),
school = levels(hier_data$school),
school_funding = seq(from = min(hier_data$school_funding),
to = max(hier_data$school_funding),
length.out = 100))
pp <- predictInterval(merMod = mod,
newdata = newData,
type = "probability")
#Split predictions by every column but school
# And calculate estimated means
predictions <- do.call("rbind", lapply(split(as.data.frame(pp),
newData[, !names(newData) == "school"]),
colMeans))
rownames(predictions) <- 1:nrow(predictions)
#create a plot
ggplot(as.data.frame(cbind(predictions, funding = newData$school_funding[newData$school == "A"])),
aes(x = funding, y = fit, ymax = upr, ymin = lwr)) +
geom_point() +
geom_errorbar()
For this example the model is more often than not singular and contains very few groups, and as such the result is a unlikely to be a great estimator for the marginal effect, but outside of extracting the simulations from predictInterval it might suffice. It is likely going to improve with models with more grouping levels in the random effect. predictInterval doesn't seem to incorporate a method for this situation directly.
An alternative for looking at marginal effects would be assuming marginal mean of the form 1/(1+exp(-eta) (which is often assumed for new groupings of the random effect). This isn't directly implemented in the predictInterval function, but can be achieved by substracting the random effect from the linear predicter, and only estimating the randomness of the fixed effects, as below:
pp <- predictInterval(merMod = mod,
newdata = ndata, #<= any school chosen
type = "linear.prediction",
which = "fixed")
#remove random effects
pp <- sweep(pp, 1, predict(mod, newdata = ndata, random.only = TRUE), "-")
pp <- 1/(1+exp(-pp))
which could then be plotted using similar methods. For fewer groups this might be a better predictor for the marginal mean(?, someone might correct me here).
In either case, adding a bit of x-jitter might improve the plot.
In all cases there might be some golden nuggets in the references to GLMM FAQ by bolker and others.
I try to make a plot for standard purposes with zero inflated model and zero inflated mixed model using ggplot2 without success. For this, I try:
#Packages
library(pscl)
library(glmmTMB)
library(ggplot2)
library(gridExtra)
# Artificial data set
set.seed(007)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 5 # maximum follow-up time
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
DF$y <- rnbinom(n * K, size = 2, mu = exp(1.552966))
str(DF)
Using zero inflated poisson model with pscl package
time2<-(DF$time)^2
mZIP <- zeroinfl(y~time+time2+sex|time+sex, data=DF)
summary(mZIP)
If I imagine thal all coefficients are significant
# Y estimated
pred.data1 = data.frame(
time<-DF$time,
time2<-(DF$time)^2,
sex<-DF$sex)
pred.data1$y = predict(mZIP, newdata=pred.data1, type="response")
Now using zero inflated poisson mixed model with glmmTMB package
mZIPmix<- glmmTMB(y~time+time2+sex+(1|id),
data=DF, ziformula=~1,family=poisson)
summary(mZIPmix)
#
# new Y estimated
pred.data2 = data.frame(
time<-DF$time,
time2<-(DF$time)^2,
sex<-DF$sex,
id<-DF$id)
pred.data2$y = predict(mZIPmix, newdata=pred.data2, type="response")
Plot zero inflated poisson model and mixed poisson model
par(mfrow=c(1,2))
plot1<-ggplot(DF, aes(time, y, colour=sex)) +
labs(title="Zero inflated model") +
geom_point() +
geom_line(data=pred.data1) +
stat_smooth(method="glm", family=poisson(link="log"), formula = y~poly(x,2),fullrange=TRUE)
plot2<-ggplot(DF, aes(time, y, colour=sex)) +
labs(title="Zero inflated mixed model") +
geom_point() +
geom_line(data=pred.data2) +
stat_smooth(method="glm", family=poisson(link="log"), formula = y~poly(x,2),fullrange=TRUE)## here a don't find any method to mixed glm
grid.arrange(plot1, plot2, ncol=2)
#-
Doesn't work of sure. Is possible to make this using ggplot2?
Thanks in advance
I'm not sure, but it looks to me that you're looking for marginal effects. You can do this with the ggeffects-package. Here are two examples, using your simulated data, that create a ggplot-object, one with and one w/o raw data.
library(glmmTMB)
library(ggeffects)
mZIPmix<- glmmTMB(y~poly(time,2)+sex+(1|id), data=DF, ziformula=~1,family=poisson)
# compute marginal effects and create a plot.
# the tag "[all]" is useful for polynomial terms, to produce smoother plots
ggpredict(mZIPmix, c("time [all]", "sex")) %>% plot(rawdata = TRUE, jitter = .01)
ggpredict(mZIPmix, c("time [all]", "sex")) %>% plot(rawdata = FALSE)
Created on 2019-05-16 by the reprex package (v0.2.1)
Note that sex only has an "additive" effect. Maybe you want to model an intercation between time and sex?
mZIPmix<- glmmTMB(y~poly(time,2)*sex+(1|id), data=DF, ziformula=~1,family=poisson)
ggpredict(mZIPmix, c("time [all]", "sex")) %>% plot(rawdata = TRUE, jitter = .01)
ggpredict(mZIPmix, c("time [all]", "sex")) %>% plot()
Created on 2019-05-16 by the reprex package (v0.2.1)
After variable selection I usually end up in a model with a numerical covariable (2nd or 3rd degree). What I want to do is to plot using emmeans package preferentially. Is there a way of doing it?
I can do it using predict:
m1 <- lm(mpg ~ poly(disp,2), data = mtcars)
df <- cbind(disp = mtcars$disp, predict.lm(m1, interval = "confidence"))
df <- as.data.frame(df)
ggplot(data = df, aes(x = disp, y = fit)) +
geom_line() +
geom_ribbon(aes(ymin = lwr, ymax = upr, x = disp, y = fit),alpha = 0.2)
I didn't figured out a way of doing it using emmip neither emtrends
For illustration purposes, how could I do it using mixed models via lme?
m1 <- lme(mpg ~ poly(disp,2), random = ~1|factor(am), data = mtcars)
I suspect that your issue is due to the fact that by default, covariates are reduced to their means in emmeans. You can use theat or cov.reduce arguments to specify a larger number of values. See the documentation for ref_grid and vignette(“basics”, “emmeans”), or the index of vignette topics.
Using sjPlot:
plot_model(m1, terms = "disp [all]", type = "pred")
gives the same graphic.
Using emmeans:
em1 <- ref_grid(m1, at = list(disp = seq(min(mtcars$disp), max(mtcars$disp), 1)))
emmip(em1, ~disp, CIs = T)
returns a graphic with a small difference in layout. An alternative is to add the result to an object and plot as the way that I want to:
d1 <- emmip(em1, ~disp, CIs = T, plotit = F)
I have this data frame in R:
x = rep(seq(-10,10,1),each=5)
y = rep(0,length(x) )
weights = sample( seq(1,20,1) ,length(x), replace = TRUE)
weights = weights/sum(weights)
groups = rep( letters[1:5], times =length(x)/5 )
and some data that looks like this:
library(ggplot2)
ggplot(data = dat, aes(x = x, y = y, color = group))+geom_point( aes(size = weights))+
ylab("outcome")+
xlab("predictor x1")+
geom_vline(xintercept = 0)+ geom_hline(yintercept = 0)
fit_brms = brm(y~ s(x)+(1|group), data = dat)
by_group = marginal_effects(fit_brms, conditions = data.frame(group = dat$group) ,
re_formula = NULL, method = "predict")
plot(by_group, ncol = 5, points = TRUE)
I'd like to make a hierarchical nonlinear model so that there is a different nonlinear fit for each group.
In brms I have the code below which is doing a spline fit on the x predictor with random intercepts on group the fitted line is the same for all groups. the difference is where the lines cross the y intercept. Is there a way to make the non-linear fit be different for each group's data points?
ON page 13 here : https://cran.r-project.org/web/packages/brms/vignettes/brms_multilevel.pdf
It states "As the smooth term itself cannot be modeled as varying by year in a multilevel manner,we add a basic varying intercept in an effort to account for variation between years"
So the spline will be the same for all groups it appears? The only difference in the plots is where the spline cross the y intercept. That seems very restrictive. Can this be modified to make the spline unique to each group?
Use the formula: y ~ s(x, by = group) + (1|group)