How to plot the results of a mixed model - r

I use lme4 in R to fit the mixed model
lmer(value~status+(1|experiment)))
where value is continuous, status(N/D/R) and experiment are factors, and I get
Linear mixed model fit by REML
Formula: value ~ status + (1 | experiment)
AIC BIC logLik deviance REMLdev
29.1 46.98 -9.548 5.911 19.1
Random effects:
Groups Name Variance Std.Dev.
experiment (Intercept) 0.065526 0.25598
Residual 0.053029 0.23028
Number of obs: 264, groups: experiment, 10
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.78004 0.08448 32.91
statusD 0.20493 0.03389 6.05
statusR 0.88690 0.03583 24.76
Correlation of Fixed Effects:
(Intr) statsD
statusD -0.204
statusR -0.193 0.476
I would like to graphically represent the fixed effects evaluation. However the seems to be no plot function for these objects. Is there any way I can graphically depict the fixed effects?

Using coefplot2 (on r-forge):
Stealing the simulation code from #Thierry:
set.seed(101)
dataset <- expand.grid(experiment = factor(seq_len(10)),
status = factor(c("N", "D", "R"), levels = c("N", "D", "R")),
reps = seq_len(10))
X <- model.matrix(~status,dataset)
dataset <- transform(dataset,
value=rnorm(nrow(dataset), sd = 0.23) + ## residual
rnorm(length(levels(experiment)), sd = 0.256)[experiment] + ## block effects
X %*% c(2.78,0.205,0.887)) ## fixed effects
Fit model:
library(lme4)
model <- lmer(value~status+(1|experiment), data = dataset)
Plot:
install.packages("coefplot2",repos="http://r-forge.r-project.org")
library(coefplot2)
coefplot2(model)
edit:
I have frequently been having problems with the R-Forge build. This fallback should work if the R-Forge build is not working:
install.packages("coefplot2",
repos="http://www.math.mcmaster.ca/bolker/R",
type="source")
Note that the coda dependency must already be installed.

I like the coefficient confidence interval plots, but it may be useful to consider some additional plots to understand the fixed effects..
Stealing the simulation code from #Thierry:
library(ggplot2)
library(lme4)
library(multcomp)
dataset <- expand.grid(experiment = factor(seq_len(10)), status = factor(c("N", "D", "R"), levels = c("N", "D", "R")), reps = seq_len(10))
dataset$value <- rnorm(nrow(dataset), sd = 0.23) + with(dataset, rnorm(length(levels(experiment)), sd = 0.256)[experiment] + ifelse(status == "D", 0.205, ifelse(status == "R", 0.887, 0))) + 2.78
model <- lmer(value~status+(1|experiment), data = dataset)
Get a look at the structure of the data...looks balanced..
library(plotrix); sizetree(dataset[,c(1,2)])
It might be interesting to track the correlation between fixed effects, especially if you fit different correlation structures. There's some cool code provided at the following link...
http://hlplab.wordpress.com/2012/03/20/correlation-plot-matrices-using-the-ellipse-library/
my.plotcorr(
matrix(c(1, .891, .891,
.891, 1, .891,
.891, .891, 1), nrow=3)
)
Finally it seems relevant to look at the variability across the 10 experiments as well as the variability across "status" within experiments. I'm still working on the code for this as I break it on unbalanced data, but the idea is...
My2Boxes(m=4,f1=dataset$experiment,f2=dataset$status,x=dataset$value,color=c("red","yellow","green"))
Finally the already mentioned Piniero and Bates (2000) book strongly favored lattice from what little I've skimmed.. So you might give that a shot. Maybe something like plotting the raw data...
lattice::xyplot(value~status | experiment, groups=experiment, data=dataset, type=c('p','r'), auto.key=F)
And then plotting the fitted values...
lattice::xyplot(fitted(model)~status | experiment, groups=experiment, data=dataset, type=c('p','r'), auto.key=F)

Here are a few suggestions.
library(ggplot2)
library(lme4)
library(multcomp)
# Creating datasets to get same results as question
dataset <- expand.grid(experiment = factor(seq_len(10)),
status = factor(c("N", "D", "R"),
levels = c("N", "D", "R")),
reps = seq_len(10))
dataset$value <- rnorm(nrow(dataset), sd = 0.23) +
with(dataset, rnorm(length(levels(experiment)),
sd = 0.256)[experiment] +
ifelse(status == "D", 0.205,
ifelse(status == "R", 0.887, 0))) +
2.78
# Fitting model
model <- lmer(value~status+(1|experiment), data = dataset)
# First possibility
tmp <- as.data.frame(confint(glht(model, mcp(status = "Tukey")))$confint)
tmp$Comparison <- rownames(tmp)
ggplot(tmp, aes(x = Comparison, y = Estimate, ymin = lwr, ymax = upr)) +
geom_errorbar() + geom_point()
# Second possibility
tmp <- as.data.frame(confint(glht(model))$confint)
tmp$Comparison <- rownames(tmp)
ggplot(tmp, aes(x = Comparison, y = Estimate, ymin = lwr, ymax = upr)) +
geom_errorbar() + geom_point()
# Third possibility
model <- lmer(value ~ 0 + status + (1|experiment), data = dataset)
tmp <- as.data.frame(confint(glht(model))$confint)
tmp$Comparison <- rownames(tmp)
ggplot(tmp, aes(x = Comparison, y = Estimate, ymin = lwr, ymax = upr)) +
geom_errorbar() + geom_point()

This answer illustrates the newer dotwhisker::dwplot + broom.mixed solution.
Adding one more variable in the simulation:
dataset <- transform(dataset,
value=rnorm(nrow(dataset), sd = 0.23) + ## residual
rnorm(length(levels(experiment)), sd = 0.256)[experiment] + ## block effects
X %*% c(2.78,0.205,0.887),
var2=rnorm(nrow(dataset))) ## fixed effects
Fitting two different models:
library(lme4)
model <- lmer(value~status+var2 + (1|experiment), data = dataset)
model2 <- update(model, . ~ . -var2)
Plotting:
library(broom.mixed)
library(dotwhisker)
dwplot(list(first=model,second=model2), effects="fixed")+
geom_vline(xintercept=0, lty=2)
(using effects="fixed" gets us just the fixed-effect parameters, dropping the intercept by default).
broom.mixed has many other options. When I want to do something complex I may use ggplot + ggstance::geom_pointrangeh (+ position="position_dodgev") to make my own custom plot rather than relying on dotwhisker::dwplot().

Related

Stop plotting predictions beyond data limits LME ggpredict Effects

Using the 'iris' dataset (sightly modified as below), I plot the results of an LME.
PLEASE NOTE: I am only using the iris dataset as mock data for the purpose of plotting, so please do not critique the appropriateness of this test. I'm not interested in the statistics, rather the plotting.
Using ggpredict function and plotting the results, the plot extends the predictions beyond the range of the data. Is there a systematic way plot predictions only within the range of each faceted data?
I can plot each facet separately, limit the axis per plot manually, and cowplot them back together, but if there is way to say 'predict only to the max. and min. of the data for that group', this would be great.
Given that these are facets of a single model, perhaps not showing the predictions for different groups is in fact misleading, and I should rather create three different models if I only want predictions within those data subsets?
library(lme4)
library(ggeffects)
library(ggplot2)
data(iris)
glimpse(iris)
df = iris
glimpse(df)
df_ed = df %>% group_by(Species) %>% mutate(Sepal.Length = ifelse(Species == "setosa",Sepal.Length+10,Sepal.Length+0))
df_ed = df_ed %>% group_by(Species) %>% mutate(Sepal.Length = ifelse(Species == "versicolor",Sepal.Length-3,Sepal.Length+0))
glimpse(df_ed)
m_test =
lmer(Sepal.Width ~ Sepal.Length * Species +
(1|Petal.Width),
data = df_ed, REML = T)
summary(m_test)
test_plot = ggpredict(m_test, c("Sepal.Length", "Species"), type = "re") %>% plot(rawdata = T, dot.alpha = 0.6, facet = T, alpha = 0.3)
As per the OP's comment, I think this will provide a solution. In this example, I use data from the sleepstudy dataset that comes with the lme4 package. First, we have to postulate a mixed model, which I generically call fit.
Note that I do not perform any hypothesis test to formally select an appropriate random-effects structure. Of course, this is essential to adequately capture the correlations in the repeated measurements, but falls outside the scope of this post.
library(lme4)
library(splines)
# quantiles of Days
quantile(sleepstudy$Days, c(0.05, 0.95))
# 5% 95%
# 0 9
# mixed model
fit <- lmer(Reaction ~ ns(Days, df = 2, B = c(0, 9)) +
(Days | Subject), data = sleepstudy)
# new data.frame for prediction
ND <- with(sleepstudy, expand.grid(Days = seq(0L, 9L, len = 50)))
Then, we need a fucntion that enables us to obtain predictions from fit for certain values of the covariates. The function effectPlot_lmer() takes the following arguments:
object: a character string indicating the merMod object that was fitted (the mixed model).
ND: a character string indicating the new data.frame, which specifies the values of the covariates for which we want to obtain predictions.
orig_data: a character string specifying the data on which the mixed model was fitted.
# function to obtain predicted reaction times
effectPlot_lmer <- function (object, ND, orig_data) {
form <- formula(object, fixed.only = TRUE)
namesVars <- all.vars(form)
betas <- fixef(object)
V <- vcov(object)
orig_data <- orig_data[complete.cases(orig_data[namesVars]), ]
Terms <- delete.response(terms(form))
mfX <- model.frame(Terms, data = orig_data)
Terms_new <- attr(mfX, "terms")
mfX_new <- model.frame(Terms_new, ND, xlev = .getXlevels(Terms, mfX))
X <- model.matrix(Terms_new, mfX_new)
pred <- c(X %*% betas)
ses <- sqrt(diag(X %*% V %*% t(X)))
ND$pred <- pred
ND$low <- pred - 1.96 * ses
ND$upp <- pred + 1.96 * ses
return(ND)
}
Finally, we can make an effect plot with ggplot.
# effect plot
library(ggplot2)
ggplot(effectPlot_lmer(fit, ND, orig_data = sleepstudy),
aes(x = Days, y = pred)) +
geom_line(size = 1.2, colour = 'blue4') +
geom_ribbon(aes(ymin = low, ymax = upp), colour = NA,
fill = adjustcolor('blue4', 0.2)) +
theme_bw() + ylab('Expected Reaction (ms)')

How to plot marginal effect of an interaction after felm() function

I ran a regression based on a "giant" panel data with a bunch of unit fixed effects. So I employed function "felm()" from package "lfe". In addition, I have an interaction term of two continuous variables in the regression. But when plotting how the marginal effects of x on y vary with x2, it seems that the objects produced by "felm()" are often incompatible to most plotting functions like "ggplot", "interplot()" and "meplot". But I have to use "felm()" because I need to control for a large amount of unit fixed effects (like people do by "reghdfe" in Stata). So, how could I address this issues in R? Feel free to let me know some ways out. Thanks!
Here is an example about how interplot() does not work with felm():
# An example data:
library(lfe)
library(ggplot2)
library(interplot)
oldopts <- options(lfe.threads=1)
x <- rnorm(1000)
x2 <- rnorm(length(x))
id <- factor(sample(10,length(x),replace=TRUE))
firm <- factor(sample(3,length(x),replace=TRUE,prob=c(2,1.5,1)))
year <- factor(sample(10,length(x),replace=TRUE,prob=c(2,1.5,rep(1,8))))
id.eff <- rnorm(nlevels(id))
firm.eff <- rnorm(nlevels(firm))
year.eff <- rnorm(nlevels(year))
y <- x + 0.25*x2 + id.eff[id] + firm.eff[firm] +
year.eff[year] + rnorm(length(x))
mydata <- data.frame(cbind(y, x, x2, id, firm, year))
# Regression using felm():
reg1 <- felm(y ~ x + x2 + x:x2|id+firm+year|0|id, data=mydata)
summary(reg1)
# Using interplot() to plot marginal effects
interplot(m=reg1, var1="x", var2="x2", ci=0.9)
Then errors appear:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘sim’ for signature ‘"felm"’
Also I tried meplot() but it still does not work:
# Using meplot() to plot marginal effects
library(evir)
meplot(model=reg1, var1="x", var2="x2", int="x:x2", vcov=vcov(reg1), data=mydata)
I got an error:
Error in meplot(model = reg1, var1 = "x", var2 = "x2", int = "x:x2", vcov = vcov(reg1), :
(list) object cannot be coerced to type 'double'
I have used ggplot2, coef() and vcov() to realize what I want, plotting the marginal effects by hands.
library(ggplot2)
beta.hat <- coef(reg1)
vcov1 <- vcov(reg1)
z0 <- seq(min(x2), max(x2), length.out = 1000)
dy.dx <- beta.hat["x"] + beta.hat["x:x2"]*z0
se.dy.dx <- sqrt(vcov1["x", "x"] + (z0^2)*vcov1["x1:x2", "x1:x2"] + 2*z0*vcov1["x", "x1:x2"])
upr <- dy.dx + 1.96*se.dy.dx
lwr <- dy.dx - 1.96*se.dy.dx
ggplot(data=NULL, aes(x=z0, y=dy.dx)) +
labs(x="x2", y="Marginal Effects",
title=paste("Marginal Effects of x on y vary with x2"),
cex=4) +
geom_line(aes(z0, dy.dx),size = 1) +
geom_line(aes(z0, lwr), size = 1, linetype = 2, color="blue") +
geom_line(aes(z0, upr), size = 1, linetype = 2, color="blue") +
geom_hline(yintercept=0, size = 1, linetype=3) +
geom_ribbon(aes(ymin=lwr, ymax=upr), alpha=0.3)

Plot poisson mixed models with ggplot2

I try to make a plot for standard purposes with zero inflated model and zero inflated mixed model using ggplot2 without success. For this, I try:
#Packages
library(pscl)
library(glmmTMB)
library(ggplot2)
library(gridExtra)
# Artificial data set
set.seed(007)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 5 # maximum follow-up time
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
DF$y <- rnbinom(n * K, size = 2, mu = exp(1.552966))
str(DF)
Using zero inflated poisson model with pscl package
time2<-(DF$time)^2
mZIP <- zeroinfl(y~time+time2+sex|time+sex, data=DF)
summary(mZIP)
If I imagine thal all coefficients are significant
# Y estimated
pred.data1 = data.frame(
time<-DF$time,
time2<-(DF$time)^2,
sex<-DF$sex)
pred.data1$y = predict(mZIP, newdata=pred.data1, type="response")
Now using zero inflated poisson mixed model with glmmTMB package
mZIPmix<- glmmTMB(y~time+time2+sex+(1|id),
data=DF, ziformula=~1,family=poisson)
summary(mZIPmix)
#
# new Y estimated
pred.data2 = data.frame(
time<-DF$time,
time2<-(DF$time)^2,
sex<-DF$sex,
id<-DF$id)
pred.data2$y = predict(mZIPmix, newdata=pred.data2, type="response")
Plot zero inflated poisson model and mixed poisson model
par(mfrow=c(1,2))
plot1<-ggplot(DF, aes(time, y, colour=sex)) +
labs(title="Zero inflated model") +
geom_point() +
geom_line(data=pred.data1) +
stat_smooth(method="glm", family=poisson(link="log"), formula = y~poly(x,2),fullrange=TRUE)
plot2<-ggplot(DF, aes(time, y, colour=sex)) +
labs(title="Zero inflated mixed model") +
geom_point() +
geom_line(data=pred.data2) +
stat_smooth(method="glm", family=poisson(link="log"), formula = y~poly(x,2),fullrange=TRUE)## here a don't find any method to mixed glm
grid.arrange(plot1, plot2, ncol=2)
#-
Doesn't work of sure. Is possible to make this using ggplot2?
Thanks in advance
I'm not sure, but it looks to me that you're looking for marginal effects. You can do this with the ggeffects-package. Here are two examples, using your simulated data, that create a ggplot-object, one with and one w/o raw data.
library(glmmTMB)
library(ggeffects)
mZIPmix<- glmmTMB(y~poly(time,2)+sex+(1|id), data=DF, ziformula=~1,family=poisson)
# compute marginal effects and create a plot.
# the tag "[all]" is useful for polynomial terms, to produce smoother plots
ggpredict(mZIPmix, c("time [all]", "sex")) %>% plot(rawdata = TRUE, jitter = .01)
ggpredict(mZIPmix, c("time [all]", "sex")) %>% plot(rawdata = FALSE)
Created on 2019-05-16 by the reprex package (v0.2.1)
Note that sex only has an "additive" effect. Maybe you want to model an intercation between time and sex?
mZIPmix<- glmmTMB(y~poly(time,2)*sex+(1|id), data=DF, ziformula=~1,family=poisson)
ggpredict(mZIPmix, c("time [all]", "sex")) %>% plot(rawdata = TRUE, jitter = .01)
ggpredict(mZIPmix, c("time [all]", "sex")) %>% plot()
Created on 2019-05-16 by the reprex package (v0.2.1)

predict lme4 for type "response" (binary choices) -- Confidence intervals

I ran a mixed effect logistig regression with lme4 (type="response"). Now I used the predict feature and wanted also to determine confidence intervals.
I found this code ( http://glmm.wikidot.com/faq) for predictions, it works, but the CIs are not suitable for binary reponses (my predictions are between 0 and 1, and suddenly the CIs are between -3 and 3. Does anyone know where to adjust this?
library(lme4)
library(ggplot2) # Plotting
fm1<- glmer(choice~rating + indi + rating*indi + (1|ID),data=z,family="binomial")
newdat<-data.frame(indi=factor(c(1,1,1,1,1,1,2,2,2,2,2,2)),
rating=factor(1:6), ID=factor(rep(c(1:30), each=12)), choice=0)
newdat$prob<-predict(fm1,newdata=newdat, re.form=NULL, type="response")
mm <- model.matrix(terms(fm1),newdat)
newdat$choice<- predict(fm1,newdat)
## or newdat$choice<- mm %*% fixef(fm1)
pvar1 <- diag(mm %*% tcrossprod(vcov(fm1),mm))
tvar1 <- pvar1+VarCorr(fm1)$ID[1] ## must be adapted
tvar1 <-
newdat <- data.frame(
newdat
, plo = newdat$choice-2*sqrt(pvar1)
, phi = newdat$choice+2*sqrt(pvar1)
, tlo = newdat$choice-2*sqrt(tvar1)
, thi = newdat$choice+2*sqrt(tvar1)
)
#plot confidence
g0 <- ggplot(newdat, aes(x=rating, y=choice, colour=indi))+geom_point()
g0 + geom_errorbar(aes(ymin = plo, ymax = phi))+
opts(title="CI based on fixed-effects uncertainty ONLY")
#plot prediction
g0 + geom_errorbar(aes(ymin = tlo, ymax = thi))+
opts(title="CI based on FE uncertainty + RE variance")
Thanks a lot!
You need to use the inverse-link function, plogis() in this case:
newdat <- transform(newdat,
plo = plogis(plo),
phi = plogis(phi),
tlo = plogis(tlo),
thi = plogis(thi))
More generally if you have fitted a model gm1 the inverse-link function will be stored in gm1#resp$family$linkinv (although mucking around with the internals of an object like this is not guaranteed to remain compatible with future versions).

R: Determine the threshold that maximally separates two groups based on a continuous variable?

Say I have 200 subjects, 100 in group A and 100 in group B, and for each I measure some continuous parameter.
require(ggplot2)
set.seed(100)
value <- c(rnorm(100, mean = 5, sd = 3), rnorm(100, mean = 10, sd = 3))
group <- c(rep('A', 100), rep('B', 100))
data <- data.frame(value, group)
ggplot(data = data, aes(x = value)) +
geom_bar(aes(color = group))
I would like to determine the value (Threshold? Breakpoint?) that maximizes separation and minimizes misclassification between the groups. Does such a function exist in R?
I've tried searching along the lines of "r breakpoint maximal separation between groups," and "r threshold minimize misclassification," but my google-foo seems to be off today.
EDIT:
Responding to #Thomas's comment, I have tried to fit the data using logistic regression and then solve for the threshold, but I haven't gotten very far.
lr <- glm(group~value)
coef(lr)
# (Intercept) value
# 1.1857435 -0.0911762
So Bo = 1.1857435 and B1 = -0.0911762
From Wikipedia, I see that F(x) = 1/(1+e^-(Bo + B1x)), and solving for x:
x = (ln(F(x) / (1 - F(x))) - Bo)/B1
But trying this in R, I get an obviously incorrect answer:
(log(0.5/(1 - 0.5)) - 1.1857435)/-0.0911762 # 13.00497
A simple approach is to write a function that calculates the accuracy given a threshold:
accuracy = Vectorize(function(th) mean(c("A", "B")[(value > th) + 1] == group))
Then find the maximum using optimize:
optimize(accuracy, c(min(value), max(value)), maximum=TRUE)
# $maximum
# [1] 8.050888
#
# $objective
# [1] 0.86
I've gotten the answer I need thanks to help from #Thomas and #BenBolker.
Summary
The problem with my attempt at solving it through logistic regression was that I hadn't specified family = binomial
The dose.p() function in MASS will do the work for me given a glm fit
Code
# Include libraries
require(ggplot2)
require(MASS)
# Set seed
set.seed(100)
# Put together some dummy data
value <- c(rnorm(100, mean = 5, sd = 3), rnorm(100, mean = 10, sd = 3))
group <- c(rep(0, 100), rep(1, 100))
data <- data.frame(value, group)
# Plot the distribution -- visually
# The answer appears to be b/t 7 and 8
ggplot(data = data, aes(x = value)) +
geom_bar(aes(color = group))
# Fit a glm model, specifying the binomial distribution
my.glm <- glm(group~value, data = data, family = binomial)
b0 <- coef(my.glm)[[1]]
b1 <- coef(my.glm)[[2]]
# See what the probability function looks like
lr <- function(x, b0, b1) {
prob <- 1 / (1 + exp(-1*(b0 + b1*x)))
return(prob)
}
# The line appears to cross 0.5 just above 7.5
x <- -0:12
y <- lr(x, b0, b1)
lr.val <- data.frame(x, y)
ggplot(lr.val, aes(x = x, y = y)) +
geom_line()
# The inverse of this function computes the threshold for a given probability
inv.lr <- function(p, b0, b1) {
x <- (log(p / (1 - p)) - b0)/b1
return(x)
}
# With the betas from this function, we get 7.686814
inv.lr(0.5, b0, b1)
# Or, feeding the glm model into dose.p from MASS, we get the same answer
dose.p(my.glm, p = 0.5)
Thanks, everyone, for your help!

Resources