Problem combining ranef plots of two models with grid.arrange() - r

I have two LMEs:
lme1 <- lmer(F1 ~ (phoneme|individual) + (1|word) + frequency,
data = nurse_female)
lme2 <- lmer(F2 ~ (phoneme|individual) + (1|word) +
frequency + age + (1|zduration),
data = nurse_female)
I created simple dotplots dotplot(ranef(lme1)) of the random effects which creates a plot for each random predictor. I am however only interested in the phoneme|individual one which looks like this:
Normally I would use grid.arrange() but I can't get it to only select the phoneme|individual plots. Do you know a way to do this?

(A reproducible example would be useful, I hope this example does what you want ...). I think the key here is to recognize that the dotplot.ranef.mer method returns a list of plots:
library(lme4)
fm1 <- lmer(angle ~ (1|recipe) + (1|recipe:replicate), cake, REML= FALSE)
dd <- dotplot(ranef(fm1))
length(dd) ## 2
They're not necessarily in the same order as in the formula:
names(dd) ## [1] "recipe:replicate" "recipe"
print(dd[["recipe"]])
print(dd[["recipe:replicate"]])
So you would want something like
f <- function(m) dotplot(ranef(m))[["individual"]]
gridExtra::grid.arrange(f(lme1),f(lme2))

Related

plotting an interaction term in moderated regression using MICE imputation

I'm using imputed data to test a series of regression models, including some moderation models.
Imputation
imp_data <- mice(data,m=20,maxit=20,meth='cart',seed=12345)
I then convert this to long format so I can recode / sum variables as needed, beore turning back to mids format
impdatlong_mids<-as.mids(impdat_long)
Example model:
model1 <- with(impdatlong_mids,
lm(Outcome ~ p1_sex + p2 + p3 + p4
+ p5+ p6+ p7+ p8+ p9+ p10
+ p11+ p1_sex*p12+ p1_sex*p13 + p14)
in non-imputed data, to create a graphic representation of the significant ineraction, I'd use (e.g.)
interact_plot (model=model1, pred = p1_sex, modx = p12)
This doesn't work with imputed data / mids objects.
Has anyone plotted an interaction using imputed data, and able to help or share examples?
Thanks
EDIT: Reproducible example
library(tidyverse)
library(interactions)
library(mice)
# library(reprex) does not work with this
set.seed(42)
options(warn=-1)
#---------------------------------------#
# Data preparations
# loading an editing data
d <- mtcars
d <- d %>% mutate_at(c('cyl','am'),factor)
# create missing data and impute it
mi_d <- d
nr_of_NAs <- 30
for (i in 1:nr_of_NAs) {
mi_d[sample(nrow(mi_d),1),sample(ncol(mi_d),1)] <- NA
}
mi_d <- mice(mi_d, m=2, maxit=2)
#---------------------------------------#
# regressions
#not imputed
lm_d <- lm(qsec ~ cyl*am + mpg*disp, data=d)
#imputed dataset
lm_mi <- with(mi_d,lm(qsec ~ cyl*am + mpg*disp))
lm_mi_pool <- pool(lm_mi)
#---------------------------------------#
# interaction plots
# not imputed
#continuous
interactions::interact_plot(lm_d, pred=mpg,modx=disp, interval=T,int.width=0.3)
#categorical
interactions::cat_plot(lm_d, pred = cyl, modx = am)
#---------------------------------------#
# interaction plots
# imputed
#continuous
interactions::interact_plot(lm_mi_pool, pred=mpg,modx=disp, interval=T,int.width=0.3)
# Error in model.frame.default(model) : object is not a matrix
#categorical
interactions::cat_plot(lm_mi_pool, pred = cyl, modx = am)
# Error in model.frame.default(model) : object is not a matrix
The problem seems to be that neither interact_plot, cat_plot or any other available package allows for (at least categorical) interaction plotting with objects of class mipo or pooled regression outputs.
I am using the walking data from the mice package as an example. One way to get the interaction plot (well version of one type of interaction plot) is to use the gtsummary package. Under the hood it will take the model1 use pool() from mice to average over the models and then use a combo of tbl_regression() and plot() to output a plot of the coefficients in the model. The tbl_regression() function is what is calling the pool() function.
library(mice)
library(dplyr)
library(gtsummary)
imp_data <- mice(mice::walking,m=20,maxit=20,meth='cart',seed=12345)
model1 <- with(imp_data,
lm(age ~ sex*YA))
model1 %>%
tbl_regression() %>%
plot()
The package emmeans allows you to extract interaction effects from a mira object. Here is a gentle introduction. After that, the interactions can be plotted with appropriate ggplot. This example is for the categorical variables but could be extended to the continous case - after the emmeans part things get relatively straighforward.
library(ggplot2)
library(ggstance)
library(emmeans)
library(khroma)
library(jtools)
lm_mi <- with(mi_d,lm(qsec ~ gear*carb))
#extracting interaction effects
emcatcat <- emmeans(lm_mi, ~gear*carb)
tidy <- as_tibble(emcatcat)
#plotting
pd <- position_dodge(0.5)
ggplot(tidy, aes(y=gear, x=emmean, colour=carb)) +
geom_linerangeh(aes(xmin=lower.CL, xmax=upper.CL), position=pd,size = 2) +
geom_point(position=pd,size = 4)+
ggtitle('Interactions') +
labs (x = "aggreageted interaction effect") +
scale_color_bright() +
theme_nice()
this can be extended to a three-way interaction plot with facet_grid as long as you have a third categorical interaction term.

Not getting a smooth curve using ggplot2

I am trying to fitting a mixed effects models using lme4 package. Unfortunately I cannot share the data that i am working with. Also i couldn't find a toy data set is relevant to my problem . So here i have showed the steps that i followed so far :
First i plotted the overall trend of the data as follows :
p21 <- ggplot(data = sub_data, aes(x = age_cent, y = y))
p21+ geom_point() + geom_smooth()
Based on this , there seems to be a some nonlinear trend in the data. Hence I tried to fit the quadratic model as follows :
sub_data$age_cent=sub_data$age-mean((sub_data)$age)
sub_data$age_centsqr=(sub_data$age-mean((sub_data)$age))^2
m1= lmer(y ~ 1 + age_cent + age_centsqr +(1 | id) , sub_data, REML = TRUE)
In the above model i only included a random intercept because i don't have enough data to include both random slope and intercept.Then i extracted the predictions of these model at population level as follows :
pred1=predict(m1,re.form=NA)
Next I plotted these predictions along with a smooth quadratic function like this
p21+ geom_point() + geom_smooth(method = "lm", formula = y ~ I(x) + I(x^2)
,col="red")+geom_line(aes(y=pred1,group = id) ,col="blue", lwd = 0.5)
In the above plot , the curve corresponds to predictions are not smooth. Can any one helps me to figure out the reason for that ?
I am doing anything wrong here ?
Update :
As eipi10 pointed out , this may due to fitting different curves for different people.
But when i tried the same thing using a toy data set which is in the lme4 package , i got the same curve for each person as follows :
m1 <- lmer(Reaction ~ 1+I(Days) + (1+ Days| Subject) , data = sleepstudy)
pred1new1=predict(m1,re.form=NA)
p21 <- ggplot(data = sleepstudy, aes(x = Days, y = Reaction))
p21+ geom_point() + geom_smooth()
p21+ geom_point() + geom_smooth()+ geom_line(aes(y=pred1new1,group = Subject) ,col="red", lwd = 0.5)
What may be the reason the for different results ? Is this due to unbalance of the data ?
The data i used collected in 3 time steps and some people didn't have it for all 3 time steps. But the toy data set is a balanced data set.
Thank you
tl;dr use expand.grid() or something like it to generate a balanced/evenly spaced sample for every group (if you have a strongly nonlinear curve you may want to generate a larger/more finely spaced set of x values than in the original data)
You could also take a look at the sjPlot package, which does a lot of this stuff automatically ...
You need both an unbalanced data set and a non-linear (e.g. polynomial) model for the fixed effects to see this effect.
if the model is linear, then you don't notice missing values because the linear interpolation done by geom_line() works perfectly
if the data are balanced then there are no gaps to get weirdly filled by linear interpolation
Generate an example with quadratic effects and an unbalanced data set; fit the model
library(lme4)
set.seed(101)
dd <- expand.grid(id=factor(1:10),x=1:10)
dd$y <- simulate(~poly(x,2)+(poly(x,2)|id),
newdata=dd,
family=gaussian,
newparams=list(beta=c(0,0,0.1),
theta=rep(0.1,6),
sigma=1))[[1]]
## subsample randomly (missing values)
dd <- dd[sort(sample(nrow(dd),size=round(0.7*nrow(dd)))),]
m1 <- lmer(y ~ poly(x,2) + (poly(x,2)|id) , data = dd)
Naive prediction and plot:
dd$pred1 <- predict(m1,re.form=NA)
library(ggplot2)
p11 <- (ggplot(data = dd, aes(x = x, y = y))
+ geom_point() + geom_smooth(method="lm",formula=y~poly(x,2))
)
p11 + geom_line(aes(y=pred1,group = id) ,col="red", lwd = 0.5)
Now generate a balanced data set. This version generates 51 evenly spaced points between the min and max - this will be useful if the original data are unevenly spaced. If you have NA values in your x variable, don't forget na.rm=TRUE ...
pframe <- with(dd,expand.grid(id=levels(id),x=seq(min(x),max(x),length.out=51)
Make predictions, and overlay them on the original plot:
pframe$pred1 <- predict(m1,newdata=pframe,re.form=NA)
p11 + geom_line(data=pframe,aes(y=pred1,group = id) ,col="red", lwd = 0.5)

R multcomp: extract cld output

I built a linear mixed model to analyse my data and used the package multcomp for pairwise comparisons. I already created a ggplot with the model output and I'd like to put the cld() output (letters) above the bars.
But I have no idea how to extract the letters from the cld() output. It is a list of 10, containing many objects and vectors and frankly I don't fully understand half of them (moreover, I have little experience with lists though I read about them several times and did many basic exercises, I still find it difficult to grasp).
When you run cld() in R, it shows you exactly what you need in the console, and that's what I'd like to extract. My internet searches did not produce any solution so far.
code
# packages
library(lme4)
library(multcomp)
library(ggplot2)
# dummy dataset
treatment <- c(rep("X",4),rep("Y",4),rep("Z",4))
replicate <- rep(c("A","B","C","D"),3)
Y <- c(18.853,20.165,20.120,21.000,18.772,19.825,20.874,19.001,22.007,21.875,21.235,21.904)
data <- data.frame(treatment,replicate,Y)
# model
lm1 <- lmer(Y ~ treatment + (1|replicate), data=data)
drop1(lm1, test="Chisq")
# pairwise comparison
pc1 <- glht(lm1, mcp(treatment = "Tukey"))
summary(pc1)
cld(pc1, level=0.05) # how to extract this result?
# parameter estimates
lm1.e <- lmer(Y ~ treatment -1 + (1|replicate), data=data)
# plotfile
pfile <- data.frame(treatment=c("X","Y","Z"))
pfile$Y <- fixef(lm1.e)
pfile$SE <- summary(lm1.e)$coefficients[,2]
pfile$minSE <- pfile$Y-pfile$SE
pfile$maxSE <- pfile$Y+pfile$SE
# plot
ggplot(pfile, aes(treatment, Y)) + # how to add cld() result above bars?
xlab("Treatment") +
ylab(expression(paste("Y (g)", sep=""))) +
geom_col(position = "dodge", color="black", fill="white") +
geom_errorbar(aes(ymin=minSE, ymax=maxSE), width=.2, position=position_dodge(0.9))
With multcomp:::print.cld you can see the code, where you'll see that it basically calls: print(x$mcletters$Letters)

Using ggplot2 to plot an already-existing linear model

Let's say that I have some data and I have created a linear model to fit the data. Then I plot the data using ggplot2 and I want to add the linear model to the plot. As far as I know, this is the standard way of doing it (using the built-in cars dataset):
library(ggplot2)
fit <- lm(dist ~ speed, data = cars)
summary(fit)
p <- ggplot(cars, aes(speed, dist))
p <- p + geom_point()
p <- p + geom_smooth(method='lm')
p
However, the above violates the DRY principle ('don't repeat yourself'): it involves creating the linear model in the call to lm and then recreating it in the call to geom_smooth. This seems inelegant to me, and it also introduces a space for bugs. For example, if I change the model that is created with lm but forget to change the model that is created with geom_smooth, then the summary and the plot won't be of the same model.
Is there a way of using ggplot2 to plot an already existing linear model, e.g. by passing the lm object itself to the geom_smooth function?
What one needs to do is to create a new data frame with the observations from the old one plus the predicted values from the model, then plot that dataframe using ggplot2.
library(ggplot2)
# create and summarise model
cars.model <- lm(dist ~ speed, data = cars)
summary(cars.model)
# add 'fit', 'lwr', and 'upr' columns to dataframe (generated by predict)
cars.predict <- cbind(cars, predict(cars.model, interval = 'confidence'))
# plot the points (actual observations), regression line, and confidence interval
p <- ggplot(cars.predict, aes(speed,dist))
p <- p + geom_point()
p <- p + geom_line(aes(speed, fit))
p <- p + geom_ribbon(aes(ymin=lwr,ymax=upr), alpha=0.3)
p
The great advantage of doing this is that if one changes the model (e.g. cars.model <- lm(dist ~ poly(speed, 2), data = cars)) then the plot and the summary will both change.
Thanks to Plamen Petrov for making me realise what was needed here. As he points out, this approach will only work if predict is defined for the model in question; if not, one has to define it oneself.
I believe you want to do something along the lines of :
library(ggplot2)
# install.packages('dplyr')
library(dplyr)
fit <- lm(dist ~ speed, data = cars)
cars %>%
mutate( my_model = predict(fit) ) %>%
ggplot() +
geom_point( aes(speed, dist) ) +
geom_line( aes(speed, my_model) )
This will also work for more complex models as long as the corresponding predict method is defined. Otherwise you will need to define it yourself.
In the case of linear model you can add the confidence/prediction bands with slightly more work and reproduce your plot.

Overlay 2 allEffects graphs

I have the following model
require(effects)
fit<-lme(x ~ y, data, random= ~1|item)
plot(allEffects(fit)
fit2<-lme(x ~ y, data2, random = ~1|item)
plot(allEffects(fit2)
How can I plot fit and fit2 overlaying? I have tried the par(new=T), but it does not work. The graphs plot fine individually.
I'm not sure there's a very nice way to do this. I usually extract the information from the effects structure and plot it with ggplot (lattice would be possible too).
Here's an example:
library(effects)
library(nlme)
library(plyr) ## utilities
Fit a model to the first and second half of one of the standard example data sets:
fm1 <- lme(distance ~ age, random = ~1|Subject,
data = Orthodont[1:54,])
fm2 <- update(fm1, data = Orthodont[55:108,])
a1 <- allEffects(fm1)
a2 <- allEffects(fm2)
Extract the information from the efflist object. This is the part that isn't completely general ... the hard part is getting out the predictor variable.
as.data.frame.efflist <- function(x) {
ldply(x,
function(z) {
r <- with(z,data.frame(fit,
var=variables[[1]]$levels,
lower,upper))
return(plyr::rename(r,setNames(z$variables[[1]]$name,"var")))
})
}
For convenience, use ldply to put the results of both models together:
comb <- ldply(list(fm1=a1,fm2=a2),as.data.frame,.id="model")
Now plot:
library(ggplot2); theme_set(theme_bw())
ggplot(comb,aes(age,fit,
ymin=lower,ymax=upper,
colour=model,fill=model))+
geom_line()+
geom_ribbon(alpha=0.2,colour=NA)+
geom_rug(sides="b")
The rug plot component is a little silly here.

Resources