plotting an interaction term in moderated regression using MICE imputation - r

I'm using imputed data to test a series of regression models, including some moderation models.
Imputation
imp_data <- mice(data,m=20,maxit=20,meth='cart',seed=12345)
I then convert this to long format so I can recode / sum variables as needed, beore turning back to mids format
impdatlong_mids<-as.mids(impdat_long)
Example model:
model1 <- with(impdatlong_mids,
lm(Outcome ~ p1_sex + p2 + p3 + p4
+ p5+ p6+ p7+ p8+ p9+ p10
+ p11+ p1_sex*p12+ p1_sex*p13 + p14)
in non-imputed data, to create a graphic representation of the significant ineraction, I'd use (e.g.)
interact_plot (model=model1, pred = p1_sex, modx = p12)
This doesn't work with imputed data / mids objects.
Has anyone plotted an interaction using imputed data, and able to help or share examples?
Thanks
EDIT: Reproducible example
library(tidyverse)
library(interactions)
library(mice)
# library(reprex) does not work with this
set.seed(42)
options(warn=-1)
#---------------------------------------#
# Data preparations
# loading an editing data
d <- mtcars
d <- d %>% mutate_at(c('cyl','am'),factor)
# create missing data and impute it
mi_d <- d
nr_of_NAs <- 30
for (i in 1:nr_of_NAs) {
mi_d[sample(nrow(mi_d),1),sample(ncol(mi_d),1)] <- NA
}
mi_d <- mice(mi_d, m=2, maxit=2)
#---------------------------------------#
# regressions
#not imputed
lm_d <- lm(qsec ~ cyl*am + mpg*disp, data=d)
#imputed dataset
lm_mi <- with(mi_d,lm(qsec ~ cyl*am + mpg*disp))
lm_mi_pool <- pool(lm_mi)
#---------------------------------------#
# interaction plots
# not imputed
#continuous
interactions::interact_plot(lm_d, pred=mpg,modx=disp, interval=T,int.width=0.3)
#categorical
interactions::cat_plot(lm_d, pred = cyl, modx = am)
#---------------------------------------#
# interaction plots
# imputed
#continuous
interactions::interact_plot(lm_mi_pool, pred=mpg,modx=disp, interval=T,int.width=0.3)
# Error in model.frame.default(model) : object is not a matrix
#categorical
interactions::cat_plot(lm_mi_pool, pred = cyl, modx = am)
# Error in model.frame.default(model) : object is not a matrix
The problem seems to be that neither interact_plot, cat_plot or any other available package allows for (at least categorical) interaction plotting with objects of class mipo or pooled regression outputs.

I am using the walking data from the mice package as an example. One way to get the interaction plot (well version of one type of interaction plot) is to use the gtsummary package. Under the hood it will take the model1 use pool() from mice to average over the models and then use a combo of tbl_regression() and plot() to output a plot of the coefficients in the model. The tbl_regression() function is what is calling the pool() function.
library(mice)
library(dplyr)
library(gtsummary)
imp_data <- mice(mice::walking,m=20,maxit=20,meth='cart',seed=12345)
model1 <- with(imp_data,
lm(age ~ sex*YA))
model1 %>%
tbl_regression() %>%
plot()

The package emmeans allows you to extract interaction effects from a mira object. Here is a gentle introduction. After that, the interactions can be plotted with appropriate ggplot. This example is for the categorical variables but could be extended to the continous case - after the emmeans part things get relatively straighforward.
library(ggplot2)
library(ggstance)
library(emmeans)
library(khroma)
library(jtools)
lm_mi <- with(mi_d,lm(qsec ~ gear*carb))
#extracting interaction effects
emcatcat <- emmeans(lm_mi, ~gear*carb)
tidy <- as_tibble(emcatcat)
#plotting
pd <- position_dodge(0.5)
ggplot(tidy, aes(y=gear, x=emmean, colour=carb)) +
geom_linerangeh(aes(xmin=lower.CL, xmax=upper.CL), position=pd,size = 2) +
geom_point(position=pd,size = 4)+
ggtitle('Interactions') +
labs (x = "aggreageted interaction effect") +
scale_color_bright() +
theme_nice()
this can be extended to a three-way interaction plot with facet_grid as long as you have a third categorical interaction term.

Related

Getting an interaction plot from a pooled lme model with mids object

Preface - I really hope this makes sense!
I ran a linear-mixed effect model using an imputed dataset (FYI, the data is a mids object imputed using mice). The model has a three-way interaction with 3 continuous variables. I am now trying to plot the interaction using the interactions::interact_plot function. However, I'm receiving an error when I run the plot code, which I believe is due to the fact that the model came from a mids object and not a data frame. Does anyone know how to address this error or if there's a better way to get the plot that I'm trying to get?
Thanks very much in advance!
MIDmod1 <- with(data = df.mids, exp = lmer(GC ~ Age + Sex + Edu + Stress*Time*HLI + (1|ID)))
summary(pool(MIDmod1))
interact_plot(
model=MIDmod1,
pred = Time,
modx=Stress,
mod2=HLI,
data = df.mids,
interval=TRUE,
y.label='Global cognition composite score',
modx.labels=c('Low Baseline Stress (-1SD)','Moderate Baseline Stress (Mean)', 'High Baseline Stress (+1SD)'),
mod2.labels=c('Low HLI (-1SD)', 'Moderate HLI (Mean)', 'High HLI (+1SD)'),
legend.main='') + ylim(-2,2)
Error:
Error in rep(1, times = nrow(data)) : invalid 'times' argument
Note - I also get an error if I don't include the data argument (optional argument for this function).
Error in formula.default(object, env = baseenv()) : invalid formula
BTW - I am able to generate the plot when the model comes from a data frame - an example of what this should look like is included here: 1
Sorry, but it won’t be that easy. Multiple imputation object will definitely require special treatment, and none of the many R packages which can plot interactions are likely to work out of hte box.
Here’s a minimal example, adapted from the multiple imputation vignette of the marginaleffects package. (Disclaimer: I am the author.)
library(mice)
library(lme4)
library(ggplot2)
library(marginaleffects)
# insert missing data in an existing dataset and impute
iris_miss <- iris
iris_miss$Sepal.Width[sample(1:nrow(iris), 20)] <- NA
iris_mice <- mice(iris_miss, m = 20, printFlag = FALSE, .Random.seed = 1024)
iris_mice <- complete(iris_mice, "all")
# fit a model on 1 imputed datatset and use the `plot_predictions()` function
# with the `draw=FALSE` argument to extract the data that we want to plot
fit <- function(dat) {
mod <- lmer(Sepal.Width ~ Petal.Width * Petal.Length + (1 | Species), data = dat)
out <- plot_predictions(mod, condition = list("Petal.Width", "Petal.Length" = "threenum"), draw = FALSE)
# `mice` requires a unique row identifier called "term"
out$term <- out$rowid
class(out) <- c("custom", class(out))
return(out)
}
# `tidy.custom()` is needed by `mice` to combine datasets, but the output of fit() also has
# the right structure and column names, so it is useless
tidy.custom <- function(x, ...) return(x)
# Fit on each imputation
mod_mice <- lapply(iris_mice, fit)
# Pool
mod_pool <- pool(mod_mice)$pooled
# Merge back some of the covariates
datplot <- data.frame(mod_pool, mod_mice[[1]][, c("Petal.Width", "Petal.Length")])
# Plot
ggplot(datplot, aes(Petal.Width, estimate, color = Petal.Length)) +
geom_line() +
theme_minimal()

Extract Model for Specific Factor

Say I've fit a model as follows fit = lm(Y ~ X + Dummy1 + Dummy2)
How can I extract the regression for a specific dummy variable?
I'm hoping to do something like the following to plot all the regressions:
plot(...)
abline(extracted.lm.dummy1)
abline(extracted.lm.dummy2)
I would look into the sjPlot package. Here is the documentation for sjp.lm, which can be used to visualize linear models in various ways. The package also has some nice tools for tabular summaries of models.
An example:
library(sjPlot)
library(dplyr)
# add a second categorical variable to the iris dataset
# then generate a linear model
set.seed(123)
fit <- iris %>%
mutate(Category = factor(sample(c("A", "B"), 150, replace = TRUE))) %>%
lm(Sepal.Length ~ Sepal.Width + Species + Category, data = .)
Different kinds of plot include:
Marginal effects plot, probably closest to what you want
sjp.lm(fit, type = "eff", vars = c("Category", "Species"))
"Forest plot" (beta coefficients + confidence interval)
sjp.lm(fit)

Add raw data points to jp.int (sjPlot)

For my manuscript, I plotted a lme with an interaction of two continuous variables:
Create data
mydata <- data.frame( SID=sample(1:150,400,replace=TRUE),age=sample(50:70,400,replace=TRUE), sex=sample(c("Male","Female"),200, replace=TRUE),time= seq(0.7, 6.2, length.out=400), Vol =rnorm(400),HCD =rnorm(400))
mydata$time <- as.numeric(mydata$time)
Run the model:
model <- lme(HCD ~ age*time+sex*time+Vol*time, random=~time|SID, data=mydata)
Make plot:
sjp.int(model, swap.pred=T, show.ci=T, mdrt.values="meansd")
The reviewer now wants me to add the raw data points to this plot. How can I do this? I tried adding geom_point() referring to mydata, but that is not possible.
Any ideas?
Update:
I thought that maybe I could extract the random slope of HCD and then residuals HCD for the covariates and also residuals Vol for the covariates and plot those two to make things easier (then I could plot the points in a 2D plot).
So, I tried to extract the slopes and use these to fit a linear regression, but the results are different (in the reproducible example less significant, but in my data: the interaction became non-significant (and was significant in the lme)). Not sure what that means or whether this just shows that I should not try to plot it this way.
get the slopes:
model <- lme(HCD ~ time, random=~time|SID, data=mydata)
slopes <- rbind(row.names(model$coefficients$random$SID), model$coef$random$SID[,2])
slopes2 <- data.frame(matrix(unlist(slopes), nrow=144, byrow=T))
names(slopes2)[1] <- "SID"
names(slopes2)[2] <- "slopes"
(save the slopes2 and reopen, because somehow R sees it as a factor)
Then create a cross-sectional dataframe and merge the slopes:
mydata$time2 <- round(mydata$time)
new <- reshape(mydata,idvar = "SID", timevar="time2", direction="wide")
newdata <- dplyr::left_join(new, slop, by="SID")
The lm:
modelw <- lm(slop$slopes ~ age.1+sex.1+Vol.1, data=newdata)
Vol now has a p-value of 0.8 (previously this was 0.14)

Overlay 2 allEffects graphs

I have the following model
require(effects)
fit<-lme(x ~ y, data, random= ~1|item)
plot(allEffects(fit)
fit2<-lme(x ~ y, data2, random = ~1|item)
plot(allEffects(fit2)
How can I plot fit and fit2 overlaying? I have tried the par(new=T), but it does not work. The graphs plot fine individually.
I'm not sure there's a very nice way to do this. I usually extract the information from the effects structure and plot it with ggplot (lattice would be possible too).
Here's an example:
library(effects)
library(nlme)
library(plyr) ## utilities
Fit a model to the first and second half of one of the standard example data sets:
fm1 <- lme(distance ~ age, random = ~1|Subject,
data = Orthodont[1:54,])
fm2 <- update(fm1, data = Orthodont[55:108,])
a1 <- allEffects(fm1)
a2 <- allEffects(fm2)
Extract the information from the efflist object. This is the part that isn't completely general ... the hard part is getting out the predictor variable.
as.data.frame.efflist <- function(x) {
ldply(x,
function(z) {
r <- with(z,data.frame(fit,
var=variables[[1]]$levels,
lower,upper))
return(plyr::rename(r,setNames(z$variables[[1]]$name,"var")))
})
}
For convenience, use ldply to put the results of both models together:
comb <- ldply(list(fm1=a1,fm2=a2),as.data.frame,.id="model")
Now plot:
library(ggplot2); theme_set(theme_bw())
ggplot(comb,aes(age,fit,
ymin=lower,ymax=upper,
colour=model,fill=model))+
geom_line()+
geom_ribbon(alpha=0.2,colour=NA)+
geom_rug(sides="b")
The rug plot component is a little silly here.

How to plot the survival curve generated by survreg (package survival of R)?

I’m trying to fit and plot a Weibull model to a survival data. The data has just one covariate, cohort, which runs from 2006 to 2010. So, any ideas on what to add to the two lines of code that follows to plot the survival curve of the cohort of 2010?
library(survival)
s <- Surv(subSetCdm$dur,subSetCdm$event)
sWei <- survreg(s ~ cohort,dist='weibull',data=subSetCdm)
Accomplishing the same with the Cox PH model is rather straightforward, with the following lines. The problem is that survfit() doesn’t accept objects of type survreg.
sCox <- coxph(s ~ cohort,data=subSetCdm)
cohort <- factor(c(2010),levels=2006:2010)
sfCox <- survfit(sCox,newdata=data.frame(cohort))
plot(sfCox,col='green')
Using the data lung (from the survival package), here is what I'm trying to accomplish.
#create a Surv object
s <- with(lung,Surv(time,status))
#plot kaplan-meier estimate, per sex
fKM <- survfit(s ~ sex,data=lung)
plot(fKM)
#plot Cox PH survival curves, per sex
sCox <- coxph(s ~ as.factor(sex),data=lung)
lines(survfit(sCox,newdata=data.frame(sex=1)),col='green')
lines(survfit(sCox,newdata=data.frame(sex=2)),col='green')
#plot weibull survival curves, per sex, DOES NOT RUN
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung)
lines(survfit(sWei,newdata=data.frame(sex=1)),col='red')
lines(survfit(sWei,newdata=data.frame(sex=2)),col='red')
Hope this helps and I haven't made some misleading mistake:
copied from above:
#create a Surv object
s <- with(lung,Surv(time,status))
#plot kaplan-meier estimate, per sex
fKM <- survfit(s ~ sex,data=lung)
plot(fKM)
#plot Cox PH survival curves, per sex
sCox <- coxph(s ~ as.factor(sex),data=lung)
lines(survfit(sCox,newdata=data.frame(sex=1)),col='green')
lines(survfit(sCox,newdata=data.frame(sex=2)),col='green')
for Weibull, use predict, re the comment from Vincent:
#plot weibull survival curves, per sex,
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung)
lines(predict(sWei, newdata=list(sex=1),type="quantile",p=seq(.01,.99,by=.01)),seq(.99,.01,by=-.01),col="red")
lines(predict(sWei, newdata=list(sex=2),type="quantile",p=seq(.01,.99,by=.01)),seq(.99,.01,by=-.01),col="red")
The trick here was reversing the quantile orders for plotting vs predicting. There is likely a better way to do this, but it works here. Good luck!
An alternative option is to make use of the package flexsurv. This offers some additional functionality over the survival package - including that the parametric regression function flexsurvreg() has a nice plot method which does what you ask.
Using lung as above;
#create a Surv object
s <- with(lung,Surv(time,status))
require(flexsurv)
sWei <- flexsurvreg(s ~ as.factor(sex),dist='weibull',data=lung)
sLno <- flexsurvreg(s ~ as.factor(sex),dist='lnorm',data=lung)
plot(sWei)
lines(sLno, col="blue")
You can plot on the cumulative hazard or hazard scale using the type argument, and add confidence intervals with the ci argument.
This is just a note clarifying Tim Riffe's answer, which uses the following code:
lines(predict(sWei, newdata=list(sex=1),type="quantile",p=seq(.01,.99,by=.01)),seq(.99,.01,by=-.01),col="red")
lines(predict(sWei, newdata=list(sex=2),type="quantile",p=seq(.01,.99,by=.01)),seq(.99,.01,by=-.01),col="red")
The reason for the two mirror-image sequences, seq(.01,.99,by=.01) and seq(.99,.01,by=-.01), is because the predict() method is giving quantiles for the event distribution f(t) - that is, values of the inverse CDF of f(t) - while a survival curve is plotting 1-(CDF of f) versus t. In other words, if you plot p versus predict(p), you'll get the CDF, and if you plot 1-p versus predict(p) you'll get the survival curve, which is 1-CDF. The following code is more transparent and generalizes to arbitrary vectors of p values:
pct <- seq(.01,.99,by=.01)
lines(predict(sWei, newdata=list(sex=1),type="quantile",p=pct),1-pct,col="red")
lines(predict(sWei, newdata=list(sex=2),type="quantile",p=pct),1-pct,col="red")
In case someone wants to add a Weibull distribution to the Kaplan-Meyer curve in the ggplot2 ecosystem, we can do the following:
library(survminer)
library(tidyr)
s <- with(lung,Surv(time,status))
fKM <- survfit(s ~ sex,data=lung)
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung)
pred.sex1 = predict(sWei, newdata=list(sex=1),type="quantile",p=seq(.01,.99,by=.01))
pred.sex2 = predict(sWei, newdata=list(sex=2),type="quantile",p=seq(.01,.99,by=.01))
df = data.frame(y=seq(.99,.01,by=-.01), sex1=pred.sex1, sex2=pred.sex2)
df_long = gather(df, key= "sex", value="time", -y)
p = ggsurvplot(fKM, data = lung, risk.table = T)
p$plot = p$plot + geom_line(data=df_long, aes(x=time, y=y, group=sex))
In case you'd like to use the survival function itself S(t) (instead of the inverse survival function S^{-1}(p) used in other answers here) I've written a function to implement that for the case of the Weibull distribution (following the same inputs as the pec::predictSurvProb family of functions:
survreg.predictSurvProb <- function(object, newdata, times){
shape <- 1/object$scale # also equals 1/exp(fit$icoef[2])
lps <- predict(object, newdata = newdata, type = "lp")
surv <- t(sapply(lps, function(lp){
sapply(times, function(t) 1 - pweibull(t, shape = shape, scale = exp(lp)))
}))
return(surv)
}
You can then do:
sWei <- survreg(s ~ as.factor(sex),dist='weibull',data=lung)
times <- seq(min(lung$time), max(lung$time), length.out = 1000)
new_dat <- data.frame(sex = c(1,2))
surv <- survreg.predictSurvProb(sWei, newdata = new_dat, times = times)
lines(times, surv[1, ],col='red')
lines(times, surv[2, ],col='red')

Resources