R lme4 Plot lmer residuals ~ fitted by Factors levels in ggplot - r

I would like to reproduce lmer diagnostic plots in ggplot2. Particularly, I know that for a lmer model DV ~ Factor1 * Factor2 + (1|SubjID) I can simply call plot(model, resid(.)~fitted(.)|Factor1+Factor2) to generate a lattice-based Residuals Vs. Fitted plot, faceted for each Factor1+Factor 2 combination.
I would like to generate the same plot, but using ggplot2. I tried using qplot(resid(model), fitted(model)) and different variations of that with different arguments, but the information about the factors needed for faceting is not coming through with this (and similar) calls.
I'd appreciate any advise to achieve this, thanks!
EDIT
The core of my question: given any lmer model, how can I create a data-frame including fitted and residual values AND the Factor information for each value? something like:
Factor1 Factor2 Fitted Resid
0 0 987 654
0 0 123 456
(...)
I could not figure that out from lmer documentation on the resid() and fitted() functions

Here is an example of what we want, a reproducible example
data(Orthodont, package="nlme")
Orthodont$age <- as.factor(Orthodont$age)
model <- lmer(distance ~ age * Sex + (1|Subject), Orthodont)
plot(model, resid(.) ~ fitted(.) | age + Sex )
answer
ggplot(model, aes(.fitted, .resid)) + geom_point() +
facet_wrap(~ Sex + age, ncol = 4) # (edited) I noticed fortify(model) isn't necessary.

Related

Different estimates between bam and gam model (mgcv) and interaction term estimates edf of 0

I am new to fitting gamm models and ran into two problems with my analysis.
I ran the same model using the gam and the bam function of the package mgcv. The models give me different estimates, and I don't understand why and how to choose which function to use. Can anyone explain to me why these functions give different estimates?
I am estimating a model including an interaction between age and condition (binomial factor with 2 conditions). For some reason one of the interaction terms (age:conditioncomputer or age:conditioncozmo) looks weird. It always gives a EDF and chi square of 0 and a p-value of 0.5, as if it was fixed to that. I tried using sum-to-zero and dummy contrasts, but that did not change the output. What is weird to me that there is a significant age effect, but this effect is not significant in neither condition. So I have the strong feeling that something is going wrong here.
Did anyone ever run into this before and can help me figure out if this is even a problem or normal, and how to solve it if it is a problem?
My model syntax is the following:
`bam(reciprocity ~ s(age,k=8) + condition + s(age, by = condition, k=8) + s(ID, bs="re") + s(class, bs="re") + s(school, bs="re"), data=df, family=binomial(link="logit"))`
This is the model output:
My df looks somewhat like this:
In short, I've used below code:
library(tidyverse)
library(psych)
library(mgcv)
library(ggplot2)
library(cowplot)
library(patchwork)
library(rstatix)
library(car)
library(yarrr)
library(itsadug)
df <- read.csv("/Users/lucaleisten/Desktop/Private/Master/Major_project/Data/test/test.csv", sep=",")
df$ID <- as.factor(as.character(df$ID))
df$condition <- as.factor(df$condition)
df$school <- as.factor(df$school)
df$class <- as.factor(df$class)
df$reciprocity <- as.factor(as.character(df$reciprocity))
summary(df)
model_reciprocity <- bam(reciprocity ~ s(age,k=7) +condition + s(age, by = condition, k=7) + s(ID, bs="re") + s(class, bs="re") + s(school, bs="re"), data=df, family=binomial(link="logit"))
summary(model_reciprocity)

Which R package can I use to visualize mixed effect model coefficient

I am working on mixed effect model using lmer function in R. My response variable is Productivity which is continues variable and I try to find the effect of 5 predictors on productivity(SR, NRI, CWM_H, and CWM_Chl and FDispersion) all the predictors are continues variables. I want to visualize the model result using coefficient plot in the same way as you can see on the image. (using different colors for positive and negative predictors).The data collected in 2017 and 2018 from 32 plots (sampled from 32 plots) repeated measure, so that I used the four predictors mentioned above as fixed effect and the plot as random effect in the model. Which package I should use to visualize the coefficient? Any help would be greatly appreciated!
Formula I have used
mixed_model<- lmer(Productivity_log ~ SpR + NRI + CWM_Height + CWM_Chlorophyl +
FDispersion + (1 | Plot), Data= datsc,REML=TRUE)
<img src="http://example.com/img.jpg">
<img src="http://file:///C:/Users/Gossaye/Desktop/CWM/png.jpg">
sjPlot::plot_model() does exactly this. A quick demo:
library(lme4)
library(sjPlot)
data(msleep, package = "ggplot2")
mixed_model <- lmer(
sleep_total ~ brainwt + bodywt + factor(vore) + (1 | order),
data = msleep
)
plot_model(mixed_model)

R plot combined levels of a factor (ggpredict)

I am using the function ggpredict to display a lmer model's result.
The model has a continuous X (RT), one continuous Y (RC1) and 4 discrete factors (2x2x2x14).
Model:
SailorJupiter <- lmer(RT~RC1*m2*m3*m5*m4 + (1|Trial:sonTrial) + (1|Subject) + (1|Trial) + (1|sonleft) + (1|sonright), data=audiostim, REML=FALSE)
library(see)
library(ggeffects)
a <- ggpredict(SailorJupiter, c("RC1","m2","m3","m4","m5"), dependencies=TRUE)
plot(a)
Example of plot without the 14-levels factor because it's too big
Question 1:
I'd like to have results with groups being a combination of m3 and m4 in order to simplify the graphs. I tried :
a <- ggpredict(SailorJupiter, c("RC1","m2","m3:m4","m5"), dependencies=TRUE)
plot(a)
But it doesn't work.
Question 2: Is there a way to use only one level of a factor in order to simplify the plot ? I know some other plotting packages allow it, but can't find it in ggpredict().

Predicting probability of disease according to a continuous variable adjusting by confusing variables

I have a doubt regarding to the R package "margins". I'm estimating a logistic model:
modelo1 <- glm(VD ~ VE12 + VE.cont + VE12:VE.cont + VC1 + VC2 + VC3 + VC4, family="binomial", data=data)
Where:
VD2 is a dichotomous variable (1 disease / 0 not disease)
VE12 is a dichotomous exposure variable (with values 0 an 1)
VE.cont a continuous exposure variable
VCx (the rest of variables) are confounding variables.
My objective is to obtain predicted probability of disease (VD2) for a vector of values of VE.cont and for each VE12 group, but adjusting by VCx variables. In other words, I would like to obtain the dose-response line between VD2 and VE.cont by VE12 group but assuming the same distribution of VCx for each dose-response line (i.e. without confounding).
Following the nomenclature of this article (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4052139/) I think that I should do a "marginal standardisation" (method 1) that can be done with stata, but I'm not sure how can I do it with R.
I'm using this syntax (with R):
cdat0 <- cplot(modelo1, x="VE.cont", what="prediction", data = data[data[["VE12"]] == 0,], draw=T, ylim=c(0,0.3))
cdat1 <- cplot(modelo1, x="VE.cont", what="prediction", data = data[data[["VE12"]] == 1,], draw=marg"add", col="blue")
but I'm not sure if I'm doing it right because this approach gives similar results as using the model without confounding variables and the function predict.glm.
modelo0 <- glm(VD2 ~ VE12 + VE.cont + VE12:VE.cont, family="binomial", data=data)
Perhaps, I should use the margins option but I don't understand the results because the values obtained in the column VE.cont are not in the probability scale (between 0 and 1).
x <- c(1,2,3,4,5)
margins::margins(modelo1, at=list("VE.cont"=x, "VE12"=c(0,1)), type="response")
This is an example of figure that I would like to obtain:

Plotting a multiple logistic regression for binary and continuous values in R

I have a data frame of mammal genera. Each row of the column is a different genus. There are three columns: a column of each genus's geographic range size (a continuous variable), a column stating whether or not a genus is found inside or outside of river basins (a binary variable), and a column stating whether the genus is found in the fossil record (a binary variable).
I have performed a multiple logistic regression to see if geographic range size and presence in/out of basins is a predictor of presence in the fossil record using the following R code.
Regression<-glm(df[ ,"FossilRecord"] ~ log(df[ ,"Geographic Range"]) + df[ ,"Basin"], family="binomial")
I am trying to find a way to visually summarize the output of this regression (other than a table of the regression summary).
I know how to do this for a single variable regression. For example, I could use a plot like if I wanted to see the relationship between just geographic range size and presence in the fossil record.
However, I do not know how to make a similar or equivalent plot when there are two independent variables, and one of them is binary. What are some plotting and data visualization techniques I could use in this case?
Thanks for the help!
Visualization is important and yet it can be very hard. With your example, I would recommend plotting one line for predicted FossilRecord versus GeographicRange for each level of your categorical covariate (Basin). Here's an example of how to do it with the ggplot2 package
##generating data
ssize <- 100
set.seed(12345)
dat <- data.frame(
Basin = rbinom(ssize, 1,.4),
GeographicRange = rnorm(ssize,10,2)
)
dat$FossilRecord = rbinom(ssize,1,(.3 + .1*dat$Basin + 0.04*dat$GeographicRange))
##fitting model
fit <- glm(FossilRecord ~ Basin + GeographicRange, family=binomial(), data=dat)
We can use the predict() function to obtain predicted response values for many GeographicRange values and for each Basin category.
##getting predicted response from model
plotting_dfm <- expand.grid(GeographicRange = seq(from=0, to = 20, by=0.1),
Basin = (0:1))
plotting_dfm$preds <- plogis( predict(fit , newdata=plotting_dfm))
Now you can plot the predicted results:
##plotting the predicted response on the two covariates
library(ggplot2)
pl <- ggplot(plotting_dfm, aes(x=GeographicRange, y =preds, color=as.factor(Basin)))
pl +
geom_point( ) +
ggtitle("Predicted FossilRecord by GeoRange and Basin") +
ggplot2::ylab("Predicted FossilRecord")
This will produce a figure like this:
You can plot a separate curve for each value of the categorical variable. You didn't provide sample data, so here's an example with another data set:
library(ggplot2)
# Data
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
# Model. gre is continuous. rank has four categories.
m1 = glm(admit ~ gre + rank, family=binomial, data=mydata)
# Predict admit probability
newdata = expand.grid(gre=seq(200,800, length.out=100), rank=1:4)
newdata$prob = predict(m1, newdata, type="response")
ggplot(newdata, aes(gre, prob, color=factor(rank), group=rank)) +
geom_line()
UPDATE: To respond to #Provisional.Modulation's comment: There are lots of options, depending on what you want to highlight and what is visually clear enough to understand, given your particular data and model output.
Here's an example using the built-in mtcars data frame and a logistic regression with one categorical and two continuous predictor variables:
m1 = glm(vs ~ cyl + mpg + hp, data=mtcars, family=binomial)
Now we create a new data frame with the unique values of cyl, five quantiles of hp and a continuous sequence of mpg, which we'll put on the x-axis (you could also of course do quantiles of mpg and use hp as the x-axis variable). If you have many continuous variables, you may need to set some of them to a single value, say, the median, when you graph the relationships between other variables.
newdata = with(mtcars, expand.grid(cyl=unique(cyl),
mpg=seq(min(mpg),max(mpg),length=20),
hp = quantile(hp)))
newdata$prob = predict(m1, newdata, type="response")
Here are three potential graphs, with varying degrees of legibility.
ggplot(newdata, aes(mpg, prob, colour=factor(cyl))) +
geom_line() +
facet_grid(. ~ hp)
ggplot(newdata, aes(mpg, prob, colour=factor(hp), linetype=factor(cyl))) +
geom_line()
ggplot(newdata, aes(mpg, prob, colour=factor(hp))) +
geom_line() +
facet_grid(. ~ cyl)
And here's another approach using geom_tile to include two continuous dimensions in each plot panel.
newdata = with(mtcars, expand.grid(cyl=unique(cyl),
mpg=seq(min(mpg),max(mpg),length=100),
hp =seq(min(hp),max(hp),length=100)))
newdata$prob = predict(m1, newdata, type="response")
ggplot(newdata, aes(mpg, hp, fill=prob)) +
geom_tile() +
facet_grid(. ~ cyl) +
scale_fill_gradient2(low="red",mid="yellow",high="blue",midpoint=0.5,
limits=c(0,1))
If you're looking for a canned solution, the visreg package might work for you.
An example using #eipi10 's data
library(visreg)
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
m1 = glm(admit ~ gre + rank, family=binomial, data=mydata)
visreg(m1, "admit", by = "rank")
Many more options described in documentation.

Resources