How to plot multiple glmer models into one single plot? - r

I have two glmer models with two covariates each that I'm trying to plot into a single figure.
MWE:
## generalized linear mixed model
library(lattice)
cbpp$response <- sample(c(0,1), replace=TRUE, size=nrow(cbpp))
gm1 <- glmer(response ~ size + incidence + (1 | herd),
data = cbpp, family = binomial)
cbpp$obs <- 1:nrow(cbpp)
gm2 <- glmer(response ~ size + incidence + (1 | herd) + (1|obs),
family = binomial, data = cbpp)
I am trying to plot the predicted values againts each covariate for each model. I found the sjPlot library and the plot_model function, which can plot these predictions when using type = "pred". Calling this function individually on each model works perfect and yields two separate figures like this for each model:
However I'm not familiar with R and I am having a hard time trying to plot the 4 plots on the same figure.
The plot_model function has a grid parameter, which only works for models with a Poisson distirbution. For gm1 and gm2, I am getting the following error when I call plot_model(gm1, type = "pred", grid = TRUE):
Error in if (attr(x, "logistic", exact = TRUE) == "1" && attr(x, "is.trial", : missing value where TRUE/FALSE needed
Anyway, I would not be able to plot the three models in one figure using this so I tried three different approaches. First, I saw the plot_models function, which takes multiple models as input. When I try to pass the two models as arguments, calling plot_models(gm1, gm2) I get the following error:
Error: $ operator not defined for this S4 class
Second, I tried using the par function setting the mfrow and then calling plot_model again without success. I don't get any error but the plots keep showing as individual figures.
Third, I tried using the gridExtra library. Calling
p1 <- plot_model(gm1, type = "pred")
p2 <- plot_model(gm2, type = "pred")
grid.arrange(p1, p2)
results in the following error:
Error in gList(list(ppt = list(data = list(x = c(-2, -1, 0, 1, 2, 3, 4, : only 'grobs' allowed in "gList"
Does anyone have an insight on this?
EDIT

This seems to work:
pp1 <- plot_model(gm1,type="pred")
pp2 <- plot_model(gm2,type="pred")
plot_grid(c(pp1,pp2))

Related

Getting an interaction plot from a pooled lme model with mids object

Preface - I really hope this makes sense!
I ran a linear-mixed effect model using an imputed dataset (FYI, the data is a mids object imputed using mice). The model has a three-way interaction with 3 continuous variables. I am now trying to plot the interaction using the interactions::interact_plot function. However, I'm receiving an error when I run the plot code, which I believe is due to the fact that the model came from a mids object and not a data frame. Does anyone know how to address this error or if there's a better way to get the plot that I'm trying to get?
Thanks very much in advance!
MIDmod1 <- with(data = df.mids, exp = lmer(GC ~ Age + Sex + Edu + Stress*Time*HLI + (1|ID)))
summary(pool(MIDmod1))
interact_plot(
model=MIDmod1,
pred = Time,
modx=Stress,
mod2=HLI,
data = df.mids,
interval=TRUE,
y.label='Global cognition composite score',
modx.labels=c('Low Baseline Stress (-1SD)','Moderate Baseline Stress (Mean)', 'High Baseline Stress (+1SD)'),
mod2.labels=c('Low HLI (-1SD)', 'Moderate HLI (Mean)', 'High HLI (+1SD)'),
legend.main='') + ylim(-2,2)
Error:
Error in rep(1, times = nrow(data)) : invalid 'times' argument
Note - I also get an error if I don't include the data argument (optional argument for this function).
Error in formula.default(object, env = baseenv()) : invalid formula
BTW - I am able to generate the plot when the model comes from a data frame - an example of what this should look like is included here: 1
Sorry, but it won’t be that easy. Multiple imputation object will definitely require special treatment, and none of the many R packages which can plot interactions are likely to work out of hte box.
Here’s a minimal example, adapted from the multiple imputation vignette of the marginaleffects package. (Disclaimer: I am the author.)
library(mice)
library(lme4)
library(ggplot2)
library(marginaleffects)
# insert missing data in an existing dataset and impute
iris_miss <- iris
iris_miss$Sepal.Width[sample(1:nrow(iris), 20)] <- NA
iris_mice <- mice(iris_miss, m = 20, printFlag = FALSE, .Random.seed = 1024)
iris_mice <- complete(iris_mice, "all")
# fit a model on 1 imputed datatset and use the `plot_predictions()` function
# with the `draw=FALSE` argument to extract the data that we want to plot
fit <- function(dat) {
mod <- lmer(Sepal.Width ~ Petal.Width * Petal.Length + (1 | Species), data = dat)
out <- plot_predictions(mod, condition = list("Petal.Width", "Petal.Length" = "threenum"), draw = FALSE)
# `mice` requires a unique row identifier called "term"
out$term <- out$rowid
class(out) <- c("custom", class(out))
return(out)
}
# `tidy.custom()` is needed by `mice` to combine datasets, but the output of fit() also has
# the right structure and column names, so it is useless
tidy.custom <- function(x, ...) return(x)
# Fit on each imputation
mod_mice <- lapply(iris_mice, fit)
# Pool
mod_pool <- pool(mod_mice)$pooled
# Merge back some of the covariates
datplot <- data.frame(mod_pool, mod_mice[[1]][, c("Petal.Width", "Petal.Length")])
# Plot
ggplot(datplot, aes(Petal.Width, estimate, color = Petal.Length)) +
geom_line() +
theme_minimal()

Calculating piecewise quantile linear regression with segmented package R

I am looking for a way to obtain the piecewise quantile linear regression with R. I have been able to compute the Quantile regression with the package quantreg. However, I don't want just 1 unique slope but want to check for breakpoints in my dataset. I have seen that the segmented package can do so. While it works good if the fit is carried out with lm or glm (as shown below in an example), it doesn't manage to work for quantile.
On the segmented package info I have read that there is a segmented.default which can be used for specific regression models, such as Quantiles. However, when I apply it for my quantile outcome it gives me the following errors:
Error in diag(vv) : invalid 'nrow' value (too large or NA)
In addition: Warning message:
cannot compute the covariance matrix
If instead of using K=2 I use for example psi I get other type of errors:
Error in rq.fit.br(x, y, tau = tau, ...) : Singular design matrix
I have created an example with the mtcars data so you can see the errors that I get.
library(quantreg)
library(segmented)
data(mtcars)
out.rq <- rq(mpg ~ wt, data= mtcars)
out.lm <- lm(mpg ~ wt, data= mtcars)
# Plotting the results
plot(mpg ~ wt, data = mtcars, pch = 1, main = "mpg ~ wt")
abline(out.lm, col = "red", lty = 2)
abline(out.rq, col = "blue", lty = 2)
legend("topright", legend = c("linear", "quantile"), col = c("red", "blue"), lty = 2)
#Generating segmented LM
o <- segmented(out.lm, seg.Z= ~wt, npsi=2, control=seg.control(display=FALSE))
plot(o, lwd=2, col=2:6, main="Segmented regression", res=FALSE) #lwd: line width #col: from 2 to 6 #RES: show datapoints
#Generating segmented Quantile
#using K=2
o.quantile <- segmented.default(out.rq, seg.Z= ~wt, control=seg.control(display=FALSE, K=2))
# using psi
o.quantile <- segmented.default(out.rq, seg.Z= ~wt, psi=list(wt=c(2,4)), control=seg.control(display=FALSE))
I came across this post after a long time because I have the same issue. Just in case others might be stuck with the problem in the future, I wanted to point out what the problem is.
I examined "segmented.default". There is a line in the source code as follows:
Cov <- try(vcov(objF), silent = TRUE)
vcov is used to calculate the covariance matrix but does not work for quantile regression object objF. To get the covariance matrix for quantile regression, you need:
summary(objF,se="boot",cov=TRUE)$cov
Here, I used bootstrap method to compute the covariance matrix by selecting se="boot" but you should choose the appropriate method for you. Check ?summary.rq then "se" section for different methods.
Additionally, you need to assign the row/column names as follows:
dimnames(Cov)[[1]] <- dimnames(Cov)[[2]] <- unlist(attributes(objF$coef))
After modifying the function, it worked for me.
Maybe the other answer isn't particularly clean, as you need to modify a package function.
Additionally, maybe boot isn't such a good idea for SEs, according to this answer.
To get it working a bit easier, add a function to your workspace:
vcov.rq <- function(object, ...) {
result = summary(object, se = "nid", covariance = TRUE)$cov
rownames(result) = colnames(result) = names(coef(object))
return(result)
}
Caveats from the Cross-Validated link apply.

How to plot a SVM model in R [duplicate]

I am trying to plot my svm model.
library(foreign)
library(e1071)
x <- read.arff("contact-lenses.arff")
#alt: x <- read.arff("http://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/contact-lenses.arff")
model <- svm(`contact-lenses` ~ . , data = x, type = "C-classification", kernel = "linear")
The contact lens arff is the inbuilt data file in weka.
However, now i run into an error trying to plot the model.
plot(model, x)
Error in plot.svm(model, x) : missing formula.
The problem is that in in your model, you have multiple covariates. The plot() will only run automatically if your data= argument has exactly three columns (one of which is a response). For example, in the ?plot.svm help page, you can call
data(cats, package = "MASS")
m1 <- svm(Sex~., data = cats)
plot(m1, cats)
So since you can only show two dimensions on a plot, you need to specify what you want to use for x and y when you have more than one to choose from
cplus<-cats
cplus$Oth<-rnorm(nrow(cplus))
m2 <- svm(Sex~., data = cplus)
plot(m2, cplus) #error
plot(m2, cplus, Bwt~Hwt) #Ok
plot(m2, cplus, Hwt~Oth) #Ok
So that's why you're getting the "Missing Formula" error.
There is another catch as well. The plot.svm will only plot continuous variables along the x and y axes. The contact-lenses data.frame has only categorical variables. The plot.svm function simply does not support this as far as I can tell. You'll have to decide how you want to summarize that information in your own visualization.

How to directly plot ROC of h2o model object in R

My apologies if I'm missing something obvious. I've been thoroughly enjoying working with h2o in the last few days using R interface. I would like to evaluate my model, say a random forest, by plotting an ROC. The documentation seems to suggest that there is a straightforward way to do that:
Interpreting a DRF Model
By default, the following output displays:
Model parameters (hidden)
A graph of the scoring history (number of trees vs. training MSE)
A graph of the ROC curve (TPR vs. FPR)
A graph of the variable importances
...
I've also seen that in python you can apply roc function here. But I can't seem to be able to find the way to do the same in R interface. Currently I'm extracting predictions from the model using h2o.cross_validation_holdout_predictions and then use pROC package from R to plot the ROC. But I would like to be able to do it directly from the H2O model object, or, perhaps, a H2OModelMetrics object.
Many thanks!
A naive solution is to use plot() generic function to plot a H2OMetrics object:
logit_fit <- h2o.glm(colnames(training)[-1],'y',training_frame =
training.hex,validation_frame=validation.hex,family = 'binomial')
plot(h2o.performance(logit_fit),valid=T),type='roc')
This will give us a plot:
But it is hard to customize, especially to change the line type, since the type parameter is already taken as 'roc'. Also I have not found a way to plot multiple models' ROC curves together on one plot. I have come up with a method to extract true positive rate and false positive rate from the H2OMetrics object and use ggplot2 to plot the ROC curves on one plot by myself. Here is the example code(uses a lot of tidyverse syntax):
# for example I have 4 H2OModels
list(logit_fit,dt_fit,rf_fit,xgb_fit) %>%
# map a function to each element in the list
map(function(x) x %>% h2o.performance(valid=T) %>%
# from all these 'paths' in the object
.#metrics %>% .$thresholds_and_metric_scores %>%
# extracting true positive rate and false positive rate
.[c('tpr','fpr')] %>%
# add (0,0) and (1,1) for the start and end point of ROC curve
add_row(tpr=0,fpr=0,.before=T) %>%
add_row(tpr=0,fpr=0,.before=F)) %>%
# add a column of model name for future grouping in ggplot2
map2(c('Logistic Regression','Decision Tree','Random Forest','Gradient Boosting'),
function(x,y) x %>% add_column(model=y)) %>%
# reduce four data.frame to one
reduce(rbind) %>%
# plot fpr and tpr, map model to color as grouping
ggplot(aes(fpr,tpr,col=model))+
geom_line()+
geom_segment(aes(x=0,y=0,xend = 1, yend = 1),linetype = 2,col='grey')+
xlab('False Positive Rate')+
ylab('True Positive Rate')+
ggtitle('ROC Curve for Four Models')
Then the ROC curve is:
you can get the roc curve by passing the model performance metrics to H2O's plot function.
shortened code snippet which assumes you created a model, call it glm, and split your dataset into train and validation sets:
perf <- h2o.performance(glm, newdata = validation)
h2o.plot(perf)
full code snippet below:
h2o.init()
# Run GLM of CAPSULE ~ AGE + RACE + PSA + DCAPS
prostatePath = system.file("extdata", "prostate.csv", package = "h2o")
prostate.hex = h2o.importFile(path = prostatePath, destination_frame = "prostate.hex")
glm = h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), training_frame = prostate.hex, family = "binomial", nfolds = 0, alpha = 0.5, lambda_search = FALSE)
perf <- h2o.performance(glm, newdata = prostate.hex)
h2o.plot(perf)
and this will produce the following:
There is not currently a function in H2O R or Python client to plot the ROC curve directly. The roc method in Python returns the data neccessary to plot the ROC curve, but does not plot the curve itself. ROC curve plotting directly from R and Python seems like a useful thing to add, so I've created a JIRA ticket for it here: https://0xdata.atlassian.net/browse/PUBDEV-4449
The reference to the ROC curve in the docs refers to the H2O Flow GUI, which will automatically plot a ROC curve for any binary classification model in your H2O cluster. All the other items in that list are in fact available directly in R and Python, however.
If you train a model in R, you can visit the Flow interface (e.g. localhost:54321) and click on a binomial model to see it's ROC curves (training, validation and cross-validated versions). It will look like this:
Building off #Lauren's example, after you run model.performance you can extract all necessary information for ggplot from perf#metrics$thresholds_and_metric_scores. This code produces the ROC curve, but you can also add precision, recall to the selected variables for plotting the PR curve.
Here is some example code using the same model as above.
library(h2o)
library(dplyr)
library(ggplot2)
h2o.init()
# Run GLM of CAPSULE ~ AGE + RACE + PSA + DCAPS
prostatePath <- system.file("extdata", "prostate.csv", package = "h2o")
prostate.hex <- h2o.importFile(
path = prostatePath,
destination_frame = "prostate.hex"
)
glm <- h2o.glm(
y = "CAPSULE",
x = c("AGE", "RACE", "PSA", "DCAPS"),
training_frame = prostate.hex,
family = "binomial",
nfolds = 0,
alpha = 0.5,
lambda_search = FALSE
)
# Model performance
perf <- h2o.performance(glm, newdata = prostate.hex)
# Extract info for ROC curve
curve_dat <- data.frame(perf#metrics$thresholds_and_metric_scores) %>%
select(c(tpr, fpr))
# Plot ROC curve
ggplot(curve_dat, aes(x = fpr, y = tpr)) +
geom_point() +
geom_line() +
geom_segment(
aes(x = 0, y = 0, xend = 1, yend = 1),
linetype = "dotted",
color = "grey50"
) +
xlab("False Positive Rate") +
ylab("True Positive Rate") +
ggtitle("ROC Curve") +
theme_bw()
Which produces this plot:
roc_plot

How to get only the plots from gam.check

When applying gam.check in the mgcv package, R produces some residual plots and basis dimension output. Is there a way to only produce the plots and not the printed output?
library(mgcv)
set.seed(0)
dat <- gamSim(1,n=200)
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3), data=dat)
plot(b, pages=1)
gam.check(b, pch=19, cex=.3)
There are four plots, from top left, moving down and across we have:
A QQ plot of the residuals
A histogram of the residuals
A plot of residuals vs the linear predictor
A plot of observed vs fitted values.
In the code below, I assume b contains your fitted model, as per your example. First some things we need
type <- "deviance" ## "pearson" & "response" are other valid choices
resid <- residuals(b, type = type)
linpred <- napredict(b$na.action, b$linear.predictors)
observed.y <- napredict(b$na.action, b$y)
Note the last two lines are applying the NA handling method used when the model was fitted to the information on the linear.predictors and y, the stored copy of the response data.
The above code and that shown below is all given in the first 10 or so lines of the gam.check() source. To view this, just enter
gam.check
at the R prompt.
Each plot is produced as follows:
QQ plot
This is produced via qq.gam():
qq.gam(b, rep = 0, level = 0.9, type = type, rl.col = 2,
rep.col = "gray80")
Histogram of residuals
This is produced using
hist(resid, xlab = "Residuals", main = "Histogram of residuals")
Residuals vs linear predictor
This is produced using
plot(linpred, resid, main = "Resids vs. linear pred.",
xlab = "linear predictor", ylab = "residuals")
Observed vs fitted values
This is produced using
plot(fitted(b), observed.y, xlab = "Fitted Values",
ylab = "Response", main = "Response vs. Fitted Values")
There are now the two packages gratia and mgcViz which have functions to produce the gam.check output as ggplots which you can store as an object. The former doesn't print anything to console, the latter does.
require(gratia)
appraise(b)
require(mgcViz)
b = getViz(b)
check(b)

Resources