Get sjPlot in R to show and sort estimates - r

I am trying to make an interaction plot in sjPlot showing percent probabiliites of my outcome under two conditions of my predictive variable. Everything works perfectly, except the show.values = T and sort.est = T arguments, which don't seem to do anything. Is there a way to get this to work? Or, if not, how can I extract the dataframe sjPlot is using to create this figure? Looking for some way to either label or tabulate the displayed probability values. Thank you!
Here is some example data and what I have so far:
set.seed(100)
dat <- data.frame(Species = rep(letters[1:10], each = 5),
threat_cat = rep(c("recreation", "climate", "pollution", "fire", "invasive_spp"), 10),
impact.pres = sample(0:1, size = 50, replace = T),
threat.pres = sample(0:1, size = 50, replace = T))
mod <- glm(impact.pres ~ 0 + threat_cat/threat.pres,
data = dat, family = "binomial")
library(sjPlot)
library(ggpubr)
plot_model(mod, type = "int",
title = "",
axis.title = c("Threat category", "Predicted probabilities of threat being observed"),
legend.title = "Threat predicted",
colors = c("#f2bf10",
"#4445ad"),
line.size = 2,
dot.size = 4,
sort.est = T,
show.values = T)+
coord_flip()+
theme_pubr(legend = "right", base_size = 30)

sjPlot produces a ggplot object, so you can examine the aesthetic mappings and underlying data. After a bit of digging around you will find the default mapping is already correct for the x, y placements of text labels, so all you need to do is add a geom_text to the plot, and only need to specify the labels as an aesthetic mapping. You can get the labels from a column called predicted stored in the ggplot object.
The upshot is that if you add the following layer to your plot:
geom_text(aes(label = scales::percent(predicted)),
position = position_dodge(width = 1), size = 8)
You get
Getting the labels in order is trickier. You have to fiddle with the internal components of the plot to do this. Suppose we store the above plot as p, then we can sort by the predicted percentages by doing:
p$data <- as.data.frame(p$data)
ord <- p$data$x[p$data$group == 1][order(p$data$predicted[p$data$group == 1])]
p$data$x <- match(p$data$x, ord)
p$scales$scales[[1]]$labels <- p$scales$scales[[1]]$labels[ord]
p

Related

How to replicate correlation plot with greyscale coefficients in the lower half and circles in upper half?

I'm looking to replicate this correlation plot, or at least get as close as possible to it.
Specifically, I want:
the correlation values in the lower half, with values varying on a greyscale based on absolute value
the circles in the top half, with varying diameter and on the colour scale.
I want to be able to edit the axis scale labels so that full descriptions are on the y-axis, and numeric references on the x-axis
I have gotten relatively close, but have not managed precise enough replication. I describe my closest attempts below with reproducible code. The corrplot package has gotten me closest.
# general preparation
library(car)
correlations = cor(mtcars)
corrplot package
library(corrplot)
corrplot.mixed(correlations,
upper = "number", #upper.col = ???
lower = "circle", #lower.col = ???
tl.pos = "lt", tl.col = "black", tl.cex = 0.5)
Notes:
there is a way to make the coefficients in greyscale, but I don't understand it: https://rdrr.io/cran/corrplot/man/COL1.html
For some bizarre reason, when I use my own data (as opposed to mtcar), the coefficient colours don't match with the actual correlation values. I cannot give a reproducible code example here, because it works fine with the mtcar data.
cormat package
source("http://www.sthda.com/upload/rquery_cormat.r")
rquery.cormat(mtcar)
ggcorrplot
library("ggcorrplot")
# circles separate
ggcorrplot(correlations, # correlation matrix
method = "circle", # circles instead of squares
type = "upper", # show only upped triangle
show.diag = F, # don't show diagonal values (1)
lab = F, # don't show cor coeffs
outline.col = "white", # no outline of circles
ggtheme = theme_bw, # theme
colors = c("#440154FF","#238A8DFF","#FDE725FF"))
# coefs separate
ggcorrplot(correlations, # correlation matrix
method = "circle", # circles instead of squares
type = "upper", # show only upped triangle
show.diag = F, # don't show diagonal values (1)
lab = T, # don't show cor coeffs
outline.col = NA, # don't show circles
ggtheme = theme_bw, # theme
colors = c("#440154FF","#238A8DFF","#FDE725FF"))
# can't combine both plots?
corrgram package
library(corrgram)
corrgram(correlations,
labels = indices_all,
lower.panel = "panel.fill",
upper.panel = "panel.cor")
Some other notes:
It seems the halves of the plots tend to run via the opposite diagonal than in the example plot, but I guess that's not a big concern.
Out-of-the-box options are quick and nice. However, when it comes to customizing then IMHO it may be worthwhile to build up the plot from scratch using ggplot2. As a first step this involves some data wrangling to get you correlation matrix into the right shape. Also in this step I convert the categories to factors and a numeric id. Based on the ids I split the data in the upper and lower diagonal values which could then be plotted separately using a geom_point and a geom_text. Besides that it's important to add the drop=FALSE to the x and y scale to keep all factor levels and the right order. Also I use some functions to get the desired axis labels:
EDIT: Following the suggestion by #AllanCameron I added a coord_equal as the "final" touch to get a nice square matrix like look. And Thanks to #RichtieSacramento the code now maps the absolute value on the size aes.
library(dplyr)
library(tidyr)
library(ggplot2)
correlations = cor(mtcars)
levels <- colnames(mtcars)
corr_long <- correlations %>%
data.frame() %>%
mutate(row = factor(rownames(.), levels = levels),
rowid = as.numeric(row)) %>%
pivot_longer(-c(row, rowid), names_to = "col") %>%
mutate(col = factor(col, levels = levels),
colid = as.numeric(col))
ggplot(corr_long, aes(col, row)) +
geom_point(aes(size = abs(value), fill = value),
data = ~filter(.x, rowid > colid), shape = 21) +
geom_text(aes(label = scales::number(value, accuracy = .01), color = abs(value)),
data = ~filter(.x, rowid < colid), size = 8 / .pt) +
scale_x_discrete(labels = ~ attr(.x, "pos"), drop = FALSE) +
scale_y_discrete(labels = ~ paste0(.x, " (", attr(.x, "pos"), ")"), drop = FALSE) +
scale_fill_viridis_c(limits = c(-1, 1)) +
scale_color_gradient(low = grey(.8), high = grey(.2)) +
coord_equal() +
guides(size = "none", color = "none") +
theme(legend.position = "bottom",
panel.grid = element_blank(),
axis.ticks = element_blank()) +
labs(x = NULL, y = NULL, fill = NULL)

create faceted plots 2-way interaction in R using 'interactions' package

I am using the cat_plot function from the 'interactions' package in R (which is a wrapper for ggplot) to plot a 2-way interaction with 2 categorical variables. I can do this easily using the code below (reprex from the "diamonds" dataset)
require(interactions)
data("diamonds")
m <- glm(price ~ cut*color, data = diamonds)
cat_plot(m, pred = cut, modx = color, geom = "bar", colors = "Set1")
This produces the following graph
However, what I would like is to have a faceted graph with each of the cuts presented separately, to make it visually easier to interpret. This can be done for 3-way interactions using the facet.modx = TRUE command, but when I try this with only a 2-way interaction with cat_plot(m, pred = cut, modx = color, geom = "bar", colors = "Set1", facet.modx = TRUE) I get the following error
Error in prep_data(model = model, pred = pred, modx = modx, pred.values = pred.values, :
formal argument "facet.modx" matched by multiple actual arguments
Is there a way to easily facet the graph for 2 way interactions? My real-life dataset is actually a glmer model so I would prefer to stay within the "interactions" package if possible.
EDIT: based on the suggestion from #stefan I tried the following syntax cat_plot(m, pred = cut, modx = color, geom = "bar", colors = "Set1") + facet_wrap(~cut) which produced the graph below. This is almost exactly what I want, except it has seemed to keep the other 'cuts' on the x-axis and just removed the bars. Ideally, colours would be on the x-axis instead.
EDIT 2:
I have recreated the problem using data which is more similar to what I am actually working with, with a binary outcome, random effects from glmer etc.
require(lme4)
require(interactions)
set.seed(123)
id <- rep(1:150, each = 4)
condition <- rep(c("a", "b", "c"), each = 4, times = 50)
cat_mod <- rep(c("cat_1", "cat_2", "cat_3", "cat_4"), each = 1, length.out = 600)
control_mod <- rep(c("control_1", "control_2"), each = 4, length.out = 600)
binary_choice <- rbinom(600, 1, 0.5)
simdat <- data.frame(id, condition, cat_mod, binary_choice, control_mod)
m <- glmer(binary_choice ~ condition*cat_mod + control_mod + (1 | id), family=binomial, data = simdat)
cat_plot(m, pred = condition, modx = cat_mod, geom = "bar", colors = "Set1")
I would like to preserve the response scale on the y-axis, and the model accounting for the random intercept, which is why I was trying to avoid using ggplot directly, as the interactions package is already built to accommodate glmms, which is super convenient.
SOLVED
Following the suggestion from #RStam I modified the code slightly so that all y-axes had the same scale, and removed the duplicate facet labels at the bottom.
cat_plot(m, pred = condition, modx = cat_mod, geom = "bar", colors = "Set1") +
scale_x_discrete(labels = c(a = " ", b = " ", c = " ")) +
facet_wrap(condition~., scales= "free_x")
This was the final result
Original Answer
cat_plot(m, pred = cut, modx = color, geom = "bar", colors = "Set1") +
facet_wrap(~cut, scales = "free_x")
Edit 1
After that it still wasn't resolving your issue I've updated my answer. This should resolve the issue you are having.
library(tidyverse)
ggplot(diamonds, aes(x=color,y=price, fill = color)) +
geom_col() + facet_wrap(~cut, scales = "free")
Edit 2
Using your new data and the interactions package I found a rather unpleasant 'hack' using scale_x_discrete() but it should give the desired outcome.
library(interactions)
library(lme4)
set.seed(123)
id <- rep(1:150, each = 4)
condition <- rep(c("a", "b", "c"), each = 4, times = 50)
cat_mod <- rep(c("cat_1", "cat_2", "cat_3", "cat_4"), each =
1, length.out = 600)
control_mod <- rep(c("control_1", "control_2"), each = 4,
length.out = 600)
binary_choice <- rbinom(600, 1, 0.5)
simdat <- data.frame(id, condition, cat_mod, binary_choice,
control_mod)
m <- glmer(binary_choice ~ condition*cat_mod + control_mod +
(1 | id), family=binomial, data = simdat)
cat_plot(m, pred = condition, modx = cat_mod, geom = "bar",
colors = "Set1") + scale_x_discrete() +
facet_wrap(condition~., scales= "free")

How to produce a similar plot? [R]

The authors of this paper (https://www.sciencedirect.com/science/article/pii/S0092867415006418) mention in their supplementary file that these were produced in Matlab. Due to lack of proficiency, time to learn it, and the license, I was trying to replicate the figure below (Figure 2 of the paper, specifically figure 2A on the left) in R:
Any suggestions? What is this plot called more generally?
Thank you!
To me it looks like a classic point plot! You can reproduce this kind of plot in R with ggplot:
# Fake dataframe with xy coordinates, type of data (for the coloring), pvalue (for size), and different panel
df <- data.frame(
x = rep(1:20, 10),
y = rnorm(200, mean = 0, sd = 2),
type = rep(rep(LETTERS[1:5], each = 4), 10),
pvalue = sample(0:50, size = 200, replace = T)/1000,
panel = sample(rep(paste0("panel", 1:4), each = 50)), 200, replace = F)
# plot
library(ggplot2)
ggplot(df, aes(x, y*x , color = type, size = pvalue)) + geom_hline(yintercept = 0) + geom_point() + facet_wrap(~panel, ncol = 2)
ggsave("demo.png")

R: Change axis label in plot.Mclust AND/OR plot uncertainty of mclust model with ggplot2

I am really confused. I would like to change the axis labels of a plot (classification or uncertainty) for a 'Mclust' model object in R and I don't understand why it's working for a simple object with just two variables, but not several ones.
Here an example:
require(mclust)
mod1 = Mclust(iris[,1:2])
plot(mod1, what = "uncertainty", dimens = c(1,2), xlab = "test")
# changed x-axis-label
mod2 = Mclust(iris[,1:4])
plot(mod2, what = "uncertainty", dimens = c(1,2), xlab = "test")
# no changed x-axis-label
Another way I tried was with coordProj:
coordProj(data= iris[, -5], dimens = c(1,2), parameters = mod2$parameters,
z = mod2$z, what = "uncertainty", xlab = "test")
# Error in plot.default(data[, 1], data[, 2], pch = 19, main = "", xlab = xlab, :
# formal argument "xlab" matched by multiple actual arguments
So I thought, maybe it will work with ggplot2 (and that would be my favourite option). Now I can change the axis labels and so on but I don't know how to plot the ellipses?
require(ggplot2)
ggplot(data = iris) +
geom_point(aes(x = Sepal.Length, y = Sepal.Width, size = mod2$uncertainty)) +
scale_x_continuous(name = "test")
It would be nice, if someone might know a solution to change the axis labels in plot.Mclust or to add the ellipses to ggplot.
Thanks a lot!
I started to look at the code for plot.Mclust, but then I just used stat_ellipse and changed the level until the plots looked the same. It appears to be a joint t-distribution (the default) at 50% confidence (instead of the default 95%). There's probably a better way to do it using the actual covariance matrix (mod2$parameters$variance$sigma), but this gets you to where you want.
require(dplyr)
iris %>%
mutate(uncertainty = mod2$uncertainty,
classification = factor(mod2$classification)) %>%
ggplot(aes(Sepal.Length, Sepal.Width, size = uncertainty, colour = classification)) +
geom_point() +
guides(size = F, colour = F) + theme_classic() +
stat_ellipse(level = 0.5, type = "t") +
labs(x = "Label X", y = "Label Y")

R: Plotting IRF manually

Consider following script to plot an impulse response function:
library(vars)
Canada <- Canada * 999
var <- VAR(Canada, p = 2, type = "both")
plot(irf(var, impulse = "rw", response = "U", boot = T, cumulative = FALSE, n.ahead = 20))
plot(irf(var, impulse = "rw", response = "U", boot = T, cumulative = TRUE, n.ahead = 20))
I wonder how I could access the data of the plot (and 95% intervals)?
It would be great to print a plot with a color filled confidence band, a green impulse response line and different axis descriptions. A solution with R's inbuild plot features would be preferred over ggplot.
Thanks!
You can view the data returned by irf:
library("vars")
# generate some dummy data
df <- data.frame(n=rnorm(100), p=rpois(100, 2))
var <- VAR(df, p = 2, type = "both")
irf <- irf(var, impulse = "n", response = "p", boot = T,
cumulative = FALSE, n.ahead = 20)
# inspect coefficients object
str(irf)
All the data you need is accessible from here (e.g. check irf$Lower and irf$Upper).
One way to customise the default plot would be to look at the source of the function being called when you run plot(irf):
vars:::plot.varirf
In this case it's a bit involved but you can copy the body of this function and edit the code to change the colours, draw a filled polygon and edit the labels of the axes to get them exactly the way you want.
Updated:
Here's a starting point for the confidence bands:
# set up the base plot
plot(irf$irf$n, type="n", ylim = c(-.3, .5),
ylab = "Your label", xlab = "Another label")
abline(h=0)
# draw the filled polygon for confidence intervals
polygon(
c(1:length(irf$Upper$n), length(irf$Lower$n):1),
c(irf$Upper$n, rev(irf$Lower$n)),
col = "grey80", border = NA)
# add coefficient estimate line
lines(irf$irf$n, col = "darkgreen")
I had a similar problem, so I modeled it myself. I am not an advanced R user so maybe someone can put that into a function or so.
This method creates a plot of all IRFs, with a vertical at y=0, the names of the impulses on the x-axis and the responses on the y-axis. The IRF-plots are also size-adjusted.
"VAR_BS_9016_5VAR" is my "varest" object. I used 5 variables but this method can easily be shortened or expanded.
par(mfrow=c(5,5), oma = c(0,0,0,0) + 0.1, mar = c(5,5,0,0) + 0.1)
for (i in 1:5){
for (j in 1:5){
var_plot=irf(VAR_BS_9016_5VAR, impulse = paste(colnames(VAR_BS_9016_5VAR$y)[i]), response=paste(colnames(VAR_BS_9016_5VAR$y)[j]), n.ahead = 20, ortho=TRUE, boot=TRUE, runs=1000, ci=0.9)
plot(x=c(1:21), y=unlist(var_plot$Lower), type="l", lwd = 3, lty=2,col="red", ylab=paste(colnames(VAR_BS_9016_5VAR$y)[j]), xlab=paste(var_plot$impulse), ylim=range(c(unlist(var_plot$Lower),unlist(var_plot$Upper))) )
lines(x=c(1:21),y=unlist(var_plot$Upper),type="l",lwd = 3, lty=2,col="red")
lines(x=c(1:21),y=unlist(var_plot$irf),type="l", lwd = 3)
abline(a = NULL, h = 0)
}
}
Here is my solution for obtaining a data frame that can be used in ggplot when you have multiple impulses and multiple responses.
For the pipe operator please get library(dplyr). Be careful since dplyr and MASS (dependency of vars-package) have naming conflicts (e.g., for "select"):
getIRFPlotData <- function(impulse, response, list) {
cbind.data.frame(Week = 0:(nrow(list[[1]][[1]])-1),
Lower = list[[2]][names(list[[2]]) == impulse][[1]] %>% as.data.frame() %>% dplyr::select_(response) %>% pull(1),
irf = list[[1]][names(list[[1]]) == impulse][[1]] %>% as.data.frame() %>% dplyr::select_(response) %>% pull(1),
Upper = list[[3]][names(list[[3]]) == impulse][[1]] %>% as.data.frame() %>% dplyr::select_(response) %>% pull(1),
Impulse = impulse,
Response = response, stringsAsFactors = FALSE)
}
With this you can return a data.frame with columns = Lower, irf, Upper, Impulse, Response. When you use dplyr::bind_rows() on the data frames you can stack the different data.frames on top of each other and using ggplot2::facet_wrap() and facet_grid() you can produce charts similar to the ones outputted by vars:::plot.varirf(), but are fully flexible to append stuff and work with the data.
getIRFPlotData("Spendings", "Returns", irf4c) %>% ggplot(.) + geom_line(aes(Week, Lower), linetype="dashed") + geom_line(aes(Week, irf)) + geom_line(aes(Week, Upper),linetype="dashed") + geom_ribbon(aes(Week, ymin=Lower, ymax=Upper), alpha = 0.3) + theme_minimal()

Resources