Controlling facet column number with visreg - r

While I am enjoying using package visreg to visualize my regressions, there's one thing that I can't yet control: the number of columns when faceting. See the following factor-by-curve generalized additive regression for example:
library(dplyr)
library(mgcv)
library(visreg)
data(airquality)
test <- gam(
Ozone ~ s(Temp, by = Month),
data = airquality %>% mutate(Month = as.factor(Month))
)
If I do
visreg(test, xvar = "Temp", by = "Month", gg = TRUE)
I get a 1-row, 5-column factor-by-curves.
Funnily enough, if I take the gg = TRUE out, it becomes 2-row. But whichever is the case I would like to be able to control the number of columns and rows when faceting. So far I have been unsuccessful, by either manipulating the ellipsis argument of visreg or by directly trying to manipulate the resulting ggplot object.
So for example, if I wanted to do visreg with gg = TRUE with 3-row, 2-column, what would be my best chance---or is there another package that is recommended?

You can just modify the ggplot object and add facet_wrap in the usual way:
p <- visreg(test, xvar = "Temp", by = "Month", gg = TRUE)
p + facet_wrap(vars(Month), nrow = 3)
You don't actually need to create p first, this gives the same result:
visreg(test, xvar = "Temp", by = "Month", gg = TRUE) +
facet_wrap(vars(Month), nrow = 3)

Related

Combine correlation plot with coefficient table (ggplot2 -> ggstatsplot)

What are your preferred techniques for combining a table with a plot in one image using R? I remember using tableGrob() and either patchwork or cowplot months ago but cannot remember the details.
This example uses the ggstatsplot package. I would like to add the correlation coefficients to the correlogram (correlation plot).
if (!('ggstatsplot' %in% installed_packages)) {
devtools::install_github('https://github.com/IndrajeetPatil/ggstatsplot')
}
needed_pkgs <- setdiff(c('ggstatsplot', 'statsExpressions',
'dplyr', 'nnet', 'MASS'),
installed_packages)
if (length(needed_pkgs) > 0) {
install.packages(needed_pkgs)
}
library(ggstatsplot)
library(statsExpressions)
library(dplyr)
library(nnet)
library(MASS)
utils::example(topic = birthwt, echo = FALSE)
# model
bwt.mu <-
nnet::multinom(
formula = low ~ .,
data = bwt,
trace = FALSE
)
original_cols <- colnames(bwt)
bwt.mu_coefstats <- ggcoefstats(x = bwt.mu, output = "tidy") %>%
# skipping first row = intercept
slice(2:n()) %>%
dplyr::filter(term %in% original_cols) %>%
arrange(desc(p.value)) %>%
dplyr::select(term, estimate, p.value)
# Correlogram
cor_plot_out <-
ggstatsplot::ggcorrmat(bwt %>% dplyr::select(low, lwt, age))
Want to combine
bwt.mu_coefstats
cor_plot_out
The key elemnent is tableGrob() from gridExtra package!
We could use grid.arrange().
For the table use tableGrob() to create a table like the plot of a data frame. Then you can use it with grid.arrange() function.
library(gridExtra)
bwt.mu_coefstats <- tableGrob(
bwt.mu_coefstats,
theme = ttheme_default(
base_size = 10,
base_colour = "grey25",
parse = T
),
rows = NULL
)
grid.arrange(cor_plot_out, bwt.mu_coefstats,
heights = c(10, 4))
OR with patchwork:
library(patchwork)
cor_plot_out + bwt.mu_coefstats

Changing aesthetics in ggplot generated by svars package in R

I'm using the svars package to generate some IRF plots. The plots are rendered using ggplot2, however I need some help with changing some of the aesthetics.
Is there any way I can change the fill and alpha of the shaded confidence bands, as well as the color of the solid line? I know in ggplot2 you can pass fill and alpha arguments to geom_ribbon (and col to geom_line), just unsure of how to do the same within the plot function of this package's source code.
# Load Dataset and packages
library(tidyverse)
library(svars)
data(USA)
# Create SVAR Model
var.model <- vars::VAR(USA, lag.max = 10, ic = "AIC" )
svar.model <- id.chol(var.model)
# Wild Bootstrap
cores <- parallel::detectCores() - 1
boot.svar <- wild.boot(svar.model, n.ahead = 30, nboot = 500, nc = cores)
# Plot the IRFs
plot(boot.svar)
I'm also looking at the command for a historical decomposition plot (see below). Is there any way I could omit the first two facets and plot only the bottom three lines on the same facet?
hist.decomp <- hd(svar.model, series = 1)
plot(hist.decomp)
Your first desired result is easily achieved by resetting the aes_params after calling plot. For your second goal. There is probably an approach to manipulate the ggplot object. Instead my approach below constructs the plot from scratch. Basically I copy and pasted the data wrangling code from vars:::plot.hd and filtered the prepared dataset for the desired series:
# Plot the IRFs
p <- plot(boot.svar)
p$layers[[1]]$aes_params$fill <- "pink"
p$layers[[1]]$aes_params$alpha <- .5
p$layers[[2]]$aes_params$colour <- "green"
p
# Helper to convert to long dataframe. Source: svars:::plot.hd
hd2PlotData <- function(x) {
PlotData <- as.data.frame(x$hidec)
if (inherits(x$hidec, "ts")) {
tsStructure = attr(x$hidec, which = "tsp")
PlotData$Index <- seq(from = tsStructure[1], to = tsStructure[2],
by = 1/tsStructure[3])
PlotData$Index <- as.Date(yearmon(PlotData$Index))
}
else {
PlotData$Index <- 1:nrow(PlotData)
PlotData$V1 <- NULL
}
dat <- reshape2::melt(PlotData, id = "Index")
dat
}
hist.decomp <- hd(svar.model, series = 1)
dat <- hd2PlotData(hist.decomp)
dat %>%
filter(grepl("^Cum", variable)) %>%
ggplot(aes(x = Index, y = value, color = variable)) +
geom_line() +
xlab("Time") +
theme_bw()
EDIT One approach to change the facet labels is via a custom labeller function. For a different approach which changes the facet labels via the data see here:
myvec <- LETTERS[1:9]
mylabel <- function(labels, multi_line = TRUE) {
data.frame(variable = labels)
}
p + facet_wrap(~variable, labeller = my_labeller(my_labels))

Annotate different equations on facetted ggplot

I'm trying to use a combination of this answer for annotating equations onto a ggplot plot and this answer of putting different texts onto different facets.
The problem I'm getting is that I can't get different formulas using mathematical expressions onto different facets.
#Required package
library(ggplot2)
#Split the mtcars dataset by the number of cylinders in each engine
cars.split <- split(mtcars, mtcars$cyl)
#Create a linear model to get the equation for the line for each cylinder
cars.mod <- lapply(cars.split, function(x){
lm(wt ~ mpg, data = x)
})
#Create predicted data set to add a 'geom_line()' in ggplot2
cars.pred <- as.data.frame(do.call(rbind,
mapply(x = cars.split, y = cars.mod,
FUN = function(x, y){
newdata <- data.frame(mpg = seq(min(x$mpg),
max(x$mpg),
length.out = 100))
pred <- data.frame(wt = predict(y, newdata),
mpg = newdata$mpg)
}, SIMPLIFY = F)))
cars.pred$cyl <- rep(c(4,6,8), each = 100)
(cars.coef <- as.data.frame(do.call(rbind, lapply(cars.mod, function(x)x$coefficients))))
#Create a data frame of line equations a 'cyl' variable to facilitate facetting
#as per second link. I had to MANUALLY take the values 'cars.coef' and put them
#into the data frame.
equation.text <- data.frame(label = c('y = 4.69-0.09x^{1}',
'y = 6.42-0.17x^{1}',
'y = 6.91-0.19x^{1}'),
cyl = c(4,6,8))
#Plot it
ggplot(data = mtcars, mapping = aes(x = mpg, y = wt)) +
geom_point() +
geom_line(data = cars.pred, mapping = aes(x = mpg, y = wt)) +
geom_text(data = equation.text, mapping = aes(x = 20, y = 5, label = label)) +
facet_wrap(.~ cyl)
The equation in the plot is exactly as I had written in the equation.text data frame, which is no surprise since the equations are in ''. But I'm trying to get it to be in mathematical notation, like $y = 4.69–0.09x^1$
I know I need to use expression as it said in the first link I had, but when I try to put it into a data frame:
equation.text <- data.frame(label = c(expression(y==4.69-0.9*x^{1}),
expression(y==6.42-0.17*x^{1}),
expression(y==6.91-0.19*x^{1})),
cyl = c(4,6,8))
I get an error saying expressions can't be put into data frames:
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class '"expression"' to a data.frame
My questions are:
How can I get different equations in mathematical notation (italicized letters, superscripts, subscripts) in different facets?
What's a more automated way of getting values from the cars.coef data frame into the equations table (rather than typing out all the numbers!)?
UPDATE: This has been brought to my attention, but a lot of the answers seem to work for linear models. Is there a way to do it for, say, a non-linear model as well?
Hopefully this satisfies both parts of the question. I'm also not great with putting together expressions.
For the first part, you can create a data frame of equation text from your data frame of intercepts and coefficients, and format it how you need. I set up the sprintf to match the number of decimal places you had, and to flag the coefficient's sign.
library(ggplot2)
# same preparation as in question
# renamed just to have standard column names
names(cars.coef) <- c("intercept", "mpg")
equation.text <- data.frame(
cyl = unique(cars.pred$cyl),
label = sprintf("y == %1.2f %+1.2f*x^{1}", cars.coef$intercept, cars.coef$mpg,
stringsAsFactors = F)
)
The label column looks like this:
"y == 4.69 -0.09*x^{1}" "y == 6.42 -0.17*x^{1}" "y == 6.91 -0.19*x^{1}"
For the second part, you can just set parse = T in your geom_text, similar to the argument available in annotate.
ggplot(data = mtcars, mapping = aes(x = mpg, y = wt)) +
geom_point() +
geom_line(data = cars.pred, mapping = aes(x = mpg, y = wt)) +
geom_text(data = equation.text, mapping = aes(x = 20, y = 5, label = label), parse = T) +
facet_wrap(.~ cyl)
Notes on sprintf: % marks off where the formatting starts. I'm using + as the flag to include signs (plus or minus) to show the coefficient being either added or subtracted. 1.2f means including 1 place before the decimal point and 2 after; this can be adjusted as needed, but worked to display numbers e.g. 4.69. Arguments are passed to the format string in order as they're passed to sprintf.

ggplot axis order (factor) changes when using last_plot()

I've been able to successfully create a dotpot in ggplot for percentages across gender. But, I want to highlight the significant differences. I thought I could do this with a combination of subsetting and the use of last_plot().
Here’s my data:
require(ggplot2)
require(reshape2)
prog <- c("Honors", "Academic", "Social", "Media")
m <- c(30,35,40,23)
f <- c(25,40,45,15)
s <- c(0.7, 0.4, 0.1, 0.03)
temp <- as.data.frame(cbind(prog, m, f, s), stringsAsFactors=FALSE)
first <- temp[,1:3]
first.melt <- melt(first, id.vars = 'prog', variable.name = 'Gender', value.name = 'Percent')
first.melt <- as.data.frame(cbind(first.melt,temp[,4]), , stringsAsFactors=FALSE)
names(first.melt) <- c("program", "Gender", "Percent", "sig")
first.melt$program <- as.factor(first.melt$program)
Here’s where I reverse order my Program variable, so that when graphed if will be alphabetical from top to bottom.
first.melt[,1] = with(first.melt, factor(first.melt[,1], levels = rev(levels(first.melt[,1]))))
first.melt$sig <- as.numeric(as.character(first.melt$sig))
first.melt$Percent <- as.numeric(as.character(first.melt$Percent))
Now, I subset...
first.melt.ns <- subset(first.melt,sig > 0.05)
first.melt.sig <- subset(first.melt,sig <= 0.05)
ggplot(first.melt.ns, aes(program, y=Percent, shape=Gender)) +
geom_point(size=3) +
coord_flip() +
scale_shape_manual(values=c("m"=1, "f"=5))
The first run at ggplot get’s me my non-significant Program pairs – and it’s in the right order – so, I add my the two new points for male and female (making them solid, to draw attention as a significant pair):
last_plot() +
geom_point(data=first.melt.sig, aes(program[Gender=="m"], y=Percent[Gender=="m"]), size=3, shape=19) +
geom_point(data=first.melt.sig, aes(program[Gender=="f"], y=Percent[Gender=="f"]),size=4, shape=18)
The points get added just fine – ggplot works. But notice my Program axis – it’s correct, but reversed now.
First, you really should avoid as.data.frame(cbind(...)). It is dramatically increasing the amount of work necessary to prepare your data. The function for creating data frames is (naturally) data.frame. Use it!
What you're doing here is basically trying to get around the limitation of only having one shape scale. It's probably easiest to just do this:
temp <- data.frame(prog,m,f,s)
first <- temp[,1:3]
first.melt <- melt(first, id.vars = 'prog', variable.name = 'Gender', value.name = 'Percent')
first.melt$sig <- rep(temp$s,times = 2)
first.melt[,1] = with(first.melt, factor(first.melt[,1], levels = rev(levels(first.melt[,1]))))
first.melt.sig <- subset(first.melt,sig < 0.05)
first.melt$Percent[first.melt$sig < 0.05] <- NA
ggplot() +
geom_point(data = first.melt,aes(x = prog,y = Percent,shape = Gender),size = 3) +
geom_point(data = first.melt.sig[1,],aes(x = prog,y = Percent),shape = 19) +
geom_point(data = first.melt.sig[2,],aes(x = prog,y = Percent),shape = 18) +
coord_flip() +
scale_shape_manual(values=c("m"=1, "f"=5))
In general, work to structure your ggplot code so that you're subsetting data frames, not variables inside of aes. That gets both tricky and dangerous, because ggplot is assuming certain things about what you pass inside of aes in order for the evaluation to work properly.

Only one of two densities is shown in ggplot2

So I have two sets of data (of different length) that I am trying to group up and display the density plots for:
dat <- data.frame(dens = c(nEXP,nCNT),lines = rep(c("Exp","Cont")))
ggplot(dat, aes(x = dens, group=lines, fill = lines)) + geom_density(alpha = .5)
when I run the code it spits an error about the different lengths, i.e.
"arguments imply different num of rows: x, y"
I then augment the code to:
dat <- data.frame(dens = c(nEXP,nCNT),lines = rep(c("Exp","Cont"),X))
Where X is the length of the longer argument so the lengths of "lines" will match that of dens.
Now the issue is that when when I go to plot the data I am only getting ONE density plot.... I know there should be two, since plotting the densities with plot/lines, is clearly two non-equal overlapping distributions, so I am assuming the error is with the grouping...
hope that makes sense.
So I am not sure why but basically I simply had to do the rep() function manually:
A<-data.frame(ExpN, key = "exp")
B<-data.frame(ConN,key = "con")
colnames(A) <- c("a","key")
colnames(B) <- c("a","key")
dat <- rbind(A,B)
ggplot(dat, aes(x = dens, fill = key)) + geom_density(alpha = .5)
You need to tell rep how many times to repeat each element to get it to line up
dat <- data.frame(dens = c(nEXP,nCNT),
lines = rep(c("Exp","Cont"), c(length(nEXP),length(nCNT)))
That should give you a dat you can use with your ggplot call.

Resources