Handle ggplot2 axis text face programmatically - r

(x-posted to community.rstudio.com)
I'm wondering if it's possible to change the axis text in ggplot2 programatically or if there is some native way to do this in ggplot2. In this reprex, the idea is that I want to bold the axis text of a variable y that has an absolute value of x over 1.5. I can add it in manually via theme(), and that works fine:
library(ggplot2)
library(dplyr)
library(forcats)
set.seed(2939)
df <- data.frame(x = rnorm(15), y = paste0("y", 1:15), group = rep(1:3, 5))
df <- mutate(df, big_number = abs(x) > 1.5, face = ifelse(big_number, "bold",
"plain"))
p <- ggplot(df, aes(x = x, y = fct_inorder(y), col = big_number)) + geom_point() +
theme(axis.text.y = element_text(face = df$face))
p
Plot 1 with no facets
But if I facet it by group, y gets reordered and ggplot2 has no idea how face is connected to df and thus y, so it just bolds in the same order as the first plot.
p + facet_grid(group ~ .)
Plot 2 with facets
And it's worse if I use a different scale for each.
p + facet_grid(group ~ ., scales = "free")
Plot 3 with facets and different scales
What do you think? Is there a general way to handle this that would work consistently here?

Idea: Don't change theme, change y-axis labels. Create a call for every y with if/else condition and parse it with parse.
Not the most elegant solution (using for loop), but works (need loop as bquote doesn't work with ifelse). I always get confused when trying to work with multiple expressions (more on that here).
Code:
# Create data
library(tidyverse)
set.seed(2939)
df <- data.frame(x = rnorm(15), y = paste0("y", 1:15), group = rep(1:3, 5)) %>%
mutate(yF = fct_inorder(y),
big_number = abs(x) > 1.5)
# Expressions for y-axis
# ifelse doesn't work
# ifelse(df$big_number, bquote(bold(1)), bquote(plain(2)))
yExp <- c() # Ignore terrible way of concatenating
for(i in 1:nrow(df)) {
if (df$big_number[i]) {
yExp <- c(yExp, bquote(bold(.(as.character(df$yF[i])))))
} else {
yExp <- c(yExp, bquote(plain(.(as.character(df$yF[i])))))
}
}
# Plot with facets
ggplot(df, aes(x, yF, col = big_number)) +
geom_point() +
scale_y_discrete(breaks = levels(df$yF),
labels = parse(text = yExp)) +
facet_grid(group ~ ., scales = "free")
Result:

Inspired by #PoGibas, I also used a function in scale_y_discrete(), which works, too.
bold_labels <- function(breaks) {
big_nums <- filter(df, y %in% breaks) %>%
pull(big_number)
labels <- purrr::map2(
breaks, big_nums,
~ if (.y) bquote(bold(.(.x))) else bquote(plain(.(.x)))
)
parse(text = labels)
}
ggplot(df, aes(x, fct_inorder(y), col = big_number)) +
geom_point() +
scale_y_discrete(labels = bold_labels) +
facet_grid(group ~ ., scales = "free")

Related

Change axes label and scale using ggplot and patchwork in R

(I am trying to make this question as short and concise as possible, as other related answers may be tough for the non-savvy like myself.)
With the following code in mind, is it possible to have both y-axes on the same scale (that of the graph with the highest y-limit), and to have independent labels for each of the axes (namely the y-axes)? I tried to use facet_wrap but haven't so far been able to succeed as Layer 1 is missing)
library(ggplot2)
library(patchwork)
d <- cars
d$Obs <- c(1:50)
f1 <- function(a) {
ggplot(data=d, aes_string(x="Obs", y=a)) +
geom_line() +
labs(x="Observation",y="Speed/Distance")
}
f1("speed") + f1("dist")
You could add two additional arguments to your function, one for the axis label and one for your desired limits.
library(ggplot2)
library(patchwork)
d <- cars
d$Obs <- c(1:50)
f1 <- function(a, y_lab) {
ggplot(data = d, aes_string(x = "Obs", y = a)) +
geom_line() +
scale_y_continuous(limits = range(c(d$speed, d$dist))) +
labs(x = "Observation", y = y_lab)
}
f1("speed", "Speed") + f1("dist", "Distance")
Reshape wide-to-long, then use facet. Instead of having different y-axis labels we will have facet labels:
library(ggplot2)
library(tidyr)
pivot_longer(d, 1:2, names_to = "grp") %>%
ggplot(aes(x = Obs, y = value)) +
geom_line() +
facet_wrap(vars(grp))

How to apply separate coord_cartesian() to "zoom in" into individual panels of a facet_grid()?

Inspired by the Q Finding the elbow/knee in a curve I started to play around with smooth.spline().
In particular, I want to visualize how the parameter df (degree of freedom) influences the approximation and the first and second derivative. Note that this Q is not about approximation but about a specific problem (or edge case) in visualisation with ggplot2.
First attempt: simple facet_grid()
library(ggplot2)
ggplot(ap, aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
dp is a data.table containing the data points for which an approximation is sought and ap is a data.table with the approximated data plus the derivatives (data are given below).
For each row, facet_grid() with scales = "free_y" has choosen a scale which displays all data. Unfortunately, one panel has kind of "outliers" which make it difficult to see details in the other panels. So, I want to "zoom in".
"Zoom in" using coord_cartesian()
ggplot(ap, aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-200, 50))
With the manually selected range, more details in the panels of row 3 have been made visible. But, the limit has been applied to all panels of the grid. So, in row 1 details hardly can been distinguished.
What I'm looking for is a way to apply coord_cartesian() with specific parameters separately to each individual panel (or group of panels, e.g., rowwise) of the grid. For instance, is it possible to manipulate the ggplot object afterwards?
Workaround: Combine individual plots with cowplot
As a workaround, we can create three separate plots and combine them afterwards using the cowplot package:
g0 <- ggplot(ap[deriv == 0], aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
g1 <- ggplot(ap[deriv == 1], aes(x, y)) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-50, 50))
g2 <- ggplot(ap[deriv == 2], aes(x, y)) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-200, 100))
cowplot::plot_grid(g0, g1, g2, ncol = 1, align = "v")
Unfortunately, this solution
requires to write code to create three separate plots,
duplicates strips and axes and adds whitespace which isn't available for display of the data.
Is facet_wrap() an alternative?
We can use facet_wrap() instead of facet_grid():
ggplot(ap, aes(x, y)) +
# geom_point(data = dp, alpha = 0.2) + # this line causes error message
geom_line() +
facet_wrap(~ deriv + df, scales = "free_y", labeller = label_both, nrow = 3) +
theme_bw()
Now, the y-axes of every panel are scaled individually exhibiting details of some of the panels. Unfortunately, we still can't "zoom in" into the bottom right panel because using coord_cartesian() would affect all panels.
In addition, the line
geom_point(data = dp, alpha = 0.2)
strangely causes
Error in gList(list(x = 0.5, y = 0.5, width = 1, height = 1, just = "centre", :
only 'grobs' allowed in "gList"
I had to comment this line out, so the the data points which are to be approximated are not displayed.
Data
library(data.table)
# data points
dp <- data.table(
x = c(6.6260, 6.6234, 6.6206, 6.6008, 6.5568, 6.4953, 6.4441, 6.2186,
6.0942, 5.8833, 5.7020, 5.4361, 5.0501, 4.7440, 4.1598, 3.9318,
3.4479, 3.3462, 3.1080, 2.8468, 2.3365, 2.1574, 1.8990, 1.5644,
1.3072, 1.1579, 0.95783, 0.82376, 0.67734, 0.34578, 0.27116, 0.058285),
y = 1:32,
deriv = 0)
# approximated data points and derivatives
ap <- rbindlist(
lapply(seq(2, length(dp$x), length.out = 4),
function(df) {
rbindlist(
lapply(0:2,
function(deriv) {
result <- as.data.table(
predict(smooth.spline(dp$x, dp$y, df = df), deriv = deriv))
result[, c("df", "deriv") := list(df, deriv)]
})
)
})
)
Late answer, but the following hack just occurred to me. Would it work for your use case?
Step 1. Create an alternative version of the intended plot, limiting the range of y values such that scales = "free_y" gives a desired scale range for each facet row. Also create the intended facet plot with the full data range:
library(ggplot2)
library(dplyr)
# alternate plot version with truncated data range
p.alt <- ap %>%
group_by(deriv) %>%
mutate(upper = quantile(y, 0.75),
lower = quantile(y, 0.25),
IQR.multiplier = (upper - lower) * 10) %>%
ungroup() %>%
mutate(is.outlier = y < lower - IQR.multiplier | y > upper + IQR.multiplier) %>%
mutate(y = ifelse(is.outlier, NA, y)) %>%
ggplot(aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
# intended plot version with full data range
p <- p.alt %+% ap
Step 2. Use ggplot_build() to generate plot data for both ggplot objects. Apply the panel parameters of the alt version onto the intended version:
p <- ggplot_build(p)
p.alt <- ggplot_build(p.alt)
p$layout$panel_params <- p.alt$layout$panel_params
rm(p.alt)
Step 3. Build the intended plot from the modified plot data, & plot the result:
p <- ggplot_gtable(p)
grid::grid.draw(p)
Note: in this example, I truncated the data range by setting all values more than 10*IQR away from the upper / lower quartile in each facet row as NA. This can be replaced by any other logic for defining outliers.

How to support loop drawing in ggplot2?

data <- data.frame(a=1:10, b=1:10 * 2, c=1:10 * 3)
library(ggplot2)
p <- ggplot(NULL, aes(x = 1:10))
# Using for loop will cause the plot only to draw the last line.
for (i in names(data)){
p <- p + geom_line(aes(y = data[[i]], colour = i))
}
# Lines below works fine.
# p <- p + geom_line(aes(y = data[["a"]], colour = "a"))
# p <- p + geom_line(aes(y = data[["b"]], colour = "b"))
# p <- p + geom_line(aes(y = data[["c"]], colour = "c"))
print(p)
Why loop plotting doesn't work as what we expected?
Is this a lazy plotting method?
You don't actually have to loop to get your lines. You just need to reshape your data and actually include x in your data frame. Your data is wide, and ggplot2 likes long data. This is how you can easily make multiple lines in a single plot.
As an aside, your method doesn't work as you are replacing p each time you iterate, ending up with only the endpoint of the loop.
library(ggplot2)
library(tidyr)
data <- data.frame(x = 1:10, a=1:10, b=1:10 * 2, c=1:10 * 3)
df <- gather(data, name, value, -x)
ggplot(df, aes(x = x, y = value, color = name)) +
geom_line()

How to plot three point lines using ggplot2 instead of the default plot in R

I have three matrix and I want to plot the graph using ggplot2. I have the data below.
library(cluster)
require(ggplot2)
require(scales)
require(reshape2)
data(ruspini)
x <- as.matrix(ruspini[-1])
w <- matrix(W[4,])
df <- melt(data.frame(max_Wmk, min_Wmk, w, my_time = 1:10), id.var = 'my_time')
ggplot(df, aes(colour = variable, x = my_time, y = value)) +
geom_point(size = 3) +
geom_line() +
scale_y_continuous(labels = comma) +
theme_minimal()
I want to add the three plots into one plot using a beautiful ggplot2.
Moreover, I want to make the points with different values have different colors.
I'm not quite sure what you're after, here's a guess
Your data...
max <- c(175523.9, 33026.97, 21823.36, 12607.78, 9577.648, 9474.148, 4553.296, 3876.221, 2646.405, 2295.504)
min <- c(175523.9, 33026.97, 13098.45, 5246.146, 3251.847, 2282.869, 1695.64, 1204.969, 852.1595, 653.7845)
w <- c(175523.947, 33026.971, 21823.364, 5246.146, 3354.839, 2767.610, 2748.689, 1593.822, 1101.469, 1850.013)
Slight modification to your base plot code to make it work...
plot(1:10,max,type='b',xlab='Number',ylab='groups',col=3)
points(1:10,min,type='b', col=2)
points(1:10,w,type='b',col=1)
Is this what you meant?
If you want to reproduce this with ggplot2, you might do something like this...
# ggplot likes a long table, rather than a wide one, so reshape the data, and add the 'time' variable explicitly (ie. my_time = 1:10)
require(reshape2)
df <- melt(data.frame(max, min, w, my_time = 1:10), id.var = 'my_time')
# now plot, with some minor customisations...
require(ggplot2); require(scales)
ggplot(df, aes(colour = variable, x = my_time, y = value)) +
geom_point(size = 3) +
geom_line() +
scale_y_continuous(labels = comma) +
theme_minimal()
UPDATE after the question was edited and the example data changed, here's an edit to suit the new example data:
Here's your example data (there's scope for simplification and speed gains here, but that's another question):
library(cluster)
require(ggplot2)
require(scales)
require(reshape2)
data(ruspini)
x <- as.matrix(ruspini[-1])
wss <- NULL
W=matrix(data=NA,ncol=10,nrow=100)
for(j in 1:100){
k=10
for(i in 1: k){
wss[i]=kmeans(x,i)$tot.withinss
}
W[j,]=as.matrix(wss)
}
max_Wmk <- matrix(data=NA, nrow=1,ncol=10)
for(i in 1:10){
max_Wmk[,i]=max(W[,i],na.rm=TRUE)
}
min_Wmk <- matrix(data=NA, nrow=1,ncol=10)
for(i in 1:10){
min_Wmk[,i]=min(W[,i],na.rm=TRUE)
}
w <- matrix(W[4,])
Here's what you need to do to make the three objects into vectors so you can make the data frame as expected:
max_Wmk <- as.numeric(max_Wmk)
min_Wmk <- as.numeric(min_Wmk)
w <- as.numeric(w)
Now reshape and plot as before...
df <- melt(data.frame(max_Wmk, min_Wmk, w, my_time = 1:10), id.var = 'my_time')
ggplot(df, aes(colour = variable, x = my_time, y = value)) +
geom_point(size = 3) +
geom_line() +
scale_y_continuous(labels = comma) +
theme_minimal()
And here's the result:

How to use subscripts in ggplot2 legends [R]

Can I use subscripts in ggplot2 legends? I see this question on greek letters in legends and elsewhere, but I can't figure out how to adapt it.
I thought that using expression(), which works in axis labels, would do the trick. But my attempt below fails. Thanks!
library(ggplot2)
temp <- data.frame(a = rep(1:4, each = 100), b = rnorm(4 * 100), c = 1 + rnorm(4 * 100))
names(temp)[2:3] <- c("expression(b[1])", "expression(c[1])")
temp.m <- melt(temp, id.vars = "a")
ggplot(temp.m, aes(x = value, linetype = variable)) + geom_density() + facet_wrap(~ a)
The following should work (remove your line with names(temp) <-...):
ggplot(temp.m, aes(x = value, linetype = variable)) +
geom_density() + facet_wrap(~ a) +
scale_linetype_discrete(breaks=levels(temp.m$variable),
labels=c(expression(b[1]), expression(c[1])))
See help(scale_linetype_discrete) for available customization (e.g. legend title via name=).
If you want to incorporate Greek symbols etc. into the major tick labels, use an unevaluated expression.
For a bar graph, i did the following:
library(ggplot2)
data <- data.frame(names=tolower(LETTERS[1:4]),mean_p=runif(4))
p <- ggplot(data,aes(x=names,y=mean_p))
p <- p + geom_bar(colour="black",fill="white")
p <- p + xlab("expressions") + scale_y_continuous(expression(paste("Wacky Data")))
p <- p + scale_x_discrete(labels=c(a=expression(paste(Delta^2)),
b=expression(paste(q^n)),
c=expression(log(z)),
d=expression(paste(omega / (x + 13)^2))))
p

Resources