R: Programmatically changing ggplot scale labels to Greek letters with expressions - r

I am trying to change the labels in a ggplot object to Greek symbols for an arbitrary number of labels. Thanks to this post, I can do this manually when I know the number of labels in advance and the number is not too large:
# Simulate data
df <- data.frame(name = rep(c("alpha1","alpha2"), 50),
value = rnorm(100))
# Create a plot with greek letters for labels
ggplot(df, aes(x = value, y = name)) + geom_density() +
scale_y_discrete(labels = c("alpha1" = expression(alpha[1]),
"alpha2" = expression(alpha[2])))
For our purposes, assume I need to change k default labels, where each of the k labels is the pre-fix "alpha" followed by a number 1:k. Their corresponding updated labels would substitute the greek letter for "alpha" and use a subscript. An example of this is below:
# default labels
paste0("alpha", 1:k)
# desired labels
for (i in 1:k) { expression(alpha[i]) }
I was able to hack together the below programmatic solution that appears to produce the desired result thanks to this post:
ggplot(df, aes(x = value, y = name)) + geom_density() +
scale_y_discrete(labels = parse(text = paste("alpha[", 1:length(unique(df)), "]")))
However, I do not understand this code and am seeking clarification about:
What is parse() doing here that expression() otherwise would do?
While I understand everything to the right-hand side of =, what is text doing on the left-hand side of the =?

Another option to achieve your desired result would be to add a new column to your data which contains the ?plotmath expression as a string and map this new column on y. Afterwards you could use scales::label_parse() to parse the expressions:
set.seed(123)
df <- data.frame(name = rep(c("alpha1","alpha2"), 50),
value = rnorm(100))
df$label <- gsub("^(.*?)(\\d+)$", "\\1[\\2]", df$name)
library(ggplot2)
library(scales)
ggplot(df, aes(x = value, y = label)) + geom_density() +
scale_y_discrete(labels = scales::label_parse())

Related

How do I pass a string of symbols for bquote to evaluate in ggplot?

The axis labels vary for a ggplot that I create within a function. Some of the labels have super/subscripts, while others don't. Example:
m.data <- data.frame(x = runif(10), y = runif(10))
x.labs <- c("rain, mm", "light*','~W~m^-2")
for (i in 1:2) {
ggplot(m.data, aes(x = x, y = y)) +
labs(title = bquote(.(x.labs[i])))
}
The title for the graph when i=2 is literally
light*','~W~m^-2
rather than the formatted version of same. With the same result, I also tried moving bquote inside each string.
x.labs <- c("bquote(rain*','~mm)", "bquote(light*','~W~m^-2)")
and
title = x.labs[i]
Of the many questions about ggplot and bquote, none seem to address passing in a symbol like the superscript indicator.
One alternative is to use expression() in your vector of titles instead of bquote().
For example
x.labs <- c("rain, mm", expression("light,"~W~m^-2))
ggplot(m.data, aes(x = x, y = y)) +
labs(title = x.labs[2])

R ggplot custom label axis ticks with other variable

When labelling axis ticks I'd like to add other information from the dataframe in parentheses. For example, using code snippet below I'd like to automatically include the information on Area in parentheses next to the label X. In other words, the label might say 'Chicago (45)' instead of just 'Chicago'. I know I can do this manually by setting labels in scale_x_discrete. However, can I do this automatically? My dataset has a large number of entries so I would like to avoid doing this manually.
dataset <- data.frame(Area = sample(c(NA, 1:100), 3, rep = TRUE),
Y = rnorm(3), X = c("Chicago","New York", "Orlando"))
ggplot(dataset, aes(X, Y)) + geom_point()
You can build a new column with your desired ticks labels:
library(dyplr)
library(stringr)
library(ggplot2)
dataset = dataset %>% mutate(label = str_c(X, " (", Area, ")"))
Then use the column label in the aesthetics
ggplot(dataset, aes(label, Y)) + geom_point()

Extracting the column names in R with lapply

So here is my code
h <- lapply(select(winedata, -quality), function(variable){
return(ggplot(aes(x = variable), data = winedata) +
geom_histogram(bins = 30) + xlab(variable))})
There is one problem, and that is xlab(variable) displays the value of the first column as the x axis title, if I choose variable[2] it displays the value of the second column as the x axis title. How do I get it to put the column names as the x axis title. names(variable) does not seem to work
You can use Map:
library(ggplot2)
library(dplyr)
Map(function(var, names){
return(ggplot(iris, aes(x = var)) +
geom_histogram(bins = 30) + xlab(names))},
select(iris, -Species), names(iris)[1:4])
Map is essentially mapply with SIMPLIFY=FALSE, which takes multiple inputs and returns a list.

R - Interpreting a subscript in a variable used in ggplot

I'm using ggplot to do some multiline plots that are constructed with lots of variables and the use of paste. I have not been able to figure out how to get the subscript 3 in O3 to appear in the following simplified version of the code.
gasSubscript <- "O[3]"
color1 <- paste(gasSubscript,"some additional text")
df <- data.frame(x = c(1,2,3,4,5,6,7,8,9,10), y = c(10,9,8,7,6,5,4,3,2,1))
testPlot <- ggplot(data = df, aes(x = x)) + geom_line(aes(y = y, color = color1))
color1 contains
"O[3] some additional text"
The legend displays as "O[3] some additional text" rather than with a subscripted 3.
The problem is that you need the label in the scale to be an expression so that, when it is rendered, it is rendered according to the rules of plotmath. However, ggplot works with data.frames and data.frames can not have a column which is a vector of expressions. So the way around this is to store the information as the text (string) version of the plotmath expression and, as the last step for making the labels, turn these into expressions. This can be done because the labels argument to the scale functions can itself be a function which can transform/format the labels.
Putting this together with your example:
color1 <- paste(gasSubscript,"*\" some additional text\"")
This is now in a format that can be made into an expression.
> color1
[1] "O[3] *\" some additional text\""
> cat(color1)
O[3] *" some additional text"
> parse(text=color1)
expression(O[3] *" some additional text")
With that format, you can force the scale to interpret the labels as expressions which will cause them to be rendered as per the rules of plotmath.
testPlot <- ggplot(data = df, aes(x = x)) +
geom_line(aes(y = y, color = color1)) +
scale_colour_discrete(labels = function(x) parse(text=x))
Using the labels function approach works for data which is stored in the data.frame as well, so long as the strings are formatted so that they can be parsed.
DF <- data.frame(x=1:4, y=rep(1:2, times=2),
group = rep(c('O[3]*" some additional text"',
'H[2]*" some different text"'), each = 2))
ggplot(DF, aes(x, y, colour=group)) +
geom_line() +
scale_colour_discrete(labels=function(x) parse(text=x))
This should do what I think you want. It took me a little tinkering to get the right order of paste and expression.
require(ggplot2)
test <- data.frame(x = c(1,2,3,4,5,6,7,8,9,10), y = c(10,9,8,7,6,5,4,3,2,1))
colour1 <- "1"
testPlot <- ggplot(data = test, aes(x = x)) + geom_line(aes(y = y, colour = colour1))
testPlot + scale_colour_discrete(labels = c(expression(paste(O[3], "some other text here"))))
It also returns the warning
Warning message:
In is.na(scale$labels) :
is.na() applied to non-(list or vector) of type 'expression'
to which I haven't been able to find an explanation.

How to specify columns in facet_grid OR how to change labels in facet_wrap

I have a large number of data series that I want to plot using small multiples. A combination of ggplot2 and facet_wrap does what I want, typically resulting a nice little block of 6 x 6 facets. Here's a simpler version:
The problem is that I don't have adequate control over the labels in facet strips. The names of the columns in the data frame are short and I want to keep them that way, but I want the labels in the facets to be more descriptive. I can use facet_grid so that I can take advantage of the labeller function but then there seems to be no straightforward way to specify the number of columns and a long row of facets just doesn't work for this particular task. Am I missing something obvious?
Q. How can I change the facet labels when using facet_wrap without changing the column names? Alternatively, how can I specify the number of columns and rows when using facet_grid?
Code for a simplified example follows. In real life I am dealing with multiple groups each containing dozens of data series, each of which changes frequently, so any solution would have to be automated rather than relying on manually assigning values.
require(ggplot2)
require(reshape)
# Random data with short column names
set.seed(123)
myrows <- 30
mydf <- data.frame(date = seq(as.Date('2012-01-01'), by = "day", length.out = myrows),
aa = runif(myrows, min=1, max=2),
bb = runif(myrows, min=1, max=2),
cc = runif(myrows, min=1, max=2),
dd = runif(myrows, min=1, max=2),
ee = runif(myrows, min=1, max=2),
ff = runif(myrows, min=1, max=2))
# Plot using facet wrap - we want to specify the columns
# and the rows and this works just fine, we have a little block
# of 2 columns and 3 rows
mydf <- melt(mydf, id = c('date'))
p1 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_wrap( ~ variable, ncol = 2)
print (p1)
# Problem: we want more descriptive labels without changing column names.
# We can change the labels, but doing so requires us to
# switch from facet_wrap to facet_grid
# However, in facet_grid we can't specify the columns and rows...
mf_labeller <- function(var, value){ # lifted bodily from the R Cookbook
value <- as.character(value)
if (var=="variable") {
value[value=="aa"] <- "A long label"
value[value=="bb"] <- "B Partners"
value[value=="cc"] <- "CC Inc."
value[value=="dd"] <- "DD Company"
value[value=="ee"] <- "Eeeeeek!"
value[value=="ff"] <- "Final"
}
return(value)
}
p2 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_grid( ~ variable, labeller = mf_labeller)
print (p2)
I don't quite understand. You've already written a function that converts your short labels to long, descriptive labels. What is wrong with simply adding a new column and using facet_wrap on that column instead?
mydf <- melt(mydf, id = c('date'))
mydf$variableLab <- mf_labeller('variable',mydf$variable)
p1 <- ggplot(mydf, aes(y = value, x = date, group = variable)) +
geom_line() +
facet_wrap( ~ variableLab, ncol = 2)
print (p1)
To change the label names, just change the factor levels of the factor you use in facet_wrap. These will be used in facet_wrap on the strips. You can use a similar setup as you would using the labeller function in facet_grid. Just do something like:
new_labels = sapply(levels(df$factor_variable), custom_labeller_function)
df$factor_variable = factor(df$factor_variable, levels = new_labels)
Now you can use factor_variable in facet_wrap.
Just add labeller = label_wrap_gen(width = 25, multi_line = TRUE) to the facet_wrap() arguments.
Eg.: ... + facet_wrap( ~ variable, ,labeller = label_wrap_gen(width = 25, multi_line = TRUE))
More info: ?ggplot2::label_wrap_gen
Simply add labeller = label_both to the facet_wrap() arguments.
... + facet_wrap( ~ variable, labeller = label_both)

Resources