geom_text / geom_label with the bquote function - r

My data :
dat <- data_frame(x = c(1,2,3,4,5,6), y = c(2,2,2,6,2,2))
I wish to display this expression beside the point (x=4,y=6) :
expression <- bquote(paste(frac(a[z], b[z]), " = ", .(dat[which.max(dat$y),"y"] %>% as.numeric())))
But, when I am using this expression with ggplot :
ggplot() +
geom_point(data = dat, aes(x = x, y = y)) +
geom_label(data = dat[which.max(dat$y),], aes(x = x, y = y, label = expression))
I get this error message :
Error: Aesthetics must be either length 1 or the same as the data (1): label

You could use the following code (keeping your definitions of the data and the expression):
Not related to your question, but: it is always better to define aesthetics in the ggplot-call and get it reused in the subsequent function calls. If needed, you may override the definitions, like done below in geom_label
ggplot(data = dat, aes(x = x, y = y)) +
geom_point() +
geom_label(data = dat[4,], label = deparse(expression), parse = TRUE,
hjust = 0, nudge_x = .1)
hjust and nudge_x are used to position the label relative to the point. One could argue to use nudge_y as well to get the whole label in the picture.
yielding this plot:
Please let me know whether this is what you want.

Related

How to define a function that invokes ggplot2 functions and takes a variable name among its arguments?

As an example, suppose that I have this snippet of code:
binwidth <- 0.01
my.histogram <- ggplot(my.data, aes(x = foo, fill = type)) +
geom_histogram(binwidth = binwidth,
aes(y = ..density..),
position = "identity",
alpha = 0.5) +
lims(x = c(0 - binwidth, 1 + binwidth), y = c(0, 100)) +
labs(x = "foo", y = "density")
Further, suppose that my.data has many other columns besides foo that could be plotted using pretty much the same code. Therefore, I would like to define a helper function make.histogram, so that I could replace the assignment above with something like:
my.histogram <- make.histogram(foo, bindwidth = 0.01)
Actually, this looks a bit weird to me. Would R complain that foo is not defined? Maybe the call would have to be this instead:
my.histogram <- make.histogram("foo", binwidth = 0.01)
Be that as it may, how would one define make.histogram?
For the purpose of this question, make.histogram may treat my.data as a global variable.
Also, note that in the snippet above, foo appears twice, once (as a variable) as the x argument in the first aes call, and once (as a string) as the x argument in the labs call. In other words, the make.histogram functions needs somehow to translate the column specified in its first argument into both a variable name and a string.
Not sure to understand your question.
Why couldn't you use aes_string() and define a function like below ?
make.histogram <- function(variable) {
p <- ggplot(my.data, aes_string(x = variable, fill = "type")) + (...) + xlab(variable)
print(p)
}
Since ggplot is part of the tidyverse, I think tidyeval will come in handy:
make.histogram <- function(var = "foo", bindwith = 0.01) {
varName <- as.name(var)
enquo_varName <- enquo(varName)
ggplot(my.data, aes(x = !!enquo_varName, fill = type)) +
...
labs(x = var)
}
Basically, with as.name() we generate a name object that matches var (here var is a string like "foo"). Then, following Programming with dplyr, we use enquo() to look at that name and return the associated value as a quosure. This quosure can then be unquoted inside the ggplot() call using !!.
After reading the material that #andrew.punnett linked in his comment, it was very easy to code the desired function:
make.histogram <- function(column.name, binwidth = 0.02) {
base.aes <- eval(substitute(aes(x = column.name, fill = type)))
x.label <- deparse(substitute(column.name))
ggplot(my.data, base.aes) +
geom_histogram(binwidth = binwidth,
aes(y = ..density..),
position = "identity",
alpha = 0.5) +
lims(x = c(0 - binwidth, 1 + binwidth), y = c(0, 100)) +
labs(x = x.label, y = "density")
}
my.histogram <- make.histogram(foo, binwidth = 0.01)
The benefit of this solution is its generality: it relies only on base R functions (substitute, eval, and deparse), so it can be easily ported to situations outside of the ggplot2 context.

How to use bquote in combination with ggplot2 geom_label?

I've read the following article:
https://trinkerrstuff.wordpress.com/2018/03/15/2246/
Now I'm trying to use the suggested approach with bquote in my plot. However I can't get it to work. I have the following code
x <- seq(0, 2*pi, 0.01)
y <- sin(x)
y_median <- median(y)
ggplot(mapping = aes(x = x, y = y)) +
geom_line() +
geom_label(aes(label = bquote("median" ~ y==~.y_median), x = 1, y = y_median))
I get the following error:
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ‘"formula"’ to a data.frame
What am I doing wrong?
ggplot(mapping = aes(x = x, y = y)) +
geom_line() +
geom_label(aes(label = list(bquote("median" ~ y==~.(y_median))),
x = 1, y = y_median),
parse=TRUE)
Key points:
1.) I fixed the missing parentheses in "median" ~ y==~.(y_median). This is discussed in ?bquote help page:
‘bquote’ quotes its
argument except that terms wrapped in ‘.()’ are evaluated in the
specified ‘where’ environment.
2.) Put the bquote inside a list, because aes expects a vector or a list.
3.) Tell geom_label to parse the expression by the setting the corresponding parameter to TRUE.
ggplot doesn't like to work with expressions for labels in geom_text or geom_label layers. It likes to work with strings. So instead you can do
ggplot(mapping = aes(x = x, y = y)) +
geom_line() +
annotate("text", label = deparse(bquote("median" ~ y==~.(y_median))), x = 1, y = y_median, parse=TRUE)
We use deparse() to turn it into a string, then use parse=TRUE to have ggplot parse it back into an expression. Also I just used annotate() here since you aren't really mapping this value to your data at all.

ggplot: remove NA factor level in legend

How can I omit the NA level of a factor from a legend?
From the nycflights13 database, I created a new continuous variable called tot_delay, and then created a factor called delay_class with 4 levels. When I plot, I filter out NA values, but they still appear in the legend. Here's my code:
library(nycflights13); library(ggplot2)
flights$tot_delay = flights$dep_delay + flights$arr_delay
flights$delay_class <- cut(flights$tot_delay,
c(min(flights$tot_delay, na.rm = TRUE), 0, 20 , 120,
max(flights$tot_delay, na.rm = TRUE)),
labels = c("none", "short","medium","long"))
filter(flights, !is.na(tot_delay)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")
The parent example isn't a good illustration of the problem (of course unexpected NA values should be tracked down and eliminated), but this is the top result on Google so it should be noted that there is a now an option in scale_XXX_XXX to prevent NA levels from displaying in the legend by setting na.translate = F. For example:
# default
ggplot(data = data.frame(x = c(1,2,NA), y = c(1,1,NA), a = c("A","B",NA)),
aes(x, y, colour = a)) + geom_point(size = 4)
# with na.translate = F
ggplot(data = data.frame(x = c(1,2,NA), y = c(1,1,NA), a = c("A","B",NA)),
aes(x, y, colour = a)) + geom_point(size = 4) +
scale_colour_discrete(na.translate = F)
This works in ggplot2 3.1.0.
You have one data point where delay_class is NA, but tot_delay isn't. This point is not being caught by your filter. Changing your code to:
filter(flights, !is.na(delay_class)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")
does the trick:
Alternatively, if you absolutely must have that extra point, you can override the fill legend as follows:
filter(flights, !is.na(tot_delay)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill") +
scale_fill_manual( breaks = c("none","short","medium","long"),
values = scales::hue_pal()(4) )
UPDATE: As pointed out in #gatsky's answer, all discrete scales also include the na.translate argument. The feature actually existed since ggplot 2.2.0; I just wasn't aware of it at the time I posted my answer. For completeness, its usage in the original question would look like
filter(flights, !is.na(tot_delay)) %>%
ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill") +
scale_fill_discrete(na.translate=FALSE)
I like #Artem's method above, i.e., getting to the bottom of why there are NA's in your df. However, sometimes you know there are NA's, and you just want to exclude them. In that case, simply using 'na.omit' should work:
na.omit(flights) %>% ggplot() +
geom_bar(mapping = aes(x = carrier, fill = delay_class), position = "fill")

How to suppress warnings when plotting with ggplot

When passing missing values to ggplot, it's very kind, and warns us that they are present. This is acceptable in an interactive session, but when writing reports, you do not the output get cluttered with warnings, especially if there's many of them. Below example has one label missing, which produces a warning.
library(ggplot2)
library(reshape2)
mydf <- data.frame(
species = sample(c("A", "B"), 100, replace = TRUE),
lvl = factor(sample(1:3, 100, replace = TRUE))
)
labs <- melt(with(mydf, table(species, lvl)))
names(labs) <- c("species", "lvl", "value")
labs[3, "value"] <- NA
ggplot(mydf, aes(x = species)) +
stat_bin() +
geom_text(data = labs, aes(x = species, y = value, label = value, vjust = -0.5)) +
facet_wrap(~ lvl)
If we wrap suppressWarnings around the last expression, we get a summary of how many warnings there were. For the sake of argument, let's say that this isn't acceptable (but is indeed very honest and correct). How to (completely) suppress warnings when printing a ggplot2 object?
You need to suppressWarnings() around the print() call, not the creation of the ggplot() object:
R> suppressWarnings(print(
+ ggplot(mydf, aes(x = species)) +
+ stat_bin() +
+ geom_text(data = labs, aes(x = species, y = value,
+ label = value, vjust = -0.5)) +
+ facet_wrap(~ lvl)))
R>
It might be easier to assign the final plot to an object and then print().
plt <- ggplot(mydf, aes(x = species)) +
stat_bin() +
geom_text(data = labs, aes(x = species, y = value,
label = value, vjust = -0.5)) +
facet_wrap(~ lvl)
R> suppressWarnings(print(plt))
R>
The reason for the behaviour is that the warnings are only generated when the plot is actually drawn, not when the object representing the plot is created. R will auto print during interactive usage, so whilst
R> suppressWarnings(plt)
Warning message:
Removed 1 rows containing missing values (geom_text).
doesn't work because, in effect, you are calling print(suppressWarnings(plt)), whereas
R> suppressWarnings(print(plt))
R>
does work because suppressWarnings() can capture the warnings arising from the print() call.
A more targeted plot-by-plot approach would be to add na.rm=TRUE to your plot calls.
E.g.:
ggplot(mydf, aes(x = species)) +
stat_bin() +
geom_text(data = labs, aes(x = species, y = value,
label = value, vjust = -0.5), na.rm=TRUE) +
facet_wrap(~ lvl)
In your question, you mention report writing, so it might be better to set the global warning level:
options(warn=-1)
the default is:
options(warn=0)

stacking geom_ribbon

I am trying to use geom_ribbon to mimic the behavior of geom_area
but i am not successful. would you have any hint on why the following does not work ?
I used Hadley's statement from ggplot2 geom_area web pages :
"An area plot is a special case of geom_ribbon, where the minimum of the range is fixed to 0, and the position adjustment defaults to position_stacked."
test <- expand.grid(Param = LETTERS[1:3], x = 1:5)
test$y <- test$x
# Ok
p <- ggplot(test)
p <- p + geom_area(aes(x = x, y = y, group = Param, fill = Param), alpha = 0.3)
p
# not ok - initial idea
p <- ggplot(test)
p <- p + geom_ribbon(aes(x = x, ymin = 0, ymax = y, group = Param, fill = Param), alpha = 0.3, position = position_stack())
p
further, how can I look in the code of functions coded the way geom_XXX are?
my traditional way gives the following, which is not very usefull:
> geom_ribbon
function (mapping = NULL, data = NULL, stat = "identity", position = "identity",
na.rm = FALSE, ...)
GeomRibbon$new(mapping = mapping, data = data, stat = stat, position = position,
na.rm = na.rm, ...)
Thanks for your help
Regards
Pascal
You just didn't map a variable to y in your geom_ribbon call. Adding y = y causes it to work for me. In general, geom_ribbon doesn't require a y aesthetic, but I believe it does in the case of stacking. I presume there's a well-thought out reasoning for why that is, but you never know...
Also, all the source code for ggplot2 is on github.

Resources