How to use bquote in combination with ggplot2 geom_label? - r

I've read the following article:
https://trinkerrstuff.wordpress.com/2018/03/15/2246/
Now I'm trying to use the suggested approach with bquote in my plot. However I can't get it to work. I have the following code
x <- seq(0, 2*pi, 0.01)
y <- sin(x)
y_median <- median(y)
ggplot(mapping = aes(x = x, y = y)) +
geom_line() +
geom_label(aes(label = bquote("median" ~ y==~.y_median), x = 1, y = y_median))
I get the following error:
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ‘"formula"’ to a data.frame
What am I doing wrong?

ggplot(mapping = aes(x = x, y = y)) +
geom_line() +
geom_label(aes(label = list(bquote("median" ~ y==~.(y_median))),
x = 1, y = y_median),
parse=TRUE)
Key points:
1.) I fixed the missing parentheses in "median" ~ y==~.(y_median). This is discussed in ?bquote help page:
‘bquote’ quotes its
argument except that terms wrapped in ‘.()’ are evaluated in the
specified ‘where’ environment.
2.) Put the bquote inside a list, because aes expects a vector or a list.
3.) Tell geom_label to parse the expression by the setting the corresponding parameter to TRUE.

ggplot doesn't like to work with expressions for labels in geom_text or geom_label layers. It likes to work with strings. So instead you can do
ggplot(mapping = aes(x = x, y = y)) +
geom_line() +
annotate("text", label = deparse(bquote("median" ~ y==~.(y_median))), x = 1, y = y_median, parse=TRUE)
We use deparse() to turn it into a string, then use parse=TRUE to have ggplot parse it back into an expression. Also I just used annotate() here since you aren't really mapping this value to your data at all.

Related

Is there a way to pass the data of a ggplot2 call to the scale_* functions that works with .+gg in one pass [duplicate]

I would like to use a variable of the dataframe passed to the data parameter of function the ggplot in another ggplot2 function in the same call.
For instance, in the following example I want to refer to the variable x in the dataframe passed to the data parameter in ggplot in another function scale_x_continuous such as in:
library(ggplot2)
set.seed(2017)
samp <- sample(x = 20, size= 1000, replace = T)
ggplot(data = data.frame(x = samp), mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(x), max(x)))
And I get the error :
Error in seq(min(x)) : object 'x' not found
which I understand. Of course I can avoid the problem by doing :
df <- data.frame(x = samp)
ggplot(data = df, mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(df$x), max(df$x)))
but I don't want to be forced to define the object df outside the call to ggplot. I want to be able to directly refer to the variables in the dataframe I passed in data.
Thanks a lot
The scale_x_continuous function does not evaluate it's parameters in the data environment. One reason for this is that each layer can have it's own data source so by the time you got to the scales it wouldn't be clear which data environment is the "correct" one any more.
You could write a helper function to initialize the plot with your default. For example
helper <- function(df, col) {
ggplot(data = df, mapping = aes_string(x = col)) +
scale_x_continuous(breaks = seq(min(df[[col]]), max(df[[col]])))
}
and then call
helper(data.frame(x = samp), "x") + geom_bar()
Or you could write a wrapper around just the scale part. For example
scale_x_custom <- function(x) {
scale_x_continuous(breaks = seq(min(x) , max(x)))
}
and then you can add your custom scale to your plot
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom(df$x)
Or since you just want breaks at integer values, you can calculate the breaks from the default limits without needed to actually specify the data. For example
scale_x_custom <- function() {
scale_x_continuous(expand=expansion(0, .3),
breaks = function(x) {
seq(ceiling(min(x)), floor(max(x)))
})
}
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom()
Another less than ideal alternative would be to utilize the . special symbol in combination with {} which is imported from magrittr.
Enclosing the ggplot call in curly brackets allows one to reference . multiple times.
data.frame(x = samp) %>%
{ggplot(data = ., mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(.$x), max(.$x)))}

How to define a function that invokes ggplot2 functions and takes a variable name among its arguments?

As an example, suppose that I have this snippet of code:
binwidth <- 0.01
my.histogram <- ggplot(my.data, aes(x = foo, fill = type)) +
geom_histogram(binwidth = binwidth,
aes(y = ..density..),
position = "identity",
alpha = 0.5) +
lims(x = c(0 - binwidth, 1 + binwidth), y = c(0, 100)) +
labs(x = "foo", y = "density")
Further, suppose that my.data has many other columns besides foo that could be plotted using pretty much the same code. Therefore, I would like to define a helper function make.histogram, so that I could replace the assignment above with something like:
my.histogram <- make.histogram(foo, bindwidth = 0.01)
Actually, this looks a bit weird to me. Would R complain that foo is not defined? Maybe the call would have to be this instead:
my.histogram <- make.histogram("foo", binwidth = 0.01)
Be that as it may, how would one define make.histogram?
For the purpose of this question, make.histogram may treat my.data as a global variable.
Also, note that in the snippet above, foo appears twice, once (as a variable) as the x argument in the first aes call, and once (as a string) as the x argument in the labs call. In other words, the make.histogram functions needs somehow to translate the column specified in its first argument into both a variable name and a string.
Not sure to understand your question.
Why couldn't you use aes_string() and define a function like below ?
make.histogram <- function(variable) {
p <- ggplot(my.data, aes_string(x = variable, fill = "type")) + (...) + xlab(variable)
print(p)
}
Since ggplot is part of the tidyverse, I think tidyeval will come in handy:
make.histogram <- function(var = "foo", bindwith = 0.01) {
varName <- as.name(var)
enquo_varName <- enquo(varName)
ggplot(my.data, aes(x = !!enquo_varName, fill = type)) +
...
labs(x = var)
}
Basically, with as.name() we generate a name object that matches var (here var is a string like "foo"). Then, following Programming with dplyr, we use enquo() to look at that name and return the associated value as a quosure. This quosure can then be unquoted inside the ggplot() call using !!.
After reading the material that #andrew.punnett linked in his comment, it was very easy to code the desired function:
make.histogram <- function(column.name, binwidth = 0.02) {
base.aes <- eval(substitute(aes(x = column.name, fill = type)))
x.label <- deparse(substitute(column.name))
ggplot(my.data, base.aes) +
geom_histogram(binwidth = binwidth,
aes(y = ..density..),
position = "identity",
alpha = 0.5) +
lims(x = c(0 - binwidth, 1 + binwidth), y = c(0, 100)) +
labs(x = x.label, y = "density")
}
my.histogram <- make.histogram(foo, binwidth = 0.01)
The benefit of this solution is its generality: it relies only on base R functions (substitute, eval, and deparse), so it can be easily ported to situations outside of the ggplot2 context.

geom_text / geom_label with the bquote function

My data :
dat <- data_frame(x = c(1,2,3,4,5,6), y = c(2,2,2,6,2,2))
I wish to display this expression beside the point (x=4,y=6) :
expression <- bquote(paste(frac(a[z], b[z]), " = ", .(dat[which.max(dat$y),"y"] %>% as.numeric())))
But, when I am using this expression with ggplot :
ggplot() +
geom_point(data = dat, aes(x = x, y = y)) +
geom_label(data = dat[which.max(dat$y),], aes(x = x, y = y, label = expression))
I get this error message :
Error: Aesthetics must be either length 1 or the same as the data (1): label
You could use the following code (keeping your definitions of the data and the expression):
Not related to your question, but: it is always better to define aesthetics in the ggplot-call and get it reused in the subsequent function calls. If needed, you may override the definitions, like done below in geom_label
ggplot(data = dat, aes(x = x, y = y)) +
geom_point() +
geom_label(data = dat[4,], label = deparse(expression), parse = TRUE,
hjust = 0, nudge_x = .1)
hjust and nudge_x are used to position the label relative to the point. One could argue to use nudge_y as well to get the whole label in the picture.
yielding this plot:
Please let me know whether this is what you want.

Refering to a variable of the data frame passed in the 'data' parameter of ggplot function

I would like to use a variable of the dataframe passed to the data parameter of function the ggplot in another ggplot2 function in the same call.
For instance, in the following example I want to refer to the variable x in the dataframe passed to the data parameter in ggplot in another function scale_x_continuous such as in:
library(ggplot2)
set.seed(2017)
samp <- sample(x = 20, size= 1000, replace = T)
ggplot(data = data.frame(x = samp), mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(x), max(x)))
And I get the error :
Error in seq(min(x)) : object 'x' not found
which I understand. Of course I can avoid the problem by doing :
df <- data.frame(x = samp)
ggplot(data = df, mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(df$x), max(df$x)))
but I don't want to be forced to define the object df outside the call to ggplot. I want to be able to directly refer to the variables in the dataframe I passed in data.
Thanks a lot
The scale_x_continuous function does not evaluate it's parameters in the data environment. One reason for this is that each layer can have it's own data source so by the time you got to the scales it wouldn't be clear which data environment is the "correct" one any more.
You could write a helper function to initialize the plot with your default. For example
helper <- function(df, col) {
ggplot(data = df, mapping = aes_string(x = col)) +
scale_x_continuous(breaks = seq(min(df[[col]]), max(df[[col]])))
}
and then call
helper(data.frame(x = samp), "x") + geom_bar()
Or you could write a wrapper around just the scale part. For example
scale_x_custom <- function(x) {
scale_x_continuous(breaks = seq(min(x) , max(x)))
}
and then you can add your custom scale to your plot
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom(df$x)
Or since you just want breaks at integer values, you can calculate the breaks from the default limits without needed to actually specify the data. For example
scale_x_custom <- function() {
scale_x_continuous(expand=expansion(0, .3),
breaks = function(x) {
seq(ceiling(min(x)), floor(max(x)))
})
}
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom()
Another less than ideal alternative would be to utilize the . special symbol in combination with {} which is imported from magrittr.
Enclosing the ggplot call in curly brackets allows one to reference . multiple times.
data.frame(x = samp) %>%
{ggplot(data = ., mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(.$x), max(.$x)))}

Use stat_summary to annotate plot with number of observations

How can I use stat_summary to label a plot with n = x where is x a variable? Here's an example of the desired output:
I can make that above plot with this rather inefficient code:
nlabels <- sapply(1:length(unique(mtcars$cyl)), function(i) as.vector(t(as.data.frame(table(mtcars$cyl))[,2][[i]])))
ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) +
geom_boxplot(fill = "grey80", colour = "#3366FF") +
geom_text(aes(x = 1, y = median(mtcars$mpg[mtcars$cyl==sort(unique(mtcars$cyl))[1]]), label = paste0("n = ",nlabels[[1]]) )) +
geom_text(aes(x = 2, y = median(mtcars$mpg[mtcars$cyl==sort(unique(mtcars$cyl))[2]]), label = paste0("n = ",nlabels[[2]]) )) +
geom_text(aes(x = 3, y = median(mtcars$mpg[mtcars$cyl==sort(unique(mtcars$cyl))[3]]), label = paste0("n = ",nlabels[[3]]) ))
This is a follow up to this question: How to add a number of observations per group and use group mean in ggplot2 boxplot? where I can use stat_summary to calculate and display the number of observations, but I haven't been able to find a way to include n = in the stat_summary output. Seems like stat_summary might be the most efficient way to do this kind of labelling, but other methods are welcome.
You can make your own function to use inside the stat_summary(). Here n_fun calculate place of y value as median() and then add label= that consist of n= and number of observations. It is important to use data.frame() instead of c() because paste0() will produce character but y value is numeric, but c() would make both character. Then in stat_summary() use this function and geom="text". This will ensure that for each x value position and label is made only from this level's data.
n_fun <- function(x){
return(data.frame(y = median(x), label = paste0("n = ",length(x))))
}
ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) +
geom_boxplot(fill = "grey80", colour = "#3366FF") +
stat_summary(fun.data = n_fun, geom = "text")
Most things in R are vectorized, so you can leverage that.
nlabels <- table(mtcars$cyl)
# To create the median labels, you can use by
meds <- c(by(mtcars$mpg, mtcars$cyl, median))
ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) +
geom_boxplot(fill = "grey80", colour = "#3366FF") +
geom_text(data = data.frame(), aes(x = names(meds) , y = meds,
label = paste("n =", nlabels)))
Regarding the nlables:
Instead of your sapply statement you can simply use:
nlabels <- table(mtcars$cyl)
Notice that your current code is taking the above, converting it, transposing it, then iterating over each row only to grab the values one by one, then put them back together into a single vector.
If you really want them as an un-dimensioned integer vector, use c()
nlabels <- c(table(mtcars$cyl))
but of course, even this is not needed to accomplish the above.

Resources