Functions and non-standard evaluation in dplyr - r

I just finished reading 'Programming with dplyr' and 'Define aesthetic mappings programatically' to start to get a grip on non-standard evaluation of functions. The specific question for this post is, "How do I write the code directly below using the tidyverse (eg quo(), !!, etc.)" instead of the base-R approach eval(), substitute, etc..
library(tidyverse)
xy <- data.frame(xvar = 1:10, yvar = 11:20)
plotfunc <- function(data, x, y){
y.sqr <- (eval(substitute(y), envir = data))^2
print(
ggplot(data, aes_q(x = substitute(x), y = substitute(y.sqr))) +
geom_line()
)
}
plotfunc(xy, xvar, yvar)
Can you provide the answer? It would be a bonus if you could work in the following concept, being, why is the function above non-standard whereas this other function below is standard? I read the Advanced R chapters on functions and non-standard evaluation, but it's above my head at this point. Can you explain in layperson terms? The function below is clear and concise (to me) whereas the function above is a hazy mess.
rescale01 <- function(x) {
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
rescale01(c(0, 5, 10))

You could do the following :
library(tidyverse)
xy <- data.frame(xvar = 1:10, yvar = 11:20)
plotfunc <- function(data, x, y){
x <- enquo(x)
y <- enquo(y)
print(
ggplot(data, aes(x = !!x, y = (!!y)^2)) +
geom_line()
)
}
plotfunc(xy, xvar, yvar)
Non standard evaluation basically means that you're passing the argument as an expression rather than a value. quo and enquo also associate an evaluation environment to this expression.
Hadley Wickham introduces it like this in his book :
In most programming languages, you can only access the values of a
function’s arguments. In R, you can also access the code used to
compute them. This makes it possible to evaluate code in non-standard
ways: to use what is known as non-standard evaluation, or NSE for
short. NSE is particularly useful for functions when doing interactive
data analysis because it can dramatically reduce the amount of typing.

With rlang_0.4.0, we can use the tidy-evaluation operator ({{...}}) or curly-curly which abstracts quote-and-unquote into a single interpolation step. This makes it easier to create functions
library(rlang)
library(ggplot2)
plotfunc <- function(data, x, y){
print(
ggplot(data, aes(x = {{x}}, y = {{y}}^2)) +
geom_line()
)
}
plotfunc(xy, xvar, yvar)
-output

Related

What do the new tidy eval braces {{ }} do? Passing arguments across functions and using as_label

While looking to fix an issue with passing arguments from a dplyr-based function to a ggplot function recently, I was surprised to discover that there was a brand-new programming with dplyr vignette and a corresponding version for ggplot2 in packages. I was hoping to kill two birds with one stone: learning the new tidyeval incantations and getting rid of my problem.
I wanted a function to do custom plotting, which may occasionally be called by another function that does some pre-processing of the data supplied; but it was failing as follows:
library(ggplot2)
library(dplyr)
my_plot <- function(df, x_var, colour_var = "cyl") {
char_col <- df %>%
pull(colour_var) %>%
is.character()
if(!char_col ) cat(colour_var, "is not character.")
ggplot(df) +
geom_point(aes(x = !! ensym(x_var), y = hp, colour = !!ensym(colour_var))) +
labs(title = paste("Passed in x variable:", x_var))
}
process_n_plot <- function(x_var, val, colour_var) {
cat("You are filtering variable", x_var, "\n")
mtcars %>%
filter(!!ensym(x_var) > val) %>%
my_plot(x_var = x_var, colour_var = colour_var)
}
process_n_plot("disp", 200, "cyl")
#> You are filtering variable "disp"
#> cyl is not character.
my_plot(mtcars, "disp", "cyl")
#> cyl is not character.
I realise I could have just used aes_string... But I was actually operating in ggraph and forgot 'cause I'd never used aes_string there. Also, having strings as arguments was me assuming it would the most straightforward way, but still preferred to call the functions with unquoted variable names.
So, things worked when calling my_plot directly; and almost worked when called "indirectly". The vignettes didn't quite cover these use cases, so I had to test.
However, replacing !!ensym(x_var) with {{x_var}} above does not work; neither does the naïve approach below with bare variable names. {{}} as per the vignettes seems to combine the steps of enquo(s) and !!(!) but that poses a problem when trying to use something like as_label/as_string, which want you to enquo but not !!.
library(ggplot2)
library(dplyr)
library(rlang)
my_plot <- function(df, x_var, colour_var = "cyl") {
discrete_col <- df %>%
pull(colour_var) %>%
is.character()
if(!discrete_col) cat(colour_var, "is not character.")
ggplot(df) +
geom_point(aes(x = {{ x_var }}, y = hp, colour = {{ colour_var }})) +
labs(title = paste("Passed in x variable:", as_label({{ x_var }})))
}
process_n_plot <- function(x_var, val, colour_var) {
cat("You are filtering variable", as_label( {{ x_var }} ), "\n")
mtcars %>%
filter({{ x_var }} > val) %>%
my_plot(x_var = x_var, colour_var = colour_var)
}
process_n_plot(disp, 200, cyl)
#> Error in is_quosure(quo): object 'disp' not found
my_plot(mtcars, disp, cyl)
#> Error: object 'cyl' not found
Note that removing the labs does make my_plot work fine, as expected from the vignette.
While waiting for the real experts to explain some of the gory details, a solution is as follows:
library(ggplot2)
library(dplyr)
library(rlang)
my_plot <- function(df, x_var, colour_var = cyl) {
char_col <- df %>%
pull({{colour_var}}) %>%
is.character()
if(!char_col) cat(as_label( enquo(colour_var)), "is not character.")
ggplot(df) +
geom_point(aes(x = {{ x_var }}, y = hp, colour = {{ colour_var }})) +
labs(title = paste("Passed in x variable:", as_label( enquo(x_var))))
}
process_n_plot <- function(x_var, val, colour_var) {
cat("You are filtering variable", as_label( enquo(x_var) ), "\n")
mtcars %>%
filter({{ x_var }} > val) %>%
my_plot(x_var = {{ x_var }}, colour_var = {{ colour_var }})
}
process_n_plot(disp, 200, cyl)
#> You are filtering variable disp
#> cyl is not character.
my_plot(mtcars, disp, cyl)
#> cyl is not character.
In the end, you can enquo so that as_label works and, surprisingly, {{}} will still know what to do. The same applies if you use ensym and as_string.
You could also simply do:
geom_point(aes(x = {{ x_var }} , y = hp, colour = {{colour_var}})) +
labs(title = paste("Passed in x variable:", as_label( enquo(x_var) )))
Using {{passing_arg}} (same as using !!enquo(passing_arg)) solves the passing of arguments across functions.
As to why the original function was only working for colour_var, for reasons I still don't quite understand, ensym() was interpreting the x_var argument promise as a simple string. Once you had transformed the promise, e.g. by using it in another function call as I did with colour_var, then it worked.
You are basically correct, the {{ }} syntax is a shortcut to !!enquo(). Using {{}} assumes you never need the intermediary evaluated expression. Since you want to call as_label, then that shortcut is not appropriate for you because you need the unevaluated expression in that case.
Also note that enquo and ensym behave differently when passed a string. ensym() will turn that value into a symbol while enquo() will keep it as a character literal
f1 <- function(x) rlang::qq_show(!!enquo(x))
f2 <- function(x) rlang::qq_show(!!ensym(x))
f1("hello")
# ^"hello"
f2("hello")
# hello
which also effects how they are turned into labels
g1 <- function(x) as_label(enquo(x))
g2 <- function(x) as_label(ensym(x))
g1("hello")
# [1] "\"hello\""
g1(hello)
# [1] "hello"
g2("hello")
# [1] "hello"
g2(hello)
# [1] "hello"
When working in the tidyverse, it's important to keep track of which functions need symbols/expressions to be injected into the call, which functions need the unevaulated symbols/expressions themselves, and which can accept raw character values.
You also need to be mindful when passing values through to other functions. Normally the functions only look at the variable names passed directly to them. If you want to pass values "through" without evaluating them, you would need to use !! or {{}} as long as there is a rlang function there on the other side to make sense of it.
foo <- function(x) {
ensym(x)
}
a1 <- function(x) {
foo(x)
}
a2 <- function(x) {
foo({{x}})
}
a3 <- function(x) {
foo(!!ensym(x))
}
a1(test)
# x
a2(test)
# test
a3(test)
# test
See how only the latter two look all "all the way up" to find test
There are three ways to capture parmaters with rlang: quo (quosure), expr (expression) and sym (symbol). A sym() or symbol is just a single variable or column name. An expr() or expression can be a variable with function calls or operators involving other variables. Examples would be x+y or foo(x). And a quo() or quosure is an expression that also keeps track of where it was defined in case it needs to look up values for any of the variables you pass in. Knowing which one is right for your particular use case can also make a big difference.

Is it somehow possibe to link a mathematical axis unit expression from a list or dataframe onto an ggplot axis?

I wonder if it's possible to craft a semi-automatic parsing of 'complicated' mathematical expressions into the axis of ggplot by maintaining some sort of lookup table?
So, for example, for data-mining, I regularly have to produce hundreds of scatterplots, which I want to discuss with colleagues. To do so, I want correct axis-legends, of course - this is rather cumbersome.
Here a simple example of what would like to read out from a database into the labs() by using a formula: expression(paste(delta^{18},"O (\u2030)")
So, what I was wondering is if there's a way to link those labs() to predefined lists or tables in a way like labs(y = list[3])?
This works just fine for simple names like: "Dissolved oxygen saturation / %", but when trying the same for the above, it generates:
paste(delta^{
18
}, "O (‰)")
(including the breaks - which is obviously not what I want)
Thanks,
Alex
You could tinker with the math_format() function from the scales package a bit to take in pre-substituted expressions:
library(patchwork)
library(ggplot2)
splitiris <- split(iris, iris$Species)
# Example expressions
exprs <- list(
substitute(10^.x),
substitute(log[.x]~"(%)"),
substitute(frac(.x, 2))
)
# Near-copy of scales::math_format
math_format2 <- function(expr = subsitute(10^.x), format = force) {
.x <- NULL
subs <- function(x) {
do.call("substitute", list(expr, list(.x = x)))
}
function(x) {
x <- format(x)
ret <- lapply(x, subs)
ret <- as.expression(ret)
ret[is.na(x)] <- NA
names(ret) <- names(x)
ret
}
}
# Generate plots
plots <- lapply(seq_along(splitiris), function(i) {
ggplot(splitiris[[i]], aes(Sepal.Width, Sepal.Length)) +
geom_point() +
scale_x_continuous(labels = math_format2(exprs[[i]]))
})
plots[[1]] + plots[[2]] + plots[[3]]
Created on 2020-05-27 by the reprex package (v0.3.0)

ggplot with aesthetics generated from input data

Since I will need to make a lot of different plots in R I'm trying to put some more logic in preparing the data (add column names corresponding to the aesthetics) and less logic in the plot itself.
Consider the following default iris plot:
library(ggplot2)
library(data.table)
scatter <- ggplot(data=iris, aes(x = Sepal.Length, y = Sepal.Width))
scatter + geom_point(aes(color=Species, shape=Species))
Now I make a modified iris data with column names matching to the desired aesthetics:
iris2 <- as.data.table(iris)
iris2 <- iris2[,.(x=Sepal.Length, y=Sepal.Width, color=Species,
shape=Species)]
That I want to plot in a function in such a way that it basically builds the following command only slightly more dynamic, so you use all the aesthetics supplied in the data.
ggplot(data, aes(x=x, y=y)) + geom_point(aes(color=color, shape=shape))
It has been a long time since I read anything about nonstandard evaluation, expressions and quotation and I noticed that there are quite some developments with rlang and quosures (cheatsheet). [This] question was kind of helpful, but it did not resolve the fact that I want to infer the aesthetics from the data.
In the end I have tried a lot of stuff, and looked inside aes. In there I see:
exprs <- rlang::enquos(x = x, y = y, ...)
and I think this is the reason that all attempts that I made like:
ggplot(iris2, aes(x=x, y=y)) +
geom_point(aes(rlang::quo(expr(color=color))))
did not work out since aes is trying to 'enquos' my quosure(s).
QUESTION Is there any way to supply arguments to aes in a dynamic way based on the contents of the data (so you do not know in advance which aesthetics you will need?
If my question is not clear enough, in the end I made something that works, only I have a feeling that this totally not necessary because I don't know/understand the right way to do it. So the stuff below works and is what I have in mind, but what I e.g. don't like is that I had to modify aes:
The block below is stand alone and can be executed without the code chunks above.
library(data.table)
library(ggplot2)
library(rlang)
iris2 <- as.data.table(iris)
iris2 <- iris2[,.(x=Sepal.Length, y=Sepal.Width, color=Species, shape=Species)]
myaes <- function (x, y, myquo=NULL, ...) {
exprs <- rlang::enquos(x = x, y = y, ...)
exprs <- c(exprs, myquo)
is_missing <- vapply(exprs, rlang::quo_is_missing, logical(1))
aes <- ggplot2:::new_aes(exprs[!is_missing], env = parent.frame())
ggplot2:::rename_aes(aes)
}
generalPlot <- function(data, f=geom_point,
knownaes=c('color'=expr(color), 'shape'=expr(shape))){
myquo <- list()
for(i in names(knownaes)){
if(i %in% names(data)){
l <- list(rlang::quo(!!knownaes[[i]]))
names(l) <- i
myquo <- c(myquo, l)
}
}
ggplot(data, aes(x=x, y=y)) +
f(myaes(myquo=myquo))
}
generalPlot(iris2[,.(x, y, color)])
generalPlot(iris2[,.(x, y, color, shape)])
You can use this custom function that parses input data colnames and generates an aes text string that is passed to eval().
generateAES <- function(foo) {
eval(parse(text = paste0("aes(",
paste(
lapply(foo, function(i) paste(i, "=", i)),
collapse = ","),
")"
)))
}
You can use it with:
ggplot(iris2, generateAES(colnames(iris2))) +
geom_point()
Or with pipes:
library(magrittr)
iris2 %>%
ggplot(generateAES(colnames(.))) +
geom_point()
generateAES output is aes like:
Aesthetic mapping:
* `x` -> `x`
* `y` -> `y`
* `colour` -> `color`
* `shape` -> `shape`
That is generated from text string "aes(x = x,y = y,color = color,shape = shape)"
So if your data as a "color" or "shape" column, you just want to map that to the color or shape aesthetic? I think a simpler way to do that would be
generalPlot <- function(data, f=geom_point, knownaes=c('color', 'shape')) {
match_aes <- intersect(names(data), knownaes)
my_aes_list <- purrr::set_names(purrr::map(match_aes, rlang::sym), match_aes)
my_aes <- rlang::eval_tidy(quo(aes(!!!my_aes_list)))
ggplot(data, aes(x=x, y=y)) +
f(mapping=my_aes)
}
Then you can do
generalPlot(iris2[,.(x, y)])
generalPlot(iris2[,.(x, y, color)])
generalPlot(iris2[,.(x, y, color, shape)])
and it doesn't require the additional myaes function.
I'm kind of surprised I had to use eval_tidy but for some reason you can't seem to use !!! with aes().
x <- list(color=sym("color"))
ggplot(iris2, aes(x,y)) + geom_point(aes(!!!x))
# Error: Can't use `!!!` at top level
(Tested with ggplot2_3.1.0)

Programming a function for "lm" using tidyeval

I am trying to write a function around "lm" using tidyeval (non-standard evaluation).Using base R NSE, it works:
lm_poly_raw <- function(df, y, x, degree = 1, ...){
lm_formula <-
substitute(expr = y ~ poly(x, degree, raw = TRUE),
env = list(y = substitute(y),
x = substitute(x),
degree = degree))
eval(lm(lm_formula, data = df, ...))
}
lm_poly_raw(mtcars, hp, mpg, degree = 2)
However, I have not figured out how to write this function using tidyeval and rlang. I assume that substitute should be replaced be enquo, and eval by !!. There are some hints in Hadley's Adv-R, but I could not figure it out.
Here is the kind of formula constructor that might make its way in rlang in the future:
f <- function(x, y, flatten = TRUE) {
x <- enquo(x)
y <- enquo(y)
# Environments should be the same
# They could be different if forwarded through dots
env <- get_env(x)
stopifnot(identical(env, get_env(y)))
# Flatten the quosures. This warns the user if nested quosures are
# found. Those are not supported by functions like lm()
if (flatten) {
x <- quo_expr(x, warn = TRUE)
y <- quo_expr(y, warn = TRUE)
}
new_formula(x, y, env = env)
}
# This can be used for unquoting symbols
var <- "cyl"
lm(f(disp, am + (!! sym(var))), data = mtcars)
The tricky parts are:
The LHS and RHS could come from different environments if forwarded through different layers of .... We need to check for this.
We need to check that the user doesn't unquote quosures. lm() and co do not support those. quo_expr() flattens all the quosures and optionally warns if some were found.

Functions inside aes

Question: why can't I call sapply inside aes()?
Goal of following figure: Create histogram showing proportion that died/lived so that the proportion for each combination of group/type sums to 1 (example inspired by previous post).
I know you could make the figure by summarising outside of ggplot but the question is really about why the function isn't working inside of aes.
## Data
set.seed(999)
dat <- data.frame(group=factor(rep(1:2, 25)),
type=factor(sample(1:2, 50, rep=T)),
died=factor(sample(0:1, 50, rep=T)))
## Setup the figure
p <- ggplot(dat, aes(x=died, group=interaction(group, type), fill=group, alpha=type)) +
theme_bw() +
scale_alpha_discrete(range=c(0.5, 1)) +
ylab("Proportion")
## Proportions, all groups/types together sum to 1 (not wanted)
p + geom_histogram(aes(y=..count../sum(..count..)), position=position_dodge())
## Look at groups
stuff <- ggplot_build(p)
stuff$data[[1]]
## The long way works: proportions by group/type
p + geom_histogram(
aes(y=c(..count..[..group..==1] / sum(..count..[..group..==1]),
..count..[..group..==2] / sum(..count..[..group..==2]),
..count..[..group..==3] / sum(..count..[..group..==3]),
..count..[..group..==4] / sum(..count..[..group..==4]))),
position='dodge'
)
## Why can't I call sapply there?
p + geom_histogram(
aes(y=sapply(unique(..group..), function(g)
..count..[..group..==g] / sum(..count..[..group..==g]))),
position='dodge'
)
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'expr' of mode 'function' was not found
So, the issue arises because of a recursive call to ggplot2:::strip_dots for any aesthetics that include 'calculated aesthetics'. There is some discussion around the calculated aesthetics in this SO question and answer. The relevant code in layer.r is here:
new <- strip_dots(aesthetics[is_calculated_aes(aesthetics)])
i.e. strip_dots is called only if there are calculated aesthetics, defined using the regex "\\.\\.([a-zA-z._]+)\\.\\.".
strip_dots in takes a recursive approach, working down through the nested calls and stripping out the dots. The code is like this:
function (expr)
{
if (is.atomic(expr)) {
expr
}
else if (is.name(expr)) {
as.name(gsub(match_calculated_aes, "\\1", as.character(expr)))
}
else if (is.call(expr)) {
expr[-1] <- lapply(expr[-1], strip_dots)
expr
}
else if (is.pairlist(expr)) {
as.pairlist(lapply(expr, expr))
}
else if (is.list(expr)) {
lapply(expr, strip_dots)
}
else {
stop("Unknown input:", class(expr)[1])
}
}
If we supply an anonymous function this code as follows:
anon <- as.call(quote(function(g) mean(g)))
ggplot2:::strip_dots(anon)
we reproduce the error:
#Error in get(as.character(FUN), mode = "function", envir = envir) :
# object 'expr' of mode 'function' was not found
Working through this, we can see that anon is a call. For calls, strip_dots will use lapply to call strip_dots on the second and third elements of the call. For an anonymous function like this, the second element is the formals of the function. If we look at the formals of anon using dput(formals(eval(anon))) or dput(anon[[2]]) we see this:
#pairlist(g = )
For pairlists, strip_dots tries to lapply it to itself. I'm not sure why this code is there, but certainly in this circumstance it leads to the error:
expr <- anon[[2]]
lapply(expr, expr)
# Error in get(as.character(FUN), mode = "function", envir = envir) :
# object 'expr' of mode 'function' was not found
TL; DR At this stage, ggplot2 doesn't support the use of anonymous functions within aes where a calculated aesthetic (such as ..count..) is used.
Anyway, the desired end result can be achieved using dplyr; in general I think it makes for more readable code to separate out the data summarisation from the plotting:
newDat <- dat %>%
group_by(died, type, group) %>%
summarise(count = n()) %>%
group_by(type, group) %>%
mutate(Proportion = count / sum(count))
p <- ggplot(newDat, aes(x = died, y = Proportion, group = interaction(group, type), fill=group, alpha=type)) +
theme_bw() +
scale_alpha_discrete(range=c(0.5, 1)) +
geom_bar(stat = "identity", position = "dodge")
ggplot2 fix
I've forked ggplot2 and have made two changes to aes_calculated.r which fix the problem. The first was to correct the handling of pairlists to lapply strip_dots instead of expr, which I think must have been the intended behaviour. The second was that for formals with no default value (like in the examples provided here), as.character(as.name(expr)) throws an error because expr is an empty name, and while this is a valid construct, it's not possible to create one from an empty string.
Forked version of ggplot2 at https://github.com/NikNakk/ggplot2 and pull request just made.
Finally, after all that, the sapply example given doesn't work because it returns a 2 row by 4 column matrix rather than an 8 length vector. The corrected version is like this:
p + geom_histogram(
aes(y=unlist(lapply(unique(..group..), function(g)
..count..[..group..==g] / sum(..count..[..group..==g])))),
position='dodge'
)
This gives the same output as the dplyr solution above.
One other thing to note is that this lapply code assumes that the data at that stage is sorted by group. I think this is always the case, but if for whatever reason it weren't you would end up with the y data out of order. An alternative which preserves the order of the rows in the calculated data would be:
p + geom_histogram(
aes(y={grp_total <- tapply(..count.., ..group.., sum);
..count.. / grp_total[as.character(..group..)]
}),
position='dodge'
)
It's also worth being aware that these expressions are evaluated in baseenv(), the namespace of the base package. This means that any functions from other packages, even standard ones like stats and utils, need to be used with the :: operator (e.g. stats::rnorm).
After playing around a little, the problem appears to be using anonymous functions with ..group.. or ..count.. inside aes:
xy <- data.frame(x=1:10,y=1:10) #data
ggplot(xy, aes(x = x, y = sapply(y, mean))) + geom_line() #sapply is fine
ggplot(xy, aes(x = x, group = y)) +
geom_bar(aes(y = sapply(..group.., mean))) #sapply with ..group.. is fine
ggplot(xy, aes(x = x, group = y)) +
geom_bar(aes(y = sapply(..group.., function(g) {mean(g)})))
#broken, with same error
ggplot(xy, aes(x = x, group = y)) +
geom_bar(aes(y = sapply(y, function(g) {mean(g)})), stat = "identity")
#sapply with anonymous functions works fine!
It seems like a really weird bug, unless I'm missing something stupid.

Resources