I´m working on a R package that takes strings as function arguments. Now I want to use non-standard evaluation to allow for non-string input. Also, to keep the backwards compatibility, I´d like to keep the possibility for the functions to take strings.
Hadley gives an example with the subset function and suggests that every NES function should be accompanied by a standard evaluation function.
library(lazyeval)
# standard evaluation
subset2_ <- function(df, condition) {
r <- lazy_eval(condition, df)
r <- r & !is.na(r)
df[r, , drop = FALSE]
}
subset2_(mtcars, lazy(mpg > 31))
# NES can be written easily afterwards
subset2 <- function(df, condition) {
subset2_(df, lazy(condition))
}
While the SE function now also takes quoted input,
subset2_(mtcars, "mpg > 31")
the NSE function throws an error:
subset2(mtcars, "mpg > 31")
But I want the user to have the same function (the NSE function) for both quoted as well as unquoted arguments.
Any ideas?
The NSE function takes NSE input. That’s the point of this pattern, isn’t it?
subset2(mtcars, mpg > 31)
You can of course allow the NSE function to take character input as well but I’d advise against this — strongly. Don’t mix SE and NSE, there’s no advantage to be had, and it sows confusion (and potentially bugs, since you’re mixing domains).
That said, the following of course works:
subset2 <- function(df, condition) {
if (is.character(substitute(condition)))
subset2_(df, condition)
else
subset2_(df, lazy(condition))
}
If you want to allow NSE and SE in the same function for backwards compatibility reasons, I suggest phasing out the SE version in a future version, and adding a deprecation warning for now. To add the deprecation warning:
subset2 <- function(df, condition) {
if (is.character(substitute(condition))) {
msg = sprintf(paste0('Calling %s with a quoted expression is',
' deprecated. Pass an unquoted expression',
' instead, or use %s.'),
sQuote('subset2'), sQuote('subset2_'))
.Deprecated(msg = msg)
subset2_(df, condition)
}
else
subset2_(df, lazy(condition))
}
Related
I am writing an R function that allows the user to specify an individual time variable when creating a survival::survfit object. The survival package has a "string-free" syntax, which means that the name of the time variable (in my case "dtime") does not need any quotation marks:
survival::survfit(formula = survival::Surv(dtime, death) ~ 1, rotterdam)
Hence, I figured that I could use tidy evaluation for my purpose. Here is my code:
# My function
get_survfit <- function(.data, .time){
return(survival::survfit(formula = survival::Surv({{ .time }}, status) ~ 1, .data))
}
# Application example
library(survival)
data(cancer)
rotterdam
colnames(rotterdam)[which(names(rotterdam) == "death")] = "status"
get_survfit(.data=rotterdam, .time=dtime)
However, I always get the following errors:
Error in in survival::Surv({ : object 'dtime' not found
and when looking into 'dtime' in the debugger, I get:
Error during wrapup :
promise already under evaluation: recursive default argument reference or earlier
problems?
How can I fix this and still obtain my feature?
survfit() is not a tidy eval function so you're using {{ out of context. See https://rlang.r-lib.org/reference/topic-inject-out-of-context.html.
Instead you'll need to use inject() to enable injection from the outside. Since survfit() is not a tidy eval function it won't support quosures, which {{ creates. So you'll need to use !! instead, together with ensym().
get_survfit <- function(data, time) {
time <- ensym(time)
inject(
survival::survfit(formula = survival::Surv(!!time, status) ~ 1, data)
)
}
ensym() ensures that simple variable names are passed, and defuses them. !! injects the variable in the formula.
Building on Lionel Henry's answer and inspired by another question I asked, I would like to add that it is also possible to achieve the same result with enquo and get_expr:
get_survfit <- function(.data, .time){
.time = rlang::enquo(.time)
.time = rlang::get_expr(.time)
rlang::inject(
return(survival::survfit(formula = survival::Surv(!!.time, status) ~ 1, .data))
)
}
Data for reproducibility
.i <- tibble(a=2*1:4+1, b=2*1:4)
This function is supposed to take its data and other arguments as unquoted names, find those names in the data, and use them to add a column and filter out the
top row. It does not work. Mutate says it can not find a.
t1 <- function(.j=.i, X=a, Y=b){
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=.j, pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
This function, which I found by typo -- note the .i instead of .j in the mutate statement -- does what the previous function was supposed to do. And I don't know why. I think it is skipping over the function arguments and finding .i in the global environment. Or maybe it is using a ouiji board.
t2 <- function(.j=.i, X=a, Y=b){
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=.i, pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
Since mutate could not find .j when passed to it in the usual R way, maybe it needs to be passed in an rlang-style quosure, like the formals X and Y. This function also does not work, with UQ in mutate saying that it can not find a. Like the first function above, it works if the .j in mutate is replaced with a .i. (Seems like there should be an "enquos" to parallel quos).
t3 <- function(.j=.i, X=a, Y=b){
e_j <- enquo(.j)
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=UQ(.j), pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
Finally, it appears that, once the .i substitution in mutate is made, t4() no longer needs a data argument at all. See below, where I replace it with bop_foo_foo. If, however, you replace bop_foo_foo throughout with the name of the data, .i, (t5()) then UQ again fails to find a.
bop_foo_foo <- 0
t4 <- function(bop_foo_foo, X=a, Y=b){
e_j <- enquo(bop_foo_foo)
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=UQ(.i), pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
The functions above seem to me to be relatively minor variants on a single function. I have run dozens more, and although I have observed some patterns,
and read the enquo and UQ help files I do not know how many times, a real
understanding continues to elude me.
I would like to know why the functions above that that don't work don't, and why the ones that do work do. I don't necessarily need a function by function critique. If you can state general principles that embody the required, understanding, that would be delightful. And more than sufficient.
I think it is skipping over the function arguments and finding .i in the global environment.
Yes, scope of symbols in R is hierarchical. The variables local to a function are looked up first, and then the surrounding environment of the function is inspected, and so on.
mutate(.data = UQ(.j), ...)
I think you are missing the difference between regular arguments and (quasi)quoted arguments. Unquoting is only relevant for quasiquoted arguments. Since the .data argument of mutate() is not quasiquoted it does not make sense to try and unquote stuff. The quasiquoted arguments are the ones that are captured/quoted with enexpr() or enquo(). You can tell whether an argument is quasiquoted either by looking at the documentation or by recognising that the argument supports direct references to columns (regular arguments need to be explicit about where to find the columns).
In the next version of rlang, the exported UQ() function will throw an error to make it clear that it should not be called directly and that it can only be used in quasiquoted arguments.
I would suggest:
Call the first argument of your function data or df rather than .i.
Don't give it a default. The user should always supply the data.
Don't capture it with enquo() or enexpr() or substitute(). Instead pass it directly to the data argument of other verbs.
Once this is out of the way it will be easier to work out the rest.
Actual question
How can I mix standard and lazy evaluation of function arguments while giving the user a unified and simple syntax when calling my functions?
Background
I'm a huge fan of dplyr, but what I don't quite like about it is that it makes you use distinguish function names (e.g. select vs. select_) and that it makes you think too much of how to write function calls when you want to express your arguments as a "mixed bag": some are expressed as character strings, for others you want lazy evaluation, for yet others you want standard evaluation. Also see John Mount's blog post on wrapr for another example of where it becomes overly complex to do a simple thing due to standard vs. lazy evaluation.
Example
This is the simplest way of writing my dyplyr::select expression that I know of
x <- "disp"
select_(mtcars, "mpg", ~cyl, x)
After playing around, here's a draft of the solution I'm after:
select2 <- function(dat, ...) {
args <- substitute(list(...))
## Express names as character //
idx <- which(sapply(args, class) == "name")[-1]
## We don't care about the first one as it's going to be
## substituted anyway
if (length(idx)) {
for (ii in idx) args[[ii]] <- as.character(args[[ii]])
}
## Ensure `c()` //
args[[1]] <- quote(c)
## Standard eval for variables containing actual column name //
idx <- which(!eval(args) %in% names(dat)) + 1
if (length(idx)) {
for (ii in idx) args[[ii]] <- eval(as.name(args[[ii]]))
}
## Indexing expression //
exprsn <- substitute(dat[, J], list(J = eval(args)))
eval(exprsn)
}
x <- "disp"
(select2(mtcars, "mpg", cyl, x))
It works, but of course it's very poorly implemented with regard to efficiency ;-)
To make it better and to understand more with regard to evaluation in R, in particular I'd like to know how to get rid of the for loops and how I could best leverage existing functionality of the dplyr and lazyevalpackages as well as base-R functionality like do.call("[.data.frame", ...), with() or the like. Especially the indexing and assignment methods ("[.*" and "<-.*") and how to call them directly are still kind of a mystery for me.
I did a test with nested return function in R, but without success. I came from Mathematica, where this code works well. Here is a toy code:
fstop <- function(x){
if(x>0) return(return("Positive Number"))
}
f <- function(x){
fstop(x)
"Negative or Zero Number"
}
If I evaluate f(1), I get:
[1] "Negative or Zero Number"
When I expected just:
[1] "Positive Number"
The question is: there is some non-standard evaluation I can do in fstop, so I can have just fstop result, without change f function?
PS: I know I can put the if direct inside f, but in my real case the structure is not so simple, and this structure would make my code simpler.
Going to stick my neck out and say...
No.
Making a function return not to its caller but to its caller's caller would involve changing its execution context. This is how things like return and other control-flow things are implemented in the source. See:
https://github.com/wch/r-source/blob/trunk/src/main/context.c
Now, I don't think R level code has access to execution contexts like this. Maybe you could write some C level code that could do it, but its not clear. You could always write a do_return_return function in the style of do_return in eval.c and build a custom version of R... Its not worth it.
So the answer is most likely "no".
I think Spacedman is right, but if you're willing to evaluate your expressions in a wrapper, then it is possible by leveraging the tryCatch mechanism to break out of the evaluation stack.
First, we need to define a special RETURN function:
RETURN <- function(x) {
cond <- simpleCondition("") # dummy message required
class(cond) <- c("specialReturn", class(cond))
attr(cond, "value") <- x
signalCondition(cond)
}
Then we re-write your functions to use our new RETURN:
f <- function(x) {
fstop(x)
"Negative or Zero"
}
fstop <- function(x) if(x > 0) RETURN("Positive Number") # Note `RETURN` not `return`
Finally, we need the wrapper function (wsr here stands for "with special return") to evaluate our expressions:
wsr <- function(x) {
tryCatch(
eval(substitute(x), envir=parent.frame()),
specialReturn=function(e) attr(e, "value")
) }
Then:
wsr(f(-5))
# [1] "Negative or Zero"
wsr(f(5))
# [1] "Positive Number"
Obviously this is a little hacky, but in day to day use would be not much different than evaluating expressions in with or calling code with source. One shortcoming is this will always return to the level you call wsr from.
I've been fascinated lately by Hadley Wickhams Non-standard Evaluation examples in R, but I'm not sure what I want to do is possible.
I want to have a closure-based environment where you pass expressions that get evaluated (in NSE ways), similar to how subset works. The problem though, is that to do so I think I need to fundamentally change how arguments are passed.
For example,
g <- function(setup_stuff){
function(x) {
substitute(x)
}
}
will give me the expression assigned to x if I so something like:
test <- g("Setup stuff")
test(1:10)
# 1:10
Similarly, I can do something like:
g <- function(setup_stuff){
function(x) {
sys.call(x)
}
}
Which will usually give me kind of what I'm looking for--a completely unevaluated argument list:
test <- g("setup variables")
test(1:10)
# test(1:10)
But this all relies on the idea that I pass variables the "standard" way, by delimiting assigned parameters with commas. I want to have something like:
g <- function(setup_stuff){
function(...) {
# Capture named expression(s) before evaluation
substitute(...)
}
}
Such that, for example, I can evaluate the arguments in a based on their named and the operators passed in, for example I've been trying to overload the logical operator &, but I just receive an error before the function to do NSE is even called:
test <- g("Setup stuff")
test(a=1 & b=2)
# > test(a=1 & b=2)
# Error: unexpected '=' in "test(a=1 & b="
I know I could probably do half-accomplish this by overloading the '&' and the '=' operator for some specific class, and just return the unevaluated call, but then a and b would need to be objects of that class, but I was wondering if I was missing something that someone can easily see?