Actual question
How can I mix standard and lazy evaluation of function arguments while giving the user a unified and simple syntax when calling my functions?
Background
I'm a huge fan of dplyr, but what I don't quite like about it is that it makes you use distinguish function names (e.g. select vs. select_) and that it makes you think too much of how to write function calls when you want to express your arguments as a "mixed bag": some are expressed as character strings, for others you want lazy evaluation, for yet others you want standard evaluation. Also see John Mount's blog post on wrapr for another example of where it becomes overly complex to do a simple thing due to standard vs. lazy evaluation.
Example
This is the simplest way of writing my dyplyr::select expression that I know of
x <- "disp"
select_(mtcars, "mpg", ~cyl, x)
After playing around, here's a draft of the solution I'm after:
select2 <- function(dat, ...) {
args <- substitute(list(...))
## Express names as character //
idx <- which(sapply(args, class) == "name")[-1]
## We don't care about the first one as it's going to be
## substituted anyway
if (length(idx)) {
for (ii in idx) args[[ii]] <- as.character(args[[ii]])
}
## Ensure `c()` //
args[[1]] <- quote(c)
## Standard eval for variables containing actual column name //
idx <- which(!eval(args) %in% names(dat)) + 1
if (length(idx)) {
for (ii in idx) args[[ii]] <- eval(as.name(args[[ii]]))
}
## Indexing expression //
exprsn <- substitute(dat[, J], list(J = eval(args)))
eval(exprsn)
}
x <- "disp"
(select2(mtcars, "mpg", cyl, x))
It works, but of course it's very poorly implemented with regard to efficiency ;-)
To make it better and to understand more with regard to evaluation in R, in particular I'd like to know how to get rid of the for loops and how I could best leverage existing functionality of the dplyr and lazyevalpackages as well as base-R functionality like do.call("[.data.frame", ...), with() or the like. Especially the indexing and assignment methods ("[.*" and "<-.*") and how to call them directly are still kind of a mystery for me.
Related
I did a test with nested return function in R, but without success. I came from Mathematica, where this code works well. Here is a toy code:
fstop <- function(x){
if(x>0) return(return("Positive Number"))
}
f <- function(x){
fstop(x)
"Negative or Zero Number"
}
If I evaluate f(1), I get:
[1] "Negative or Zero Number"
When I expected just:
[1] "Positive Number"
The question is: there is some non-standard evaluation I can do in fstop, so I can have just fstop result, without change f function?
PS: I know I can put the if direct inside f, but in my real case the structure is not so simple, and this structure would make my code simpler.
Going to stick my neck out and say...
No.
Making a function return not to its caller but to its caller's caller would involve changing its execution context. This is how things like return and other control-flow things are implemented in the source. See:
https://github.com/wch/r-source/blob/trunk/src/main/context.c
Now, I don't think R level code has access to execution contexts like this. Maybe you could write some C level code that could do it, but its not clear. You could always write a do_return_return function in the style of do_return in eval.c and build a custom version of R... Its not worth it.
So the answer is most likely "no".
I think Spacedman is right, but if you're willing to evaluate your expressions in a wrapper, then it is possible by leveraging the tryCatch mechanism to break out of the evaluation stack.
First, we need to define a special RETURN function:
RETURN <- function(x) {
cond <- simpleCondition("") # dummy message required
class(cond) <- c("specialReturn", class(cond))
attr(cond, "value") <- x
signalCondition(cond)
}
Then we re-write your functions to use our new RETURN:
f <- function(x) {
fstop(x)
"Negative or Zero"
}
fstop <- function(x) if(x > 0) RETURN("Positive Number") # Note `RETURN` not `return`
Finally, we need the wrapper function (wsr here stands for "with special return") to evaluate our expressions:
wsr <- function(x) {
tryCatch(
eval(substitute(x), envir=parent.frame()),
specialReturn=function(e) attr(e, "value")
) }
Then:
wsr(f(-5))
# [1] "Negative or Zero"
wsr(f(5))
# [1] "Positive Number"
Obviously this is a little hacky, but in day to day use would be not much different than evaluating expressions in with or calling code with source. One shortcoming is this will always return to the level you call wsr from.
I am reading Hadley Wickhams's book on Github, in particular this part on lazy evaluation. There he gives an example of consequences of lazy evaluation, in the part with add/adders functions. Let me quote that bit:
This [lazy evaluation] is important when creating closures with lapply or a loop:
add <- function(x) {
function(y) x + y
}
adders <- lapply(1:10, add)
adders[[1]](10)
adders[[10]](10)
x is lazily evaluated the first time that you call one of the adder
functions. At this point, the loop is complete and the final value of
x is 10. Therefore all of the adder functions will add 10 on to their
input, probably not what you wanted! Manually forcing evaluation fixes
the problem:
add <- function(x) {
force(x)
function(y) x + y
}
adders2 <- lapply(1:10, add)
adders2[[1]](10)
adders2[[10]](10)
I do not seem to understand that bit, and the explanation there is minimal. Could someone please elaborate that particular example, and explain what happens there? I am specifically puzzled by the sentence "at this point, the loop is complete and the final value of x is 10". What loop? What final value, where? Must be something simple I am missing, but I just don't see it. Thanks a lot in advance.
This is no longer true as of R 3.2.0!
The corresponding line in the change log reads:
Higher order functions such as the apply functions and Reduce() now
force arguments to the functions they apply in order to eliminate
undesirable interactions between lazy evaluation and variable capture
in closures.
And indeed:
add <- function(x) {
function(y) x + y
}
adders <- lapply(1:10, add)
adders[[1]](10)
# [1] 11
adders[[10]](10)
# [1] 20
The goal of:
adders <- lapply(1:10, function(x) add(x) )
is to create a list of add functions, the first adds 1 to its input, the second adds 2, etc. Lazy evaluation causes R to wait for really creating the adders functions until you really start calling the functions. The problem is that after creating the first adder function, x is increased by the lapply loop, ending at a value of 10. When you call the first adder function, lazy evaluation now builds the function, getting the value of x. The problem is that the original x is no longer equal to one, but to the value at the end of the lapply loop, i.e. 10.
Therefore, lazy evaluation causes all adder functions to wait until after the lapply loop has completed in really building the function. Then they build their function with the same value, i.e. 10. The solution Hadley suggests is to force x to be evaluated directly, avoiding lazy evaluation, and getting the correct functions with the correct x values.
I am reading Hadley Wickhams's book on Github, in particular this part on lazy evaluation. There he gives an example of consequences of lazy evaluation, in the part with add/adders functions. Let me quote that bit:
This [lazy evaluation] is important when creating closures with lapply or a loop:
add <- function(x) {
function(y) x + y
}
adders <- lapply(1:10, add)
adders[[1]](10)
adders[[10]](10)
x is lazily evaluated the first time that you call one of the adder
functions. At this point, the loop is complete and the final value of
x is 10. Therefore all of the adder functions will add 10 on to their
input, probably not what you wanted! Manually forcing evaluation fixes
the problem:
add <- function(x) {
force(x)
function(y) x + y
}
adders2 <- lapply(1:10, add)
adders2[[1]](10)
adders2[[10]](10)
I do not seem to understand that bit, and the explanation there is minimal. Could someone please elaborate that particular example, and explain what happens there? I am specifically puzzled by the sentence "at this point, the loop is complete and the final value of x is 10". What loop? What final value, where? Must be something simple I am missing, but I just don't see it. Thanks a lot in advance.
This is no longer true as of R 3.2.0!
The corresponding line in the change log reads:
Higher order functions such as the apply functions and Reduce() now
force arguments to the functions they apply in order to eliminate
undesirable interactions between lazy evaluation and variable capture
in closures.
And indeed:
add <- function(x) {
function(y) x + y
}
adders <- lapply(1:10, add)
adders[[1]](10)
# [1] 11
adders[[10]](10)
# [1] 20
The goal of:
adders <- lapply(1:10, function(x) add(x) )
is to create a list of add functions, the first adds 1 to its input, the second adds 2, etc. Lazy evaluation causes R to wait for really creating the adders functions until you really start calling the functions. The problem is that after creating the first adder function, x is increased by the lapply loop, ending at a value of 10. When you call the first adder function, lazy evaluation now builds the function, getting the value of x. The problem is that the original x is no longer equal to one, but to the value at the end of the lapply loop, i.e. 10.
Therefore, lazy evaluation causes all adder functions to wait until after the lapply loop has completed in really building the function. Then they build their function with the same value, i.e. 10. The solution Hadley suggests is to force x to be evaluated directly, avoiding lazy evaluation, and getting the correct functions with the correct x values.
I am reading Hadley Wickhams's book on Github, in particular this part on lazy evaluation. There he gives an example of consequences of lazy evaluation, in the part with add/adders functions. Let me quote that bit:
This [lazy evaluation] is important when creating closures with lapply or a loop:
add <- function(x) {
function(y) x + y
}
adders <- lapply(1:10, add)
adders[[1]](10)
adders[[10]](10)
x is lazily evaluated the first time that you call one of the adder
functions. At this point, the loop is complete and the final value of
x is 10. Therefore all of the adder functions will add 10 on to their
input, probably not what you wanted! Manually forcing evaluation fixes
the problem:
add <- function(x) {
force(x)
function(y) x + y
}
adders2 <- lapply(1:10, add)
adders2[[1]](10)
adders2[[10]](10)
I do not seem to understand that bit, and the explanation there is minimal. Could someone please elaborate that particular example, and explain what happens there? I am specifically puzzled by the sentence "at this point, the loop is complete and the final value of x is 10". What loop? What final value, where? Must be something simple I am missing, but I just don't see it. Thanks a lot in advance.
This is no longer true as of R 3.2.0!
The corresponding line in the change log reads:
Higher order functions such as the apply functions and Reduce() now
force arguments to the functions they apply in order to eliminate
undesirable interactions between lazy evaluation and variable capture
in closures.
And indeed:
add <- function(x) {
function(y) x + y
}
adders <- lapply(1:10, add)
adders[[1]](10)
# [1] 11
adders[[10]](10)
# [1] 20
The goal of:
adders <- lapply(1:10, function(x) add(x) )
is to create a list of add functions, the first adds 1 to its input, the second adds 2, etc. Lazy evaluation causes R to wait for really creating the adders functions until you really start calling the functions. The problem is that after creating the first adder function, x is increased by the lapply loop, ending at a value of 10. When you call the first adder function, lazy evaluation now builds the function, getting the value of x. The problem is that the original x is no longer equal to one, but to the value at the end of the lapply loop, i.e. 10.
Therefore, lazy evaluation causes all adder functions to wait until after the lapply loop has completed in really building the function. Then they build their function with the same value, i.e. 10. The solution Hadley suggests is to force x to be evaluated directly, avoiding lazy evaluation, and getting the correct functions with the correct x values.
The normal approach to writing functions in R (as I understand) is to avoid side-effects and return a value from a function.
contained <- function(x) {
x_squared <- x^2
return(x_squared)
}
In this case, the value computed from the input into the function is returned. But the variable x_squared is not available.
But if you need to violate this basic functional programming tenet (and I'm not sure how serious R is about this issue) and return an object from a function, you have two choices.
escape <- function(x){
x_squared <<- x^2
assign("x_times_x", x*x, envir = .GlobalEnv)
}
Both objects x_squared and x_times_x are returned. Is one method preferable to the other and why so?
Thomas Lumley answers this in a superb post on r-help the other day. <<- is about the enclosing environment so you can do thing like this (and again, I quote his post from April 22 in this thread):
make.accumulator<-function(){
a <- 0
function(x) {
a <<- a + x
a
}
}
> f<-make.accumulator()
> f(1)
[1] 1
> f(1)
[1] 2
> f(11)
[1] 13
> f(11)
[1] 24
This is a legitimate use of <<- as "super-assignment" with lexical scope. And not simply to assign in the global environment. For that, Thomas has these choice words:
The Evil and Wrong use is to modify
variables in the global environment.
Very good advice.
According to the manual page here,
The operators <<- and ->> cause a search to made through the environment for an existing definition of the variable being assigned.
I've never had to do this in practice, but to my mind, assign wins a lot of points for specifying the environment exactly, without even having to think about R's scoping rules. The <<- performs a search through environments and is therefore a little bit harder to interpret.
EDIT: In deference to #Dirk and #Hadley, it sounds like assign is the appropriate way to actually assign to the global environment (when that's what you know you want), while <<- is the appropriate way to "bump up" to a broader scope.
As pointed out by #John in his answer, assign lets you specify the environment specifically. A specific application would be in the following:
testfn <- function(x){
x_squared <- NULL
escape <- function(x){
x_squared <<- x^2
assign("x_times_x", x*x, envir = parent.frame(n = 1))
}
escape(x)
print(x_squared)
print(x_times_x)
}
where we use both <<- and assign. Notice that if you want to use <<- to assign to the environment of the top level function, you need to declare/initialise the variable. However, with assign you can use parent.frame(1) to specify the encapsulating environment.