Capture Arbitrary Conditions with `withCallingHandlers` - r

The Problem
I'm trying to write a function that will evaluate code and store the results, including any possible conditions signaled in the code. I've got this working perfectly fine, except for the situation when my function (let's call it evalcapt) is run within an error handling expression.
The problem is that withCallingHandlers will keep looking for matching condition handlers and if someone has defined such a handler outside of my function, my function loses control of execution. Here is simplified example of the problem:
evalcapt <- function(expr) {
conds <- list()
withCallingHandlers(
val <- eval(expr),
condition=function(e) {
message("Caught condition of class ", deparse(class(e)))
conds <<- c(conds, list(e))
} )
list(val=val, conditions=conds)
}
myCondition <- simpleCondition("this is a custom condition")
class(myCondition) <- c("custom", class(myCondition))
expr <- expression(signalCondition(myCondition), 25)
tryCatch(evalcapt(expr))
Works as expected
Caught condition of class c("custom", "simpleCondition", "condition")
$val
[1] 25
$conditions
$conditions[[1]]
<custom: this is a custom condition>
but:
tryCatch(
evalcapt(expr),
custom=function(e) stop("Hijacked `evalcapt`!")
)
Doesn't work:
Caught condition of class c("custom", "simpleCondition", "condition")
Error in value[[3L]](cond) : Hijacked `evalcapt`!
A Solution I don't Know How To Implement
What I really need is a way of defining a restart right after the condition is signaled in the code which frankly is the way withCallingHandlers appears to work normally (when my handler is the last available handler), but I don't see the restart established when I browse in my handling function and use computeRestarts.
Things That Seem Like Solutions That Won't Work
Use tryCatch
tryCatch does not have the same problem as withCallingHandlers because it does not continue looking for handlers after it finds the first one. The big problem with is it also does not continue to evaluate the code after the condition. If you look at the example that worked above, but sub in tryCatch for withCallingHandlers, the value (25) does not get returned because execution is brought back to the tryCatch frame after the condition is handled.
So basically, I'm looking for a hybrid between tryCatch and withCallingHandlers, one that returns control to the condition signaler, but also stops looking for more handlers after the first one is found.
Break Up The Expression Into Sub-expression, then Use tryCatch
Okay, but how do you break up (and more complex functions with signaled conditions all over the place):
fun <- function(myCondition) {
signalCondition(myCondition)
25
}
expr <- expression(fun())
Misc
I looked for the source code associated with the .Internal(.signalCondition()) call to see if I can figure out if there is a behind the scenes restart being set, but I'm out of my depth there. It seems like:
void R_ReturnOrRestart(SEXP val, SEXP env, Rboolean restart)
{
int mask;
RCNTXT *c;
mask = CTXT_BROWSER | CTXT_FUNCTION;
for (c = R_GlobalContext; c; c = c->nextcontext) {
if (c->callflag & mask && c->cloenv == env)
findcontext(mask, env, val);
else if (restart && IS_RESTART_BIT_SET(c->callflag))
findcontext(CTXT_RESTART, c->cloenv, R_RestartToken);
else if (c->callflag == CTXT_TOPLEVEL)
error(_("No function to return from, jumping to top level"));
}
}
from src/main/errors.c is doing some of that restart invocation, and this is called by do_signalCondition, but I don't have a clue how I would go about messing with this.

I think what you're looking for is to use withRestarts when your special condition is signaled, like from warning:
withRestarts({
.Internal(.signalCondition(cond, message, call))
.Internal(.dfltWarn(message, call))
}, muffleWarning = function() NULL)
so
evalcapt <- function(expr) {
conds <- list()
withCallingHandlers(
val <- eval(expr),
custom=function(e) {
message("Caught condition of class ", deparse(class(e)))
conds <<- c(conds, list(e))
invokeRestart("muffleCustom")
} )
list(val=val, conditions=conds)
}
expr <- expression(withRestarts({
signalCondition(myCondition)
}, muffleCustom=function() NULL), 25)
leads to
> tryCatch(evalcapt(expr))
Caught condition of class c("custom", "simpleCondition", "condition")
$val
[1] 25
$conditions
$conditions[[1]]
<custom: this is a custom condition>
> tryCatch(
+ evalcapt(expr),
+ custom=function(e) stop("Hijacked `evalcapt`!")
+ )
Caught condition of class c("custom", "simpleCondition", "condition")
$val
[1] 25
$conditions
$conditions[[1]]
<custom: this is a custom condition>

As far as I can tell there isn't and can't be a simple solution to this problem (I'm happy to be proven wrong). The source of the problem can be seen if we look at how tryCatch and withCallingHandlers register the handlers:
.Internal(.addCondHands(name, list(handler), parentenv, environment(), FALSE)) # tryCatch
.Internal(.addCondHands(classes, handlers, parentenv, NULL, TRUE)) # withCallingHandlers
The key point is the last argument, FALSE in tryCatch, TRUE in withCallingHandlers. This argument leads to the gp bit getting set by do_addCondHands > mkHandlerEntry in src/main/errors.c.
That same bit is then consulted by do_signalCondition (still in src/main/errors.c) when a condition is signaled:
// simplified code excerpt from `do_signalCondition
PROTECT(oldstack = R_HandlerStack);
while ((list = findConditionHandler(cond)) != R_NilValue) {
SEXP entry = CAR(list);
R_HandlerStack = CDR(list);
if (IS_CALLING_ENTRY(entry)) { // <<------------- Consult GP bit
... // Evaluate handler
} else gotoExitingHandler(cond, ecall, entry); // Evaluate handler and exit
}
R_HandlerStack = oldstack;
return R_NilValue;
Basically, if the GP bit is set, then we evaluate the handler, and keep iterating through the handler stack. If it isn't set, then we run gotExitingHandler which runs the handler but then returns control to the handling control structure rather than resuming the code where the condition was signaled.
Since the GP bit can only tell you to do one of two things, there is no straightforward way to modify the behavior of this call (i.e. you either iterate through all the handlers if using withCallingHandlers, or you stop at the first matching one registered by tryCatch).
I toyed with the idea of traceing signalConditions to add a restart there, but that seems too hackish.

With a bit of C you can evaluate an expression within a ToplevelExec() to isolate it from all handlers registered on the stack.
We will expose it at R level in the next rlang version.

I may be a bit late, but I've been digging into the condition-system as well, and I think I've found some other solutions.
But first: some reasons why this is necessarily a hard problem, not something that can easily be solved generally.
The question is which function is signalling a condition, and whether this function can continue execution if it throws a condition. Errors are implemented as "just a condition" as well, but most functions don't expect to be continued after they've thrown a stop().
And some functions may pass on a condition, expecting not be bothered by it again.
Normally, this means that control can only be returned after a stop if a function has explicitly said it can accept that: with a restart provided.
There may also be other serious conditions that can be signalled, and if a function expects such a condition to always be caught, and you force it to return execution, things break badly.
What should happen when you would have written it as follows and execution would resume?
myfun <- function(deleteFiles=NULL) {
if (!all(haveRights(deleteFiles))) stop("Access denied")
file.remove(deleteFiles)
}
withCallingHandlers(val <- eval(myfun(myProtectedFiles)),
error=function(e) message("I'm just going to ignore everything..."))
If no other handlers are called (which alert the user that stop has been called), the files would be removed, even though this function has a (small) safeguard against that.
In the case of an error this is clear, but there could be also cases for other conditions, so I think that's the main reason R doesn't really support it if you stop the passing on of conditions, unless it means halting.
Nonetheless, I think I've found 2 ways of hacking your problem.
The first is simply executing expr step by step, which is quite close to Martin Morgans solution, but moves the withRestarts into your function:
evalcapt <- function(expr) {
conds <- list()
for (i in seq_along(expr)) {
withCallingHandlers(
val <- withRestarts(
eval(expr[[i]]),
muffleCustom = function()
NULL
),
custom = function(e) {
message("Caught condition of class ", deparse(class(e)))
conds <<- c(conds, list(e))
invokeRestart(findRestart("muffleCustom"))
})
}
list(val = val, conditions = conds)
}
The main disadvantage is that this doesn't dig into functions, expr is executed for each instruction at the level it is called.
So if you call evalcapt(myfun()), the for-loop sees this as one instruction. And this one instruction throws a condition --> so does not return --> so you can't see any output that would have been there would you not have been catching anything.
OTOH, evalcapt(expression(signalCondition(myCondition), 25)) does work as requested, as this is an expression with 2 elements, each of which is called.
If you want to go hardcore, I think you could try evaluating myfun() step-by-step, but there is always the question how deep you want to go. If myfun() calls myotherfun(), which calls myotherotherfun(), do you want to return control to the point where myfun failed, or myotherfun, or myotherotherfun?
Basically, it's just a guess about what level you want to halt execution, and where you want to resume.
So a second solution: hijack any call to signalCondition. This means you'll probably end up at a quite deep level, although not the very deepest (no primitives, or code that calls .signalCondition).
I think this works best if you're really sure that your custom condition is only thrown by code that is written by you: it means that execution resumes directly after signalCondition.
Which gives me this function:
evalcapt <- function(expr) {
if(exists('conds', parent.frame(), inherits=FALSE)) {
conds_backup <- get('conds', parent.frame(), inherits=FALSE)
on.exit(assign('conds', conds_backup, parent.frame(), inherits=FALSE), add=TRUE)
} else {
on.exit(rm('conds', pos=parent.frame(), inherits=FALSE), add=TRUE)
}
assign('conds', list(), parent.frame(), inherits=FALSE)
origsignalCondition <- signalCondition
if(exists('signalCondition', parent.frame(), inherits=FALSE)) {
signal_backup <- get('signalCondition', parent.frame(), inherits=FALSE)
on.exit(assign('signalCondition', signal_backup, parent.frame(), inherits=FALSE), add=TRUE)
} else {
on.exit(rm('signalCondition', pos=parent.frame(), inherits=FALSE), add=TRUE)
}
assign('signalCondition', function(e) {
if(is(e, 'custom')) {
message("Caught condition of class ", deparse(class(e)))
conds <<- c(conds, list(e))
} else {
origsignalCondition(e)
}
}, parent.frame())
val <- eval(expr, parent.frame())
list(val=val, conditions=conds)
}
It looks way messier, but that's mostly because there are more issues with which environment to use. The differences are that here, I use the calling environment as context, and to hijack signalCondition() that needs to be there too. And afterwards we need to clean up.
But the main use is overwriting signalCondition: if we see a custom error we log it, and return control. If it's another condition, we pass on control.
Here there may be some smaller disadvantages:
You may end up in a deeper function, where the bug is the way myfun calls myotherfun, but you end up in myotherfun (or deeper).
It only catches occurrences where signalCondition is called. If you call e.g. warning(myCondition), nothing is caught.
If a function in another package/another environment calls signalCondition, then it uses its own searchpath, meaning our signalCondition might be bypassed, and base::signalCondition is used instead.
When debugging, it's a lot uglier. Variables are assigned in environments where you don't expect them (and then disappear when you exit a function), the scope for different functions may be unclear, parent.frame() might give others results then you'd expect, etc.
And as said before: all functions must be able to handle re-entrance after throwing a condition.

Related

R: determine if `suppressMessages()` has been invoked

R's suppressMessages function remains a mystery. What does it do? It does not seem to change global options. It does not appear to add anything to the env. So how does it work? I would like to create a function that detects if suppressMessages has been invoked. However, I don't know where to even start.
Simple example:
#super basic function
myfunfction=function(x = 1){
y = x * 2
return(y)
}
#I can call this function as is
myfunfction(x=4)
#I can call it with suppressMessages
suppressMessages(myfunfction(x=4))
Is there any way for myfunfction to 'know' that suppressMessages is used?
Broader context: Long-running Rcpp functions that benefit from having status messages. Rprintf and Rcpp::Rcout are not silenced by suppressMessages. Being able to detect it would be helpful.
It goes like this:
is_suppressed <- function() {
withRestarts({
signalCondition(structure(list(), class = c("message", "condition")))
FALSE
}, muffleMessage = function() TRUE)
}
is_suppressed()
#> FALSE
suppressMessages(is_suppressed())
#> TRUE
It has a small side effect, though. It emits a "message" condition, every time it is called, and if some code on the stack (e.g. code calling your package/function) is catching these conditions, they might be affected.

R: Exit from the calling function

In R, is there a way to exit from the calling function and return a value? Something like return(), but from the parent function?
parent <- function(){
child()
# stuff afterward should not be executed
}
child <- function(){
returnFromParent("a message returned by parent()")
}
It seems stop() is doing something like that. What I want to do is to write a small replacement for stop() that returns the message that stop() writes to stderr.
Update after G5W's suggestion: I have a large number of checks, each resulting in a stop() if the test fails, but subsequent conditions cannot be evaluated if earlier checks fail, so the function must exit after a failing one. To do this 'properly', I would have to build up a huge if else construct, which I wanted to avoid.
Got it. I guess I was looking for something like this:
parent <- function(){
parent_killing_child()
print("do not run this")
}
parent_killing_child <- function(){
do.call(return, list("my message"), envir = sys.frame(-1))
}
parent()
Thanks for all the advices.
Disclaimer: This sounds a XY problem, printing the stop message to stdout has few to no value, if interactive it should not be a problem, if in a script just use the usual redirection 2 > &1 to write stderr messages to stdout, or maybe use sink as in answer in this question.
Now, if I understood properly what you're after I'll do something like the following to avoid too much code refactoring.
First define a function to handle errors:
my_stop <- function() {
e <- geterrmessage()
print(e)
}
Now configure the system to send errors to your function (error handler) and suppress error messages:
options(error = my_stop)
options(show.error.messages=FALSE)
Now let's test it:
f1 <- function() {
f2()
print("This should not be seen")
}
f2 <- function() {
stop("This is a child error message")
}
Output:
> f1()
[1] "Error in f2() : This is a child error message\n"
For the parent function, make a list of tests. Then loop over the tests, and return your message at the first failed test. Subsequent tests will not be executed after the first failure.
Sample code:
test1 <- function(){criteria <- T; return(ifelse(criteria,T,F))}
test2 <- function(){criteria <- F; return(ifelse(criteria,T,F))}
test3 <- function(){criteria <- T; return(ifelse(criteria,T,F))}
parent <- function() {
tests <- c('test1', 'test2', 'test3')
for (i in 1:length(tests)) {
passed <- do.call(tests[i],args = list())
#print(passed)
if (!passed){
return(paste("Testing failed on test ", i, ".", sep=''))
}
}
return('Congrats! All tests passed!')
}
parent()
Update
Kudos to #chris for their clever application of do.call() in their successful solution.
In five years since then, the R team has released the rlang package within the tidyverse, which provides the apt function rlang::return_from() in tandem with rlang::return_to().
While base::return() can only return from the current local frame,
these two functions will return from any frame on the current
evaluation stack, between the global and the currently active context.
They provide a way of performing arbitrary non-local jumps out of the
function currently under evaluation.
Solution
Thus, you can simply do
child <- function() {
rlang::return_from(
# Return from the parent context (1 frame back).
frame = rlang::caller_env(n = 1),
# Return the message text.
value = "some text returned by parent()"
)
}
where the parent is identified via rlang::caller_env().
Results
When called from a parent() function
parent <- function() {
child()
# stuff afterward should not be executed
return("text that should NOT be returned by parent()")
}
the child() function will force parental behavior like this:
parent()
#> [1] "some text returned by parent()"
Bonus
See my solution here for throwing an error from a parent (or from any arbitrary "ancestor").

Catching use of return without parentheses in R

I just tracked down a silly bug in some R code that I had written. The bug was equivalent to this:
brokenEarlyReturn = function(x=TRUE) {
if (x) return # broken with bare return
stop("Should not get here if x is TRUE. x == ", x)
}
brokenEarlyReturn(TRUE)
# Error in brokenEarlyReturn(TRUE) :
# Should not get here if x is TRUE. x == TRUE
The problem is that instead of return() I had just a bare return without the following parentheses. This causes the if statement be roughly equivalent to if (x) constant, where the body is a bareword that performs no action. In this case, the bareword is the definition of the return function itself, and the function continues rather than returning. The correct version would look like this:
workingEarlyReturn = function(x=TRUE) {
if (x) return() # parentheses added to return
stop("Should not get here if x is TRUE. x == ", x)
}
It makes sense that R requires parentheses after return, but as a C programmer I'm likely to occasionally forget to add them. Usually there would be a parsing error if they are omitted, but in this case of a bare return in the body of an if statement there is not.
Assuming I want the ability to put a "guard" statement at the top of a function that will return without a value if some condition is not met, how I can avoid making this error in the future? Or at least, how can I make it easier to track down this error when I do make it? Is there some "expression has no effect" warning that I can turn on?

Explicitly calling return in a function or not

A while back I got rebuked by Simon Urbanek from the R core team (I believe) for recommending a user to explicitly calling return at the end of a function (his comment was deleted though):
foo = function() {
return(value)
}
instead he recommended:
foo = function() {
value
}
Probably in a situation like this it is required:
foo = function() {
if(a) {
return(a)
} else {
return(b)
}
}
His comment shed some light on why not calling return unless strictly needed is a good thing, but this was deleted.
My question is: Why is not calling return faster or better, and thus preferable?
Question was: Why is not (explicitly) calling return faster or better, and thus preferable?
There is no statement in R documentation making such an assumption.
The main page ?'function' says:
function( arglist ) expr
return(value)
Is it faster without calling return?
Both function() and return() are primitive functions and the function() itself returns last evaluated value even without including return() function.
Calling return() as .Primitive('return') with that last value as an argument will do the same job but needs one call more. So that this (often) unnecessary .Primitive('return') call can draw additional resources.
Simple measurement however shows that the resulting difference is very small and thus can not be the reason for not using explicit return. The following plot is created from data selected this way:
bench_nor2 <- function(x,repeats) { system.time(rep(
# without explicit return
(function(x) vector(length=x,mode="numeric"))(x)
,repeats)) }
bench_ret2 <- function(x,repeats) { system.time(rep(
# with explicit return
(function(x) return(vector(length=x,mode="numeric")))(x)
,repeats)) }
maxlen <- 1000
reps <- 10000
along <- seq(from=1,to=maxlen,by=5)
ret <- sapply(along,FUN=bench_ret2,repeats=reps)
nor <- sapply(along,FUN=bench_nor2,repeats=reps)
res <- data.frame(N=along,ELAPSED_RET=ret["elapsed",],ELAPSED_NOR=nor["elapsed",])
# res object is then visualized
# R version 2.15
The picture above may slightly difffer on your platform.
Based on measured data, the size of returned object is not causing any difference, the number of repeats (even if scaled up) makes just a very small difference, which in real word with real data and real algorithm could not be counted or make your script run faster.
Is it better without calling return?
Return is good tool for clearly designing "leaves" of code where the routine should end, jump out of the function and return value.
# here without calling .Primitive('return')
> (function() {10;20;30;40})()
[1] 40
# here with .Primitive('return')
> (function() {10;20;30;40;return(40)})()
[1] 40
# here return terminates flow
> (function() {10;20;return();30;40})()
NULL
> (function() {10;20;return(25);30;40})()
[1] 25
>
It depends on strategy and programming style of the programmer what style he use, he can use no return() as it is not required.
R core programmers uses both approaches ie. with and without explicit return() as it is possible to find in sources of 'base' functions.
Many times only return() is used (no argument) returning NULL in cases to conditially stop the function.
It is not clear if it is better or not as standard user or analyst using R can not see the real difference.
My opinion is that the question should be: Is there any danger in using explicit return coming from R implementation?
Or, maybe better, user writing function code should always ask: What is the effect in not using explicit return (or placing object to be returned as last leaf of code branch) in the function code?
If everyone agrees that
return is not necessary at the end of a function's body
not using return is marginally faster (according to #Alan's test, 4.3 microseconds versus 5.1)
should we all stop using return at the end of a function? I certainly won't, and I'd like to explain why. I hope to hear if other people share my opinion. And I apologize if it is not a straight answer to the OP, but more like a long subjective comment.
My main problem with not using return is that, as Paul pointed out, there are other places in a function's body where you may need it. And if you are forced to use return somewhere in the middle of your function, why not make all return statements explicit? I hate being inconsistent. Also I think the code reads better; one can scan the function and easily see all exit points and values.
Paul used this example:
foo = function() {
if(a) {
return(a)
} else {
return(b)
}
}
Unfortunately, one could point out that it can easily be rewritten as:
foo = function() {
if(a) {
output <- a
} else {
output <- b
}
output
}
The latter version even conforms with some programming coding standards that advocate one return statement per function. I think a better example could have been:
bar <- function() {
while (a) {
do_stuff
for (b) {
do_stuff
if (c) return(1)
for (d) {
do_stuff
if (e) return(2)
}
}
}
return(3)
}
This would be much harder to rewrite using a single return statement: it would need multiple breaks and an intricate system of boolean variables for propagating them. All this to say that the single return rule does not play well with R. So if you are going to need to use return in some places of your function's body, why not be consistent and use it everywhere?
I don't think the speed argument is a valid one. A 0.8 microsecond difference is nothing when you start looking at functions that actually do something. The last thing I can see is that it is less typing but hey, I'm not lazy.
This is an interesting discussion. I think that #flodel's example is excellent. However, I think it illustrates my point (and #koshke mentions this in a comment) that return makes sense when you use an imperative instead of a functional coding style.
Not to belabour the point, but I would have rewritten foo like this:
foo = function() ifelse(a,a,b)
A functional style avoids state changes, like storing the value of output. In this style, return is out of place; foo looks more like a mathematical function.
I agree with #flodel: using an intricate system of boolean variables in bar would be less clear, and pointless when you have return. What makes bar so amenable to return statements is that it is written in an imperative style. Indeed, the boolean variables represent the "state" changes avoided in a functional style.
It is really difficult to rewrite bar in functional style, because it is just pseudocode, but the idea is something like this:
e_func <- function() do_stuff
d_func <- function() ifelse(any(sapply(seq(d),e_func)),2,3)
b_func <- function() {
do_stuff
ifelse(c,1,sapply(seq(b),d_func))
}
bar <- function () {
do_stuff
sapply(seq(a),b_func) # Not exactly correct, but illustrates the idea.
}
The while loop would be the most difficult to rewrite, because it is controlled by state changes to a.
The speed loss caused by a call to return is negligible, but the efficiency gained by avoiding return and rewriting in a functional style is often enormous. Telling new users to stop using return probably won't help, but guiding them to a functional style will payoff.
#Paul return is necessary in imperative style because you often want to exit the function at different points in a loop. A functional style doesn't use loops, and therefore doesn't need return. In a purely functional style, the final call is almost always the desired return value.
In Python, functions require a return statement. However, if you programmed your function in a functional style, you will likely have only one return statement: at the end of your function.
Using an example from another StackOverflow post, let us say we wanted a function that returned TRUE if all the values in a given x had an odd length. We could use two styles:
# Procedural / Imperative
allOdd = function(x) {
for (i in x) if (length(i) %% 2 == 0) return (FALSE)
return (TRUE)
}
# Functional
allOdd = function(x)
all(length(x) %% 2 == 1)
In a functional style, the value to be returned naturally falls at the ends of the function. Again, it looks more like a mathematical function.
#GSee The warnings outlined in ?ifelse are definitely interesting, but I don't think they are trying to dissuade use of the function. In fact, ifelse has the advantage of automatically vectorizing functions. For example, consider a slightly modified version of foo:
foo = function(a) { # Note that it now has an argument
if(a) {
return(a)
} else {
return(b)
}
}
This function works fine when length(a) is 1. But if you rewrote foo with an ifelse
foo = function (a) ifelse(a,a,b)
Now foo works on any length of a. In fact, it would even work when a is a matrix. Returning a value the same shape as test is a feature that helps with vectorization, not a problem.
It seems that without return() it's faster...
library(rbenchmark)
x <- 1
foo <- function(value) {
return(value)
}
fuu <- function(value) {
value
}
benchmark(foo(x),fuu(x),replications=1e7)
test replications elapsed relative user.self sys.self user.child sys.child
1 foo(x) 10000000 51.36 1.185322 51.11 0.11 0 0
2 fuu(x) 10000000 43.33 1.000000 42.97 0.05 0 0
____EDIT __________________
I proceed to others benchmark (benchmark(fuu(x),foo(x),replications=1e7)) and the result is reversed... I'll try on a server.
My question is: Why is not calling return faster
It’s faster because return is a (primitive) function in R, which means that using it in code incurs the cost of a function call. Compare this to most other programming languages, where return is a keyword, but not a function call: it doesn’t translate to any runtime code execution.
That said, calling a primitive function in this way is pretty fast in R, and calling return incurs a minuscule overhead. This isn’t the argument for omitting return.
or better, and thus preferable?
Because there’s no reason to use it.
Because it’s redundant, and it doesn’t add useful redundancy.
To be clear: redundancy can sometimes be useful. But most redundancy isn’t of this kind. Instead, it’s of the kind that adds visual clutter without adding information: it’s the programming equivalent of a filler word or chartjunk).
Consider the following example of an explanatory comment, which is universally recognised as bad redundancy because the comment merely paraphrases what the code already expresses:
# Add one to the result
result = x + 1
Using return in R falls in the same category, because R is a functional programming language, and in R every function call has a value. This is a fundamental property of R. And once you see R code from the perspective that every expression (including every function call) has a value, the question then becomes: “why should I use return?” There needs to be a positive reason, since the default is not to use it.
One such positive reason is to signal early exit from a function, say in a guard clause:
f = function (a, b) {
if (! precondition(a)) return() # same as `return(NULL)`!
calculation(b)
}
This is a valid, non-redundant use of return. However, such guard clauses are rare in R compared to other languages, and since every expression has a value, a regular if does not require return:
sign = function (num) {
if (num > 0) {
1
} else if (num < 0) {
-1
} else {
0
}
}
We can even rewrite f like this:
f = function (a, b) {
if (precondition(a)) calculation(b)
}
… where if (cond) expr is the same as if (cond) expr else NULL.
Finally, I’d like to forestall three common objections:
Some people argue that using return adds clarity, because it signals “this function returns a value”. But as explained above, every function returns something in R. Thinking of return as a marker of returning a value isn’t just redundant, it’s actively misleading.
Relatedly, the Zen of Python has a marvellous guideline that should always be followed:
Explicit is better than implicit.
How does dropping redundant return not violate this? Because the return value of a function in a functional language is always explicit: it’s its last expression. This is again the same argument about explicitness vs redundancy.
In fact, if you want explicitness, use it to highlight the exception to the rule: mark functions that don’t return a meaningful value, which are only called for their side-effects (such as cat). Except R has a better marker than return for this case: invisible. For instance, I would write
save_results = function (results, file) {
# … code that writes the results to a file …
invisible()
}
But what about long functions? Won’t it be easy to lose track of what is being returned?
Two answers: first, not really. The rule is clear: the last expression of a function is its value. There’s nothing to keep track of.
But more importantly, the problem in long functions isn’t the lack of explicit return markers. It’s the length of the function. Long functions almost (?) always violate the single responsibility principle and even when they don’t they will benefit from being broken apart for readability.
A problem with not putting 'return' explicitly at the end is that if one adds additional statements at the end of the method, suddenly the return value is wrong:
foo <- function() {
dosomething()
}
This returns the value of dosomething().
Now we come along the next day and add a new line:
foo <- function() {
dosomething()
dosomething2()
}
We wanted our code to return the value of dosomething(), but instead it no longer does.
With an explicit return, this becomes really obvious:
foo <- function() {
return( dosomething() )
dosomething2()
}
We can see that there is something strange about this code, and fix it:
foo <- function() {
dosomething2()
return( dosomething() )
}
I think of return as a trick. As a general rule, the value of the last expression evaluated in a function becomes the function's value -- and this general pattern is found in many places. All of the following evaluate to 3:
local({
1
2
3
})
eval(expression({
1
2
3
}))
(function() {
1
2
3
})()
What return does is not really returning a value (this is done with or without it) but "breaking out" of the function in an irregular way. In that sense, it is the closest equivalent of GOTO statement in R (there are also break and next). I use return very rarely and never at the end of a function.
if(a) {
return(a)
} else {
return(b)
}
... this can be rewritten as if(a) a else b which is much better readable and less curly-bracketish. No need for return at all here. My prototypical case of use of "return" would be something like ...
ugly <- function(species, x, y){
if(length(species)>1) stop("First argument is too long.")
if(species=="Mickey Mouse") return("You're kidding!")
### do some calculations
if(grepl("mouse", species)) {
## do some more calculations
if(species=="Dormouse") return(paste0("You're sleeping until", x+y))
## do some more calculations
return(paste0("You're a mouse and will be eating for ", x^y, " more minutes."))
}
## some more ugly conditions
# ...
### finally
return("The end")
}
Generally, the need for many return's suggests that the problem is either ugly or badly structured.
[EDIT]
return doesn't really need a function to work: you can use it to break out of a set of expressions to be evaluated.
getout <- TRUE
# if getout==TRUE then the value of EXP, LOC, and FUN will be "OUTTA HERE"
# .... if getout==FALSE then it will be `3` for all these variables
EXP <- eval(expression({
1
2
if(getout) return("OUTTA HERE")
3
}))
LOC <- local({
1
2
if(getout) return("OUTTA HERE")
3
})
FUN <- (function(){
1
2
if(getout) return("OUTTA HERE")
3
})()
identical(EXP,LOC)
identical(EXP,FUN)
The argument of redundancy has come up a lot here. In my opinion that is not reason enough to omit return().
Redundancy is not automatically a bad thing. When used strategically, redundancy makes code clearer and more maintenable.
Consider this example: Function parameters often have default values. So specifying a value that is the same as the default is redundant. Except it makes obvious the behaviour I expect. No need to pull up the function manpage to remind myself what the defaults are. And no worry about a future version of the function changing its defaults.
With a negligible performance penalty for calling return() (as per the benchmarks posted here by others) it comes down to style rather than right and wrong. For something to be "wrong", there needs to be a clear disadvantage, and nobody here has demonstrated satisfactorily that including or omitting return() has a consistent disadvantage. It seems very case-specific and user-specific.
So here is where I stand on this.
function(){
#do stuff
...
abcd
}
I am uncomfortable with "orphan" variables like in the example above. Was abcd going to be part of a statement I didn't finish writing? Is it a remnant of a splice/edit in my code and needs to be deleted? Did I accidentally paste/move something from somewhere else?
function(){
#do stuff
...
return(abdc)
}
By contrast, this second example makes it obvious to me that it is an intended return value, rather than some accident or incomplete code. For me this redundancy is absolutely not useless.
Of course, once the function is finished and working I could remove the return. But removing it is in itself a redundant extra step, and in my view more useless than including return() in the first place.
All that said, I do not use return() in short unnamed one-liner functions. There it makes up a large fraction of the function's code and therefore mostly causes visual clutter that makes code less legible. But for larger formally defined and named functions, I use it and will likely continue to so.
return can increase code readability:
foo <- function() {
if (a) return(a)
b
}

Environment chaining in R

In my R development I need to wrap function primitives in proto objects so that a number of arguments can be automatically passed to the functions when the $perform() method of the object is invoked. The function invocation internally happens via do.call(). All is well, except when the function attempts to access variables from the closure within which it is defined. In that case, the function cannot resolve the names.
Here is the smallest example I have found that reproduces the behavior:
library(proto)
make_command <- function(operation) {
proto(
func = operation,
perform = function(., ...) {
func <- with(., func) # unbinds proto method
do.call(func, list(), envir=environment(operation))
}
)
}
test_case <- function() {
result <- 100
make_command(function() result)$perform()
}
# Will generate error:
# Error in function () : object 'result' not found
test_case()
I have a reproducible testthat test that also outputs a lot of diagnostic output. The diagnostic output has me stumped. By looking up the parent environment chain, my diagnostic code, which lives inside the function, finds and prints the very same variable the function fails to find. See this gist..
How can the environment for do.call be set up correctly?
This was the final answer after an offline discussion with the poster:
make_command <- function(operation) {
proto(perform = function(.) operation())
}
I think the issue here is clearer and easier to explore if you:
Replace the anonymous function within make_command() with a named one.
Make that function open a browser() (instead of trying to get result). That way you can look around to see where you are and what's going on.
Try this, which should clarify the cause of your problem:
test_case <- function() {
result <- 100
myFun <- function() browser()
make_command(myFun)$perform()
}
test_case()
## Then from within the browser:
#
parent.env(environment())
# <environment: 0x0d8de854>
# attr(,"class")
# [1] "proto" "environment"
get("result", parent.env(environment()))
# Error in get("result", parent.env(environment())) :
# object 'result' not found
#
parent.frame()
# <environment: 0x0d8ddfc0>
get("result", parent.frame()) ## (This works, so points towards a solution.)
# [1] 100
Here's the problem. Although you think you're evaluating myFun(), whose environment is the evaluation frame of test_case(), your call to do.call(func, ...) is really evaluating func(), whose environment is the proto environment within which it was defined. After looking for and not finding result in its own frame, the call to func() follows the rules of lexical scoping, and next looks in the proto environment. Neither it nor its parent environment contains an object named result, resulting in the error message you received.
If this doesn't immediately make sense, you can keep poking around within the browser. Here are a few further calls you might find helpful:
environment(get("myFun", parent.frame()))
ls(environment(get("myFun", parent.frame())))
environment(get("func", parent.env(environment())))
ls(environment(get("func", parent.env(environment()))))

Resources