I have a function bowtieIndex(bowtieBuildLocation, filename) that calls bowtie2-build. It's parameters are the bowtie2-build location on my system and the output filename.
How do I write a test for this function without hard-coding the bowtie2-build location?
If all you are doing inside the function is calling an external program, then you can't test if that works directly. Without the code itself, this is hard to answer specifically, but if, for example, bowtieIndex is calling system to run an external program, you can mock what system will do, pretending that it worked, or failed. Look at the documentation for testthat::with_mock.
Three examples follow. The first parameter is the new code that will be executed instead of the real system function. The parameters that the system function was called with are available as normal in a function definition, if you want to do very specific things. I find it easier to test repeatedly rather than write a complex replacement function. Everything inside the second parameter, a code block { }, is executed with a context that see only the replacement system function, including all nested function calls. After this code block, the with_mock function exits and the real base::system is back in scope automatically. There are some limits to what functions can be mocked, but a surprising number of base functions can be over-ridden.
# Pretend that the system call exited with a specific error
with_mock(
`base::system`= function(...) {
stop("It didn't work");
}, {
expect_error( bowtieIndex( bowtieBuildLocation, filename ),
"It didn't work"
)
})
# Pretend that the system call exited with a specific return value
with_mock(
`base::system`= function(...) {
return(127);
}, {
expect_error( bowtieIndex( bowtieBuildLocation, filename ),
"Error from your function on bad return value." )
expect_false( file.exists( "someNotGeneratedFile" ))
})
# Pretend that the system call worked in a specific way
with_mock(
`base::system`= function(...) {
file.create("someGeneratedFile")
return(0);
}, {
# No error
expect_error( got <- bowtieIndex(bowtieBuildLocation, filename), NA )
# File was created
expect_true( file.exists( "someGeneratedFile" ))
})
# got variable holding results is actually available here, outside
# with_mock, so don't need to test everything within the function.
test_equal( got, c( "I", "was", "returned" ))
Is there an R function that lists all the functions in an R script file along with their arguments?
i.e. an output of the form:
func1(var1, var2)
func2(var4, var10)
.
.
.
func10(varA, varB)
Using [sys.]source has the very undesirable side-effect of executing the source inside the file. At the worst this has security problems, but even “benign” code may simply have unintended side-effects when executed. At best it just takes unnecessary time (and potentially a lot).
It’s actually unnecessary to execute the code, though: it is enough to parse it, and then do some syntactical analysis.
The actual code is trivial:
file_parsed = parse(filename)
functions = Filter(is_function, file_parsed)
function_names = unlist(Map(function_name, functions))
And there you go, function_names contains a vector of function names. Extending this to also list the function arguments is left as an exercise to the reader. Hint: there are two approaches. One is to eval the function definition (now that we know it’s a function definition, this is safe); the other is to “cheat” and just get the list of arguments to the function call.
The implementation of the functions used above is also not particularly hard. There’s probably even something already in R core packages (‘utils’ has a lot of stuff) but since I’m not very familiar with this, I’ve just written them myself:
is_function = function (expr) {
if (! is_assign(expr)) return(FALSE)
value = expr[[3L]]
is.call(value) && as.character(value[[1L]]) == 'function'
}
function_name = function (expr) {
as.character(expr[[2L]])
}
is_assign = function (expr) {
is.call(expr) && as.character(expr[[1L]]) %in% c('=', '<-', 'assign')
}
This correctly recognises function declarations of the forms
f = function (…) …
f <- function (…) …
assign('f', function (…) …)
It won’t work for more complex code, since assignments can be arbitrarily complex and in general are only resolvable by actually executing the code. However, the three forms above probably account for ≫ 99% of all named function definitions in practice.
UPDATE: Please refer to the answer by #Konrad Rudolph instead
You can create a new environment, source your file in that environment and then list the functions in it using lsf.str() e.g.
test.env <- new.env()
sys.source("myfile.R", envir = test.env)
lsf.str(envir=test.env)
rm(test.env)
or if you want to wrap it as a function:
listFunctions <- function(filename) {
temp.env <- new.env()
sys.source(filename, envir = temp.env)
functions <- lsf.str(envir=temp.env)
rm(temp.env)
return(functions)
}
Suppose I have a script Foo.r which looks like this.
source('Bar.r')
print("Hello World")
Now suppose further that in Bar.r, I want to return immediately if some condition is true, instead of executing the rest of the script.
if (DO_NOT_RUN_BAR) return; // What should return here be replaced with?
// Remainder of work in bar
In particular, if DO_NOT_RUN_BAR is set, I want Bar.r to return and Foo.r to continue execution, printing Hello World.
I realize that one way to do this is to invert the if statement in Bar.t and wrap the entire script inside an if statement that checks for the logical negation of DO_NOT_RUN_BAR, but I still want to know whether I can replace the return statement in the code above to skip execution of the rest of Bar.r when it is sourced.
By defintion , source parse a file until the end of the file.
So, I would split bar file in 2 parts :
source("bar1.R")
if (DO_RUN_BAR2) source("bar2.R")
print("Hello World")
This is safer than managing global variables with dangerous side effect.
In Bar.r you can do
if (DO_NOT_RUN_BAR) stop();
Then in Foo.r you call it like this
try(source("./bar.r"), silent=T)
Create your script with a multi-line expression with {...} enclosing it, and use stop in the middle:
{ print("test")
print("test")
print("test"); stop()
print("test")
print("test")
print("test") }
#[1] "test"
#[1] "test"
#[1] "test"
#Error:
I have a program that does some data analysis and is a few hundred lines long.
Very early on in the program, I want to do some quality control and if there is not enough data, I want the program to terminate and return to the R console. Otherwise, I want the rest of the code to execute.
I've tried break,browser, and quit and none of them stop the execution of the rest of the program (and quit stops the execution as well as completely quitting R, which is not something I want to happen). My last resort is creating an if-else statement as below:
if(n < 500){}
else{*insert rest of program here*}
but that seems like bad coding practice. Am I missing something?
You could use the stopifnot() function if you want the program to produce an error:
foo <- function(x) {
stopifnot(x > 500)
# rest of program
}
Perhaps you just want to stop executing a long script at some point. ie. like you want to hard code an exit() in C or Python.
print("this is the last message")
stop()
print("you should not see this")
Edited. Thanks to #Droplet, who found a way to make this work without the .Internal(): Here is a way to implement an exit() command in R.
exit <- function() { invokeRestart("abort") }
print("this is the last message")
exit()
print("you should not see this")
Only lightly tested, but when I run this, I see this is the last message and then the script aborts without any error message.
Below is the uglier version from my original answer.
exit <- function() {
.Internal(.invokeRestart(list(NULL, NULL), NULL))
}
Reverse your if-else construction:
if(n >= 500) {
# do stuff
}
# no need for else
Edit: Seems the OP is running a long script, in that case one only needs to wrap the part of the script after the quality control with
if (n >= 500) {
.... long running code here
}
If breaking out of a function, you'll probably just want return(), either explicitly or implicitly.
For example, an explicit double return
foo <- function(x) {
if(x < 10) {
return(NA)
} else {
xx <- seq_len(x)
xx <- cumsum(xx)
}
xx ## return(xx) is implied here
}
> foo(5)
[1] 0
> foo(10)
[1] 1 3 6 10 15 21 28 36 45 55
By return() being implied, I mean that the last line is as if you'd done return(xx), but it is slightly more efficient to leave off the call to return().
Some consider using multiple returns bad style; in long functions, keeping track of where the function exits can become difficult or error prone. Hence an alternative is to have a single return point, but change the return object using the if () else () clause. Such a modification to foo() would be
foo <- function(x) {
## out is NA or cumsum(xx) depending on x
out <- if(x < 10) {
NA
} else {
xx <- seq_len(x)
cumsum(xx)
}
out ## return(out) is implied here
}
> foo(5)
[1] NA
> foo(10)
[1] 1 3 6 10 15 21 28 36 45 55
This is an old question but there is no a clean solution yet. This probably is not answering this specific question, but those looking for answers on 'how to gracefully exit from an R script' will probably land here. It seems that R developers forgot to implement an exit() function. Anyway, the trick I've found is:
continue <- TRUE
tryCatch({
# You do something here that needs to exit gracefully without error.
...
# We now say bye-bye
stop("exit")
}, error = function(e) {
if (e$message != "exit") {
# Your error message goes here. E.g.
stop(e)
}
continue <<-FALSE
})
if (continue) {
# Your code continues here
...
}
cat("done.\n")
Basically, you use a flag to indicate the continuation or not of a specified block of code. Then you use the stop() function to pass a customized message to the error handler of a tryCatch() function. If the error handler receives your message to exit gracefully, then it just ignores the error and set the continuation flag to FALSE.
Here:
if(n < 500)
{
# quit()
# or
# stop("this is some message")
}
else
{
*insert rest of program here*
}
Both quit() and stop(message) will quit your script. If you are sourcing your script from the R command prompt, then quit() will exit from R as well.
I had a similar issue: Exit the current function, but not wanted to finish the rest of the code.
Finally I solved it by a for() loop that runs only once. Inside the for loop you can set several differenct conditions to leave the current loop (function).
for (i in T) {
print('hello')
if (leave.condition) next
print('good bye')
}
You can use the pskill function in the R "tools" package to interrupt the current process and return to the console. Concretely, I have the following function defined in a startup file that I source at the beginning of each script. You can also copy it directly at the start of your code, however. Then insert halt() at any point in your code to stop script execution on the fly. This function works well on GNU/Linux and judging from the R documentation, it should also work on Windows (but I didn't check).
# halt: interrupts the current R process; a short iddle time prevents R from
# outputting further results before the SIGINT (= Ctrl-C) signal is received
halt <- function(hint = "Process stopped.\n") {
writeLines(hint)
require(tools, quietly = TRUE)
processId <- Sys.getpid()
pskill(processId, SIGINT)
iddleTime <- 1.00
Sys.sleep(iddleTime)
}
A while back I got rebuked by Simon Urbanek from the R core team (I believe) for recommending a user to explicitly calling return at the end of a function (his comment was deleted though):
foo = function() {
return(value)
}
instead he recommended:
foo = function() {
value
}
Probably in a situation like this it is required:
foo = function() {
if(a) {
return(a)
} else {
return(b)
}
}
His comment shed some light on why not calling return unless strictly needed is a good thing, but this was deleted.
My question is: Why is not calling return faster or better, and thus preferable?
Question was: Why is not (explicitly) calling return faster or better, and thus preferable?
There is no statement in R documentation making such an assumption.
The main page ?'function' says:
function( arglist ) expr
return(value)
Is it faster without calling return?
Both function() and return() are primitive functions and the function() itself returns last evaluated value even without including return() function.
Calling return() as .Primitive('return') with that last value as an argument will do the same job but needs one call more. So that this (often) unnecessary .Primitive('return') call can draw additional resources.
Simple measurement however shows that the resulting difference is very small and thus can not be the reason for not using explicit return. The following plot is created from data selected this way:
bench_nor2 <- function(x,repeats) { system.time(rep(
# without explicit return
(function(x) vector(length=x,mode="numeric"))(x)
,repeats)) }
bench_ret2 <- function(x,repeats) { system.time(rep(
# with explicit return
(function(x) return(vector(length=x,mode="numeric")))(x)
,repeats)) }
maxlen <- 1000
reps <- 10000
along <- seq(from=1,to=maxlen,by=5)
ret <- sapply(along,FUN=bench_ret2,repeats=reps)
nor <- sapply(along,FUN=bench_nor2,repeats=reps)
res <- data.frame(N=along,ELAPSED_RET=ret["elapsed",],ELAPSED_NOR=nor["elapsed",])
# res object is then visualized
# R version 2.15
The picture above may slightly difffer on your platform.
Based on measured data, the size of returned object is not causing any difference, the number of repeats (even if scaled up) makes just a very small difference, which in real word with real data and real algorithm could not be counted or make your script run faster.
Is it better without calling return?
Return is good tool for clearly designing "leaves" of code where the routine should end, jump out of the function and return value.
# here without calling .Primitive('return')
> (function() {10;20;30;40})()
[1] 40
# here with .Primitive('return')
> (function() {10;20;30;40;return(40)})()
[1] 40
# here return terminates flow
> (function() {10;20;return();30;40})()
NULL
> (function() {10;20;return(25);30;40})()
[1] 25
>
It depends on strategy and programming style of the programmer what style he use, he can use no return() as it is not required.
R core programmers uses both approaches ie. with and without explicit return() as it is possible to find in sources of 'base' functions.
Many times only return() is used (no argument) returning NULL in cases to conditially stop the function.
It is not clear if it is better or not as standard user or analyst using R can not see the real difference.
My opinion is that the question should be: Is there any danger in using explicit return coming from R implementation?
Or, maybe better, user writing function code should always ask: What is the effect in not using explicit return (or placing object to be returned as last leaf of code branch) in the function code?
If everyone agrees that
return is not necessary at the end of a function's body
not using return is marginally faster (according to #Alan's test, 4.3 microseconds versus 5.1)
should we all stop using return at the end of a function? I certainly won't, and I'd like to explain why. I hope to hear if other people share my opinion. And I apologize if it is not a straight answer to the OP, but more like a long subjective comment.
My main problem with not using return is that, as Paul pointed out, there are other places in a function's body where you may need it. And if you are forced to use return somewhere in the middle of your function, why not make all return statements explicit? I hate being inconsistent. Also I think the code reads better; one can scan the function and easily see all exit points and values.
Paul used this example:
foo = function() {
if(a) {
return(a)
} else {
return(b)
}
}
Unfortunately, one could point out that it can easily be rewritten as:
foo = function() {
if(a) {
output <- a
} else {
output <- b
}
output
}
The latter version even conforms with some programming coding standards that advocate one return statement per function. I think a better example could have been:
bar <- function() {
while (a) {
do_stuff
for (b) {
do_stuff
if (c) return(1)
for (d) {
do_stuff
if (e) return(2)
}
}
}
return(3)
}
This would be much harder to rewrite using a single return statement: it would need multiple breaks and an intricate system of boolean variables for propagating them. All this to say that the single return rule does not play well with R. So if you are going to need to use return in some places of your function's body, why not be consistent and use it everywhere?
I don't think the speed argument is a valid one. A 0.8 microsecond difference is nothing when you start looking at functions that actually do something. The last thing I can see is that it is less typing but hey, I'm not lazy.
This is an interesting discussion. I think that #flodel's example is excellent. However, I think it illustrates my point (and #koshke mentions this in a comment) that return makes sense when you use an imperative instead of a functional coding style.
Not to belabour the point, but I would have rewritten foo like this:
foo = function() ifelse(a,a,b)
A functional style avoids state changes, like storing the value of output. In this style, return is out of place; foo looks more like a mathematical function.
I agree with #flodel: using an intricate system of boolean variables in bar would be less clear, and pointless when you have return. What makes bar so amenable to return statements is that it is written in an imperative style. Indeed, the boolean variables represent the "state" changes avoided in a functional style.
It is really difficult to rewrite bar in functional style, because it is just pseudocode, but the idea is something like this:
e_func <- function() do_stuff
d_func <- function() ifelse(any(sapply(seq(d),e_func)),2,3)
b_func <- function() {
do_stuff
ifelse(c,1,sapply(seq(b),d_func))
}
bar <- function () {
do_stuff
sapply(seq(a),b_func) # Not exactly correct, but illustrates the idea.
}
The while loop would be the most difficult to rewrite, because it is controlled by state changes to a.
The speed loss caused by a call to return is negligible, but the efficiency gained by avoiding return and rewriting in a functional style is often enormous. Telling new users to stop using return probably won't help, but guiding them to a functional style will payoff.
#Paul return is necessary in imperative style because you often want to exit the function at different points in a loop. A functional style doesn't use loops, and therefore doesn't need return. In a purely functional style, the final call is almost always the desired return value.
In Python, functions require a return statement. However, if you programmed your function in a functional style, you will likely have only one return statement: at the end of your function.
Using an example from another StackOverflow post, let us say we wanted a function that returned TRUE if all the values in a given x had an odd length. We could use two styles:
# Procedural / Imperative
allOdd = function(x) {
for (i in x) if (length(i) %% 2 == 0) return (FALSE)
return (TRUE)
}
# Functional
allOdd = function(x)
all(length(x) %% 2 == 1)
In a functional style, the value to be returned naturally falls at the ends of the function. Again, it looks more like a mathematical function.
#GSee The warnings outlined in ?ifelse are definitely interesting, but I don't think they are trying to dissuade use of the function. In fact, ifelse has the advantage of automatically vectorizing functions. For example, consider a slightly modified version of foo:
foo = function(a) { # Note that it now has an argument
if(a) {
return(a)
} else {
return(b)
}
}
This function works fine when length(a) is 1. But if you rewrote foo with an ifelse
foo = function (a) ifelse(a,a,b)
Now foo works on any length of a. In fact, it would even work when a is a matrix. Returning a value the same shape as test is a feature that helps with vectorization, not a problem.
It seems that without return() it's faster...
library(rbenchmark)
x <- 1
foo <- function(value) {
return(value)
}
fuu <- function(value) {
value
}
benchmark(foo(x),fuu(x),replications=1e7)
test replications elapsed relative user.self sys.self user.child sys.child
1 foo(x) 10000000 51.36 1.185322 51.11 0.11 0 0
2 fuu(x) 10000000 43.33 1.000000 42.97 0.05 0 0
____EDIT __________________
I proceed to others benchmark (benchmark(fuu(x),foo(x),replications=1e7)) and the result is reversed... I'll try on a server.
My question is: Why is not calling return faster
It’s faster because return is a (primitive) function in R, which means that using it in code incurs the cost of a function call. Compare this to most other programming languages, where return is a keyword, but not a function call: it doesn’t translate to any runtime code execution.
That said, calling a primitive function in this way is pretty fast in R, and calling return incurs a minuscule overhead. This isn’t the argument for omitting return.
or better, and thus preferable?
Because there’s no reason to use it.
Because it’s redundant, and it doesn’t add useful redundancy.
To be clear: redundancy can sometimes be useful. But most redundancy isn’t of this kind. Instead, it’s of the kind that adds visual clutter without adding information: it’s the programming equivalent of a filler word or chartjunk).
Consider the following example of an explanatory comment, which is universally recognised as bad redundancy because the comment merely paraphrases what the code already expresses:
# Add one to the result
result = x + 1
Using return in R falls in the same category, because R is a functional programming language, and in R every function call has a value. This is a fundamental property of R. And once you see R code from the perspective that every expression (including every function call) has a value, the question then becomes: “why should I use return?” There needs to be a positive reason, since the default is not to use it.
One such positive reason is to signal early exit from a function, say in a guard clause:
f = function (a, b) {
if (! precondition(a)) return() # same as `return(NULL)`!
calculation(b)
}
This is a valid, non-redundant use of return. However, such guard clauses are rare in R compared to other languages, and since every expression has a value, a regular if does not require return:
sign = function (num) {
if (num > 0) {
1
} else if (num < 0) {
-1
} else {
0
}
}
We can even rewrite f like this:
f = function (a, b) {
if (precondition(a)) calculation(b)
}
… where if (cond) expr is the same as if (cond) expr else NULL.
Finally, I’d like to forestall three common objections:
Some people argue that using return adds clarity, because it signals “this function returns a value”. But as explained above, every function returns something in R. Thinking of return as a marker of returning a value isn’t just redundant, it’s actively misleading.
Relatedly, the Zen of Python has a marvellous guideline that should always be followed:
Explicit is better than implicit.
How does dropping redundant return not violate this? Because the return value of a function in a functional language is always explicit: it’s its last expression. This is again the same argument about explicitness vs redundancy.
In fact, if you want explicitness, use it to highlight the exception to the rule: mark functions that don’t return a meaningful value, which are only called for their side-effects (such as cat). Except R has a better marker than return for this case: invisible. For instance, I would write
save_results = function (results, file) {
# … code that writes the results to a file …
invisible()
}
But what about long functions? Won’t it be easy to lose track of what is being returned?
Two answers: first, not really. The rule is clear: the last expression of a function is its value. There’s nothing to keep track of.
But more importantly, the problem in long functions isn’t the lack of explicit return markers. It’s the length of the function. Long functions almost (?) always violate the single responsibility principle and even when they don’t they will benefit from being broken apart for readability.
A problem with not putting 'return' explicitly at the end is that if one adds additional statements at the end of the method, suddenly the return value is wrong:
foo <- function() {
dosomething()
}
This returns the value of dosomething().
Now we come along the next day and add a new line:
foo <- function() {
dosomething()
dosomething2()
}
We wanted our code to return the value of dosomething(), but instead it no longer does.
With an explicit return, this becomes really obvious:
foo <- function() {
return( dosomething() )
dosomething2()
}
We can see that there is something strange about this code, and fix it:
foo <- function() {
dosomething2()
return( dosomething() )
}
I think of return as a trick. As a general rule, the value of the last expression evaluated in a function becomes the function's value -- and this general pattern is found in many places. All of the following evaluate to 3:
local({
1
2
3
})
eval(expression({
1
2
3
}))
(function() {
1
2
3
})()
What return does is not really returning a value (this is done with or without it) but "breaking out" of the function in an irregular way. In that sense, it is the closest equivalent of GOTO statement in R (there are also break and next). I use return very rarely and never at the end of a function.
if(a) {
return(a)
} else {
return(b)
}
... this can be rewritten as if(a) a else b which is much better readable and less curly-bracketish. No need for return at all here. My prototypical case of use of "return" would be something like ...
ugly <- function(species, x, y){
if(length(species)>1) stop("First argument is too long.")
if(species=="Mickey Mouse") return("You're kidding!")
### do some calculations
if(grepl("mouse", species)) {
## do some more calculations
if(species=="Dormouse") return(paste0("You're sleeping until", x+y))
## do some more calculations
return(paste0("You're a mouse and will be eating for ", x^y, " more minutes."))
}
## some more ugly conditions
# ...
### finally
return("The end")
}
Generally, the need for many return's suggests that the problem is either ugly or badly structured.
[EDIT]
return doesn't really need a function to work: you can use it to break out of a set of expressions to be evaluated.
getout <- TRUE
# if getout==TRUE then the value of EXP, LOC, and FUN will be "OUTTA HERE"
# .... if getout==FALSE then it will be `3` for all these variables
EXP <- eval(expression({
1
2
if(getout) return("OUTTA HERE")
3
}))
LOC <- local({
1
2
if(getout) return("OUTTA HERE")
3
})
FUN <- (function(){
1
2
if(getout) return("OUTTA HERE")
3
})()
identical(EXP,LOC)
identical(EXP,FUN)
The argument of redundancy has come up a lot here. In my opinion that is not reason enough to omit return().
Redundancy is not automatically a bad thing. When used strategically, redundancy makes code clearer and more maintenable.
Consider this example: Function parameters often have default values. So specifying a value that is the same as the default is redundant. Except it makes obvious the behaviour I expect. No need to pull up the function manpage to remind myself what the defaults are. And no worry about a future version of the function changing its defaults.
With a negligible performance penalty for calling return() (as per the benchmarks posted here by others) it comes down to style rather than right and wrong. For something to be "wrong", there needs to be a clear disadvantage, and nobody here has demonstrated satisfactorily that including or omitting return() has a consistent disadvantage. It seems very case-specific and user-specific.
So here is where I stand on this.
function(){
#do stuff
...
abcd
}
I am uncomfortable with "orphan" variables like in the example above. Was abcd going to be part of a statement I didn't finish writing? Is it a remnant of a splice/edit in my code and needs to be deleted? Did I accidentally paste/move something from somewhere else?
function(){
#do stuff
...
return(abdc)
}
By contrast, this second example makes it obvious to me that it is an intended return value, rather than some accident or incomplete code. For me this redundancy is absolutely not useless.
Of course, once the function is finished and working I could remove the return. But removing it is in itself a redundant extra step, and in my view more useless than including return() in the first place.
All that said, I do not use return() in short unnamed one-liner functions. There it makes up a large fraction of the function's code and therefore mostly causes visual clutter that makes code less legible. But for larger formally defined and named functions, I use it and will likely continue to so.
return can increase code readability:
foo <- function() {
if (a) return(a)
b
}