How to pass FsCheck Test Correctly - functional-programming

let list p = if List.contains " " p || List.contains null p then false else true
I have such a function to check if the list is well formatted or not. The list shouldn't have an empty string and nulls. I don't get what I am missing since Check.Verbose list returns falsifiable output.
How should I approach the problem?

I think you don't quite understand FsCheck yet. When you do Check.Verbose someFunction, FsCheck generates a bunch of random input for your function, and fails if the function ever returns false. The idea is that the function you pass to Check.Verbose should be a property that will always be true no matter what the input is. For example, if you reverse a list twice then it should return the original list no matter what the original list was. This property is usually expressed as follows:
let revTwiceIsSameList (lst : int list) =
List.rev (List.rev lst) = lst
Check.Verbose revTwiceIsSameList // This will pass
Your function, on the other hand, is a good, useful function that checks whether a list is well-formed in your data model... but it's not a property in the sense that FsCheck uses the term (that is, a function that should always return true no matter what the input is). To make an FsCheck-style property, you want to write a function that looks generally like:
let verifyMyFunc (input : string list) =
if (input is well-formed) then // TODO: Figure out how to check that
myFunc input = true
else
myFunc input = false
Check.Verbose verifyMyFunc
(Note that I've named your function myFunc instead of list, because as a general rule, you should never name a function list. The name list is a data type (e.g., string list or int list), and if you name a function list, you'll just confuse yourself later on when the same name has two different meanings.)
Now, the problem here is: how do you write the "input is well-formed" part of my verifyMyFunc example? You can't just use your function to check it, because that would be testing your function against itself, which is not a useful test. (The test would essentially become "myFunc input = myFunc input", which would always return true even if your function had a bug in it — unless your function returned random input, of course). So you'd have to write another function to check if the input is well-formed, and here the problem is that the function you've written is the best, most correct way to check for well-formed input. If you wrote another function to check, it would boil down to not (List.contains "" || List.contains null) in the end, and again, you'd be essentially checking your function against itself.
In this specific case, I don't think FsCheck is the right tool for the job, because your function is so simple. Is this a homework assignment, where your instructor is requiring you to use FsCheck? Or are you trying to learn FsCheck on your own, and using this exercise to teach yourself FsCheck? If it's the former, then I'd suggest pointing your instructor to this question and see what he says about my answer. If it's the latter, then I'd suggest finding some slightly more complicated function to use to learn FsCheck. A useful function here would be one where you can find some property that should always be true, like in the List.rev example (reversing a list twice should restore the original list, so that's a useful property to test with). Or if you're having trouble finding an always-true property, at least find a function that you can implement in at least two different ways, so that you can use FsCheck to check that both implementations return the same result for any given input.

Adding to #rmunn's excellent answer:
if you wanted to test myFunc (yes I also renamed your list function) you could do it by creating some fixed cases that you already know the answer to, like:
let myFunc p = if List.contains " " p || List.contains null p then false else true
let tests =
testList "myFunc" [
testCase "empty list" <| fun()-> "empty" |> Expect.isTrue (myFunc [ ])
testCase "nonempty list" <| fun()-> "hi" |> Expect.isTrue (myFunc [ "hi" ])
testCase "null case" <| fun()-> "null" |> Expect.isFalse (myFunc [ null ])
testCase "empty string" <| fun()-> "\"\"" |> Expect.isFalse (myFunc [ "" ])
]
Tests.runTests config tests
Here I am using a testing library called Expecto.
If you run this you would see one of the tests fails:
Failed! myFunc/empty string:
"". Actual value was true but had expected it to be false.
because your original function has a bug; it checks for space " " instead of empty string "".
After you fix it all tests pass:
4 tests run in 00:00:00.0105346 for myFunc – 4 passed, 0 ignored, 0
failed, 0 errored. Success!
At this point you checked only 4 simple and obvious cases with zero or one element each. Many times functions fail when fed more complex data. The problem is how many more test cases can you add? The possibilities are literally infinite!
FsCheck
This is where FsCheck can help you. With FsCheck you can check for properties (or rules) that should always be true. It takes a little bit of creativity to think of good ones to test for and granted, sometimes it is not easy.
In your case we can test for concatenation. The rule would be like this:
If two lists are concatenated the result of MyFunc applied to the concatenation should be true if both lists are well formed and false if any of them is malformed.
You can express that as a function this way:
let myFuncConcatenation l1 l2 = myFunc (l1 # l2) = (myFunc l1 && myFunc l2)
l1 # l2 is the concatenation of both lists.
Now if you call FsCheck:
FsCheck.Verbose myFuncConcatenation
It tries a 100 different combinations trying to make it fail but in the end it gives you the Ok:
0:
["X"]
["^"; ""]
1:
["C"; ""; "M"]
[]
2:
[""; ""; ""]
[""; null; ""; ""]
3:
...
Ok, passed 100 tests.
This does not necessarily mean your function is correct, there still could be a failing combination that FsCheck did not try or it could be wrong in a different way. But it is a pretty good indication that it is correct in terms of the concatenation property.
Testing for the concatenation property with FsCheck actually allowed us to call myFunc 300 times with different values and prove that it did not crash or returned an unexpected value.
FsCheck does not replace case by case testing, it complements it:
Notice that if you had run FsCheck.Verbose myFuncConcatenation over the original function, which had a bug, it would still pass. The reason is the bug was independent of the concatenation property. This means that you should always have the case by case testing where you check the most important cases and you can complement that with FsCheck to test other situations.
Here are other properties you can check, these test the two false conditions independently:
let myFuncHasNulls l = if List.contains null l then myFunc l = false else true
let myFuncHasEmpty l = if List.contains "" l then myFunc l = false else true
Check.Quick myFuncHasNulls
Check.Quick myFuncHasEmpty
// Ok, passed 100 tests.
// Ok, passed 100 tests.

Related

function return in R programming language

I need help on returning a value/object from function
noReturnKeyword <- function(){
'noReturnKeyword'
}
justReturnValue <- function(){
returnValue('returnValue')
}
justReturn <- function(){
return('justReturn')
}
When I invoked these functions: noReturnKeyword(), justReturnValue(), justReturn(), I got output as [1] "noReturnKeyword", [1] "returnValue", [1] "justReturn" respectively.
My question is, even though I have not used returnValue or return keywords explicitly in noReturnKeyword() I got the output (I mean the value returned by the function).
So what is the difference in these function noReturnKeyword(), justReturnValue(), justReturn()
What is the difference in these words returnValue('') , return('')? are these one and the same?
When to go for returnValue('') and return('') in R functions ?
In R, according to ?return
If the end of a function is reached without calling return, the value of the last evaluated expression is returned.
return is the explicite way to exit a function and set the value that shall be returned. The advantage is that you can use it anywhere in your function.
If there is no explicit return statement R will return the value of the last evaluated expression
returnValueis only defined in a debugging context. The manual states:
the experimental returnValue() function may be called to obtain the
value about to be returned by the function. Calling this function in
other circumstances will give undefined results.
In other words, you shouldn't use that except in the context of on.exit. It does not even work when you try this.
justReturnValue <- function(){
returnValue('returnValue')
2
}
This function will return 2, not "returnValue". What happened in your example is nothing more than the second approach. R evaluates the last statement which is returnValue() and returns exactly that.
If you use solution 1 or 2 is up to you. I personally prefer the explicit way because I believe it makes the code clearer. But that is more a matter of opinion.

How to find if item is contained in Dict in Julia

I'm fairly new to Julia and am trying to figure out how to check if the given expression is contained in a Dict I've created.
function parse( expr::Array{Any} )
if expr[1] == #check here if "expr[1]" is in "owl"
return BinopNode(owl[expr[1]], parse( expr[2] ), parse( expr[3] ) )
end
end
owl = Dict(:+ => +, :- => -, :* => *, :/ => /)
I've looked at Julia's documentation and other resources, but can't find any answer to this.
"owl" is the name of my dictionary that I'm trying to check. I want to run the return statement should expr[1] be either "+,-,* or /".
A standard approach to check if some dictionary contains some key would be:
:+ in keys(owl)
or
haskey(owl, :+)
Your solution depends on the fact that you are sure that 0 is not one of the values in the dictionary, which might not be true in general. However, if you want to use such an approach (it is useful when you do not want to perform a lookup in the dictionary twice: once to check if it contains some key, and second time to get the value assigned to the key if it exists) then normally you would use nothing as a sentinel and then perform the check get_return_value !== nothing (note two = here - they are important for the compiler to generate an efficient code). So your code would look like this:
function myparse(expr::Array{Any}, owl) # better pass `owl` as a parameter to the function
v = get(expr[1], owl, nothing)
if v !== nothing
return BinopNode(v, myparse(expr[2]), myparse(expr[3]))
end
# and what do we do if v === nothing?
end
Note that I use myparse name, as parse is a function defined in Base, so we do not want to have a name clash. Finally your myparse is recursive so you should define a second method to this function handling the case when expr is not an Array{Any}.
I feel like an idiot for finding this so fast, but I came up with the following solution: (Willing to hear more efficient answers however)
yes = 1
yes = get(owl,expr[1],0)
if yes != 0
#do return statement here
"yes" should get set equal to 0 if the expression is not found in the dictionary "owl". So a simple != if statement to see if it's zero fixes my problem.

Julia pass optional parameter to inner function call

I have something like the following
function test(; testvar=nothing)
# only pass testvar to function if it has a useful value
inner_function(testvar != nothing ? testvar2=testvar : # leave it out)
end
# library function, outside my control, testvar2 can't be nothing
function inner_function(; testvar2=useful value)
# do something
end
I know I can use if/else statements within test() but inner_function has lots of parameters so I would prefer to avoid that from a code duplication standpoint. Is this possible?
Note: inner_function cannot have testvar2 = nothing, if testvar2 is passed it has to have a valid value.
As #elsuizo points out, multiple dispatch is one of the core julia features which allows you to perform such operations. Find bellow a quick example:
>>> inner_func(x) = true;
>>> inner_func(::Type{Void}) = false;
Note: it seems Nothing has been renamed to Void (as a warning prompts out when I try to use it).
inner_func has 2 method definitions, if Void is passed as a parameter, the function will just return false (change this behaviour by do nothing or whatever you want to do). Instead, if the function receives anything else than Void, it will just do something else (in this case, return true).
Your test wouldn't have to perform any check at all. Just pass the parameters to the inner_func, and it will decide what to do (which of the 2 methods of the function inner_func to call) depending on the parameter type.
An example:
>>> a = [1, 2, 3, Void, 5];
>>> filter(inner_func, a)
4-element Array{Any,1}:
1
2
3
5
For the numerical elements, the function calls the method of the function that returns true, and thus, the numerical elements are returned. For the Void element, the second method of the function is called, returning false, and thus, not returning such element.

Erlang sudoku solver - How to find the empty spots and try possible values recursively

I have been busy with a sudoku solver in Erlang yesterday and today. The working functionality I have now is that I can check if a sudoku in the form of a list, e.g.,
[6,7,1,8,2,3,4,9,5,5,4,9,1,7,6,3,2,8,3,2,8,5,4,9,1,6,7,1,3,2,6,5,7,8,4,9,9,8,6,4,1,2,5,7,3,4,5,7,3,9,8,6,1,2,8,9,3,2,6,4,7,5,1,7,1,4,9,3,5,2,8,6,2,6,5,7,8,1,9,3,4].
is valid or not by looking at the constraints (no duplicates in squares, rows, and columns).
This function is called valid(S) which takes a sudoku S and returns true if it is a valid sudoku and false if it is not. The function ignores 0's, which are used to represent empty values. This is an example of the same sudoku with some random empty values:
[0,7,1,8,2,3,4,0,5,5,4,9,0,7,6,3,2,8,3,0,8,5,0,9,1,6,7,1,3,2,6,5,7,8,4,9,0,8,6,4,1,2,5,7,0,4,5,7,3,9,8,6,1,0,8,9,3,2,6,4,7,5,1,7,1,4,9,3,0,2,8,6,2,6,5,7,8,1,9,3,4].
The next step is to find the first 0 in the list, and try a value from 1 to 9 and check if it produces a valid sudoku. If it does we can continue to the next 0 and try values there and see if it is valid or not. Once we cannot go further we go back to the previous 0 and try the next values et cetera until we end up with a solved sudoku.
The code I have so far looks like this (based on someone who got it almost working):
solve(First,Nom,[_|Last]) -> try_values({First,Nom,Last},pos()).
try_values(_,[]) -> {error, "No solution found"};
try_values({First,Nom,Last},[N|Pos]) ->
case valid(First++[N]++Last) of
true ->
case solve({First++[N]},Nom,Last) of
{ok,_} -> {ok, "Result"};
{error,_} -> try_values({First,N,Last},Pos)
end;
false -> try_values({First,N,Last},Pos)
end.
pos() is a list consisting of the values from 1 to 9. The idea is that we enter an empty list for First and a Sudoku list for [_|Last] in which we look for a 0 (Nom?). Then we try a value and if the list that results is valid according to our function we continue till we fail the position or have a result. When we fail we return a new try_values with remaining (Pos) values of our possibitilies.
Naturally, this does not work and returns:
5> sudoku:solve([],0,S).
** exception error: bad argument
in operator ++/2
called as {[6]}
++
[1,1,8,2,3,4,0,5,5,4,9,0,7,6,3,2,8,3,2,8,5,4,9,1,6,7,1,3,2|...]
in call from sudoku:try_values/2 (sudoku.erl, line 140)
in call from sudoku:try_values/2 (sudoku.erl, line 142)
With my inexperience I cannot grasp what I need to do to make the code logical and working. I would really appreciate it if someone with more experience could give me some pointers.
try_values([], []) -> error("No solution found");
try_values([Solution], []) -> Solution;
try_values(_, []) -> error("Bad sudoku: multiple solutions");
try_values(Heads, [0|Tail]) ->
NewHeads = case Heads of
[] -> [[P] || P <- pos()];
_ -> [Head++[P] || P <- pos(), Head <- Heads]
end,
ValidHeads = [Head || Head <- NewHeads, valid(Head++Tail)],
try_values(ValidHeads, Tail);
try_values([], [H|Tail]) -> try_values([[H]], Tail);
try_values(Heads, [H|Tail]) -> try_values([Head++[H] || Head <- Heads], Tail).
solve(Board) ->
case valid(Board) of
true -> try_values([], Board);
false -> error("No solution found")
end.
try_values does what you described. It builds solution by going through Board, trying all possible solutions (from pos()) when it finds 0 and collecting valid solutions in ValidHeads to pass them further to continue. Thus, it goes all possible ways, if at some point there are multiple valid sudoku they all will be added to Heads and will be tested on validity on following steps. solve is just a wrapper to call try_values([], Board).
Basically, the way to iterate recursively over 0's is to skip all non-zeros (2 last try_values expression) and do the job on zeros (fourth try_values expression).
First three try_values expressions check if solution is exist and single and return it in that case.

Explicitly calling return in a function or not

A while back I got rebuked by Simon Urbanek from the R core team (I believe) for recommending a user to explicitly calling return at the end of a function (his comment was deleted though):
foo = function() {
return(value)
}
instead he recommended:
foo = function() {
value
}
Probably in a situation like this it is required:
foo = function() {
if(a) {
return(a)
} else {
return(b)
}
}
His comment shed some light on why not calling return unless strictly needed is a good thing, but this was deleted.
My question is: Why is not calling return faster or better, and thus preferable?
Question was: Why is not (explicitly) calling return faster or better, and thus preferable?
There is no statement in R documentation making such an assumption.
The main page ?'function' says:
function( arglist ) expr
return(value)
Is it faster without calling return?
Both function() and return() are primitive functions and the function() itself returns last evaluated value even without including return() function.
Calling return() as .Primitive('return') with that last value as an argument will do the same job but needs one call more. So that this (often) unnecessary .Primitive('return') call can draw additional resources.
Simple measurement however shows that the resulting difference is very small and thus can not be the reason for not using explicit return. The following plot is created from data selected this way:
bench_nor2 <- function(x,repeats) { system.time(rep(
# without explicit return
(function(x) vector(length=x,mode="numeric"))(x)
,repeats)) }
bench_ret2 <- function(x,repeats) { system.time(rep(
# with explicit return
(function(x) return(vector(length=x,mode="numeric")))(x)
,repeats)) }
maxlen <- 1000
reps <- 10000
along <- seq(from=1,to=maxlen,by=5)
ret <- sapply(along,FUN=bench_ret2,repeats=reps)
nor <- sapply(along,FUN=bench_nor2,repeats=reps)
res <- data.frame(N=along,ELAPSED_RET=ret["elapsed",],ELAPSED_NOR=nor["elapsed",])
# res object is then visualized
# R version 2.15
The picture above may slightly difffer on your platform.
Based on measured data, the size of returned object is not causing any difference, the number of repeats (even if scaled up) makes just a very small difference, which in real word with real data and real algorithm could not be counted or make your script run faster.
Is it better without calling return?
Return is good tool for clearly designing "leaves" of code where the routine should end, jump out of the function and return value.
# here without calling .Primitive('return')
> (function() {10;20;30;40})()
[1] 40
# here with .Primitive('return')
> (function() {10;20;30;40;return(40)})()
[1] 40
# here return terminates flow
> (function() {10;20;return();30;40})()
NULL
> (function() {10;20;return(25);30;40})()
[1] 25
>
It depends on strategy and programming style of the programmer what style he use, he can use no return() as it is not required.
R core programmers uses both approaches ie. with and without explicit return() as it is possible to find in sources of 'base' functions.
Many times only return() is used (no argument) returning NULL in cases to conditially stop the function.
It is not clear if it is better or not as standard user or analyst using R can not see the real difference.
My opinion is that the question should be: Is there any danger in using explicit return coming from R implementation?
Or, maybe better, user writing function code should always ask: What is the effect in not using explicit return (or placing object to be returned as last leaf of code branch) in the function code?
If everyone agrees that
return is not necessary at the end of a function's body
not using return is marginally faster (according to #Alan's test, 4.3 microseconds versus 5.1)
should we all stop using return at the end of a function? I certainly won't, and I'd like to explain why. I hope to hear if other people share my opinion. And I apologize if it is not a straight answer to the OP, but more like a long subjective comment.
My main problem with not using return is that, as Paul pointed out, there are other places in a function's body where you may need it. And if you are forced to use return somewhere in the middle of your function, why not make all return statements explicit? I hate being inconsistent. Also I think the code reads better; one can scan the function and easily see all exit points and values.
Paul used this example:
foo = function() {
if(a) {
return(a)
} else {
return(b)
}
}
Unfortunately, one could point out that it can easily be rewritten as:
foo = function() {
if(a) {
output <- a
} else {
output <- b
}
output
}
The latter version even conforms with some programming coding standards that advocate one return statement per function. I think a better example could have been:
bar <- function() {
while (a) {
do_stuff
for (b) {
do_stuff
if (c) return(1)
for (d) {
do_stuff
if (e) return(2)
}
}
}
return(3)
}
This would be much harder to rewrite using a single return statement: it would need multiple breaks and an intricate system of boolean variables for propagating them. All this to say that the single return rule does not play well with R. So if you are going to need to use return in some places of your function's body, why not be consistent and use it everywhere?
I don't think the speed argument is a valid one. A 0.8 microsecond difference is nothing when you start looking at functions that actually do something. The last thing I can see is that it is less typing but hey, I'm not lazy.
This is an interesting discussion. I think that #flodel's example is excellent. However, I think it illustrates my point (and #koshke mentions this in a comment) that return makes sense when you use an imperative instead of a functional coding style.
Not to belabour the point, but I would have rewritten foo like this:
foo = function() ifelse(a,a,b)
A functional style avoids state changes, like storing the value of output. In this style, return is out of place; foo looks more like a mathematical function.
I agree with #flodel: using an intricate system of boolean variables in bar would be less clear, and pointless when you have return. What makes bar so amenable to return statements is that it is written in an imperative style. Indeed, the boolean variables represent the "state" changes avoided in a functional style.
It is really difficult to rewrite bar in functional style, because it is just pseudocode, but the idea is something like this:
e_func <- function() do_stuff
d_func <- function() ifelse(any(sapply(seq(d),e_func)),2,3)
b_func <- function() {
do_stuff
ifelse(c,1,sapply(seq(b),d_func))
}
bar <- function () {
do_stuff
sapply(seq(a),b_func) # Not exactly correct, but illustrates the idea.
}
The while loop would be the most difficult to rewrite, because it is controlled by state changes to a.
The speed loss caused by a call to return is negligible, but the efficiency gained by avoiding return and rewriting in a functional style is often enormous. Telling new users to stop using return probably won't help, but guiding them to a functional style will payoff.
#Paul return is necessary in imperative style because you often want to exit the function at different points in a loop. A functional style doesn't use loops, and therefore doesn't need return. In a purely functional style, the final call is almost always the desired return value.
In Python, functions require a return statement. However, if you programmed your function in a functional style, you will likely have only one return statement: at the end of your function.
Using an example from another StackOverflow post, let us say we wanted a function that returned TRUE if all the values in a given x had an odd length. We could use two styles:
# Procedural / Imperative
allOdd = function(x) {
for (i in x) if (length(i) %% 2 == 0) return (FALSE)
return (TRUE)
}
# Functional
allOdd = function(x)
all(length(x) %% 2 == 1)
In a functional style, the value to be returned naturally falls at the ends of the function. Again, it looks more like a mathematical function.
#GSee The warnings outlined in ?ifelse are definitely interesting, but I don't think they are trying to dissuade use of the function. In fact, ifelse has the advantage of automatically vectorizing functions. For example, consider a slightly modified version of foo:
foo = function(a) { # Note that it now has an argument
if(a) {
return(a)
} else {
return(b)
}
}
This function works fine when length(a) is 1. But if you rewrote foo with an ifelse
foo = function (a) ifelse(a,a,b)
Now foo works on any length of a. In fact, it would even work when a is a matrix. Returning a value the same shape as test is a feature that helps with vectorization, not a problem.
It seems that without return() it's faster...
library(rbenchmark)
x <- 1
foo <- function(value) {
return(value)
}
fuu <- function(value) {
value
}
benchmark(foo(x),fuu(x),replications=1e7)
test replications elapsed relative user.self sys.self user.child sys.child
1 foo(x) 10000000 51.36 1.185322 51.11 0.11 0 0
2 fuu(x) 10000000 43.33 1.000000 42.97 0.05 0 0
____EDIT __________________
I proceed to others benchmark (benchmark(fuu(x),foo(x),replications=1e7)) and the result is reversed... I'll try on a server.
My question is: Why is not calling return faster
It’s faster because return is a (primitive) function in R, which means that using it in code incurs the cost of a function call. Compare this to most other programming languages, where return is a keyword, but not a function call: it doesn’t translate to any runtime code execution.
That said, calling a primitive function in this way is pretty fast in R, and calling return incurs a minuscule overhead. This isn’t the argument for omitting return.
or better, and thus preferable?
Because there’s no reason to use it.
Because it’s redundant, and it doesn’t add useful redundancy.
To be clear: redundancy can sometimes be useful. But most redundancy isn’t of this kind. Instead, it’s of the kind that adds visual clutter without adding information: it’s the programming equivalent of a filler word or chartjunk).
Consider the following example of an explanatory comment, which is universally recognised as bad redundancy because the comment merely paraphrases what the code already expresses:
# Add one to the result
result = x + 1
Using return in R falls in the same category, because R is a functional programming language, and in R every function call has a value. This is a fundamental property of R. And once you see R code from the perspective that every expression (including every function call) has a value, the question then becomes: “why should I use return?” There needs to be a positive reason, since the default is not to use it.
One such positive reason is to signal early exit from a function, say in a guard clause:
f = function (a, b) {
if (! precondition(a)) return() # same as `return(NULL)`!
calculation(b)
}
This is a valid, non-redundant use of return. However, such guard clauses are rare in R compared to other languages, and since every expression has a value, a regular if does not require return:
sign = function (num) {
if (num > 0) {
1
} else if (num < 0) {
-1
} else {
0
}
}
We can even rewrite f like this:
f = function (a, b) {
if (precondition(a)) calculation(b)
}
… where if (cond) expr is the same as if (cond) expr else NULL.
Finally, I’d like to forestall three common objections:
Some people argue that using return adds clarity, because it signals “this function returns a value”. But as explained above, every function returns something in R. Thinking of return as a marker of returning a value isn’t just redundant, it’s actively misleading.
Relatedly, the Zen of Python has a marvellous guideline that should always be followed:
Explicit is better than implicit.
How does dropping redundant return not violate this? Because the return value of a function in a functional language is always explicit: it’s its last expression. This is again the same argument about explicitness vs redundancy.
In fact, if you want explicitness, use it to highlight the exception to the rule: mark functions that don’t return a meaningful value, which are only called for their side-effects (such as cat). Except R has a better marker than return for this case: invisible. For instance, I would write
save_results = function (results, file) {
# … code that writes the results to a file …
invisible()
}
But what about long functions? Won’t it be easy to lose track of what is being returned?
Two answers: first, not really. The rule is clear: the last expression of a function is its value. There’s nothing to keep track of.
But more importantly, the problem in long functions isn’t the lack of explicit return markers. It’s the length of the function. Long functions almost (?) always violate the single responsibility principle and even when they don’t they will benefit from being broken apart for readability.
A problem with not putting 'return' explicitly at the end is that if one adds additional statements at the end of the method, suddenly the return value is wrong:
foo <- function() {
dosomething()
}
This returns the value of dosomething().
Now we come along the next day and add a new line:
foo <- function() {
dosomething()
dosomething2()
}
We wanted our code to return the value of dosomething(), but instead it no longer does.
With an explicit return, this becomes really obvious:
foo <- function() {
return( dosomething() )
dosomething2()
}
We can see that there is something strange about this code, and fix it:
foo <- function() {
dosomething2()
return( dosomething() )
}
I think of return as a trick. As a general rule, the value of the last expression evaluated in a function becomes the function's value -- and this general pattern is found in many places. All of the following evaluate to 3:
local({
1
2
3
})
eval(expression({
1
2
3
}))
(function() {
1
2
3
})()
What return does is not really returning a value (this is done with or without it) but "breaking out" of the function in an irregular way. In that sense, it is the closest equivalent of GOTO statement in R (there are also break and next). I use return very rarely and never at the end of a function.
if(a) {
return(a)
} else {
return(b)
}
... this can be rewritten as if(a) a else b which is much better readable and less curly-bracketish. No need for return at all here. My prototypical case of use of "return" would be something like ...
ugly <- function(species, x, y){
if(length(species)>1) stop("First argument is too long.")
if(species=="Mickey Mouse") return("You're kidding!")
### do some calculations
if(grepl("mouse", species)) {
## do some more calculations
if(species=="Dormouse") return(paste0("You're sleeping until", x+y))
## do some more calculations
return(paste0("You're a mouse and will be eating for ", x^y, " more minutes."))
}
## some more ugly conditions
# ...
### finally
return("The end")
}
Generally, the need for many return's suggests that the problem is either ugly or badly structured.
[EDIT]
return doesn't really need a function to work: you can use it to break out of a set of expressions to be evaluated.
getout <- TRUE
# if getout==TRUE then the value of EXP, LOC, and FUN will be "OUTTA HERE"
# .... if getout==FALSE then it will be `3` for all these variables
EXP <- eval(expression({
1
2
if(getout) return("OUTTA HERE")
3
}))
LOC <- local({
1
2
if(getout) return("OUTTA HERE")
3
})
FUN <- (function(){
1
2
if(getout) return("OUTTA HERE")
3
})()
identical(EXP,LOC)
identical(EXP,FUN)
The argument of redundancy has come up a lot here. In my opinion that is not reason enough to omit return().
Redundancy is not automatically a bad thing. When used strategically, redundancy makes code clearer and more maintenable.
Consider this example: Function parameters often have default values. So specifying a value that is the same as the default is redundant. Except it makes obvious the behaviour I expect. No need to pull up the function manpage to remind myself what the defaults are. And no worry about a future version of the function changing its defaults.
With a negligible performance penalty for calling return() (as per the benchmarks posted here by others) it comes down to style rather than right and wrong. For something to be "wrong", there needs to be a clear disadvantage, and nobody here has demonstrated satisfactorily that including or omitting return() has a consistent disadvantage. It seems very case-specific and user-specific.
So here is where I stand on this.
function(){
#do stuff
...
abcd
}
I am uncomfortable with "orphan" variables like in the example above. Was abcd going to be part of a statement I didn't finish writing? Is it a remnant of a splice/edit in my code and needs to be deleted? Did I accidentally paste/move something from somewhere else?
function(){
#do stuff
...
return(abdc)
}
By contrast, this second example makes it obvious to me that it is an intended return value, rather than some accident or incomplete code. For me this redundancy is absolutely not useless.
Of course, once the function is finished and working I could remove the return. But removing it is in itself a redundant extra step, and in my view more useless than including return() in the first place.
All that said, I do not use return() in short unnamed one-liner functions. There it makes up a large fraction of the function's code and therefore mostly causes visual clutter that makes code less legible. But for larger formally defined and named functions, I use it and will likely continue to so.
return can increase code readability:
foo <- function() {
if (a) return(a)
b
}

Resources