Nargin function in R (number of function inputs) - r

Goal
I am trying to create a function in R to replicate the functionality of a homonymous MATLAB function which returns the number of arguments that were passed to a function.
Example
Consider the function below:
addme <- function(a, b) {
if (nargin() == 2) {
c <- a + b
} else if (nargin() == 1) {
c <- a + a
} else {
c <- 0
}
return(c)
}
Once the user runs addme(), I want nargin() to basically look at how many parameters were passed―2 (a and b), only 1 (a) or none―and calculate c accordingly.
What I have tried
After spending a lot of time messing around with environments, this is the closest I ever got to a working solution:
nargin <- function() {
length(as.list(match.call(envir = parent.env(environment()))))
}
The problem with this function is that it always returns 0, and the reason why is that I think it's looking at its own environment instead of its parent's (in spite of my attempt of throwing in a parent.env there).
I know I can use missing() and args() inside addme() to achieve the same functionality, but I'll be needing this quite a few other times throughout my project, so wrapping it in a function is definitely something I should try to do.
Question
How can I get nargin() to return the number of arguments that were passed to its parent function?

You could use
nargin <- function() {
if(sys.nframe()<2) stop("must be called from inside a function")
length(as.list(sys.call(-1)))-1
}
Basically you just use sys.call(-1) to go up the call stack to the calling function and get it's call and then count the number of elements and subtract one for the function name itself.

Related

Is it valid to access global variables in R function and how to assign it in a package?

I have a package which provides a script and some functions. Within the script I assign a variable which will be used by the function. This works if the function gets executed within the script but might fail if I just call the function since the variable doesn't exist.
If I use devtools::check() I get warnings, that the variable within the function isn't defined. How can I handle this properly?
Edit
I am thinking about to use get() within the function to assign the variable within the function to get rid of this warnings. So the question is, is myp2 the correct way of doing something like this? Maybe some trycatch to handle errors?
ab <- c(1,2,3)
myp1 <- function() {
print(ab)
return(1)
}
myp2 <- function() {
ab <- get('ab')
print(ab)
return(1)
}
myp1()
myp2()
You could do something like
if(!exists("your variable")){
stop("You have not defined your variable")}
This would check to see if what you are looking for exists. A better practice would be to define the variable in the function and have the default value be the name of the thing for which you are looking.
myp <- function(x) {
print(x)
return(1)
}
ab <- c(1,2,3)
myp(x = ab)
If possible, it would be also better to substitute the script with a function.

Rcpp function to construct a function

In R the possibility exists to have a function that creates another function, e.g.
create_ax2 <- function(a) {
ax2 <- function(x) {
y <- a * x^2
return(y)
}
return(ax2)
}
The result of which is
> fun <- create_ax2(3)
> fun(1)
[1] 3
> fun(2)
[1] 12
> fun(2.5)
[1] 18.75
I have such a complicated create function in R which take a couple of arguments, sets some of the constants used in the returned function, does some intermediary computations etc... But the result is a function that is way too slow. Hence I tried to translate the code to C++ to use it with Rcpp. However, I can't figure out a way to construct a function inside a C++ function and return it to be used in R.
This is what I have so far:
Rcpp::Function createax2Rcpp(int a) {
double ax2(double x) {
return(a * pow(x, 2));
};
return (ax2);
}
This gives me the error 'function definition is not allowed here', I am stuck about how to create the function.
EDIT: The question RcppArmadillo pass user-defined function comes close, but as far as I can tell, it only provides a way to pass a C++ function to R. It does not provide a way to initialise some values in the C++ function before it is passed to R.
Ok, as far as I understand, you want a function returning function with a closure, a.k.a. " the function defined in the closure 'remembers' the environment in which it was created."
In C++11 and up it is quite possible to define such function, along the lines
std::function<double(double)> createax2Rcpp(int a) {
auto ax2 = [a](double x) { return(double(a) * pow(x, 2)); };
return ax2;
}
What happens, the anonymous class and object with overloaded operator() will be created, it will capture the closure and moved out of the creator function. Return will be captured into instance of std::function with type erasure etc.
But! C/C++ function in R requires to be of a certain type, which is narrower (as an opposite to wider, you could capture narrow objects into wide one, but not vice versa).
Thus, I don't know how to make from std::function a proper R function, looks like it is impossible.
Perhaps, emulation of the closure like below might help
static int __a;
double ax2(double x) {
return(__a * pow(x, 2));
}
Rcpp::Function createax2Rcpp(int a) {
__a = a;
return (ax2);
}

Converting positional arguments to named parameters in an R function based on variable name

In R there's common function calling pattern that looks like this:
child = function(a, b, c) {
a * b - c
}
parent = function(a, b, c) {
result = child(a=a, b=b, c=c)
}
This repetition of names is helpful because it prevents potentially insidious errors if the ordering of the child's arguments were to change, or if an additional variable were to be added into the list:
childReordered = function(c, b, a) { # same args in different order
a * b - c
}
parent = function(a, b, c) {
result = childReordered(a, b, c) # probably an error
}
But this becomes cumbersome when the names of the arguments get long:
child = function(longVariableNameA, longVariableNameB, longVariableNameC) {
longVariableNameA * longVariableNameB - longVariableNameC
}
parent = function(longVariableNameA, longVariableNameB, longVariableNameC) {
child(longVariableNameA=longVariableNameA, longVariableNameB=longVariableNameB, longVariableNameC=longVariableNameB)
}
I'd like to have a way to get the safety of the named parameters without needing to actually type the names again. And I'd like to be able to do this when I can modify only the parent, and not the child. I'd envision something like this:
parent = function(a, b, c) {
result = child(params(a, b, c))
}
Where params() would be a new function that converts the unnamed arguments to named parameters based on the names of the variables. For example:
child(params(c,b,a)) == child(a=a, b=b, c=c)
There are a couple function in 'pryr' that come close to this, but I haven't figured out how to combine them to do quite what I want. named_dots(c,b,a) returns list(c=c, b=b, a=a), and standardise_call() has a similar operation, but I haven't figured out how to be able to convert the results into something that can be passed to an unmodified child().
I'd like to be able to use a mixture of implicit and explicitly named parameters:
child(params(b=3, c=a, a)) == child(b=3, c=a, a=a)
It would also be nice to be able to mix in some unnamed constants (not variables), and have them treated as named arguments when passed to the child.
child(params(7, c, b=x)) == child(a=7, c=c, b=x) # desired
named_dots(7, c, b=x) == list("7"=7, c=c, b=x) # undesired
But in a non-R way, I'd prefer to raise errors rather than trying to muddle through with what are likely programmer mistakes:
child(params(c, 7, b=x)) # ERROR: unnamed parameters must be first
Are there tools that already exist to do this? Simple ways to piece together existing functions to do what I want? Better ways to accomplish the same goal of getting safety in the presence of changing parameter lists without unwieldy repetition? Improvements to my suggested syntax to make it even safer?
Pre-bounty clarification: Both the parent() and child() functions should be considered unchangeable. I'm not interested in wrapping either with a different interface. Rather, I'm looking here for a way to write the proposed params() function in a general manner that can rewrite the list of arguments on the fly so that both parent() and child() can be used directly with a safe but non-verbose syntax.
Post-bounty clarification: While inverting the parent-child relationship and using do.call() is a useful technique, it's not the one I'm looking for here. Instead, I'm looking for a way to accept a '...' argument, modify it to have named parameters, and then return it in a form that the enclosing function will accept. It's possible that as others suggest this is truly impossible. Personally, I currently think it is possible with a C level extension, and my hope is that this extension already exists. Perhaps the vadr package does what I want? https://github.com/crowding/vadr#dot-dot-dot-lists-and-missing-values
Partial credit: I feel silly just letting the bounty expire. If there are no full solutions, I'll award it to anyone who gives a proof of concept of at least one of the necessary steps. For example, modifying a '...' argument within a function and then passing it to another function without using do.call(). Or returning an unmodified '...' argument in a way that the parent can use it. Or anything that best points the way toward a direct solution, or even some useful links: http://r.789695.n4.nabble.com/internal-manipulation-of-td4682090.html But I'm reluctant to award it to an answer that starts with the (otherwise entirely reasonable) premise that "you don't want to do that", or "that's impossible so here's an alternative".
Bounty awarded: There are several really useful and practical answers, but I chose to award the bounty to #crowding. While he (probably correctly) asserts that what I want is impossible, I think his answer comes closest to the 'idealistic' approach I'm aiming for. I also think that his vadr package might be a good starting point for a solution, whether it matches my (potentially unrealistic) design goals or not. The 'accepted answer' is still up for grabs if in case someone figures out a way to do the impossible. Thanks for the other answers and suggestions, and hopefully they will help someone put together the pieces for a more robust R syntax.
I think attempting to overwrite the built argument matching functionality of R is somewhat dangerous, so here is a solution that uses do.call.
It was unclear how much of parent is changeable
# This ensures that only named arguments to formals get passed through
parent = function(a, b, c) {
do.call("child", mget(names(formals(child))))
}
A second option, based on the "magic" of write.csv
# this second option replaces the call to parent with child and passes the
# named arguments that have been matched within the call to parent
#
parent2 <- function(a,b,c){
Call <- match.call()
Call[[1]] <- quote(child)
eval(Call)
}
You can't change the parameters to a function from inside the function call. The next best way would be to write a simple wrapper around the call. Perhaps something like this can help
with_params <- function(f, ...) {
dots <- substitute(...())
dots <- setNames(dots, sapply(dots, deparse))
do.call(f, as.list(dots), envir=parent.frame())
}
And we can test with something like
parent1 <- function(a, b, c) {
child(a, b, c)
}
parent2 <- function(a, b, c) {
with_params(child, a, b, c)
}
child <- function(a, b, c) {
a * b - c
}
parent1(5,6,7)
# [1] 23
parent2(5,6,7)
# [1] 23
child <- function(c, a, b) {
a * b - c
}
parent1(5,6,7)
# [1] 37
parent2(5,6,7)
# [1] 23
Note that parent2 is robust to change in the parameter order of the child while parent is.
It's not easy to get the exact syntax you're proposing. R is lazily evaluated, so syntax appearing in an argument to a function is only looked at after the function call is started. By the time the interpreter encounters params() it will have already started the call to child() and bound all of child's arguments and executed some of child's code. Can't rewrite child's arguments after the horse has left the barn, so to speak.
(Most non-lazy languages wouldn't let you do this either, for various reasons)
So the syntax will need to be something that that has the the references to 'child' and 'params' in its arguments. vadr has a %()% operator that can work. It applies a dotlist given on the right to a function given on the left. So you would write:
child %()% params(a, b, c)
where params catches its arguments in a dotlist and manipulates them:
params <- function(...) {
d <- dots(...)
ex <- expressions(d)
nm <- names(d) %||% rep("", length(d))
for (i in 1:length(d)) {
if (is.name(ex[[i]]) && nm[[i]] == "") {
nm[[i]] = as.character(ex[[i]])
}
}
names(d) <- nm
d
}
This was meant to be a comment but it doesn't fit the limits. I commend you for your programming ambition and purity, but I think the goal is unattainable. Let's assume params exists and apply it to the function list. By the definition of params, identical(list(a = a, b = b, c = c) , list(params(a, b, c)). From this it follows that identical(a, params(a, b, c)) by taking the first element of the first and second argument of identical. From which it follows that params does not depend on its second and later arguments, a contradiction. Q.E.D. But I think your idea is a lovely example of DRY in R and I am perfectly happy with do.call(f, params(a,b,c)), which has an additional do.call, but no repetition. With your permission I would like to incorporate it in my package bettR which collects various ideas to improve the R language. A related idea which I was toying with is creating a function that allows another function to get missing args from the calling frame. That is, instead of calling f(a = a, b = b), one could call f() and inside f there would be something like args = eval(formals(f), parent.frame()) but encapsulated into a macro-like construct args = get.args.by.name or some such. This is distinct from your idea in that it requires f to be programmed deliberately to have this feature.
Here's an answer that at first would appear to work. Diagnosing why it does not may lead to enlightenment as to why it cannot (hint: see first paragraph of #crowding's answer).
params<-function(...) {
dots<-list(...)
names(dots)<-eval(substitute(alist(...)))
child.env<-sys.frame(-1)
child.fun<-sys.function(sys.parent()+1)
args<-names(formals(child.fun))
for(arg in args) {
assign(arg,dots[[arg]],envir=child.env)
}
dots[[args[[1]]]]
}
child1<-function(a,b,c) a*b-c
parent1<-function(a,b,c) child1(params(a,b,c))
parent1(1,2,3)
#> -1
child2<-function(a,c,b) a*b-c #swap b and c in formals
parent2<-function(a,b,c) child2(params(a,b,c)) #mirrors parent1
parent2(1,2,3)
#> -1
Both produce 1*2-3 == -1 even though the order of b=2 and c=3 have been swapped in the formal argument list of child1 versus child2.
This is basically a clarification to Mnel's answer. If it happens to answer your question, please do not accept it or award the bounty; it should go to Mnel. First, we define call_as_parent, which you use to call a function inside another function as the outside function:
call_as_parent <- function(fun) {
par.call <- sys.call(sys.parent())
new.call <- match.call(
eval(par.call[[1L]], parent.frame(2L)),
call=par.call
)
new.call[[1L]] <- fun
eval(new.call, parent.frame())
}
Then we define parent and child:
child <- function(c, b, a) a - b - c
parent <- function(a, b, c) call_as_parent(child)
And finally, some examples
child(5, 10, 20)
# [1] 5
parent(5, 10, 20)
# [1] -25
Notice how clearly in the second example the 20 is getting matched to c, as it should.

Explicitly calling return in a function or not

A while back I got rebuked by Simon Urbanek from the R core team (I believe) for recommending a user to explicitly calling return at the end of a function (his comment was deleted though):
foo = function() {
return(value)
}
instead he recommended:
foo = function() {
value
}
Probably in a situation like this it is required:
foo = function() {
if(a) {
return(a)
} else {
return(b)
}
}
His comment shed some light on why not calling return unless strictly needed is a good thing, but this was deleted.
My question is: Why is not calling return faster or better, and thus preferable?
Question was: Why is not (explicitly) calling return faster or better, and thus preferable?
There is no statement in R documentation making such an assumption.
The main page ?'function' says:
function( arglist ) expr
return(value)
Is it faster without calling return?
Both function() and return() are primitive functions and the function() itself returns last evaluated value even without including return() function.
Calling return() as .Primitive('return') with that last value as an argument will do the same job but needs one call more. So that this (often) unnecessary .Primitive('return') call can draw additional resources.
Simple measurement however shows that the resulting difference is very small and thus can not be the reason for not using explicit return. The following plot is created from data selected this way:
bench_nor2 <- function(x,repeats) { system.time(rep(
# without explicit return
(function(x) vector(length=x,mode="numeric"))(x)
,repeats)) }
bench_ret2 <- function(x,repeats) { system.time(rep(
# with explicit return
(function(x) return(vector(length=x,mode="numeric")))(x)
,repeats)) }
maxlen <- 1000
reps <- 10000
along <- seq(from=1,to=maxlen,by=5)
ret <- sapply(along,FUN=bench_ret2,repeats=reps)
nor <- sapply(along,FUN=bench_nor2,repeats=reps)
res <- data.frame(N=along,ELAPSED_RET=ret["elapsed",],ELAPSED_NOR=nor["elapsed",])
# res object is then visualized
# R version 2.15
The picture above may slightly difffer on your platform.
Based on measured data, the size of returned object is not causing any difference, the number of repeats (even if scaled up) makes just a very small difference, which in real word with real data and real algorithm could not be counted or make your script run faster.
Is it better without calling return?
Return is good tool for clearly designing "leaves" of code where the routine should end, jump out of the function and return value.
# here without calling .Primitive('return')
> (function() {10;20;30;40})()
[1] 40
# here with .Primitive('return')
> (function() {10;20;30;40;return(40)})()
[1] 40
# here return terminates flow
> (function() {10;20;return();30;40})()
NULL
> (function() {10;20;return(25);30;40})()
[1] 25
>
It depends on strategy and programming style of the programmer what style he use, he can use no return() as it is not required.
R core programmers uses both approaches ie. with and without explicit return() as it is possible to find in sources of 'base' functions.
Many times only return() is used (no argument) returning NULL in cases to conditially stop the function.
It is not clear if it is better or not as standard user or analyst using R can not see the real difference.
My opinion is that the question should be: Is there any danger in using explicit return coming from R implementation?
Or, maybe better, user writing function code should always ask: What is the effect in not using explicit return (or placing object to be returned as last leaf of code branch) in the function code?
If everyone agrees that
return is not necessary at the end of a function's body
not using return is marginally faster (according to #Alan's test, 4.3 microseconds versus 5.1)
should we all stop using return at the end of a function? I certainly won't, and I'd like to explain why. I hope to hear if other people share my opinion. And I apologize if it is not a straight answer to the OP, but more like a long subjective comment.
My main problem with not using return is that, as Paul pointed out, there are other places in a function's body where you may need it. And if you are forced to use return somewhere in the middle of your function, why not make all return statements explicit? I hate being inconsistent. Also I think the code reads better; one can scan the function and easily see all exit points and values.
Paul used this example:
foo = function() {
if(a) {
return(a)
} else {
return(b)
}
}
Unfortunately, one could point out that it can easily be rewritten as:
foo = function() {
if(a) {
output <- a
} else {
output <- b
}
output
}
The latter version even conforms with some programming coding standards that advocate one return statement per function. I think a better example could have been:
bar <- function() {
while (a) {
do_stuff
for (b) {
do_stuff
if (c) return(1)
for (d) {
do_stuff
if (e) return(2)
}
}
}
return(3)
}
This would be much harder to rewrite using a single return statement: it would need multiple breaks and an intricate system of boolean variables for propagating them. All this to say that the single return rule does not play well with R. So if you are going to need to use return in some places of your function's body, why not be consistent and use it everywhere?
I don't think the speed argument is a valid one. A 0.8 microsecond difference is nothing when you start looking at functions that actually do something. The last thing I can see is that it is less typing but hey, I'm not lazy.
This is an interesting discussion. I think that #flodel's example is excellent. However, I think it illustrates my point (and #koshke mentions this in a comment) that return makes sense when you use an imperative instead of a functional coding style.
Not to belabour the point, but I would have rewritten foo like this:
foo = function() ifelse(a,a,b)
A functional style avoids state changes, like storing the value of output. In this style, return is out of place; foo looks more like a mathematical function.
I agree with #flodel: using an intricate system of boolean variables in bar would be less clear, and pointless when you have return. What makes bar so amenable to return statements is that it is written in an imperative style. Indeed, the boolean variables represent the "state" changes avoided in a functional style.
It is really difficult to rewrite bar in functional style, because it is just pseudocode, but the idea is something like this:
e_func <- function() do_stuff
d_func <- function() ifelse(any(sapply(seq(d),e_func)),2,3)
b_func <- function() {
do_stuff
ifelse(c,1,sapply(seq(b),d_func))
}
bar <- function () {
do_stuff
sapply(seq(a),b_func) # Not exactly correct, but illustrates the idea.
}
The while loop would be the most difficult to rewrite, because it is controlled by state changes to a.
The speed loss caused by a call to return is negligible, but the efficiency gained by avoiding return and rewriting in a functional style is often enormous. Telling new users to stop using return probably won't help, but guiding them to a functional style will payoff.
#Paul return is necessary in imperative style because you often want to exit the function at different points in a loop. A functional style doesn't use loops, and therefore doesn't need return. In a purely functional style, the final call is almost always the desired return value.
In Python, functions require a return statement. However, if you programmed your function in a functional style, you will likely have only one return statement: at the end of your function.
Using an example from another StackOverflow post, let us say we wanted a function that returned TRUE if all the values in a given x had an odd length. We could use two styles:
# Procedural / Imperative
allOdd = function(x) {
for (i in x) if (length(i) %% 2 == 0) return (FALSE)
return (TRUE)
}
# Functional
allOdd = function(x)
all(length(x) %% 2 == 1)
In a functional style, the value to be returned naturally falls at the ends of the function. Again, it looks more like a mathematical function.
#GSee The warnings outlined in ?ifelse are definitely interesting, but I don't think they are trying to dissuade use of the function. In fact, ifelse has the advantage of automatically vectorizing functions. For example, consider a slightly modified version of foo:
foo = function(a) { # Note that it now has an argument
if(a) {
return(a)
} else {
return(b)
}
}
This function works fine when length(a) is 1. But if you rewrote foo with an ifelse
foo = function (a) ifelse(a,a,b)
Now foo works on any length of a. In fact, it would even work when a is a matrix. Returning a value the same shape as test is a feature that helps with vectorization, not a problem.
It seems that without return() it's faster...
library(rbenchmark)
x <- 1
foo <- function(value) {
return(value)
}
fuu <- function(value) {
value
}
benchmark(foo(x),fuu(x),replications=1e7)
test replications elapsed relative user.self sys.self user.child sys.child
1 foo(x) 10000000 51.36 1.185322 51.11 0.11 0 0
2 fuu(x) 10000000 43.33 1.000000 42.97 0.05 0 0
____EDIT __________________
I proceed to others benchmark (benchmark(fuu(x),foo(x),replications=1e7)) and the result is reversed... I'll try on a server.
My question is: Why is not calling return faster
It’s faster because return is a (primitive) function in R, which means that using it in code incurs the cost of a function call. Compare this to most other programming languages, where return is a keyword, but not a function call: it doesn’t translate to any runtime code execution.
That said, calling a primitive function in this way is pretty fast in R, and calling return incurs a minuscule overhead. This isn’t the argument for omitting return.
or better, and thus preferable?
Because there’s no reason to use it.
Because it’s redundant, and it doesn’t add useful redundancy.
To be clear: redundancy can sometimes be useful. But most redundancy isn’t of this kind. Instead, it’s of the kind that adds visual clutter without adding information: it’s the programming equivalent of a filler word or chartjunk).
Consider the following example of an explanatory comment, which is universally recognised as bad redundancy because the comment merely paraphrases what the code already expresses:
# Add one to the result
result = x + 1
Using return in R falls in the same category, because R is a functional programming language, and in R every function call has a value. This is a fundamental property of R. And once you see R code from the perspective that every expression (including every function call) has a value, the question then becomes: “why should I use return?” There needs to be a positive reason, since the default is not to use it.
One such positive reason is to signal early exit from a function, say in a guard clause:
f = function (a, b) {
if (! precondition(a)) return() # same as `return(NULL)`!
calculation(b)
}
This is a valid, non-redundant use of return. However, such guard clauses are rare in R compared to other languages, and since every expression has a value, a regular if does not require return:
sign = function (num) {
if (num > 0) {
1
} else if (num < 0) {
-1
} else {
0
}
}
We can even rewrite f like this:
f = function (a, b) {
if (precondition(a)) calculation(b)
}
… where if (cond) expr is the same as if (cond) expr else NULL.
Finally, I’d like to forestall three common objections:
Some people argue that using return adds clarity, because it signals “this function returns a value”. But as explained above, every function returns something in R. Thinking of return as a marker of returning a value isn’t just redundant, it’s actively misleading.
Relatedly, the Zen of Python has a marvellous guideline that should always be followed:
Explicit is better than implicit.
How does dropping redundant return not violate this? Because the return value of a function in a functional language is always explicit: it’s its last expression. This is again the same argument about explicitness vs redundancy.
In fact, if you want explicitness, use it to highlight the exception to the rule: mark functions that don’t return a meaningful value, which are only called for their side-effects (such as cat). Except R has a better marker than return for this case: invisible. For instance, I would write
save_results = function (results, file) {
# … code that writes the results to a file …
invisible()
}
But what about long functions? Won’t it be easy to lose track of what is being returned?
Two answers: first, not really. The rule is clear: the last expression of a function is its value. There’s nothing to keep track of.
But more importantly, the problem in long functions isn’t the lack of explicit return markers. It’s the length of the function. Long functions almost (?) always violate the single responsibility principle and even when they don’t they will benefit from being broken apart for readability.
A problem with not putting 'return' explicitly at the end is that if one adds additional statements at the end of the method, suddenly the return value is wrong:
foo <- function() {
dosomething()
}
This returns the value of dosomething().
Now we come along the next day and add a new line:
foo <- function() {
dosomething()
dosomething2()
}
We wanted our code to return the value of dosomething(), but instead it no longer does.
With an explicit return, this becomes really obvious:
foo <- function() {
return( dosomething() )
dosomething2()
}
We can see that there is something strange about this code, and fix it:
foo <- function() {
dosomething2()
return( dosomething() )
}
I think of return as a trick. As a general rule, the value of the last expression evaluated in a function becomes the function's value -- and this general pattern is found in many places. All of the following evaluate to 3:
local({
1
2
3
})
eval(expression({
1
2
3
}))
(function() {
1
2
3
})()
What return does is not really returning a value (this is done with or without it) but "breaking out" of the function in an irregular way. In that sense, it is the closest equivalent of GOTO statement in R (there are also break and next). I use return very rarely and never at the end of a function.
if(a) {
return(a)
} else {
return(b)
}
... this can be rewritten as if(a) a else b which is much better readable and less curly-bracketish. No need for return at all here. My prototypical case of use of "return" would be something like ...
ugly <- function(species, x, y){
if(length(species)>1) stop("First argument is too long.")
if(species=="Mickey Mouse") return("You're kidding!")
### do some calculations
if(grepl("mouse", species)) {
## do some more calculations
if(species=="Dormouse") return(paste0("You're sleeping until", x+y))
## do some more calculations
return(paste0("You're a mouse and will be eating for ", x^y, " more minutes."))
}
## some more ugly conditions
# ...
### finally
return("The end")
}
Generally, the need for many return's suggests that the problem is either ugly or badly structured.
[EDIT]
return doesn't really need a function to work: you can use it to break out of a set of expressions to be evaluated.
getout <- TRUE
# if getout==TRUE then the value of EXP, LOC, and FUN will be "OUTTA HERE"
# .... if getout==FALSE then it will be `3` for all these variables
EXP <- eval(expression({
1
2
if(getout) return("OUTTA HERE")
3
}))
LOC <- local({
1
2
if(getout) return("OUTTA HERE")
3
})
FUN <- (function(){
1
2
if(getout) return("OUTTA HERE")
3
})()
identical(EXP,LOC)
identical(EXP,FUN)
The argument of redundancy has come up a lot here. In my opinion that is not reason enough to omit return().
Redundancy is not automatically a bad thing. When used strategically, redundancy makes code clearer and more maintenable.
Consider this example: Function parameters often have default values. So specifying a value that is the same as the default is redundant. Except it makes obvious the behaviour I expect. No need to pull up the function manpage to remind myself what the defaults are. And no worry about a future version of the function changing its defaults.
With a negligible performance penalty for calling return() (as per the benchmarks posted here by others) it comes down to style rather than right and wrong. For something to be "wrong", there needs to be a clear disadvantage, and nobody here has demonstrated satisfactorily that including or omitting return() has a consistent disadvantage. It seems very case-specific and user-specific.
So here is where I stand on this.
function(){
#do stuff
...
abcd
}
I am uncomfortable with "orphan" variables like in the example above. Was abcd going to be part of a statement I didn't finish writing? Is it a remnant of a splice/edit in my code and needs to be deleted? Did I accidentally paste/move something from somewhere else?
function(){
#do stuff
...
return(abdc)
}
By contrast, this second example makes it obvious to me that it is an intended return value, rather than some accident or incomplete code. For me this redundancy is absolutely not useless.
Of course, once the function is finished and working I could remove the return. But removing it is in itself a redundant extra step, and in my view more useless than including return() in the first place.
All that said, I do not use return() in short unnamed one-liner functions. There it makes up a large fraction of the function's code and therefore mostly causes visual clutter that makes code less legible. But for larger formally defined and named functions, I use it and will likely continue to so.
return can increase code readability:
foo <- function() {
if (a) return(a)
b
}

Forcing specific data types as arguments to a function

I was just wondering if there was a way to force a function to only accept certain data types, without having to check for it within the function; or, is this not possible because R's type-checking is done at runtime (as opposed to those programming languages, such as Java, where type-checking is done during compilation)?
For example, in Java, you have to specify a data type:
class t2 {
public int addone (int n) {
return n+1;
}
}
In R, a similar function might be
addone <- function(n)
{
return(n+1)
}
but if a vector is supplied, a vector will (obviously) be returned. If you only want a single integer to be accepted, then is the only way to do to have a condition within the function, along the lines of
addone <- function(n)
{
if(is.vector(n) && length(n)==1)
{
return(n+1)
} else
{
return ("You must enter a single integer")
}
}
Thanks,
Chris
This is entirely possible using S3 classes. Your example is somewhat contrived in the context or R, since I can't think of a practical reason why one would want to create a class of a single value. Nonetheless, this is possible. As an added bonus, I demonstrate how the function addone can be used to add the value of one to numeric vectors (trivial) and character vectors (so A turns to B, etc.):
Start by creating a generic S3 method for addone, utlising the S3 despatch mechanism UseMethod:
addone <- function(x){
UseMethod("addone", x)
}
Next, create the contrived class single, defined as the first element of whatever is passed to it:
as.single <- function(x){
ret <- unlist(x)[1]
class(ret) <- "single"
ret
}
Now create methods to handle the various classes. The default method will be called unless a specific class is defined:
addone.default <- function(x) x + 1
addone.character <- function(x)rawToChar(as.raw(as.numeric(charToRaw(x))+1))
addone.single <- function(x)x + 1
Finally, test it with some sample data:
addone(1:5)
[1] 2 3 4 5 6
addone(as.single(1:5))
[1] 2
attr(,"class")
[1] "single"
addone("abc")
[1] "bcd"
Some additional information:
Hadley's devtools wiki is a valuable source of information on all things, including the S3 object system.
The S3 method doesn't provide strict typing. It can quite easily be abused. For stricter object orientation, have a look at S4 classes, reference based classesor the proto package for Prototype object-based programming.
You could write a wrapper like the following:
check.types = function(classes, func) {
n = as.name
params = formals(func)
param.names = lapply(names(params), n)
handler = function() { }
formals(handler) = params
checks = lapply(seq_along(param.names), function(I) {
as.call(list(n('assert.class'), param.names[[I]], classes[[I]]))
})
body(handler) = as.call(c(
list(n('{')),
checks,
list(as.call(list(n('<-'), n('.func'), func))),
list(as.call(c(list(n('.func')), lapply(param.names, as.name))))
))
handler
}
assert.class = function(x, cls) {
stopifnot(cls %in% class(x))
}
And use it like
f = check.types(c('numeric', 'numeric'), function(x, y) {
x + y
})
> f(1, 2)
[1] 3
> f("1", "2")
Error: cls %in% class(x) is not TRUE
Made somewhat inconvenient by R not having decorators. This is kind of hacky
and it suffers from some serious problems:
You lose lazy evaluation, because you must evaluate an argument to determine
its type.
You still can't check the types until call time; real static type checking
lets you check the types even of a call that never actually happens.
Since R uses lazy evaluation, (2) might make type checking not very useful,
because the call might not actually occur until very late, or never.
The answer to (2) would be to add static type information. You could probably
do this by transforming expressions, but I don't think you want to go there.
I've found stopifnot() to be highly useful for these situations as well.
x <- function(n) {
stopifnot(is.vector(n) && length(n)==1)
print(n)
}
The reason it is so useful is because it provides a pretty clear error message to the user if the condition is false.

Resources