This thread discusses two basic approaches to using functions inside other functions in R: What are the benefits of defining and calling a function inside another function in R?
The top answer says the second approach, naming externally and just calling via name in the outer function, is faster: " f2 needs to be redefined every time you call f1, which adds some overhead (not very much overhead, but definitely there)". My question is, is this overhead caused by the assignment itself or by passing through the function itself?
For example, consider this third option besides the two in that thread:
#Approach 1
fun1a <- function(x) {
fun1b <- function(y){return(y^2)}
return(fun1b(x))
}
#Approach 2
fun2a <- function(y){return(y^2)}
fun2b <- function(x){return(fun2a(x))}
#Approach 3
fun3 <- function(x) {
return(function(x){return(x^2)})
}
It was confirmed that Approach 2 is faster than Approach 1 because Approach 1 needs to redefine fun1b in the function repeatedly. But if you use Approach 3 --basically, Approach 1, but not assigning fun1b to a named function everytime you run it -- is that always faster?
If so, why would anyone not just use Approach 3 for everything? i.e. what disadvantages does it have compared to Approach 2 (or 1)
Some of these (but not all) are already mentioned in the link in the question but here is a longer list.
Visibility Functions defined within functions are not visible outside that function increasing the modularity of the software if that function is not also used elsewhere. It provides a sort of poor man's namespace. For example, an alternative to using an anonymous function in a lapply appearing within a function would be to define it as a named function within the outer function to keep it from being visible outside the outer function. The name might form a sort of documentation for the inner function.
Scope Functions defined within functions can access variables defined in the outer function without passing them as arguments.
Cache Functions defined within functions and passed back out can use the outer function to cache results so that they are remembered the next time the passed out function is run. Here makeIncr is a factory function which constructs a new counter function each time it is run. The counter functions return the next number in sequence each time they are run.
makeIncr <- function(init) function() { init <<- init + 1; init }
counter1 <- makeIncr(0)
counter1()
## [1] 1
counter1()
## [1] 2
counter2 <- makeIncr(0)
counter2()
## [1] 1
Object Orientation Functions defined within functions can be used to emulate a limited form of object orientation. See an example by running: demo(scoping)
Debugging can be a bit more awkward with functions within functions. For example, debug(makeIncr) using makeIncr above does not debug the counters which would have to be debugged separately.
I am not sure that the performance issue discussed is really material since the functions would be byte compiled the first time the outer function is run. In most cases you would want to make a decision based on other factors.
Related
Is this bad practice? It seems like a lot could go wrong here.*
I am setting the argument of an outer function to be a global variable for a function defined inside it. I am just doing this to work around some existing code.
f = function(a,b){h = function(c){print(b);b+c}}
myh = f(1,2)
myh(7)
#[1] 2
#[1] 9
*On the other hand, it's perfectly acceptable to write something like
h = function(c){print(7);7+c}
Creating a function that creates functions (or a function factory) is a totally acceptable code practice. See https://adv-r.hadley.nz/function-factories.html for more details on certain parts of the technical implementation in R.
It is most often used if you need to create functions at runtime or you need to create a lot of similar funcions.
The function factory you have created could be considered similar to a function factory that would create different sized counters that told the user how much the amount was incremented by.
It is important to keep track of the functions you create this way however.
Let me know if you'd like more clarification on anything.
(One possible bad practise in the function you have created though is an unused argument a).
I'd like to save computation time,
by avoiding running the same function with the same arguments multiple times.
given the following code:
f1 <- function(a,b) return(a+b)
f2 <- function(c,d,f) return(c*d*f)
x <- 3
y <- 4
f2(1,2,f1(x,y))
let's assume that the computation of 'f' function argument is hard,
and I'd like to cash the result somehow, so that I'd know if it had ever been executed before.
here is my main question:
I assume I can generate a key myself for f1(3,4),
example: key <- paste('f1',x,y), do my own bookkeeping and avoid running it again.
however, is it possible for f2 to generate such a key from f automatically and return it to me?
(for any function with any arguments)
if not / alternatively, before I pass f1(x,y) can I generate such a key in a generic manner,
that would work for any function with any arguments?
thanks much
Interesting question. I never thought about this.
A quick google search found this package: R.cache.
The function addMemoization takes a function as argument, and returns a function that should cache its results.
I haven't used this package myself, so I don't know how well it works, but it seems to fit what you are looking for.
I would like to create a promise in R programmatically. I know that the language supports it. But for some reason, there does not seem a way to do this.
To give more detail: I would like to have components of a list lazily evaluated. E.g.
x <- list(node=i, children=promise(some_expensive_function(i))
I only want to access the second component of the list for very few values of the list. Pre-populating the list with lazy expressions results in very clear, compact and readable code. The background of this algorithm is a tree search. Essentially, I am trying to emulate coroutine behaviour here. Right now I am using closures for this, but the code lacks elegancy.
Is there a third-party package that exposes the hidden promise construction mechanism in R? Or is this mechanism explicitly tied to environment bindings rather than expressions?
P.S. Yes, I am aware of delayedAssign. It does not do what I want. Yes, I can juggle around with intermediate environments, but its also messy.
Any programming language that has first-class functions (including R) can pretty easily implement lazy evaluation through thunks (Wikipedia entry on this).
The basic idea is that functions are not evaluated until they're called, so just wrap the elements of your list in anonymous functions that return their value when called.
delayed <- list(function() 1, function() 2, function () 3)
lapply(delayed, function(x) x())
Those are just numbers wrapped in there, but you can easily place some_expensive_function(i) in there instead to provide the argument but delay evaluation.
Edit: noticed the using closures thing just now, so I assume you're using a similar method currently. Can you elaborate on the "inelegance" of it? This is all eye-of-the-beholder, but thunking seems fairly straightforward and a lot less boilerplate if you're just looking for lazy evaluation.
At the moment your use case is too vague to get my head around. I'm wondering if one of quote, expression or call is what you are asking for:
x <- list(node=i, children=quote(mean(i)) )
x
#----------
$node
[1] 768
$children
mean(i)
#------------
x <- list(node=i, children=call('mean',i))
x
#-------------
$node
[1] 768
$children
mean(768L)
#-----------
x <- list(node=i, children=expression(mean(i)) )
x
#------------
$node
[1] 768
$children
expression(mean(i))
A test of the last one obviously evaluation in the globalenv():
eval( x$children)
#[1] 768
I have ended up using environments and delayedAssign for this one.
node <- new.env()
node$name <- X[1, 1]
r$level <- names(X)[1]
delayedAssign('subtaxa', split_taxons_lazy(X[-1]), assign.env=node)
node
This works well for my case, and while I would prefer it use lists, it does not seem to be possible in R. Thanks for the comments!
"R passes promises, not values. The promise is forced when it is first evaluated, not when it is passed.", see this answer by G. Grothendieck. Also see this question referring to Hadley's book.
In simple examples such as
> funs <- lapply(1:10, function(i) function() print(i))
> funs[[1]]()
[1] 10
> funs[[2]]()
[1] 10
it is possible to take such unintuitive behaviour into account.
However, I find myself frequently falling into this trap during daily development. I follow a rather functional programming style, which means that I often have a function A returning a function B, where B is in some way depending on the parameters with which A was called. The dependency is not as easy to see as in the above example, since calculations are complex and there are multiple parameters.
Overlooking such an issue leads to difficult to debug problems, since all calculations run smoothly - except that the result is incorrect. Only an explicit validation of the results reveals the problem.
What comes on top is that even if I have noticed such a problem, I am never really sure which variables I need to force and which I don't.
How can I make sure not to fall into this trap? Are there any programming patterns that prevent this or that at least make sure that I notice that there is a problem?
You are creating functions with implicit parameters, which isn't necessarily best practice. In your example, the implicit parameter is i. Another way to rework it would be:
library(functional)
myprint <- function(x) print(x)
funs <- lapply(1:10, function(i) Curry(myprint, i))
funs[[1]]()
# [1] 1
funs[[2]]()
# [1] 2
Here, we explicitly specify the parameters to the function by using Curry. Note we could have curried print directly but didn't here for illustrative purposes.
Curry creates a new version of the function with parameters pre-specified. This makes the parameter specification explicit and avoids the potential issues you are running into because Curry forces evaluations (there is a version that doesn't, but it wouldn't help here).
Another option is to capture the entire environment of the parent function, copy it, and make it the parent env of your new function:
funs2 <- lapply(
1:10, function(i) {
fun.res <- function() print(i)
environment(fun.res) <- list2env(as.list(environment())) # force parent env copy
fun.res
}
)
funs2[[1]]()
# [1] 1
funs2[[2]]()
# [1] 2
but I don't recommend this since you will be potentially copying a whole bunch of variables you may not even need. Worse, this gets a lot more complicated if you have nested layers of functions that create functions. The only benefit of this approach is that you can continue your implicit parameter specification, but again, that seems like bad practice to me.
As others pointed out, this might not be the best style of programming in R. But, one simple option is to just get into the habit of forcing everything. If you do this, realize you don't need to actually call force, just evaluating the symbol will do it. To make it less ugly, you could make it a practice to start functions like this:
myfun<-function(x,y,z){
x;y;z;
## code
}
There is some work in progress to improve R's higher order functions like the apply functions, Reduce, and such in handling situations like these. Whether this makes into R 3.2.0 to be released in a few weeks depend on how disruptive the changes turn out to be. Should become clear in a week or so.
R has a function that helps safeguard against lazy evaluation, in situations like closure creation: forceAndCall().
From the online R help documentation:
forceAndCall is intended to help defining higher order functions like apply to behave more reasonably when the result returned by the function applied is a closure that captured its arguments.
I have a couple of functions that convert between coordinate systems, and they all rely on constants from the WGS84 ellipsoid, etc. I'd rather not have these constants pollute the global namespace. Similarly, not all of the functions need to be visible globally.
In Java, I'd encapsulate all the coordinate stuff in a utility class and only expose the coordinate transformation methods.
What's a low-overhead way to do this in R? Ideally, I could:
source("coordinateStuff.R")
at the top of my file and call the "public" functions as needed. It might make a nice package down the road, but that's not a concern right now.
Edit for initial approach:
I started coords.R with:
coords <- new.env()
with(coords, {
## Semi-major axis (center to equator)
a <- 6378137.0
## And so on...
})
The with statement and indentation clearly indicate that something is different about the assignment variables. And it sure beats typing a zillion assign statements.
The first cut at functions looked like:
ecef2geodetic <- function (x,y,z) {
attach(coords)
on.exit(detach(coords))
The on.exit() ensures that we'll leave coords when the function exits. But the attach() statements caused trouble when one function in coords called another in coords. See this question to see how things went from there.
Utility classes in Java are code smell. This is not what you want in R.
There are several ways of solving this in R. For medium / large scale things, the way to go is to put you stuff into a package and use it in the remaining code. That encapsulates your “private” variables nicely and exposes a well-defined interface.
For smaller things, an excellent way of doing this is to put your code into a local call which, as the name suggests, executes its argument in a local scope:
x <- 23
result <- local({
foo <- 42
bar <- x
foo * bar
})
Finally, you can put your objects into a list or environment (there are differences but you may ignore them for now), and then just access them via listname$objname:
coordinateStuff <- list(
foo = function () { cat('42\n') }
bar = 23
)
coordinateStuff$foo()
If you want something similar to your source statement, take a look at my xsource command which solves this to some extent (although it’s work in progress and has several issues!). This would allow you to write
cs <- xsource(coordinateStuff)
# Use cs as if it were an evironment, e.g.
cs$public_function()
# or even:
cs::public_function()
A package is the solution... But for a fast solution you could use Environments http://stat.ethz.ch/R-manual/R-devel/library/base/html/environment.html