Create a promise/lazily evaluated expression in R - r

I would like to create a promise in R programmatically. I know that the language supports it. But for some reason, there does not seem a way to do this.
To give more detail: I would like to have components of a list lazily evaluated. E.g.
x <- list(node=i, children=promise(some_expensive_function(i))
I only want to access the second component of the list for very few values of the list. Pre-populating the list with lazy expressions results in very clear, compact and readable code. The background of this algorithm is a tree search. Essentially, I am trying to emulate coroutine behaviour here. Right now I am using closures for this, but the code lacks elegancy.
Is there a third-party package that exposes the hidden promise construction mechanism in R? Or is this mechanism explicitly tied to environment bindings rather than expressions?
P.S. Yes, I am aware of delayedAssign. It does not do what I want. Yes, I can juggle around with intermediate environments, but its also messy.

Any programming language that has first-class functions (including R) can pretty easily implement lazy evaluation through thunks (Wikipedia entry on this).
The basic idea is that functions are not evaluated until they're called, so just wrap the elements of your list in anonymous functions that return their value when called.
delayed <- list(function() 1, function() 2, function () 3)
lapply(delayed, function(x) x())
Those are just numbers wrapped in there, but you can easily place some_expensive_function(i) in there instead to provide the argument but delay evaluation.
Edit: noticed the using closures thing just now, so I assume you're using a similar method currently. Can you elaborate on the "inelegance" of it? This is all eye-of-the-beholder, but thunking seems fairly straightforward and a lot less boilerplate if you're just looking for lazy evaluation.

At the moment your use case is too vague to get my head around. I'm wondering if one of quote, expression or call is what you are asking for:
x <- list(node=i, children=quote(mean(i)) )
x
#----------
$node
[1] 768
$children
mean(i)
#------------
x <- list(node=i, children=call('mean',i))
x
#-------------
$node
[1] 768
$children
mean(768L)
#-----------
x <- list(node=i, children=expression(mean(i)) )
x
#------------
$node
[1] 768
$children
expression(mean(i))
A test of the last one obviously evaluation in the globalenv():
eval( x$children)
#[1] 768

I have ended up using environments and delayedAssign for this one.
node <- new.env()
node$name <- X[1, 1]
r$level <- names(X)[1]
delayedAssign('subtaxa', split_taxons_lazy(X[-1]), assign.env=node)
node
This works well for my case, and while I would prefer it use lists, it does not seem to be possible in R. Thanks for the comments!

Related

Pipe with additional Arguments

I read in several places that pipes in Julia only work with functions that take only one argument. This is not true, since I can do the following:
function power(a, b = 2) a^b end
3 |> power
> 9
and it works fine.
However, I but can't completely get my head around the pipe. E.g. why is this not working?? :
3 |> power()
> MethodError: no method matching power()
What I would actually like to do is using a pipe and define additional arguments, e.g. keyword arguments so that it is actually clear which argument to pass when piping (namely the only positional one):
function power(a; b = 2) a^b end
3 |> power(b = 3)
Is there any way to do something like this?
I know I could do a work-around with the Pipe package, but to honest it feels kind of clunky to write #pipe at the start of half of the lines.
In R the magritrr package has convincing logic (in my opinion): it passes what's left of the pipe by default as the first argument to the function on the right - I'm looking for something similar.
power as defined in the first snippet has two methods. One with one argument, one with two. So the point about |> working only with one-argument methods still holds.
The kind of thing you want to do is called "partial application", and very common in functional languages. You can always write
3 |> (a -> power(a, 3))
but that gets clunky quickly. Other language have syntax like power(%1, 3) to denote that lambda. There's discussion to add something similar to Julia, but it's difficult to get right. Pipe is exactly the macro-based fix for it.
If you have control over the defined method, you can also implement methods with an interface that return partially applied versions as you like -- many predicates in Base do this already, e.g., ==(1). There's also the option of Base.Fix2(power, 3), but that's not really an improvement, if you ask me (apart from maybe being nicer to the compiler).
And note that magrittrs pipes are also "macro"-based. The difference is that argument passing in R is way more complicated, and you can't see from outside whether an argument is used as a value or as an expression (essentially, R passes a thunk containing the expression and a pointer to the parent environment, and automatically evaluates and caches it if you use it as a value; see substitute)

How to not fall into R's 'lazy evaluation trap'

"R passes promises, not values. The promise is forced when it is first evaluated, not when it is passed.", see this answer by G. Grothendieck. Also see this question referring to Hadley's book.
In simple examples such as
> funs <- lapply(1:10, function(i) function() print(i))
> funs[[1]]()
[1] 10
> funs[[2]]()
[1] 10
it is possible to take such unintuitive behaviour into account.
However, I find myself frequently falling into this trap during daily development. I follow a rather functional programming style, which means that I often have a function A returning a function B, where B is in some way depending on the parameters with which A was called. The dependency is not as easy to see as in the above example, since calculations are complex and there are multiple parameters.
Overlooking such an issue leads to difficult to debug problems, since all calculations run smoothly - except that the result is incorrect. Only an explicit validation of the results reveals the problem.
What comes on top is that even if I have noticed such a problem, I am never really sure which variables I need to force and which I don't.
How can I make sure not to fall into this trap? Are there any programming patterns that prevent this or that at least make sure that I notice that there is a problem?
You are creating functions with implicit parameters, which isn't necessarily best practice. In your example, the implicit parameter is i. Another way to rework it would be:
library(functional)
myprint <- function(x) print(x)
funs <- lapply(1:10, function(i) Curry(myprint, i))
funs[[1]]()
# [1] 1
funs[[2]]()
# [1] 2
Here, we explicitly specify the parameters to the function by using Curry. Note we could have curried print directly but didn't here for illustrative purposes.
Curry creates a new version of the function with parameters pre-specified. This makes the parameter specification explicit and avoids the potential issues you are running into because Curry forces evaluations (there is a version that doesn't, but it wouldn't help here).
Another option is to capture the entire environment of the parent function, copy it, and make it the parent env of your new function:
funs2 <- lapply(
1:10, function(i) {
fun.res <- function() print(i)
environment(fun.res) <- list2env(as.list(environment())) # force parent env copy
fun.res
}
)
funs2[[1]]()
# [1] 1
funs2[[2]]()
# [1] 2
but I don't recommend this since you will be potentially copying a whole bunch of variables you may not even need. Worse, this gets a lot more complicated if you have nested layers of functions that create functions. The only benefit of this approach is that you can continue your implicit parameter specification, but again, that seems like bad practice to me.
As others pointed out, this might not be the best style of programming in R. But, one simple option is to just get into the habit of forcing everything. If you do this, realize you don't need to actually call force, just evaluating the symbol will do it. To make it less ugly, you could make it a practice to start functions like this:
myfun<-function(x,y,z){
x;y;z;
## code
}
There is some work in progress to improve R's higher order functions like the apply functions, Reduce, and such in handling situations like these. Whether this makes into R 3.2.0 to be released in a few weeks depend on how disruptive the changes turn out to be. Should become clear in a week or so.
R has a function that helps safeguard against lazy evaluation, in situations like closure creation: forceAndCall().
From the online R help documentation:
forceAndCall is intended to help defining higher order functions like apply to behave more reasonably when the result returned by the function applied is a closure that captured its arguments.

Does the R compiler pass-by-reference if an argument is not modified?

When playing with large objects the memory and speed implications of pass-by-value can be substantial.
R has several ways to pass-by-reference:
Reference Classes
R.oo
C/C++/other external languages
Environments
However, many of them require considerable overhead (in terms of code complexity and programmer time).
In particular, I'm envisioning something like what you could use constant references for in C++ : pass a large object, compute on it without modifying that, and return the results of that computation.
Since R does not have a concept of constants, I suspect if this happens anywhere, it's in compiled R functions, where the compiler could see that the formal argument was not modified anywhere in the code and pass it by reference.
Does the R compiler pass-by-reference if an argument is not modified? If not, are there any technical barriers to it doing so or has it just not been implemented yet?
Example code:
n <- 10^7
bigdf <- data.frame( x=runif(n), y=rnorm(n), z=rt(n,5) )
myfunc <- function(dat) invisible(with( dat, x^2+mean(y)+sqrt(exp(z)) ))
library(compiler)
mycomp <- compile(myfunc)
tracemem(bigdf)
> myfunc(bigdf)
> # No object was copied! Question is not necessary
This may be way off base for what you need, but what about wrapping the object in a closure? This function makes a function that knows about the object given to its parent, here I use the tiny volcano to do a very simple job.
mkFun <- function(x) {
function(rownumbers) {
rowSums(x[rownumbers , , drop = FALSE])
}
}
fun <- mkFun(volcano)
fun(2) ##1] 6493
fun(2:3) ##[1] 6493 6626
Now fun can get passed around by worker functions to do its job as it likes.

in python what is the difference between map(func,list) and [func(x) for x in list]

As far as I can tell the only difference is speed and you have to be a bit tricker in how you define lambda functions.
For instance:
map(lambda x: x + 1, range(4)) == [(lambda x: x + 1)(y) for y in range(4)]
It seems to me like the second way is more pythonic, but I am not sure why.
EDIT:
Yes I understand that the lambda would be excluded in the second example, I was just trying to show as equivalent code as possible.
The right way to do this would be
[y + 1 for y in range(4)]
No need to construct a lambda function here. Your code would unnecessarily build a new function object in every single iteration of the list comprehension.
That said, you can write any call to map() as an equivalent list comprehension. If the first argument to map() is a lambda function, the list comprehension is usually preferred. If the first argument to map() is a function name, both variants are fine. Some people (including me) prefer, say,
map(str, my_list)
while others prefer
[str(x) for x in my_list]
There is no difference, but the pythonic way would be to omit the lambda completely:
[y + 1 for y in range(4)]
Note also that if your mapping function is a "built-in" (written in C) function, rather than a python function or a lambda, map will be faster.
Another pythonic, but uncommon, way (avoids unnecessary lambda) would be:
map(1 .__add__, range(4)) # thanks to SvenMarnach for this
It is usually preferable to avoid lambdas in mapping forms, because a list comprehension will always be more efficient, AND clearer. By contrast, using multi-line functions is perfectly acceptable - there is no way to write them inline, and even if you could, it would likely be less clear.
Another difference is that because map can take multiple sequences to map against, and passes them as positional parameters to the mapping function, one can avoid the zipping that would be required in a list comprehension:
[x+y for x,y in zip(range(4), range(2,6))]
#vs
from operator import add
map(add, range(4), range(2,6))

Simple and short if clauses for combind statements

TRUE/FALSE if clauses are easily and quickly done in R. However, if the argument gets more complex, it also gets ugly very soon.
For instance:
I might want to execute different operations for a row(foo) dependent on the value in one cell (foo[1]).
Let the intervals be 0:39 and 40:59 and 60:100
Something like does not exit:
(if foo[1] "in" 40:60){...
In fact, I only see ways of at least two if clauses and two else statements and the action for the first interval somewhere at the bottom of the code. With more intervals(or any other condition) it is getting more complex.
Is there a best practice (for this purpose or others) with a simple construction and nice design to read?
Not totally sure, but I would suggest to use something like:
f <- approxfun(0:100,c(rep(1,40),rep(2,20),rep(3,41)),method="c")
fac <- f(foo)
tapply(foo,fac,FUN,...)
where you can use any function FUN.
Not totally following your question. Are you looking for a switch statement? Have a look at this example:
ccc <- c("b","QQ","a","A","bb")
for(ch in ccc)
cat(ch,":",switch(EXPR = ch, a=1, b=2:3), "\n")

Resources