Background
I’m in the process of creating a shortcut for lambdas, since the repeated use of function (…) … clutters my code considerably. As a remedy, I’m trying out alternative syntaxes inspired by other languages such as Haskell, as far as this is possible in R. Simplified, my code looks like this:
f <- function (...) {
args <- match.call(expand.dots = FALSE)$...
last <- length(args)
params <- c(args[-last], names(args)[[last]])
function (...)
eval(args[[length(args)]],
envir = setNames(list(...), params),
enclos = parent.frame())
}
This allows the following code:
f(x = x * 2)(5) # => 10
f(x, y = x + y)(1, 2) # => 3
etc.
Of course the real purpose is to use this with higher-order functions1:
Map(f(x = x * 2), 1 : 10)
The problem
Unfortunately, I sometimes have to nest higher-order functions and then it stops working:
f(x = Map(f(y = x + y), 1:2))(10)
yields “Error in eval(expr, envir, enclos): object x not found”. The conceptually equivalent code using function instead of f works. Furthermore, other nesting scenarios also work:
f(x = f(y = x + y)(2))(3) # => 5
I’m suspecting that the culprit is the parent environment of the nested f inside the map: it’s the top-level environment rather than the outer f’s. But I have no idea how to fix this, and it also leaves me puzzled that the second scenario above works. Related questions (such as this one) suggest workarounds which are not applicable in my case.
Clearly I have a gap in my understanding of environments in R. Is what I want possible at all?
1 Of course this example could simply be written as (1 : 10) * 2. The real application is with more complex objects / operations.
The answer is to attach parent.frame() to the output function's environment:
f <- function (...) {
args <- match.call(expand.dots = FALSE)$...
last <- length(args)
params <- c(args[-last], names(args)[[last]])
e <- parent.frame()
function (...)
eval(args[[length(args)]],
envir = setNames(list(...), params),
enclos = e)
}
Hopefully someone can explain well why this works and not yours. Feel free to edit.
Great question.
Why your code fails
Your code fails because eval()'s supplied enclos= argument does not point far enough up the call stack to reach the environment in which you are wanting it to next search for unresolved symbols.
Here is a partial diagram of the call stack from the bottom of which your call to parent.frame() occurs. (To make sense of this, it's important to keep in mind that the function call from which parent.frame() is here being called is not f(), but a call the anonymous function returned by f() (let's call it fval)).
## Note: E.F. = "Evaluation Frame"
## fval = anonymous function returned as value of nested call to f()
f( <------------------------- ## E.F. you want, ptd to by parent.frame(n=3)
Map(
mapply( <-------------------- ## E.F. pointed to by parent.frame(n=1)
fval( |
parent.frame(n=1 |
In this particular case, redefining the function returned by f() to call parent.frame(n=3) rather than parent.frame(n=1) produces working code, but that's not a good general solution. For instance, if you wanted to call f(x = mapply(f(y = x + y), 1:2))(10), the call stack would then be one step shorter, and you'd instead need parent.frame(n=2).
Why flodel's code works
flodel's code provides a more robust solution by calling parent.frame() during evaluation of the inner call to f in the nested chain f(Map(f(), ...)) (rather than during the subsequent evaluation of the anonymous function fval returned by f()).
To understand why his parent.frame(n=1) points to the appropriate environment, it's important to recall that in R, supplied arguments are evaluated in the the evaluation frame of the calling function. In the OP's example of nested code, the inner f() is evaluated during the processing of Map()'s supplied arguments, so it's evaluation environment is that of the function calling Map(). Here, the function calling Map() is the outer call to f(), and its evaluation frame is exactly where you want eval() to next be looking for symbols:
f( <--------------------- ## Evaluation frame of the nested call to f()
Map(f( |
parent.frame(n=1 |
Related
I'm creating an S3 method for a generic defined in another package. An earlier method for the generic produces some console output that's not returned as part of the function return value, it's only printed to the console. I'd like to capture that output for use in my own method.
I tried using capture.output() on NextMethod(), but that just results in a bizarre error:
foo <- function(x, ...) UseMethod("foo")
foo.bar <- function(x, ...) cat(x, "\n")
foo.baz <- function(x, ...) capture.output(NextMethod())
foo(structure(1, class = "bar"))
#> 1
foo(structure(1, class = c("baz", "bar")))
#> Error: 'function' is not a function, but of type 8
Is this expected behaviour, a known limitation, or a bug? I couldn't find anything matching this error with a quick search.
How can I capture the output of the next S3 method in another S3 method?
This is... "expected behavior." I say that because I believe it's technically true, but there's probably no way for a user to expect it necessarily. If you don't care why it happens, but just want to see how to work around it, skip down to the heading "The Fix", because the following explanation of the error is a little involved.
What does 'function' is not a function, but of type 8 mean?
type 8 refers to a type 8 SEXP. From Section one of the R Internals Manual:
What R users think of as variables or objects are symbols which are
bound to a value. The value can be thought of as either a SEXP (a
pointer), or the structure it points to, a SEXPREC...
Currently SEXPTYPEs 0:10 and 13:25 are in use....
no SEXPTYPE Description
...
3 CLOSXP closures
...
8 BUILTINSXP builtin functions
NextMethod() expects a CLOSXP, not a BUILTINSXP. We can see this if we look at the source code (around line 717) of do_nextmethod(), the C function underlying NextMethod()
SEXP attribute_hidden do_nextmethod(SEXP call, SEXP op, SEXP args, SEXP env)
{
// Some code omitted
if (TYPEOF(s) != CLOSXP){ /* R_LookupMethod looked for a function */
if (s == R_UnboundValue)
error(_("no calling generic was found: was a method called directly?"));
else
errorcall(R_NilValue,
_("'function' is not a function, but of type %d"),
TYPEOF(s));
}
So why did that happen here? This is where it gets tricky. I believe it's because by passing NextMethod() through capture.output(), it gets called using eval(), which is a built-in (see builtins()).
So how can we deal with this? Read on...
The Fix
We can simulate capture output with clever use of sink(), cat(), and tempfile():
foo.baz <- function(x, ...) {
# Create a temporary file to store the output
tmp <- tempfile("tmp.txt")
# start sink()
sink(tmp)
# call NextMethod() just for the purpose of capturing output
NextMethod()
# stop sink'ing
sink()
# store the output in an R object
y <- readLines(tmp)
# here we'll cat() the output to make sure it worked
cat("The output was:", y, "\n")
# destroy the temporary file
unlink(tmp)
# and call NextMethod for its actual execution
NextMethod()
}
foo(structure(1, class = c("baz", "bar")))
# 1
I'm not sure if what you saw is documented or not: the documentation ?NextMethod makes clear that it isn't a regular function, but I didn't follow all the details to see if your usage would be allowed.
One way to do what you want would be
foo.baz <- function(x, ...) {class(x) <- class(x)[-1]; capture.output(foo(x, ...))}
This assumes that the method was called directly from a call to the generic; it won't work if there's a third level, and foo.baz was itself invoked by NextMethod().
I am currently working on user defined functions aimed at modelling empirical data and I have problems with objects / parameters passed to the function:
bestModel <- function(k=4L, R2=0.994){
print(k) # here, everything is still fine
lmX <- mixlm::lm(getLinearModelFunction(k), data)
best <- mixlm::best.subsets(lmX, nbest=1)
.
.
.
}
At first, everything works as expected, but as soon as I want to pass the parameter k to another user defined function getLinearModelFunction(), an error is thrown:
Error in getLinearModelFunction(k) : object 'k' not found
It doesn't help, if I am assigning a new parameter, e. g. l <- k and try to pass that on. The parameter doesn't seem to be available for the other function. I ran into this problem not only with primitive data types, but as well complex structures. On command line, everything works, as long as the objects are in my workspace.
To sum it up: Passing parameters work only within that function, but calls of other functions from there onwards result in error. Why? And: What to do about it?
EDIT:
While trying to resolve the problem, it gets really weird. I stripped down all functions:
functionA <- function(data, k){
lmX <- mixlm::lm(functionB(k), data)
summary(lmX)
# best <- mixlm::best.subsets(lmX,nbest=1)
}
functionB <- function(k=4){
if(k==1){
return(formula("raw ~ L1"))
}else if(k==2){
return(formula("raw ~ L1 + L2"))
}else if(k==3){
return(formula("raw ~ L1 + L2 + L3 "))
}else if(k==4){
return(formula("raw ~ L1 + L2 + L3 + L4"))
}
}
Let's say, we have a data.frame d with the variables raw, L1, L2, L3, L4 ... As long, as there is the commenting # before best, it works. As soon as it is removed, calling functionA(d, 3) results in
Error in functionB(k) : object 'k' not found
Even, though k doesn't play a role in that function and before that, it worked.
Ok, indeed, this was an environment thing. The solution is to get the current environment and to take the object from there:
functionA <- function(data, k){
e <- environment()
lmX <- mixlm::lm(functionB(e$k), e$data)
summary(lmX)
best <- mixlm::best.subsets(lmX,nbest=1)
}
This is usually not a problem, when directly working with are packages. The objects usually are in the global environments then. When working with functions, each function has its' own environment. I managed to solve this while starting to learn about packaging the code: http://adv-r.had.co.nz/Environments.html
I am writing some functions for doing repeated tasks, but I am trying to minimize the amount of times I load the data. Basically I have one function that takes some information and makes a plot. Then I have a second function that will loop through and output multiple plots to a .pdf. In both functions I have the following line of code:
if(load.dat) load("myworkspace.RData")
where load.dat is a logical and the data I need is stored in myworkspace.RData. When I am calling the wrapper function that loops through and outputs multiple plots I do not want to reload the workspace in every call to the inner function. I thought I could just load the workspace once in the wrapper function, then the inner function could access that data, but I got an error stating otherwise.
So my understanding was when a function cannot find the variable in its local environment (created when the function gets called), the function will look to the parent environment for the variable.
I assumed the parent environment to the inner function call would be the outer function call. Obviously this is not true:
func1 <- function(...){
print(var1)
}
func2 <- function(...){
var1 <- "hello"
func1(...)
}
> func2()
Error in print(var1) : object 'var1' not found
After reading numerous questions, the language manual, and this really helpful blog post, I came up with the following:
var1 <- "hello"
save(list="var1",file="test.RData")
rm(var1)
func3 <- function(...){
attach("test.RData")
func1(...)
detach("file:test.RData")
}
> func3()
[1] "hello"
Is there a better way to do this? Why doesn't func1 look for undefined variables in the local environment created by func2, when it was func2 that called func1?
Note: I did not know how to name this question. If anyone has better suggestions I will change it and edit this line out.
To illustrate lexical scoping, consider the following:
First let's create a sandbox environment, only to avoid the oh-so-common R_GlobalEnv:
sandbox <-new.env()
Now we put two functions inside it: f, which looks for a variable named x; and g, which defines a local x and calls f:
sandbox$f <- function()
{
value <- if(exists("x")) x else "not found."
cat("This is function f looking for symbol x:", value, "\n")
}
sandbox$g <- function()
{
x <- 123
cat("This is function g. ")
f()
}
Technicality: entering function definitions in the console causes then to have the enclosing environment set to R_GlobalEnv, so we manually force the enclosures of f and g to match the environment where they "belong":
environment(sandbox$f) <- sandbox
environment(sandbox$g) <- sandbox
Calling g. The local variable x=123 is not found by f:
> sandbox$g()
This is function g. This is function f looking for symbol x: not found.
Now we create a x in the global environment and call g. The function f will look for x first in sandbox, and then in the parent of sandbox, which happens to be R_GlobalEnv:
> x <- 456
> sandbox$g()
This is function g. This is function f looking for symbol x: 456
Just to check that f looks for x first in its enclosure, we can put a x there and call g:
> sandbox$x <- 789
> sandbox$g()
This is function g. This is function f looking for symbol x: 789
Conclusion: symbol lookup in R follows the chain of enclosing environments, not the evaluation frames created during execution of nested function calls.
EDIT: Just adding a link to this very interesting answer from Martin Morgan on the related subject of parent.frame() vs parent.env()
You could use closures:
f2 <- function(...){
f1 <- function(...){
print(var1)
}
var1 <- "hello"
f1(...)
}
f2()
This is something I find difficult to understand:
cl = makeCluster(rep("localhost", 8), "SOCK")
# This will not work, error: dat not found in the nodes
pmult = function(cl, a, x)
{
mult = function(s) s*x
parLapply(cl, a, mult)
}
scalars = 1:4
dat = rnorm(4)
pmult(cl, scalars, dat)
# This will work
pmult = function(cl, a, x)
{
x
mult = function(s) s*x
parLapply(cl, a, mult)
}
scalars = 1:4
dat = rnorm(4)
pmult(cl, scalars, dat)
# This will work
pmult = function(cl, a, x)
{
mult = function(s, x) s*x
parLapply(cl, a, mult, x)
}
scalars = 1:4
dat = rnorm(4)
pmult(cl, scalars, dat)
The first function doesn't work because of lazy evaluation of arguments. But what is lazy evaluation? When mult() is executed, does it not require x to be evaluated? The second one works because it forces x to be evaluated. Now the most strange thing happens in the third function, nothing is done but make mult() receive x as an extra argument, and suddenly everything works!
Another thing is, what should I do if I don't want to define all the variables and functions inside the function calling parLapply()? The following definitely will not work:
pmult = function(cl)
{
source("a_x_mult.r")
parLapply(cl, a, mult, x)
}
scalars = 1:4
dat = rnorm(4)
pmult(cl, scalars, dat)
I can pass all these variables and functions as arguments:
f1 = function(i)
{
return(rnorm(i))
}
f2 = function(y)
{
return(f1(y)^2)
}
f3 = function(v)
{
return(v- floor(v) + 100)
}
test = function(cl, f1, f2, f3)
{
x = f2(15)
parLapply(cl, x, f3)
}
test(cl, f1, f2, f3)
Or I can use clusterExport(), but it'll be cumbersome when there are lots of objects to be exported. Is there a better way?
To understand this, you have to realize that there is an environment associated with every function, and what that environment is depends on how the function was created. A function that is simply created in a script is associated with the global environment, but a function that is created by another function is associated with the local environment of the creating function. In your example, pmult creates mult, so the environment associated with mult contains the formal arguments cl, a, and x.
The problem with the first case is that parLapply doesn't know anything about x: it is just an unevaluated formal argument that is serialized as part of the environment of mult by parLapply. Since x isn't evaluated when mult is serialized and sent to the cluster workers, it causes an error when the workers execute mult, since dat isn't available in that context. In other words, by the time mult evaluates x, it's too late.
The second case works because x is evaluated before mult is serialized, so the actual value of x is serialized along with the environment of mult. It does what you would expect if you knew about closures but not lazy argument evaluation.
The third case works because you're having parLapply handle x for you. There's no trickery going on at all.
I should warn you that in all of these cases, a is being evaluated (by parLapply) and serialized along with the environment of mult. parLapply is also splitting a into chunks and sending those chunks to each worker, so the copy of a in the environment of mult is completely unnecessary. It doesn't cause an error, but it could hurt performance, since mult is sent to the workers in every task object. Fortunately, this is much less of a problem with parLapply, since there is only one task per worker. It would be a much worse problem with clusterApply or clusterApplyLB where the number of tasks is equal to the length of a.
I talk about a number of issues relating to functions and environments in the "snow" chapter of my book. There are some subtle issues involved, and it's easy to get burned, sometimes without realizing that it happened.
As for your second question, there are various strategies for exporting functions to the workers, but some people do use source to define functions on the workers rather than using clusterExport. Keep in mind that source has a local argument that controls where the parsed expressions are evaluated, and you may need to specify the absolute path to the script. Finally, if you're using remote cluster workers, you may need to scp the script to the workers if you don't have a distributed file system.
Here is a simple method of exporting all of the functions in your global environment to the cluster workers:
ex <- Filter(function(x) is.function(get(x, .GlobalEnv)), ls(.GlobalEnv))
clusterExport(cl, ex)
I am not surprised that this function doesn't work, but I cannot quite understand why.
computeMeans <- function(data,dv,fun) {
x <- with(data,aggregate(dv,
list(
method=method,
hypo=hypothesis,
pre.group=pre.group,
pre.smooth=pre.smooth
),
fun ) )
return(x)
}
computeMeans(df.basic,dprime,mean)
Where df.basic is a dataframe with factors method, hypothesis, etc, and several dependent variables (and I specify one with the dv parameter, dprime).
I have multiple dependent variables and several dataframes all of the same form, so I wanted to write this little function to keep things "simple". The error I get is:
Error in aggregate(dv, list(method = method, hypo = hypothesis,
pre.group = pre.group, :
object 'dprime' not found
But dprime does exist in df.basic, which is referenced with with(). Can anyone explain the problem? Thank you!
EDIT: This is the R programming language. http://www.r-project.org/
Although dprime exists in df.basic, when you call it at computeMeans it has no idea what you are referring to, unless you explicitly reference it.
computeMeans(df.basic,df.basic$dprime,mean)
will work.
Alternatively
computeMeans <- function(data,dv,fun) {
dv <- eval(substitute(dv), envir=data)
x <- with(data,aggregate(dv,
list(
method=method,
hypo=hypothesis,
pre.group=pre.group,
pre.smooth=pre.smooth
),
fun ) )
return(x)
}
You might think that since dv is in the with(data, (.)) call, it gets evaluated within the environment of data. It does not.
When a function is called the arguments are matched and then each of
the formal arguments is bound to a promise. The expression that was
given for that formal argument and a pointer to the environment the
function was called from are stored in the promise.
Until that argument is accessed there is no value associated with the
promise. When the argument is accessed, the stored expression is
evaluated in the stored environment, and the result is returned. The
result is also saved by the promise.
source
A promise is therefore evaluated within the environment in which it was created (ie, the environment where the function was called), regardless of the environment in which the promise is first called. Observe:
delayedAssign("x", y)
local({
y <- 10
x
})
Error in eval(expr, envir, enclos) : object 'y' not found
w <- 10
delayedAssign("z", w)
local({
w <- 11
z
})
[1] 10
Note that delayedAssign creates a promise. In the first example, x is assigned the value of y via a promise in the global environemnt, but y has not been defined in the global enviornment. x is called in an enviornment where y has been defined, yet calling x still results in an error indicating that y does not exist. This demonstrates that x is evaluated in environment in which the promise was defined, not in its current environment.
In the second example, z is assigned the value of w via a promise in the global environment, and w is defined in the global environment. z is then called in an enviornment where w has been assigned a different value, yet z still returns the value of the w in the environment where the promise has been created.
Passing in the dprime argument as a character string would allow you to sidestep any consideration of the involved scoping and evaluation rules discussed in #Michael's answer:
computeMeans <- function(data, dv, fun) {
x <- aggregate(data[[dv]],
list(
method = data[["method"]],
hypo = data[["hypothesis"]],
pre.group = data[["pre.group"]],
pre.smooth = data[["pre.smooth"]]
),
fun )
return(x)
}
computeMeans(df.basic, "dprime", mean)