Export manually missed globals object with future - r

I defined a function f in a package that takes data and an R expression as input and then applies the user-defined expression on the data. Here's an example of the function's use:
f <- function(data, expr) {
expr <- substitute(expr)
eval(expr, envir = data)
}
data <- data.frame(a = 1:2, b = 3:4)
f(data, mean(a))
#> [1] 1.5
The problem arises with the parallel version of this function using explicit futures and user-defined object. Here a toy version:
library(future)
f <- function(data, expr) {
expr <- substitute(expr)
y <- future::future(eval(expr, envir = data))
future::value(y)
}
data <- data.frame(a = 1:2, b = 3:4)
myfun <- function(x){sum(sqrt(x))}
plan(sequential)
f(data, myfun(a))
#> [1] 2.414214
plan(multiprocess)
f(data, myfun(a))
#> Error in myfun(a) : impossible to find function "myfun"
The problem is that myfun cannot trivially be found by future and thus must be exported manually. I'm able to fix this issue by analyzing expr with future::getGlobalsAndPackages and then manually adding objects:
future::future(..., globals = structure(TRUE, add = globals))
I'm wondering if there is a better/good way to do that since it looks like a hack to me.

I finally found that the ellipsis in plan propagates to future
plan(multiprocess, globals = myfun)

Related

How to call the `function` function [duplicate]

Hadley Wickham recently asked an interesting question on the r-devel mailing list, and being unable to find an existing question on the topic on StackOverflow, I thought it might be useful for it exist here as well.
To paraphrase:
An R function consists of three elements: an argument list, a body and an environment. Can we construct a function programmatically from these three elements?
(A fairly comprehensive answer is reached at the end of the thread in the r-devel link above. I will leave this open for others to recreate the benchmarking of the various solutions themselves and supply it as an answer, but be sure to cite Hadley if you do. If no one steps up in a few hours I'll do it myself.)
This is an expansion on the discussion here.
Our three pieces need to be an argument list, a body and an environment.
For the environment, we will simply use env = parent.frame() by default.
We do not really want a regular old list for the arguments, so instead we use alist
which has some different behavior:
"...values are not evaluated, and tagged arguments with no value are allowed"
args <- alist(a = 1, b = 2)
For the body, we quote our expression to get a call:
body <- quote(a + b)
One option is to convert args to a pairlist and then simply call the function function
using eval:
make_function1 <- function(args, body, env = parent.frame()) {
args <- as.pairlist(args)
eval(call("function", args, body), env)
}
Another option is to create an empty function, and then fill it with the desired values:
make_function2 <- function(args, body, env = parent.frame()) {
f <- function() {}
formals(f) <- args
body(f) <- body
environment(f) <- env
f
}
A third option is to simply use as.function:
make_function3 <- function(args, body, env = parent.frame()) {
as.function(c(args, body), env)
}
And finally, this seems very similar to the first method to me, except
we are using a somewhat different idiom to create the function call, using
substitute rather than call:
make_function4 <- function(args, body, env = parent.frame()) {
subs <- list(args = as.pairlist(args), body = body)
eval(substitute(`function`(args, body), subs), env)
}
library(microbenchmark)
microbenchmark(
make_function1(args, body),
make_function2(args, body),
make_function3(args, body),
make_function4(args, body),
function(a = 1, b = 2) a + b
)
Unit: nanoseconds
expr min lq median uq max
1 function(a = 1, b = 2) a + b 187 273.5 309.0 363.0 673
2 make_function1(args, body) 4123 4729.5 5236.0 5864.0 13449
3 make_function2(args, body) 50695 52296.0 53423.0 54782.5 147062
4 make_function3(args, body) 8427 8992.0 9618.5 9957.0 14857
5 make_function4(args, body) 5339 6089.5 6867.5 7301.5 55137
rlang has a function called new_function that does this :
Usage
new_function(args, body, env = caller_env())
library(rlang)
g <- new_function(alist(x = ), quote(x + 3))
g
# function (x)
# x + 3
There is also the issue of creating alist objects programmatically as that can be useful for creating functions when the number of arguments is variable.
An alist is simply a named list of empty symbols. These empty symbols can be created with substitute(). So:
make_alist <- function(args) {
res <- replicate(length(args), substitute())
names(res) <- args
res
}
identical(make_alist(letters[1:2]), alist(a=, b=))
## [1] TRUE
I am not sure this will help, but below code might be beneficial in some scenarios,
hello_world can be the string which will be used to create function and assign will be used to name function hello_world
hello_world <- "print('Hello World')"
assign("Hello",function()
{
eval(parse(text = hello_world))
}, envir = .GlobalEnv)
This will create a function called hello_world

R: how to find what S3 method will be called on an object?

I know about methods(), which returns all methods for a given class. Suppose I have x and I want to know what method will be called when I call foo(x). Is there a oneliner or package that will do this?
The shortest I can think of is:
sapply(class(x), function(y) try(getS3method('foo', y), silent = TRUE))
and then to check the class of the results... but is there not a builtin for this?
Update
The full one liner would be:
fm <- function (x, method) {
cls <- c(class(x), 'default')
results <- lapply(cls, function(y) try(getS3method(method, y), silent = TRUE))
Find(function (x) class(x) != 'try-error', results)
}
This will work with most things but be aware that it might fail with some complex objects. For example, according to ?S3Methods, calling foo on matrix(1:4, 2, 2) would try foo.matrix, then foo.numeric, then foo.default; whereas this code will just look for foo.matrix and foo.default.
findMethod defined below is not a one-liner but its body has only 4 lines of code (and if we required that the generic be passed as a character string it could be reduced to 3 lines of code). It will return a character string representing the name of the method that would be dispatched by the input generic given that generic and its arguments. (Replace the last line of the body of findMethod with get(X(...)) if you want to return the method itself instead.) Internally it creates a generic X and an X method corresponding to each method of the input generic such that each X method returns the name of the method of the input generic that would be run. The X generic and its methods are all created within the findMethod function so they disappear when findMethod exits. To get the result we just run X with the input argument(s) as the final line of the findMethod function body.
findMethod <- function(generic, ...) {
ch <- deparse(substitute(generic))
f <- X <- function(x, ...) UseMethod("X")
for(m in methods(ch)) assign(sub(ch, "X", m, fixed = TRUE), "body<-"(f, value = m))
X(...)
}
Now test it. (Note that the one-liner in the question fails with an error in several of these tests but findMethod gives the expected result.)
findMethod(as.ts, iris)
## [1] "as.ts.default"
findMethod(print, iris)
## [1] "print.data.frame"
findMethod(print, Sys.time())
## [1] "print.POSIXct"
findMethod(print, 22)
## [1] "print.default"
# in this example it looks at 2nd component of class vector as no print.ordered exists
class(ordered(3))
## [1] "ordered" "factor"
findMethod(print, ordered(3))
## [1] "print.factor"
findMethod(`[`, BOD, 1:2, "Time")
## [1] "[.data.frame"
I use this:
s3_method <- function(generic, class, env = parent.frame()) {
fn <- get(generic, envir = env)
ns <- asNamespace(topenv(fn))
tbl <- ns$.__S3MethodsTable__.
for (c in class) {
name <- paste0(generic, ".", c)
if (exists(name, envir = tbl, inherits = FALSE)) {
return(get(name, envir = tbl))
}
if (exists(name, envir = globalenv(), inherits = FALSE)) {
return(get(name, envir = globalenv()))
}
}
NULL
}
For simplicity this doesn't return methods defined by assignment in the calling environment. The global environment is checked for convenience during development. These are the same rules used in r-lib packages.

Functions returning functions - substituting in parameters [duplicate]

Hadley Wickham recently asked an interesting question on the r-devel mailing list, and being unable to find an existing question on the topic on StackOverflow, I thought it might be useful for it exist here as well.
To paraphrase:
An R function consists of three elements: an argument list, a body and an environment. Can we construct a function programmatically from these three elements?
(A fairly comprehensive answer is reached at the end of the thread in the r-devel link above. I will leave this open for others to recreate the benchmarking of the various solutions themselves and supply it as an answer, but be sure to cite Hadley if you do. If no one steps up in a few hours I'll do it myself.)
This is an expansion on the discussion here.
Our three pieces need to be an argument list, a body and an environment.
For the environment, we will simply use env = parent.frame() by default.
We do not really want a regular old list for the arguments, so instead we use alist
which has some different behavior:
"...values are not evaluated, and tagged arguments with no value are allowed"
args <- alist(a = 1, b = 2)
For the body, we quote our expression to get a call:
body <- quote(a + b)
One option is to convert args to a pairlist and then simply call the function function
using eval:
make_function1 <- function(args, body, env = parent.frame()) {
args <- as.pairlist(args)
eval(call("function", args, body), env)
}
Another option is to create an empty function, and then fill it with the desired values:
make_function2 <- function(args, body, env = parent.frame()) {
f <- function() {}
formals(f) <- args
body(f) <- body
environment(f) <- env
f
}
A third option is to simply use as.function:
make_function3 <- function(args, body, env = parent.frame()) {
as.function(c(args, body), env)
}
And finally, this seems very similar to the first method to me, except
we are using a somewhat different idiom to create the function call, using
substitute rather than call:
make_function4 <- function(args, body, env = parent.frame()) {
subs <- list(args = as.pairlist(args), body = body)
eval(substitute(`function`(args, body), subs), env)
}
library(microbenchmark)
microbenchmark(
make_function1(args, body),
make_function2(args, body),
make_function3(args, body),
make_function4(args, body),
function(a = 1, b = 2) a + b
)
Unit: nanoseconds
expr min lq median uq max
1 function(a = 1, b = 2) a + b 187 273.5 309.0 363.0 673
2 make_function1(args, body) 4123 4729.5 5236.0 5864.0 13449
3 make_function2(args, body) 50695 52296.0 53423.0 54782.5 147062
4 make_function3(args, body) 8427 8992.0 9618.5 9957.0 14857
5 make_function4(args, body) 5339 6089.5 6867.5 7301.5 55137
rlang has a function called new_function that does this :
Usage
new_function(args, body, env = caller_env())
library(rlang)
g <- new_function(alist(x = ), quote(x + 3))
g
# function (x)
# x + 3
There is also the issue of creating alist objects programmatically as that can be useful for creating functions when the number of arguments is variable.
An alist is simply a named list of empty symbols. These empty symbols can be created with substitute(). So:
make_alist <- function(args) {
res <- replicate(length(args), substitute())
names(res) <- args
res
}
identical(make_alist(letters[1:2]), alist(a=, b=))
## [1] TRUE
I am not sure this will help, but below code might be beneficial in some scenarios,
hello_world can be the string which will be used to create function and assign will be used to name function hello_world
hello_world <- "print('Hello World')"
assign("Hello",function()
{
eval(parse(text = hello_world))
}, envir = .GlobalEnv)
This will create a function called hello_world

clusterExport, environment and variable scoping

I wrote a function in which I define variables and load objects. Here's a simplified version:
fn1 <- function(x) {
load("data.RData") # a vector named "data"
source("myFunctions.R")
library(raster)
library(rgdal)
a <- 1
b <- 2
r1 <- raster(ncol = 10, nrow = 10)
r1 <- init(r1, fun = runif)
r2 <- r1 * 100
names(r1) <- "raster1"
names(r2) <- "raster2"
m <- stack(r1, r2) # basically, a list of two rasters in which it is possible to access a raster by its name, like this: m[["raster1"]]
c <- fn2(m)
}
Function "fn2" is can be found in "myFunctions.R" and is defined as:
fn2 <- function(x) {
fn3 <- function(y) {
x[[y]] * 100 * data
}
cl <- makeSOCKcluster(8)
clusterExport(cl, list("x"), envir = environment())
clusterExport(cl, list("a", "b", "data"))
clusterEvalQ(cl, c(library(raster), library(rgdal), rasterOptions(maxmemory = a, chunksize = b)))
f <- parLapply(cl, names(x), fn3)
stopCluster(cl)
}
Now, when I run fn1, I get an error like this:
Error in get(name, envir = envir) : object 'a' not found
From what I understand from ?clusterExport, the default value for envir is .GlobalEnv, so I would assume that "a" and "b" would be accessible to fn2. However, it doesn't seem to be the case. How can I access the environment to which "a" and "b" belong?
So far, the only solution I have found is to pass "a" and "b" as arguments to fn2. Is there a way to use these two variables in fn2 without passing them as arguments?
Thanks a lot for your help.
You're getting the error when calling clusterExport(cl, list("a", "b", "data")) because clusterExport is trying to find the variables in .GlobalEnv, but fn1 isn't setting them in .GlobalEnv but in its own local environment.
An alternative is to pass the local environment of fn1 to fn2, and specify that environment to clusterExport. The call to fn2 would be:
c <- fn2(m, environment())
If the arguments to fn2 are function(x, env), then the call to clusterExport would be:
clusterExport(cl, list("a", "b", "data"), envir = env)
Since environments are passed by reference, there should be no performance problem doing this.

How to create an R function programmatically?

Hadley Wickham recently asked an interesting question on the r-devel mailing list, and being unable to find an existing question on the topic on StackOverflow, I thought it might be useful for it exist here as well.
To paraphrase:
An R function consists of three elements: an argument list, a body and an environment. Can we construct a function programmatically from these three elements?
(A fairly comprehensive answer is reached at the end of the thread in the r-devel link above. I will leave this open for others to recreate the benchmarking of the various solutions themselves and supply it as an answer, but be sure to cite Hadley if you do. If no one steps up in a few hours I'll do it myself.)
This is an expansion on the discussion here.
Our three pieces need to be an argument list, a body and an environment.
For the environment, we will simply use env = parent.frame() by default.
We do not really want a regular old list for the arguments, so instead we use alist
which has some different behavior:
"...values are not evaluated, and tagged arguments with no value are allowed"
args <- alist(a = 1, b = 2)
For the body, we quote our expression to get a call:
body <- quote(a + b)
One option is to convert args to a pairlist and then simply call the function function
using eval:
make_function1 <- function(args, body, env = parent.frame()) {
args <- as.pairlist(args)
eval(call("function", args, body), env)
}
Another option is to create an empty function, and then fill it with the desired values:
make_function2 <- function(args, body, env = parent.frame()) {
f <- function() {}
formals(f) <- args
body(f) <- body
environment(f) <- env
f
}
A third option is to simply use as.function:
make_function3 <- function(args, body, env = parent.frame()) {
as.function(c(args, body), env)
}
And finally, this seems very similar to the first method to me, except
we are using a somewhat different idiom to create the function call, using
substitute rather than call:
make_function4 <- function(args, body, env = parent.frame()) {
subs <- list(args = as.pairlist(args), body = body)
eval(substitute(`function`(args, body), subs), env)
}
library(microbenchmark)
microbenchmark(
make_function1(args, body),
make_function2(args, body),
make_function3(args, body),
make_function4(args, body),
function(a = 1, b = 2) a + b
)
Unit: nanoseconds
expr min lq median uq max
1 function(a = 1, b = 2) a + b 187 273.5 309.0 363.0 673
2 make_function1(args, body) 4123 4729.5 5236.0 5864.0 13449
3 make_function2(args, body) 50695 52296.0 53423.0 54782.5 147062
4 make_function3(args, body) 8427 8992.0 9618.5 9957.0 14857
5 make_function4(args, body) 5339 6089.5 6867.5 7301.5 55137
rlang has a function called new_function that does this :
Usage
new_function(args, body, env = caller_env())
library(rlang)
g <- new_function(alist(x = ), quote(x + 3))
g
# function (x)
# x + 3
There is also the issue of creating alist objects programmatically as that can be useful for creating functions when the number of arguments is variable.
An alist is simply a named list of empty symbols. These empty symbols can be created with substitute(). So:
make_alist <- function(args) {
res <- replicate(length(args), substitute())
names(res) <- args
res
}
identical(make_alist(letters[1:2]), alist(a=, b=))
## [1] TRUE
I am not sure this will help, but below code might be beneficial in some scenarios,
hello_world can be the string which will be used to create function and assign will be used to name function hello_world
hello_world <- "print('Hello World')"
assign("Hello",function()
{
eval(parse(text = hello_world))
}, envir = .GlobalEnv)
This will create a function called hello_world

Resources