parallel::clusterExport how to pass nested functions from global environment? - r

I'm making a function (myFUN) that calls parallel::parApply at one point, with a function yourFUN that is supplied as an argument.
In many situations, yourFUN will contain custom functions from the global environment.
So, while I can pass "yourFUN" to parallel::clusterExport, I cannot know the names of functions inside it beforehand, and clusterExport returns me an error because it cannot find them.
I don't want to export the whole enclosing environment of yourFUN, since it might be very big.
Is there a way for me to export only the variables necessary for running yourFUN?
The actual function is very long, here is a minimized example of the error:
mydata <- matrix(data = 1:9, 3, 3)
perfFUN <- function(x) 2*x
opt_perfFUN <- function(y) max(perfFUN(y))
avg_perfFUN <- function(w) perfFUN(mean(w))
myFUN <- function(data, yourFUN, n_cores = 1){
cl <- parallel::makeCluster(n_cores)
parallel::clusterExport(cl, varlist = c("yourFUN"), envir = environment())
parallel::parApply(cl, data, 1, yourFUN)
}
myFUN(data = mydata, yourFUN = opt_perfFUN)
myFUN(data = mydata, yourFUN = avg_perfFUN)
Error in checkForRemoteErrors(val) : one node produced an error: could not find function "perfFUN"
Thank you very much!

A possible solution, use:
myFUN <- function(data, yourFUN, n_cores = 1) {
cl <- parallel::makeCluster(n_cores)
on.exit(parallel::stopCluster(cl), add = TRUE)
envir <- environment(yourFUN)
parallel::clusterExport(cl, varlist = ls(envir), envir = envir)
parallel::parApply(cl, data, 1, yourFUN)
}

Related

nested lapply to allow calling functions with multiple inputs

EDIT: I have a few functions with multiple arguments I'd like to apply over a list of lists. One of the arguments is also a list of lists.
Both functions have multiple parameters. two of which I have to indicate recursively across the lists of lists.
say I have the two following lists of lists.
mylist <- list(list(10,12,13,14,15), list(5,6,7,8,9))
m <- list(list(2,2,2,3,4), list(3,3,4,4,5))
and the functions
func1 <- function(x, att1 = m, const = 10){
e <- x^m + const
return(e)
}
func2 <- function(x, att2 = m, const = 10){
d <- sqrt(x)/m + const
return(d)
}
I don't know how to address the right argument, att1 or att2, when I want to call each function.
I tried the function below using eval(substitute(att1 = a))
nested_function <- function(df_list, FUN = func1, changing_param = a, ...){
nested_output <- lapply(seq(df_list), function(i){
lapply(seq(df_list[[i]]), function(j){
FUN(df[[i]][[j]], eval(substitute(changing_param))[[i]][[j]],...)
})
})
return(nested_output)
}
result <- nested_function(df_list, FUN = func1, changing_param = 'att1 = a')
and got the following error:
Error in df[[i]] : object of type 'closure' is not subsettable
6.
FUN(df[[i]][[j]], eval(substitute(changing_param))[[i]][[j]],
...)
5.
FUN(X[[i]], ...)
4.
lapply(seq(df_list[[i]]), function(j) {
FUN(df[[i]][[j]], eval(substitute(changing_param))[[i]][[j]],
...)
})
3.
FUN(X[[i]], ...)
2.
lapply(seq(df_list), function(i) {
lapply(seq(df_list[[i]]), function(j) {
FUN(df[[i]][[j]], eval(substitute(changing_param))[[i]][[j]],
...) ...
1.
nested_function(mylist, changing_param = m)
My problem is how to make the FUN() call recognize a as att1 in func1 and att2 in func2, since I have to designate them for each function (I can't just put the argument there).
any suggestions?
If you need to dynamically build parameter names, you often have to end up using do.call (at least with base R). I'm sure quite how all your variables were supposed to work in your example, so here's an adapted version that runs.
df_list <- list(list(10,12,13,14,15), list(5,6,7,8,9))
param_list <- list(list(2,2,2,3,4), list(3,3,4,4,5))
func1 <- function(x, att1 = m, const = 10){
e <- x^att1 + const
return(e)
}
func2 <- function(x, att2 = m, const = 10){
d <- sqrt(x)/att2 + const
return(d)
}
nested_function <- function(df_list, param_list, FUN = func1, changing_param = "a", ...){
nested_output <- lapply(seq(df_list), function(i){
lapply(seq(df_list[[i]]), function(j){
params <- list(df_list[[i]][[j]], param_list[[i]][[j]], ...)
names(params)[2] <- changing_param
do.call(FUN, params)
})
})
return(nested_output)
}
nested_function(df_list, param_list, func1, changing_param = 'att1', const=1)
nested_function(df_list, param_list, func2, changing_param = 'att2', const=2)
Here we pass in the name of the parameter we want as a string. Then when we build the parameter we are going to pass to the function, we rename the parameter we are passing in with the name supplied, then just call the function.

How to pass list of arguments to method in R?

I'm trying to pass arguments as a list to a method. I'm creating methods of stuff to pass to a data.frame. Example:
dfApply <- function(df, ...) {
UseMethod("dfApply", df)
}
dfApply.sample <- function(df, size, ...) {
# Stuff
df <- sample_frac(df, size = size)
return(df)
}
Now, if I call the function:
args <- list(size = 0.5)
class(df) <- c("sample", class(df))
df <- dfApply(df, args)
The method still receives it as a list().
Is there a way to pass arguments like this?
EDIT:
As mentioned in the comments, do.call() solves the problem (for now), but I have to define every argument in args:
args <- list(df = df, size = 0.5)
class(df) <- c("sample", class(df))
df <- do.call(dfApply, args)
Is this a wise way to implement methods? Doesn't seem right.

Write a wrapper function to successfully take addition arguments (like subset) via ellipsis (...)

I am writing a function that calls another function (e.g. lm), and I would like to pass other
arguments to it using ellipsis (...). However, the data to be used is not
in the global environment, but inside a list. A minimal example:
L <- list(data = chickwts, other = 1:5)
wrapper <- function(list, formula = NULL, ...){
if (missing(formula)) formula <- formula(weight~feed)
lm(formula, data = list$data, ...)
}
wrapper(L, subset = feed != "casein") #fails
I can make it work using attach but I'm sure there is more efficient ways of doing it by specifying the evaluation frame...?
wrapper2 <- function(list, formula = NULL, ...){
if (missing(formula)) formula <- formula(weight~feed)
attach(list$data)
m <- lm(formula, ...)
detach(list$data)
return(m)
}
wrapper2(L, subset = feed != "casein") #works
Another solution I have used before is to use list(...), and dealing with the arguments manually, but that would not be practical in the real situation.
I can see that this is fairly basic, but I couldn't find a solution. Any suggestion to the specific problem and also a link to a good conceptual explanation of environments in general would be appreciated.
We would need to construct a call and eval it.
wrapper <- function(list, formula = NULL, ...){
if (missing(formula)) formula <- weight ~ feed
cl <- match.call()
cl$list <- NULL
cl$formula <- formula
cl$data <- quote(list$data)
cl[[1]] <- quote(stats::lm)
eval(cl)
}
Reproducible example:
L <- list(data = trees, other = 1:5)
wrapper(L, Height ~ Girth, subset = Volume > 20)

foreach behaviour in function, in package, with custom iterator

I have created a custom iterator, inheriting from there iter class in the iterators package. The iterator and its methods are not exported from the package. Here is the iterator and a test function in a script that is reproducible and runnable, the iterator is called pairsRef:
library(Biostrings)
library(iterators)
library(foreach)
setGeneric("maskSequences", function(object, seqnames, invert = TRUE, append = "union"){
standardGeneric("maskSequences")
})
setMethod("maskSequences",
signature("DNAMultipleAlignment", "character", "logical", "character"),
function(object, seqnames, invert, append) {
sequenceNames <- rownames(object)
rowmask(object, append = append, invert = invert) <-
which(sequenceNames %in% seqnames)
return(object)
})
pairsRef <- function(obj, ...){
UseMethod('pairsRef')
}
pairsRef.DNAMultipleAlignment <-
function(obj, ref = NULL, checkFun = function(x, ...) TRUE){
state <- new.env()
state$i <- 0L
state$obj <- obj
if(is.null(ref)){
state$ref <- rownames(obj)[1]
} else {
state$ref <- ref
}
state$nonRefs <- rownames(obj)
state$nonRefs <- state$nonRefs[state$nonRefs != state$ref]
it <- list(state=state, checkFun = checkFun)
class(it) <- c("pairsRef", "abstractiter", "iter")
return(it)
}
nextElem.pairsRef <- function(obj, ...){
repeat {
obj$state$i <- obj$state$i + 1L
if(obj$state$i > length(obj$state$nonRefs))
stop('StopIteration', call.=FALSE)
pair <- maskSequences(obj$state$obj,
c(obj$state$ref, obj$state$nonRefs[obj$state$i]),
invert = TRUE,
append = "replace"
)
if(obj$checkFun(pair)){
return(pair)
}
}
}
Test2 <- function(dna, ref){
pit <- pairsRef(dna, ref = ref, checkFun = function(x) TRUE)
results <- foreach(x = pit, .combine = c, .multicombine = TRUE) %do% {x}
return(results)
}
dna <-
readDNAMultipleAlignment(filepath =
system.file("extdata",
"msx2_mRNA.aln",
package="Biostrings"),
format="clustal")
Test2(dna, rownames(dna)[1])
However, I want to use this iterator to do foreach loops in a package of mine.
If I put Test2 in a package (exported), and I have all the other functions in the package (unexported), and I have the package namespace import Biostrings, iterators, and foreach. It does not work. With a fresh R session, loading the package, and the running:
dna <-
Biostrings::readDNAMultipleAlignment(filepath =
system.file("extdata",
"msx2_mRNA.aln",
package="Biostrings"),
format="clustal")
Test2(dna, rownames(dna)[1])
Results in an error: "Error in { : attempt to apply non-function"
Is this because the custom iterator is internal to the package? Any help or suggestions are greatly appreciated.
[EDIT] - If I export the iterator and it's functions from the package. Then all works fine. But I don't necessarily want to export iterators of the package.
Thanks,
Ben.
In order for it to work inside the package, the method nextElem must be imported from iterators, and then the additional method unique to the package, exported, such that it is visible to the functions in the foreach package namespace.

clusterExport, environment and variable scoping

I wrote a function in which I define variables and load objects. Here's a simplified version:
fn1 <- function(x) {
load("data.RData") # a vector named "data"
source("myFunctions.R")
library(raster)
library(rgdal)
a <- 1
b <- 2
r1 <- raster(ncol = 10, nrow = 10)
r1 <- init(r1, fun = runif)
r2 <- r1 * 100
names(r1) <- "raster1"
names(r2) <- "raster2"
m <- stack(r1, r2) # basically, a list of two rasters in which it is possible to access a raster by its name, like this: m[["raster1"]]
c <- fn2(m)
}
Function "fn2" is can be found in "myFunctions.R" and is defined as:
fn2 <- function(x) {
fn3 <- function(y) {
x[[y]] * 100 * data
}
cl <- makeSOCKcluster(8)
clusterExport(cl, list("x"), envir = environment())
clusterExport(cl, list("a", "b", "data"))
clusterEvalQ(cl, c(library(raster), library(rgdal), rasterOptions(maxmemory = a, chunksize = b)))
f <- parLapply(cl, names(x), fn3)
stopCluster(cl)
}
Now, when I run fn1, I get an error like this:
Error in get(name, envir = envir) : object 'a' not found
From what I understand from ?clusterExport, the default value for envir is .GlobalEnv, so I would assume that "a" and "b" would be accessible to fn2. However, it doesn't seem to be the case. How can I access the environment to which "a" and "b" belong?
So far, the only solution I have found is to pass "a" and "b" as arguments to fn2. Is there a way to use these two variables in fn2 without passing them as arguments?
Thanks a lot for your help.
You're getting the error when calling clusterExport(cl, list("a", "b", "data")) because clusterExport is trying to find the variables in .GlobalEnv, but fn1 isn't setting them in .GlobalEnv but in its own local environment.
An alternative is to pass the local environment of fn1 to fn2, and specify that environment to clusterExport. The call to fn2 would be:
c <- fn2(m, environment())
If the arguments to fn2 are function(x, env), then the call to clusterExport would be:
clusterExport(cl, list("a", "b", "data"), envir = env)
Since environments are passed by reference, there should be no performance problem doing this.

Resources