R foreach parallel can not find global function - r

I try to replace lapply function with parallel foreach.
I have two functions:
dRet <- function(x,per,sloss,daysToReopt){
...
}
getSum <- function(curEnv,perTP){
...
dRetlst <- function(x) return(dRet(x,perTP,sl,days))
Es_1 <- lapply(stlst,dRetlst)
Es_2 <- foreach(a = stlst) %do% dRetlst(a)
...
}
perTp,sl,days- are constant.
stlst is list of lists (of xts).
If I do this way, everything will be OK (Es_1 equal to Es_2).
I have replaced getSum function:
getSum <- function(curEnv,perTP){
...
dRetlst <- function(x) return(dRet(x,perTP,sl,days))
cl<-makeCluster(2)
registerDoParallel(cl)
#registerDoSNOW(cl)
Es_2 <- foreach(a = stlst) %dopar% dRetlst(a)
stopCluster(cl)
...
}
As a result, I had error: Error in dRetlst(a) :
task 1 failed - "can not find function "dRet""
How can i solve this problem without adding dRet into getSum?
(R version 3.1.2, Windows 8)

Use the foreach .export option to explicitly export dRet to the workers:
Es_2 <- foreach(a = stlst, .export='dRet') %dopar% dRetlst(a)
Also, I think the foreach loop would be more readable as:
Es_2 <- foreach(a = stlst, .export='dRet') %dopar% {
dRet(a,perTP,sl,days)
}

Related

Parallel computations with for each and %dopar% do not generate a file

I am trying to use the doParallel package with foreach and %dopar% for the first time as I need to increase the speed of my computation.
Although the code is executed without raising any error, the file is not stored in the output folder.
I have a list of file paths (list_files) and a function (my_function) that I previously validated using sapply. When I use sapply, the output is stored in the output location. Using foreach and %dopar% returns no output in my output location.
# Define my function and call it my_function
my_function <- function(input_dir, output_dir) {
tryCatch(
expr = {
file <- read.csv(input_dir,sep = "\t", col.names = c("column1", "column2"))
file <- as_tibble(file)
file_noNA <- file %>% filter(!is.na(column1))
name <- substr(input_dir, nchar(input_dir)-8, nchar(input_dir)-4)
save(file_noNA, file = paste0(output_dir, name, ".rds"))
}
)
}
library("parallel")
library("foreach")
library("doParallel")
# Set number of cores
n.cores <- 5
# Check doParallel package
doParallel::registerDoParallel(n.cores)
getDoParWorkers()
# Apply function with parallel computing
foreach(i = list_files) %dopar% function(x) {
my_function(
input_dir = x,
output_dir = output_location)
}
This is what I have tried (without success):
Assigned the result
used foreach(i = list_files, .combine = 'c') %dopar% function(x) {...}
Used a single file instead of list_files
Reduced the number of cores
Do I need to add an export statement, e.g.
.export=ls(envir=globalenv())
or
.export=ls() ?

Foreach .combine Function to combine lists in R

the following is a parallel loop I am trying to run in R:
cl <- makeCluster(30,type="SOCK")
registerDoSNOW(cl)
results <- foreach (i = 1:30, .combine='bindlist', .multicombine=TRUE) %dopar% {
test <- i
test <- as.list(test)
list(test)
}
stopCluster(cl)
The output of my code is always a list and I want to combine the list into one large list. Thus I wrote the following .combine function:
bindlist <- function(x,y,...){
append(list(x),list(y),list(...))
}
As I am doing multiple runs and the number of variables change I tried to use .... However it does not work. How can I rewrite the .combine function so it can work with changing numbers of variables?
Have you considered using 'c'
results <- foreach (i = 1:4, .combine='c', .multicombine=TRUE) %dopar% {
test <- i
test <- as.list(test)
list(test)
}
If this adds an additional unwanted 'level' to your results, you could use 'unlist' to remove that level.
unlist(results, recursive = FALSE)

R foreach parallel and package variables

I have the following code where i get an error like:
object 'a' could not be found
This file is inside a package.
It works when i use %do% instead of %dopar%
a <- 2
fun1 <- function(x)
{
y <- x*a
return(y)
}
fun2 <- function(n)
{
foreach(data = 1:n, .combine = rbind, .multicombine = TRUE, .export = c("a","fun1")) %dopar%
{
load_all() # Works, get error if line is removed
y <- fun1(data)
return(y)
}
}
In my main file where i use devtools::load-all() to load the package and have used doParallel to run my foreach on multiple cores i execute fun2(5) when getting the error.
If i use a directly in the foreach function body everything works. But when i use a function that uses the a variable, i get the error.
My main issue is i wan't to be able to call the function fun1 from both fun2 aswell as stand alone from the package.
Cluster is created as
cl <- makeCluster(16)
registerDoParallel(cl)
clusterCall(cl, function(x) .libPaths(x), .libPaths())
# %dopar% code
stopCluster(cl)

Nested do.call within a foreach %dopar% environment can't find function passed with .export

I am nesting multiple levels of do.call (each themselves using functions named in the parameters, not hard-coded) within a %dopar% parallelized environment, and a function from my outside environment can't be found by the innermost function. I know about the .export parameter on foreach and am using it, but somehow the named function isn't propagating down the entire chain.
I reduced my issue to the following test case, which does exhibit this problem:
library(doParallel)
cl <- makeCluster(4)
registerDoParallel(cl)
simple.func <- function(a, b) {
return(a+b)
}
inner.func <- function(a, b) {
return(do.call(simple.func, list(a=a, b=b)))
}
outer.func <- function(a, b, my.func=inner.func) {
return(do.call(my.func, list(a=a, b=b)))
}
main.func <- function(my.list=1:10, my.func=outer.func,
my.args=list(my.func=inner.func)) {
results <- foreach(i=my.list, .multicombine=TRUE, .inorder=FALSE,
.export="simple.func") %dopar% {
return(do.call(my.func, c(list(a=i, b=i+1), my.args)))
}
return(results)
}
Rather than giving the correct answer (a list with some numbers), I get:
Error in { : task 1 failed - "object 'simple.func' not found"
Adding if (!exists("simple.func")) stop("Could not find parse.data in scope main.func") to the start of each function (changing the name of the scope as appropriate) reveals that it's inner.func which doesn't see simple.func -- even though outer.func does see it.
I also tested a couple of variations of the above, with either main.func or outer.func having the next-level function hard-coded, rather than using it from a parameter. Both of these variations do work (e.g., give the expected result), but for the real-world case I want to retain the generalizability of taking the sub-functions as parameters.
# Variation number one: Replace main.func() with this version
main.func <- function(my.list=1:10, my.func=outer.func,
my.args=list(my.func=inner.func)) {
results <- foreach(i=my.list, .multicombine=TRUE, .inorder=FALSE,
.export=c("simple.func", "outer.func", "inner.func")) %dopar% {
return(do.call(outer.func, list(a=i, b=i+1, my.func=inner.func)))
}
return(results)
}
# Variation number two: Replace outer.func() and main.func() with these versions
outer.func <- function(a, b, my.func=inner.func) {
return(do.call(inner.func, list(a=a, b=b)))
}
main.func <- function(my.list=1:10, my.func=outer.func,
my.args=list(my.func=inner.func)) {
results <- foreach(i=my.list, .multicombine=TRUE, .inorder=FALSE,
.export=c("simple.func", "inner.func")) %dopar% {
return(do.call(my.func, c(list(a=i, b=i+1), my.args)))
}
return(results)
}
I could also pass simple.func down the chain manually, by including it as an extra parameter, but this looks extra messy, and why should it be necessary when simple.func should just be passed along as part of the environment?
# Variation number three: Replace inner.func(), outer.func(), and main.func()
# with these versions
inner.func <- function(a, b, innermost.func=simple.func) {
return(do.call(innermost.func, list(a=a, b=b)))
}
outer.func <- function(a, b, my.func=inner.func,
innermost.args=list(innermost.func=simple.func)) {
return(do.call(my.func, c(list(a=a, b=b), innermost.args)))
}
main.func <- function(my.list=1:10, my.func=outer.func,
my.args=list(my.func=inner.func,
innermost.args=list(innermost.func=simple.func))) {
results <- foreach(i=my.list, .multicombine=TRUE, .inorder=FALSE,
.export="simple.func") %dopar% {
return(do.call(my.func, c(list(a=i, b=i+1), my.args)))
}
return(results)
}
Does anyone have ideas for less-kludgy solutions, or the underlying cause of this problem?
For doParallel, and any other doNnn adaptor that doesn't fork the current process, I think the following hack would do it:
main.func <- function(my.list = 1:10, my.func=outer.func,
my.args = list(my.func=inner.func)) {
results <- foreach(i = my.list, .multicombine = TRUE, .inorder = FALSE,
.export="simple.func") %dopar% {
environment(my.args$my.func) <- environment() ## <= HACK
return(do.call(my.func, args = c(list(a=i, b=i+1), my.args)))
}
return(results)
}
Alternatively, you can use the doFuture adaptor (I'm the author). Then you don't have to worry about global objects because they are automatically identified and exported. That is, there is no need for specifying .export (or .packages). For example, in your case the following works:
library("doFuture")
registerDoFuture()
plan(multisession, workers = 4)
main.func <- function(my.list = 1:10, my.func = outer.func,
my.args = list(my.func = inner.func)) {
foreach(i = my.list, .multicombine = TRUE, .inorder = FALSE) %dopar% {
do.call(my.func, args = c(list(a = i, b = i+1), my.args))
}
}
res <- main.func(1:3)
str(res)
## List of 10
## $ : num 3
## $ : num 5
## $ : num 7
You can also skip foreach() all along and do:
library("future")
plan(multisession, workers = 4)
main <- function(my.list = 1:10, my.func = outer.func,
my.args = list(my.func = inner.func)) {
future_lapply(my.list, FUN = function(i) {
do.call(my.func, args = c(list(a = i, b = i+1), my.args))
})
}
PS. There are lots of different plan() backends to choose from. The only one that is not covered is if you use doRedis.

Assign function output with assign

I am using
library(foreach)
library(doSNOW)
And I have a function mystoploss(data,n=14)
I then call it like that (I want to loop over n=14 for now):
registerDoSNOW(makeCluster(4, type = "SOCK"))
foreach(i = 14) %dopar% {assign(paste("Performance",i,sep=""),
mystoploss(data=mydata,n=i))}
I then try to find Performance14 from above, but it is not assigned.
Is there some way to assign so the output will be in Performance14?
And if I use
foreach(i = 14) %dopar% {assign(paste("Performance",i,sep=""),
mystoploss(data=mydata,n=i),envir = .GlobalEnv)}
I get error :
Error in e$fun(obj, substitute(ex), parent.frame(), e$data) :
worker initialization failed: Error in as.name
Best Regards
This is because the assign operations are happening in the worker processes. The vaues of the variables are being sent back (see your R session console) but not with the names you assigned. You need to capture these values and assign them names again. See this related question.
The following is an alternative that may be of help: asign the output of foreach to an intermediate variable and assign it to your desired variables in the current 'master process' environment.
PerformanceAll <- foreach(i = 1:14,.combine="c") %dopar% { mystoploss(data=mydata,n=i) } #pick .combine appropriately
for(i in 1:14){ assign(paste("Performance",i,sep=""), PerformanceAll[i]) }
Here is the full example I tried:
library(foreach)
library(doSNOW)
mystoploss <- function(data=1,n=1){
return(runif(data)) #some operation, returns a scalar
}
mydata <- 1
registerDoSNOW(makeCluster(4, type = "SOCK"))
PerformanceAll <- foreach(i = 1:14,.combine="c") %dopar% { mystoploss(data=mydata,n=i) }#pick .combine appropriately
for(i in 1:14){ assign(paste("Performance",i,sep=""), PerformanceAll[i]) }
Edit: If the output of mystoploss is a list, then do the following changes:
mystoploss <- function(data=1,n=1){#Example
return(list(a=runif(data),b=1)) #some operation, return a list
}
PerformanceAll <- foreach(i = 1:14) %dopar% { mystoploss(data=mydata,n=i) }#remove .combine
for(i in 1:14){ assign(paste("Performance",i,sep=""), PerformanceAll[[i]]) } #double brackets

Resources