I'm trying to use foreach to do multicore computing in R.
A <-function(....) {
foreach(i=1:10) %dopar% {
B()
}
}
then I call function A in the console. The problem is I'm calling a function Posdef inside B that is defined in another script file which I source. I had to put Posdef in the list of export argument of foreach: .export=c("Posdef"). However I get the following error:
Error in { : task 3 failed - "could not find function "Posdef""
Why cant R find this defined function?
So I can reproduce this, for the curious:
require(doSNOW)
registerDoSNOW(makeCluster(5, type="SOCK"))
getDoParWorkers()
getDoParName()
getDoParVersion()
fib <- function(n) {
if (n <= 1) { return(1) }
return(fib(n-1) + fib(n-2))
}
my.matrix <- matrix(runif(2500, 10, 50), nrow=50)
calcLotsaFibs <- function() {
result <- foreach(row.num=1:nrow(my.matrix), .export=c("fib", "my.matrix")) %dopar% {
return(Vectorize(fib)(my.matrix[row.num,]))
}
return(result)
}
lotsa.fibs <- calcLotsaFibs()
I have been able to get around this by putting the function in another file and loading that file in the body of the foreach. You could also obviously move the function definition into the body of the foreach itself.
[EDIT -- I had previously suggested that perhaps .export doesn't work properly with function names, but was corrected below.]
The short answer is that this was a bug in parallel backends such as doSNOW, doParallel and doMPI, but it has since been fixed.
The slightly longer answer is that foreach exports functions to the workers using a special "export" environment, not the global environment. That used to cause problems for functions that were created in the global environment, because the "export" environment wasn't in their scope, even though they were now defined in that same "export" environment. Thus, they couldn't see any other functions or variables defined in the "export" environment, such as "Posdef" in your case.
The doSNOW, doParallel and doMPI backends now change the associated environment from the global to the "export" environment for functions exported via ".export", and seems to have resolved these issues.
Quick fix for problem with foreach %dopar% is to reinstall these packages:
install.packages("doSNOW")
install.packages("doParallel")
install.packages("doMPI")
It worked in my case.
Related
I am trying to a debug an issue with using unit tests with testthat. The code runs fine if run manually, however it seems when running test(), the workers inside the foreach don't seem to have access to the package or functions inside the package I am testing. The code is quite complex so I don't have a great working example, but here is the outline of the structure:
unit test in tests/testthat:
test_that("dataset runs successful", {
expect_snapshot_output(myFunc(dataset, params))
})
MyFunc calls another func, and inside that func, creates workers to run some code:
final_out <- foreach(i = 1:nrow(data),
.combine = c,
.export = c("func1", "func2", "params"),
.packages = c("fields", "dplyr")) %dopar% {
output = func1(stuff)
more = func2(stuff)
out = rbind(output, more)
return (out)
}
The workers don't seem to have access to func1, func2 etc..
I tried adding the name of the package to packages in this line, but it doesn't work either.
Any ideas?
As I mentioned, this is only an issue when trying to run the unit tests and I suspect it is somehow related to how the package I am testing is being loaded?
When the workers are started they do not have the full set of packages a normal session has; pass all package names of the packages in the search path when the tests are running in the the local session to the .packages argument.
I'm attempting to build an R package from code that works outside a package. My first try and it is rather complex, nested functions that end up doing parallel processing using doMPI and foreach. Also using RStudio 1.01.43 on Ubuntu 16.04. I build the package and works ok. Then when I try to run the top level function which calls the next it throws an error:
Error in { : task 6 failed - "object 'RunOys' not found"
I'm setting the boolean variable RunOys=TRUE manually before calling the top level function, when it gets down to the one where this variable is called for an ifelse statement it fails. Before I call the top level function I check the globalenv() and
> RunOys
[1] TRUE
In the foreach parallel code I have this statement, which works find until compiled into an R package:
FinalCalcs <- function (...) {
results <- data.frame ( foreach::`%dopar%`(
foreach::`%:%`(foreach::foreach(j = 1:NumSim, .combine = acomb,
.options.mpi=opts1),
foreach::foreach (i = 1:PopSize, .combine=rbind,
.options.mpi=opts2,
.export = c(ls(globalenv())),
.packages = c("zoo", "msm", "FAdist", "qmra"))),
{
which should export all of the objects in globalenv() to each slave.
I can't understand why some variables seem to get passed and not other. Do I need to specify it explicitly as a #param in the file for the function where it is called?
With foreach, the better is to have all the needed variables present in the same environment where foreach is called. So basically, I always use foreach inside a function and pass all the variables that are needed in the foreach to this function.
Do as if foreach couldn't see past its calling function. You won't need to export anything. For functions, use package::function (like in packages so that you don't need to #import packages).
How can I export the global environment for the beginning of each parallel simulation in foreach? The following code is part of a function that is called to run the simulations.
num.cores <- detectCores()-1
cluztrr <- makeCluster(num.cores)
registerDoParallel(cl = cluztrr)
sim.result.list <- foreach(r = 1:simulations,
.combine = list,
.multicombine = TRUE,
) %dopar% {
#...tons of calculations using many variables...
list(vals1,
vals2,
vals3)
}
stopCluster(cluztrr)
Is it necessary to use .export with a character vector of every variable and function that I use? Would that be slow in execution time?
If the foreach loop is in the global environment, variables should be exported automatically. If not, you can use .export = ls(globalenv()) (or .GlobalEnv).
For functions from other packages, you just need to use the syntax package::function.
The "If [...] in the global environment, ..." part of F. Privé reply is very important here. The foreach framework will only identify global variables in that case. It will not do so if the foreach() call is done within a function.
However, if you use the doFuture backend (disclaimer: I'm the author);
library("doFuture")
registerDoFuture()
plan(cluster, workers = cl)
global variables that are needed will be automatically identified and exported (which is then done by the future framework and not the foreach framework). Now, if you rely on this, and don't explicitly specify .export, then your code will only work with doFuture and none of the other backends. That's a decision you need to make as a developer.
Also, automatic exporting of globals is neat, but be careful that you know how much is exported; exporting too many too large objects can be quite costly and introduce lots of overhead in your parallel code.
I'm testing the doRedis package by running a worker one machine and the master/server on another. The code on my master looks like this:
#Register ...
r <- foreach(a=1:numreps, .export(...)) %dopar% {
train <- func1(..)
best <- func2(...)
weights <- func3(...)
return ...
}
In every function, a global variable is accessed, but not modified. I've exported the global variable in the .export portion of the foreach loop, but whenever I run the code, an error occurs stating that the variable was not found. Interestingly, the code works when all my workers on one machine, but crashes when I have an "outside" worker. Any ideas why this error is occurring, and how to correct it?
Thanks!
UPDATE: I have a gist of some code here: https://gist.github.com/liangricha/fbf29094474b67333c3b
UPDATE2: I asked a another to doRedis related question: "Would it be possible allow each worker machine to utilize all of its cores?
#Steve Weston responded: "Starting one redis worker per core will often fully utilize a machine."
This kind of code was a problem for the doParallel, doSNOW, and doMPI packages in the past, but they were improved in the last year or so to handle it better. The problem is that variables are exported to a special "export" environment, not to the global environment. That is preferable in various ways, but it means that the backend has to do more work so that the exported variables are in the scope of the exported functions. It looks like doRedis hasn't been updated to use these improvements.
Here is a simple example that illustrates the problem:
library(doRedis)
registerDoRedis('jobs')
startLocalWorkers(3, 'jobs')
glob <- 6
f1 <- function() {
glob
}
f2 <- function() {
foreach(1:3, .export=c('f1', 'glob')) %dopar% {
f1()
}
}
f2() # fails with the error: "object 'glob' not found"
If the doParallel backend is used, it succeeds:
library(doParallel)
cl <- makePSOCKcluster(3)
registerDoParallel(cl)
f2() # works with doParallel
One workaround is to define the function "f1" inside function "f2":
f2 <- function() {
f1 <- function() {
glob
}
foreach(1:3, .export=c('glob')) %dopar% {
f1()
}
}
f2() # works with doParallel and doRedis
Another solution is to use some mechanism to export the variables to the global environment of each of the workers. With doParallel or doSNOW, you could do that with the clusterExport function, but I'm not sure how to do that with doRedis.
I'll report this issue to the author of the doRedis package and suggest that he update doRedis to handle exported functions like doParallel.
I am using doSMP as a parallel backend in Windows 7, with R 2.12.2. I incur in an error, and would like to understand the likely cause. Here is some sample code to reproduce the error.
require(foreach)
require(doSMP)
require(data.table)
wrk <- startWorkers(workerCount = 2)
registerDoSMP(wrk)
DF = data.table(x=c("b","b","b","a","a"),v=rnorm(5))
setkey(DF,x)
foreach( i=1:2) %dopar% {
DF[J("a"),]
}
The error message is
Error in { : task 1 failed - "could not find function "J""
I've not used doSMP, but I did some digging around and it looks like this post gets at a similar issue.
so it looks like you should be able to do:
foreach( i=1:2, .packages="data.table") %dopar% {
DF[J("a"),]
}
I can't test as I don't have a Windows machine handy.
OK, I asked Revolution computing, and Steve Weller (of RC) replied:
The problem is a R scoping issue. By
default, foreach() will look for
variables defined in it's own
'environment'. Any objects defined
outside of it's scope need to be
explicitly passed to it via the
'.export' argument.
In your case, you will need to modify
your 'foreach()' call to pass in the
objects 'DF' and 'J':
...
foreach(i=1:2, .export=c("DF","J")) %dopar% {
...
I haven't tried either solution yet, but I trust both JD and RC...