Problems calling function in foreach loop in r - r

I'm running 200 simulations, varying one of my six parameters and using just the standard r-setting and a normal for loop. It takes 2 hours pr. variable I vary.
I was recommended running the loop on parallel cores and I found the function foreach and the doSnow library. I've been able to use simple examples posted on different r-blogs and on stack-overflow and ran them on my computer. But so far I've problems with my own written function.
I get the the following error:
Error in { : task 1 failed - "object 'delta' not found"
Here is a generic code describing the function:
simulation <- function(x){
#Parameter guesses
alpha <- x[1]
mean_ability <-x[2]
delta <- x[3]
var <- x[4]
lambda_0 <- x[5]
lambda_1 <- x[6]
HERE THE SIMULATION PART IS DONE
#Put moments together
c(lam_1_hat,lam_0_hat,delta_hat,mean_within,between_var,average_wage)
}
I put this function inside the foreach function:
foreach(kk=1:length(alpha_vec), .combine = 'c',.packages=#the packages...) %dopar% {
simulation(c(lambda_1[3],lambda_0[3],delta[3],alpha[kk],var[3],mean_abil[3]))[4]
}
So I keep every element fixed except alpha in this case.
During the simulation part I compute random numbers. The set seed command is defined outside the foreach loop. I tried to include it but the error was the same.
I have also tried to include the packages I use, using the .package specification in the foreach-package.
It could make it work by including the whole function code inside the foreach function, but this is surely not the optimal way.
Any suggestions on how to solve my problem?

I think you should include another parameter to your foreach loop, namely .export
Like this:
foreach(kk=1:length(alpha_vec), .combine = 'c',.packages=c(the packages), .export= c("simulation")) %dopar% {
simulation(c(lambda_1[3],lambda_0[3],delta[3],alpha[kk],var[3],mean_abil[3]))[4]
}

Related

How to remotely execute myfun and a scattered vector by using Rmpi

myfun is a user-defined function, for example,
myfun=function(x){x^3}.
Now I want to run myfun for a vector, say t=1:10, in parallel by using 3 slaves. My code looks as follows,
mpi.spawn.Rslaves(nslaves=3)
source("myfun.R")
mpi.bcast.cmd(myfun) #broadcast myfun to slaves
x=1:10
grp=ceiling(seq_along(x)/3)
grp[10]=3
sx=split(x,grp)
mpi.scatter.Robj2slave(sx) #scatter x into 3 groups to slaves
y=mpi.remote.exec(cmd=myfun,sx) #this does not work!
print(y)
mpi.close.Rslaves()
mpi.quit()
The problem is, Rmpi won't execute myfun on scattered sx properly. In the manual, about command mpi.remote.exec it says
...used as arguments to cmd (function command) for passing their
(master) values to R slaves, i.e., if ‘myfun(x)’ will be executed on R
slaves with ‘x’ as master variable, use mpi.remote.exec(cmd=myfun, x).
Since 'x' is a master variable, does this mean it's not possible to execute the myfun for scattered x on slaves? If not, what's the right way to parallel-compute the above example by using Rmpi?
Inside myfun, use
i <- mpi.comm.rank()
to pick off a component of sx
myfun<-
function(x)
{
i <- mpi.comm.rank()
x[[i]]^3
}
I think your issue is with the use of
mpi.bcast.cmd(myfun) #broadcast myfun to slaves
I think this should be
mpi.bcast.Robj2slave(myfun)
Notice the description in the help files.
There could be a second issue with your myfun as detailed by #zmirlig but it is hard to say without seeing the function's code.

Using rm(list=ls()) in a parallel environment in R

I am running code in R that runs a function in parallel. The code sets some parameters initially and loads libraries etc, then calls a function (called calibrate) and this runs across several workers using different input parameters on each worker in parallel and returns the result back to the centre. It works, and a number of iterations take place (sometimes more than a 100 over a couple of hours) but crashes after a while, and I suspect that it is a memory resource issue. Hence I want to include an rm type command to reduce memory usage:
At first the function looked like this:
Calibrate <- function() {
rm(list = ls())
gc()
...rest of code calling other functions
}
but this had very little effect. When looking closely and running the code line by line I realised that rm(list=ls()) will do very little inside a function.
So, I thought I would change the code to:
Calibrate <- function() {
ENV <- globalenv()
ll <- ls(envir = ENV)
lf <- lsf.str(envir = ENV)
ll <- ll[ll != lf]
rm(list = ll, envir = ENV)
....rest of code calling other functions
}
This will now get rid of all the variables but not the functions. However, I am worried that this will get rid of all the variables on all the other workers which will still be running. The code runs in parallel but the code does not necessarily run on all the workers at the same speed. So the code is effectively staggered. I only want to remove the variables for an individual worker when the calibrate function is called.
So my question, what should I be doing to clear the variables (rm) for one worker and not the whole system when running in parallel?
Help, really appreciated.

How to use plyr mdply with failsafe execution in parallel

I have to run an analysis on multiple datasets. I use plyr (mdply) with the doSNOW package to use multiple cores.
Sometimes the analysis code will fail,raising an error and stopping execution. I want the analysis to be continued for the other datasets. How to achieve that?
Solution 1: Coding so that all errors are caught which is not feasable.
Solution 2: A failsafe plyr wrapper to run the function in parallel that returns all valid results, and indicates where something went wrong.
I implemented the second solution (see answer below). The tricky part was that I wanted a single function call to accomplish the failsafe-and-return-a-data.frame feature.
How I went about constructing the function:
The actual function call is wrapped with tryCatch. It is called from within a callfailsafe function, which in turn is necessary to pass the individual function name simple and respective parameters in (...) to the whole procedure.
Maybe I did it overly complicated... but it works.
Be sure that your simple function does not rely on any globally defined functions or parameters, as these will not be loaded when used with .parallel=T and doSNOW.
Here is my test dataset: There are 100 tasks. For each a function "simple" will be called. However sometimes the function fails. I use it typically on tasks that autonomously load many rdata files do extensive processing, save some output and finally return a data.frame object.
library(plyr)
library(doSNOW)
N=100
multiargtab= data.frame(ID=1:N,A=round(runif(N,0,1)),B=round(runif(N,0,1)))
simple=function(ID,A,B){ # a function that will sometimes fail
if(B==0) rm(B)
data.frame(A=A,B=B,AB=A/B,ID=ID)
}
The signature of the calling function is:
res2=mdply.anyfun.parallel.failsafe(multiargtab,simple)
The function mdply.anyfun.parallel.failsafe takes a data.frame and a functionname myfunction (as character) as parameters. myfunction is then called for every row in the data.frame and passed all column values as parameters like the original mdply. Additionally to the original mdply functionality the function does not stop when a task fails, but continues on the other tasks. The error message of the failed task is returned in the column "error".
library(doSNOW)
library(plyr)
mdply.anyfun.parallel.failsafe=function(multiargtab,myfunction){
cl<-makeCluster(4)
registerDoSNOW(cl)
callfailsafe=function(...){
r=tryCatch.W.E(FUN(...))
val=r$value[[1]]
if(!"simpleError" %in% class(val)){
return(val)
}else{
return(data.frame(...,error= (as.character(val))))
}
}
tryCatch.W.E=function(expr) {
#pass a function, it will be run and result returned; if error then error will return - BUT function will not fail
W <- NULL
w.handler <- function(w){ # warning handler
W <<- w
invokeRestart("muffleWarning")
}
list(value = list(withCallingHandlers(tryCatch(expr, error = function(e) e), warning = w.handler)), warning = W)
}
FUN=match.fun(myfunction)
res=mdply(multiargtab,callfailsafe,.parallel=T)
stopCluster(cl)
res
}
Testing the function:
res2=mdply.anyfun.parallel.failsafe(multiargtab,simple)
Which generally works fine. I only have some strange errors when multiargtab is of type data.table
Error in data.table(..., key = key(..1)) :
Item 1 has no length. Provide at least one item
I circumvented the error by casting as as.data.frame ...although it would be interesting to know why data.table would not work.

Error when using %dopar% instead of %do% in R (package doParallel)

I've come up with a strange error.
Suppose I have 10 xts objects in a list called data. I now search for every three combinations using
data_names <- names(data)
combs <- combn(data_names, 3)
My basic goal is to do a PCA on those 1080 triples.
To speed things up I wanted do use the package doParallel. So here is the snippet shortened till the point where the error occurs:
list <- foreach(i=1:ncol(combs)) %dopar% {
tmp_triple <- combs[,i]
p1<-data[tmp_triple[[1]]][[1]]
p2<-data[tmp_triple[[2]]][[1]]
p3<-data[tmp_triple[[3]]][[1]]
data.merge <- merge(p1,p2,p3,all=FALSE)
}
Here, the merge function seems to be the problem. The error is
task 1 failed - "cannot coerce class 'c("xts", "zoo")' into a data.frame"
However, when changing %dopar% to a normal serial %do% everything works as accepted.
Till now I was not able to find any solution to this problem and I'm not even sure what to look for.
A better solution rather than explicitly loading the libraries within the function would be to utilise the .packages argument of the foreach() function:
list <- foreach(i=1:ncol(combs),.packages=c("xts","zoo")) %dopar% {
tmp_triple <- combs[,i]
p1<-data[tmp_triple[[1]]][[1]]
p2<-data[tmp_triple[[2]]][[1]]
p3<-data[tmp_triple[[3]]][[1]]
data.merge <- merge(p1,p2,p3,all=FALSE)
}
The problem is likely that you haven't called library(xts) on each of the workers. You don't say what backend you're using, so I can't be 100% sure.
If that's the problem, then this code will fix it:
list <- foreach(i=1:ncol(combs)) %dopar% {
library(xts)
tmp_triple <- combs[,i]
p1<-data[tmp_triple[[1]]][[1]]
p2<-data[tmp_triple[[2]]][[1]]
p3<-data[tmp_triple[[3]]][[1]]
data.merge <- merge(p1,p2,p3,all=FALSE)
}
Quick fix for problem with foreach %dopar% is to reinstall these packages:
install.packages("doSNOW")
install.packages("doParallel")
install.packages("doMPI")
These are responsible for parallelism in R. Bug which existed in old versions of these packages is now removed. It worked in my case.

could not find function inside foreach loop

I'm trying to use foreach to do multicore computing in R.
A <-function(....) {
foreach(i=1:10) %dopar% {
B()
}
}
then I call function A in the console. The problem is I'm calling a function Posdef inside B that is defined in another script file which I source. I had to put Posdef in the list of export argument of foreach: .export=c("Posdef"). However I get the following error:
Error in { : task 3 failed - "could not find function "Posdef""
Why cant R find this defined function?
So I can reproduce this, for the curious:
require(doSNOW)
registerDoSNOW(makeCluster(5, type="SOCK"))
getDoParWorkers()
getDoParName()
getDoParVersion()
fib <- function(n) {
if (n <= 1) { return(1) }
return(fib(n-1) + fib(n-2))
}
my.matrix <- matrix(runif(2500, 10, 50), nrow=50)
calcLotsaFibs <- function() {
result <- foreach(row.num=1:nrow(my.matrix), .export=c("fib", "my.matrix")) %dopar% {
return(Vectorize(fib)(my.matrix[row.num,]))
}
return(result)
}
lotsa.fibs <- calcLotsaFibs()
I have been able to get around this by putting the function in another file and loading that file in the body of the foreach. You could also obviously move the function definition into the body of the foreach itself.
[EDIT -- I had previously suggested that perhaps .export doesn't work properly with function names, but was corrected below.]
The short answer is that this was a bug in parallel backends such as doSNOW, doParallel and doMPI, but it has since been fixed.
The slightly longer answer is that foreach exports functions to the workers using a special "export" environment, not the global environment. That used to cause problems for functions that were created in the global environment, because the "export" environment wasn't in their scope, even though they were now defined in that same "export" environment. Thus, they couldn't see any other functions or variables defined in the "export" environment, such as "Posdef" in your case.
The doSNOW, doParallel and doMPI backends now change the associated environment from the global to the "export" environment for functions exported via ".export", and seems to have resolved these issues.
Quick fix for problem with foreach %dopar% is to reinstall these packages:
install.packages("doSNOW")
install.packages("doParallel")
install.packages("doMPI")
It worked in my case.

Resources