My challenge is to parallel compute a recursive function. However, the recursion is quite deep, and therefore (in my own novice words) there is an issue with allocating a worker when all the workers are busy. in short, it crushes.
Here is some reproducible code. The code is very stupid, but the structure is what counts. This is a simplified version of what is going on.
I work on a windows machine, if the solution is to go linux, just say the word. Because the real function can be quite deep, managing the number of workers that are called for in the upper level will not solve the issue. Is there perhaps a way to know in what level the recursion is?
FUN <- function(optimizer,neighbors,considered,x){
considered <- c(considered,optimizer)
neighbors <- setdiff(x=neighbors,y=considered)
if (length(neighbors)==0) {
# this loop is STUPID, but it is just an example.
z <- numeric(10)
for (i in 1:100)
{
z[i] <- sample(x,1)
}
return(max(z))
} else {
# Something embarrassingly parallel,
# but cannot be vectorized.
z <- numeric(10)
z <- foreach(i=1:10, .combine='c') %dopar%{
FUN(optimizer=neighbors[1],neighbors=neighbors,
considered=considered,x=x)}
return(max(z))
}
}
require(doParallel,quietly=T)
cl <- makeCluster(3)
clusterExport(cl, c("FUN"))
registerDoParallel(cl)
getDoParWorkers()
>FUN(optimizer=1,neighbors=c(2),considered=c(),x=1:500)
[1] 500
>FUN(optimizer=1,neighbors=c(2,3),considered=c(),x=1:500)
Error in { : task 1 failed - "could not find function "%dopar%""
>FUN(optimizer=1,neighbors=c(2,3),considered=c(),x=1:500)
Error in { : task 1 failed - "could not find function "%dopar%""
Is this error really because the recursion is too deep or is it just because you haven't got require(doParallel) in your FUN function? So that when FUN is called on the workers, that instance of R hasn't got that package in its list.
Your first example doesn't do this because its simple enough to not get to the inner %dopar% loop.
Related
I am a beginner in parallel computing with R. I recently started using the foreach and parallel computing using the doParallel package. I have a an issue when i am trying to index a list after splitting a iterator into chunks.
library(itertools)
library(foreach)
library(doParallel)
n=10000
iter = 1:n
cores = detectCores() -1
c = makeCluster(cores)
clusterExport(c,c("mod_function","test_list","cores")
registerDoParallel(c)
output <- foreach(i = isplitVector(iter,chunks = cores)) %dopar%
{
mod_function(test_list[[i]]
}
stopCluster(c)
I get the error
Error in { : task 1 failed - "recursive indexing failed at level 3
I do not get the error when I do not split the iteration vector into chunks. I am not sure what exactly does the isplitVector returns and how I go about indexing the list. This works for me
n=10000
iter = 1:n
cores = detectCores() -1
c = makeCluster(cores)
registerDoParallel(c)
output <- foreach(i = (1:n) %dopar%
{
mod_function(test_list[[i]]
}
stopCluster(c)
Since I have a lot of iterations, I thought the best way to speed up my foreach was to chunk the iterations to the cluster. Any help in this direction would be very helpful. Thanks in advance.
The isplitVector function returns an iterator that returns sub-vectors (or sub-lists) of its first argument. You're getting an error because you're using [[ to index into test_list with a vector. You might be able to use [ instead, but that would fail if mod_function doesn't accept list arguments.
Here's one way to break up your example into cores tasks that works even if mod_function doesn't accept list arguments:
output <-
foreach(s=isplitVector(test_list, chunks=cores), .combine='c') %dopar% {
lapply(s, mod_function)
}
Note that it uses c to combine the lists returned by lapply into a single list.
I'm running the following code (extracted from doParallel's Vignettes) on a PC (OS Linux) with 4 and 8 physical and logical cores, respectively.
Running the code with iter=1e+6 or less, every thing is fine and I can see from CPU usage that all cores are employed for this computation. However, with larger number of iterations (e.g. iter=4e+6), it seems parallel computing does not work in which case. When I also monitor the CPU usage, just one core is involved in computations (100% usage).
Example1
require("doParallel")
require("foreach")
registerDoParallel(cores=8)
x <- iris[which(iris[,5] != "setosa"), c(1,5)]
iter=4e+6
ptime <- system.time({
r <- foreach(i=1:iter, .combine=rbind) %dopar% {
ind <- sample(100, 100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
}
})[3]
Do you have any idea what could be the reason? Could memory be the cause?
I googled around and I found THIS relevant to my question but the point is that I'm not given any kind of error and the OP seemingly has came up with a solution by providing necessary packages inside foreach loop. But no package is used inside my loop, as can be seen.
UPDATE1
My problem still is not solved. As per my experiments, I don't think that memory could be the reason. I have 8GB of memory on the system on which I run the following simple parallel (over all 8 logical cores) iteration:
Example2
require("doParallel")
require("foreach")
registerDoParallel(cores=8)
iter=4e+6
ptime <- system.time({
r <- foreach(i=1:iter, .combine=rbind) %dopar% {
i
}
})[3]
I do not have problem with running of this code but when I monitor the CPU usage, just one core (out of 8) is 100%.
UPDATE2
As for Example2, #SteveWeston (thanks for pointing this out) stated that (in comments) : "The example in your update is suffering from having tiny tasks. Only the master has any real work to do, which consists of sending tasks and processing results. That's fundamentally different than the problem with the original example which did use multiple cores on a smaller number of iterations."
However, Example1 still remains unsolved. When I run it and I monitor the processes with htop, here is what happens in more detail:
Let's name all 8 created processes p1 through p8. The status (column S in htop) for p1 is R meaning that it's running and remains unchanged. However, for p2 up to p8, after some minutes, the status changes to D (i.e. uninterruptible sleep) and, after some minutes, again changes to Z (i.e. terminated but not reaped by its parent). Do you have any idea why this happens?
I think you're running low on memory. Here's a modified version of that example that should work better when you have many tasks. It uses doSNOW rather than doParallel because doSNOW allows you to process the results with the combine function as they're returned by the workers. This example writes those results to a file in order to use less memory, however it reads the results back into memory at the end using a ".final" function, but you could skip that if you don't have enough memory.
library(doSNOW)
library(tcltk)
nw <- 4 # number of workers
cl <- makeSOCKcluster(nw)
registerDoSNOW(cl)
x <- iris[which(iris[,5] != 'setosa'), c(1,5)]
niter <- 15e+6
chunksize <- 4000 # may require tuning for your machine
maxcomb <- nw + 1 # this count includes fobj argument
totaltasks <- ceiling(niter / chunksize)
comb <- function(fobj, ...) {
for(r in list(...))
writeBin(r, fobj)
fobj
}
final <- function(fobj) {
close(fobj)
t(matrix(readBin('temp.bin', what='double', n=niter*2), nrow=2))
}
mkprogress <- function(total) {
pb <- tkProgressBar(max=total,
label=sprintf('total tasks: %d', total))
function(n, tag) {
setTkProgressBar(pb, n,
label=sprintf('last completed task: %d of %d', tag, total))
}
}
opts <- list(progress=mkprogress(totaltasks))
resultFile <- file('temp.bin', open='wb')
r <-
foreach(n=idiv(niter, chunkSize=chunksize), .combine='comb',
.maxcombine=maxcomb, .init=resultFile, .final=final,
.inorder=FALSE, .options.snow=opts) %dopar% {
do.call('c', lapply(seq_len(n), function(i) {
ind <- sample(100, 100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
}))
}
I included a progress bar since this example takes several hours to execute.
Note that this example also uses the idiv function from the iterators package to increase the amount of work in each of the tasks. This technique is called chunking, and often improves the parallel performance. However, using idiv messes up the task indices, since the variable i is now a per-task index rather than a global index. For a global index, you can write a custom iterator that wraps idiv:
idivix <- function(n, chunkSize) {
i <- 1
it <- idiv(n, chunkSize=chunkSize)
nextEl <- function() {
m <- nextElem(it) # may throw 'StopIterator'
value <- list(i=i, m=m)
i <<- i + m
value
}
obj <- list(nextElem=nextEl)
class(obj) <- c('abstractiter', 'iter')
obj
}
The values emitted by this iterator are lists, each containing a starting index and a count. Here's a simple foreach loop that uses this custom iterator:
r <-
foreach(a=idivix(10, chunkSize=3), .combine='c') %dopar% {
do.call('c', lapply(seq(a$i, length.out=a$m), function(i) {
i
}))
}
Of course, if the tasks are compute intensive enough, you may not need chunking and can use a simple foreach loop as in the original example.
At first I thought you were running into memory problems because submitting many tasks does use more memory, and that can eventually cause the master process to get bogged down, so my original answer shows several techniques for using less memory. However, now it sounds like there's a startup and shutdown phase where only the master process is busy, but the workers are busy for some period of time in the middle. I think the issue is that the tasks in this example aren't really very compute intensive, and so when you have a lot of tasks, you start to really notice the startup and shutdown times. I timed the actual computations and found that each task only takes about 3 milliseconds. In the past, you wouldn't get any benefit from parallel computing with tasks that small, but now, depending on your machine, you can get some benefit but the overhead is significant, so when you have a great many tasks you really notice that overhead.
I still think that my other answer works well for this problem, but since you have enough memory, it's overkill. The most important technique to use chunking. Here is an example that uses chunking with minimal changes to the original example:
require("doParallel")
nw <- 8
registerDoParallel(nw)
x <- iris[which(iris[,5] != "setosa"), c(1,5)]
niter <- 4e+6
r <- foreach(n=idiv(niter, chunks=nw), .combine='rbind') %dopar% {
do.call('rbind', lapply(seq_len(n), function(i) {
ind <- sample(100, 100, replace=TRUE)
result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
coefficients(result1)
}))
}
Note that this does the chunking slightly differently than my other answer. It only uses one task per worker by using the idiv chunks option, rather than the chunkSize option. This reduces the amount of work done by the master and is a good strategy if you have enough memory.
I am new to Parallel computing in R.
I have gone through various links on StackOverFlow for the topic and wrote an initial code
library(doParallel)
library(foreach)
detectCores()
## [1] 4
# Create cluster with desired number of cores
cl <- makeCluster(3)
# Register cluster
registerDoParallel(cl)
# Find out how many cores are being used
getDoParWorkers()
My objective is to do a repetitive calculation on each row, my function looks something like
func2<-function(i)
{
msgbody<-tolower(as.character(purchase$msg_body[i]))
purchase$category[i]<-category_fun(i,msgbody)
}
For this purpose I have written a foreach loop
foreach(i = 1:nrow(purchase)) %dopar% func2(i)
But, the issue is that "func2" is supposed to write back to dataframe but it is not writing anything back, all the entries are same as old
Appreciate you help.
I believe this would work better in the scenario you're indicating. You can write a function that works on each msg_body string:
func2 <- function(msg_body)
{
return(category_fun(i,tolower(as.character(purchase$msg_body[i])))
}
result <- foreach(i=1:nrow(purchase),.combine=c) %dopar% {func2(purchase$msg_body[i]}
purchase$category <- result
I do think you'll be better off using apply() to solve this though.
Iterpc begins each loop from the same point. This has created an amusing, though frustrating issue, illustrated below:
####Load Packages:
library("doParallel")
library("foreach")
library("iterpc")
####Define variables:
n<-2
precision<-0.1
support<-matrix(seq(0+precision,1-precision,by=precision), ncol=1)
nodes<-2 #preparing for multicore.
cl<-makeCluster(nodes)
####Prep iterations
I<-iterpc(table(support),n, ordered=TRUE,replace=FALSE)
steps<-((factorial(length(support)) / factorial(length(support)-n)))/n
####Run loop to get the combined values:
registerDoParallel(cl)
support_n<-foreach(m=1:n,.packages="iterpc", .combine='cbind') %dopar% {
t(getnext(I,steps))
} #????
Which returns
support_n
I was hoping that this would run each of the sets in parallel, one half of the permutations assigned to each node. However, it only does the first half of the permutations... twice. ([,1] is equal to [,37].) How do I get it to return all of permutations and combine them in parallel?
Assume there will be an arbitrarily large number of permutations so memory management and speed are nontrivial.
Previous research:All possible permutations for large n
Just for anyone, who will come here by searching "foreach iterpc R", as i did.
Your approach marked as accepted answer does not really differ much from
result <- foreach(a=1:10) %dopar% {
a
}
because a=getnext(I,d=(2*steps)) will simply return the first 2*steps combinations and then foreach package will iterate in parallel over this combinations.
When you have very large number of combinations returned by iterpc (which it is build for) you cannot in fact use such an approach.
In that case the only thing i believe one could do is to write iterator wrapper over the iterpc object.
# register parallel backend
library(doParallel)
registerDoParallel(cores = 3)
#create iterpc object
library(iterpc)
combinations <- iterpc(4,2)
library(iterators)
iterpc_iterator <- function(iterpc_object, iteration_length) {
# one's own function of nextElement() because iterpc
# returns NULL on finished iteration on subsequent getnext() invocation
# but not 'StopIteration'
nextEl <- function() {
if (iteration_length > 0)
iteration_length <<- iteration_length - 1
else
stop('StopIteration')
getnext(iterpc_object)
}
obj <- list(nextElem=nextEl)
class(obj) <- c('irep', 'abstractiter', 'iter')
obj
}
it <- iterpc_iterator(combinations, getlength(combinations))
library(foreach)
result <- foreach(i=it) %dopar% {
i
}
You can simply use iterpc::iter_wrapper.
The relevant line from your example:
support_n <-foreach(a = iter_wrapper(I), .combine='cbind') %dopar% a
After further investigation I believe the following does in fact execute the command in parallel.
registerDoParallel(cl)
system.time(
support_n<-foreach(a=getnext(I,d=(2*steps)),.combine='cbind') %dopar% a
)
support_n<-t(support_n)
Thank you for your assistance.
I'm trying to run a NetLogo simulation (using RNetLogo package) in R using parallel processing on my laptop. I'm trying to assess "t-feeding of females" using 3 (i.e., 0, 25, and 50) different "minimum-separation" values. For each "minimum-separation" value, I'd like to replicate the simulation 10 times. I can run everything correctly just using lapply but I'm having trouble with parLapply. I've just started using the package "parallel" so I'm sure it is something in the syntax.
#Set up clusters for parallel
processors <- detectCores()
cl <- makeCluster(processors)
#Simulation
sim3 <- function(min_sep) {
NLCommand("set minimum-separation ", min_sep, "setup")
ret <- NLDoReport(720, "go", "[t-feeding] of females", as.data.frame=TRUE)
tot <- sum(ret[,1])
return(tot)
}
#Replicate simulations 10 times using lapply and create boxplots. This one works.
rep.sim3 <- function(min_sep, rep) {
return(
lapply(min_sep, function(min_sep) {
replicate(rep, sim3(min_sep))
})
)
}
d <- seq(0,50,25)
res <- rep.sim3(d,10)
boxplot(res,names=d, xlab="Minimum Separation", ylab="Time spent feeding")
#Replicate simulations 10 times using parLapply. This one does not work.
rep.sim3 <- function(min_sep, rep) {
return(
parLapply(cl, min_sep, function(min_sep) {
replicate(rep, sim3(min_sep))
})
)
}
d <- seq(0,50,25)
res <- rep.sim3(d,10)
# Error in checkForRemoteErrors(val) : 3 nodes produced errors; first error: could not find function "sim3"
#Replicate simulations 10 times using parLapply. This one does work but creates a list of the wrong length and therefore the boxplot cannot be plotted correctly.
rep.sim3 <- function(min_sep, rep) {
return(
parLapply(cl, replicate(rep, d), sim3))
}
d <- seq(0,50,25)
res <- rep.sim3(d,10)
Ideally I'd like to make the first parLapply work. Alternatively, I guess I could modify res from the parLapply that works so that the list has a length of max_sep instead of 30. However, I can't seem to do that. Any help would be much appreciated!
Thanks in advance.
You need to initialize the cluster workers before executing rep.sim3. The error message indicates that your workers can't execute the sim3 function because you haven't exported it to them. Also, I noticed that you haven't loaded the RNetlogo package on the workers, either.
The easiest way to initialize the workers is with the clusterEvalQ and clusterExport functions:
clusterEvalQ(cl, library(RNetLogo))
clusterExport(cl, 'sim3')
Note that you shouldn't do this in your rep.sim3 function, since that would be inefficient and unnecessary. Do it just once after creating the cluster object and sim3 has been defined.
This initialization is necessary because the workers started via makeCluster don't know anything about your variables or functions, or anything else about your R session. And parLapply doesn't analyze the function that you pass to it any more than lapply does. The difference is that lapply executes in your local R session where sim3 is defined and the RNetLogo package is loaded. parLapply executes the specified function in remote R sessions that have not been initialized by executing your R script.