Suppose that I want to do something in R that would normally (in one process/thread) look like this:
for(i in 1:2) {
for(j in 1:2) {
#Do some stuff here
Using R's new package parallel, on a quad core machine, can I do the following?
innerLoop<-function() {
#Do some stuff here
outerLoop<-function() {
result<, parLapply(cluster, c(1:2), innerLoop))
final.result<, parLapply(cluster, c(1:2), outerLoop))
Is this possible with the parallel package that comes with R-2.14.0?
Yes, you can do that. For the first level of parallelization you have to use distributed memory technology (as makeCluster() from the snow package) and in the second level of parallelization you have to use shared memory technology (multicore package, mclapply()).
Here is a simple code example:
cl <- makeCluster(2)
inner <- function(x){
pid <- Sys.getpid()
name <-["nodename"]
str <- paste("This is R running on", name, "with PID", pid, "!")
outer <- function(x, cores, funceval){
mclapply(1:cores, funceval)
parLapply(cl, 1:length(cl), outer, 2, inner)
In the output you should see different machine names and different PIDs!
There are several packages in R to simplify running code in parallel, like foreach and future. Most of these have constructs which are like lapply or a for loop: they carry on until all the tasks have finished.
Is there a simple parallel version of Find? That is, I would like to run several tasks in parallel. I don't need all of them to finish, I just need to get the first one that finishes (maybe with a particular result). After that the other tasks can be killed, or left to finish on their own.
Conceptual code:
hunt_needle <- function (x, y) x %in% (y-1000):y
x <-, 1)
result <- parallel_find(seq(1000, 1000000, 1000), hunt_needle)
# should return the first value for which hunt_needle is true
You can use shared memory so that processes can communicate with one another.
For that, you can use package bigstatsr (disclaimer: I'm the author).
Choose a block size and do:
# devtools::install_github("privefl/bigstatsr")
# Data example
cond <- logical(1e6)
cond[sample(length(cond), size = 1)] <- TRUE
ind.block <- bigstatsr:::CutBySize(length(cond), block.size = 1000)
cl <- parallel::makeCluster(nb_cores())
# This value (in an on-disk matrix) is shared by processes
found_it <- FBM(1, 1, type = "integer", init = 0L)
res <- foreach(ic = sample(rows_along(ind.block)), .combine = 'c') %dopar% {
if (found_it[1]) return(NULL)
ind <- bigstatsr:::seq2(ind.block[ic, ])
find <- which(cond[ind])
if (length(find)) {
found_it[1] <- 1L
} else {
# Verification
all.equal(res, which(cond))
Basically, when a solution is found, you don't need to do some computations anymore, and others know it because you put a 1 in found_it which is shared between all processes.
As your question is not reproducible and I don't understand everything you need, you may have to adapt this solution a little bit.
I have a generic chunking function that breaks big calls into smaller pieces and runs them in parallel.
chunk_it <- function(d, n, some_fun) {
# run n chunks of d in parallel
dat <- foreach(...) %doPar% {
I want to make it so that this generic chunking function can identify if it's being called by a process that's already in parallel (chunked in my terminology)
chunked_highlevel <- function(d, n, some_fun) {
# run n chunks of d in parallel
chunk_it(lowerlevel_d, n) # do not chunk!
What I would like to happen here is that if I have chunked the process at a higher level, that it does not activate the chunking function at the lower level.
Is there a way to identify when you're already inside a parallel process?
So, that we could code like this:
chunk_it <- function(d, n, some_fun) {
# run n chunks of d in parallel
if(!already_parallel) {
dat <- foreach(...) %doPar% {
} else {
dat <- some_fun()
I don't think there's an official way of doing this. However, in general there should be code evident in the call stack which makes it obvious whether you're in parallel code. What I've got so far looks like this. It seems to work for doSNOW with either MPI or SOCK, but will probably need adjustment for other packages that implement %dopar%. It's also dependent on some internal details of snow which may be subject to change in future versions.
my_fn <- function(bit) {
is_parallel <- any(unlist(lapply(sys.calls(), function(cal) {
as.character(cal[[1]]) %in% c("slaveLoop", "%dopar%")
foreach(x = 1:2) %do% my_fn(x)
# [[1]]
# [1] FALSE
# [[2]]
# [1] FALSE
cl <- makeCluster(2)
foreach(x = 1:2) %dopar% my_fn(x)
# [[1]]
# [1] TRUE
# [[2]]
# [1] TRUE
The future package (I'm the author) has built in support for nested parallelism so that you do not have to worry about it as a developer while still giving the end user full power to control how and where parallelization is taking place.
Here's an example from one of the future vignettes:
x <- listenv()
for (ii in 1:3) {
x[[ii]] %<-% {
y <- listenv()
for (jj in 1:3) {
y[[jj]] %<-% { ii + jj/10 }
## [1] 1.1 1.2 1.3 2.1 2.2 2.3 3.1 3.2 3.3
Note how there are two-layers of future assignments (%<-%). The default is to always process them sequentially unless specificiation says otherwise. For instance, to process the outer loop of future assignments in parallel on your local machine, use:
This will cause x[[ii]] %<-% { ... } for ii = 1, 2, 3 to run in parallel, while the contained y[[jj]] %<-% { ... } will run sequentially. The equivalent fully explicit setting for this is:
plan(list(multiprocess, sequential))
Now, if you want to run the outer loop of futures (x[[ii]]) sequentially and the inner loop of futures (y[[jj]]) in parallel, you can specify:
plan(list(sequential, multiprocess))
before running the code.
BTW, the number of parallel processes used with multiprocess is future::availableCores(). Think of it as parallel::detectCores() but that is also agile to mc.cores, HPC cluster environments etc. Importantly, future::availableCores() will return 1 if it's already running in parallel ("is a parallel child"). This means that if you do:
plan(list(multiprocess, multiprocess))
the inner layer of futures will actually only see a single core. You can think of this as a built-in automatic protection from creating a huge number of parallel processes by mistake through recursive parallelism.
You can force a different setting though (but not recommended). For instance, say you want the outer layer to run four parallel tasks at the same time, and each of those tasks to run two parallel tasks at the same time (on your local machine), then you can use:
tweak(multiprocess, workers = 4L),
tweak(multiprocess, workers = 2L)
That will run at most 4*2 = 8 parallel tasks at the same time (plus the master process).
If you have a set of machines available, you can do:
tweak(cliuster, workers = c("machine1", "machine2", "machine3")),
that will distribute the outer layer of futures (x[[ii]]) to those three machines, and the inner layer of futures (y[[ii]]) will run in parallel using all the available cores on those machines.
Note how the code doesn't change - only the settings (= plan() call). This is in the spirit of "write once, run wherever". There are many different future-strategy setups you can use; see the vignettes of the future package.
Now, what if you wanna use foreach()? You can use the doFuture %dopar% adapter that works on top of the future framework. For example,
some_fun <- function(j) {
list(j = j, pid.j = Sys.getpid())
my_fun <- function(i) {
y <- foreach(j = 1:3) %dopar% { some_fun(j = j) }
list(i = i, pid.i = Sys.getpid(), y = y)
x <- foreach(i = 1:3) %dopar% { my_fun(i = i) }
Run the above and look at str(x) and its different PIDs for the different plan():s exemplified above. That'll illustrate what's going on.
Hope this helps
I have a process I want to do in parallel but I fail due to some strange error. Now I am considering to combine, and calculate the failing task on the master CPU. However I don't know how to write such a function for .combine.
How should it be written?
I know how to write them, for example this answer provides an example, but it doesn't provide how to handle with failing tasks, neither repeating a task on the master.
I would do something like:
foreach(i=1:100, .combine = function(x, y){tryCatch(?)} %dopar% {
However, how do I use the input of that task in the .combine function (if it can be done)? Or should I provide inside the %dopar% to return a flag or a list to calculate it?
To execute tasks in the combine function, you need to include extra information in the result object returned by the body of the foreach loop. In this case, that would be an error flag and the value of i. There are many ways to do this, but here's an example:
comb <- function(results, x) {
i <- x$i
result <- x$result
if (x$error) {
cat(sprintf('master computing failed task %d\n', i))
# Could call function repeatedly until it succeeds,
# but that could hang the master
result <- try(fails_randomly(i))
results[i] <- list(result) # guard against a NULL result
r <- foreach(i=1:100, .combine='comb',
.init=vector('list', 100)) %dopar% {
list(error=FALSE, i=i, result=fails_randomly(i))
error=function(e) {
list(error=TRUE, i=i, result=e)
I'd be tempted to deal with this problem by executing the parallel loop repeatedly until all the tasks have been computed:
x <- rnorm(100)
results <- lapply(x, function(i) simpleError(''))
# Might want to put a limit on the number of retries
repeat {
ix <- which(sapply(results, function(x) inherits(x, 'error')))
if (length(ix) == 0)
cat(sprintf('computing tasks %s\n', paste(ix, collapse=',')))
r <- foreach(i=x[ix], .errorhandling='pass') %dopar% {
results[ix] <- r
Note that this solution uses the .errorhandling option which is very useful if errors can occur. For more information on this option, see the foreach man page.
Let's have a cluster made by makeCluster function.
Does it need same type of synchronization like barrier/semaphore synchronization? Or I can run code in parallel (for example, by parLapply) from different section of code runned parallely?
Thank you.
P.S. a little example (something like that).
func1 <- function(count, cl)
parSapply(cl, 1:count, function(i) { i^count; });
func2 <- function(cl1, cl2, count)
clusterExport(cl1, cl2);
parLapply(cl1, 1:count, function(i) { func1(i, cl2); });
maxThreads <- 4;
cl1 <- makeCluster(4);
cl2 <- makeCluster(4);
func2(cl1, cl2);
Problem Description:
I have a big matrix c, loaded in RAM memory. My goal is through parallel processing to have read only access to it. However when I create the connections either I use doSNOW, doMPI, big.matrix, etc the amount to ram used increases dramatically.
Is there a way to properly create a shared memory, where all the processes may read from, without creating a local copy of all the data?
libs<-function(libraries){# Installs missing libraries and then load them
for (lib in libraries){
if( !is.element(lib, .packages(all.available = TRUE)) ) {
library(lib,character.only = TRUE)
#create a matrix of size 1GB aproximatelly
#convert it to bigmatrix
# get a description of the matrix
mdesc <- describe(x)
# Create the required connections
cl <- makeCluster(detectCores ())
out<-foreach(linID = 1:10, .combine=c) %dopar% {
#load bigmemory
# attach the matrix via shared memory??
m <- attach.big.matrix(mdesc)
#dummy expression to test data aquisition
in the image above, you may find that the memory increases a lot until foreach ends and it is freed.
I think the solution to the problem can be seen from the post of Steve Weston, the author of the foreach package, here. There he states:
The doParallel package will auto-export variables to the workers that are referenced in the foreach loop.
So I think the problem is that in your code your big matrix c is referenced in the assignment c<-m[1,1]. Just try xyz <- m[1,1] instead and see what happens.
Here is an example with a file-backed big.matrix:
#create a matrix of size 1GB aproximatelly
n <- 10000
m <- 10000
c <- matrix(runif(n*m),n,m)
#convert it to bigmatrix
x <- as.big.matrix(x = c, type = "double",
separated = FALSE,
backingfile = "example.bin",
descriptorfile = "example.desc")
# get a description of the matrix
mdesc <- describe(x)
# Create the required connections
cl <- makeCluster(detectCores ())
## 1) No referencing
out <- foreach(linID = 1:4, .combine=c) %dopar% {
t <- attach.big.matrix("example.desc")
for (i in seq_len(30L)) {
for (j in seq_len(m)) {
y <- t[i,j]
## 2) Referencing
out <- foreach(linID = 1:4, .combine=c) %dopar% {
invisible(c) ## c is referenced and thus exported to workers
t <- attach.big.matrix("example.desc")
for (i in seq_len(30L)) {
for (j in seq_len(m)) {
y <- t[i,j]
Alternatively, if you are on Linux/Mac and you want a CoW shared memory, use forks. First load all your data into the main thread, and then launch working threads (forks) with general function mcparallel from the parallel package.
You can collect their results with mccollect or with the use of truly shared memory using the Rdsm library, like this:
library(bigmemory) #for shared variables
shared<-bigmemory::big.matrix(nrow = size, ncol = 1, type = 'double')
shared[1]<-1 #Init shared memory with some number
job<-mcparallel({shared[1]<-23}) #...change it in another forked thread
shared[1,1] #...and confirm that it gets changed
# [1] 23
You can confirm, that the value really gets updated in backgruound, if you delay the write:
Sys.sleep(1) #One second delay
shared[1] #Execute immediately after last command
# [1] 23
aaa[1,1] #Execute after one second
# [1] 11
mccollect() #To destroy all forked processes (and possibly collect their output)
To control for concurency and avoid race conditions use locks:
library(synchronicity) #for locks
m<-boost.mutex() #Lets create a mutex "m"
bad.incr<-function() #This function doesn't protect the shared resource with locks:
for (i in 1:5) job<-mcparallel(bad.incr())
shared[1] #You can verify, that the value didn't get increased 5 times due to race conditions
mccollect() #To clear all threads, not to get the values
for (i in 1:5) job<-mcparallel(good.incr())
shared[1] #As expected, eventualy after 5 seconds of waiting you get the 6
#[1] 6
I simplified dependencies a bit by exchanging Rdsm::mgrmakevar into bigmemory::big.matrix. mgrmakevar internally calls big.matrix anyway, and we don't need anything more.