Saving within a foreach loop (R package)? - r

I am using the foreach package to run a monte carlo simulation in parallel using HPC. I have no issue with foreach returning the results I want, but I am wondering if it is possible to periodically save the workspace. The reason I ask is that I have to specify a walltime (after which the job is terminated) and so I risk 1. setting the walltime too low and losing everything before the foreach loop has finished and saved my output or 2. setting a conservative walltime and wasting my quota. Ideally, I'd like to be able to periodically save my output, in which case I still have data if the job terminates.
For example:
results <- foreach (ii = 1:100, .combine = rbind) %dopar% {
res1 <- ii^2
res2 <- matrix(sample(1:9,9,replace = F),nrow = 3, ncol = 3)
if (ii %% 10 == 0){
save.image("temp_save.RData")
}
results <- list(res1,res2)
}
save.image("final_save.RData")
> load("~/final_save.RData")
> results
[,1] [,2]
result.1 1 Integer,9
result.2 4 Integer,9
result.3 9 Integer,9
result.4 16 Integer,9
...
but 'temp_save' is empty, presumably because 'results' is only generated once the foreach loop has finished. Is there any way to access and save these data before the foreach loop has completed? Thanks for your help.

I had a very similar problem to this one, and solved it using a save() method as mentioned in the comments. Specifically, I like intermittently "saving" my progress when working with large datasets so I don't waste computing resources if something goes wrong (e.g. run out of walltime). I have personally found that save.image() can be quite unreliable when used in complex code, and especially in an HPC environment. My code is too long to replicate here, but this was the general (untested) method:
# Set up the parallel backend
library(doParallel)
library(foreach)
cl <- parallel::makeCluster(4)
doParallel::registerDoParallel(cl)
# Set up a scratch directory for your intermediate files
intermediate_directory <- 'INTER_DIR'
if (!dir.exists(intermediate_directory) {
dir.create(intermediate_directory)
}
# Run your parallel loop
foreach(i = 1:100, .export = ls(environment())) %dopar% {
# Create a unique filename for each interation of the parallel loop
each_filename <- paste0('RESULT_', as.character(i), '.rda')
each_filepath <- file.path(intermediate_directory, each_filename)
# If the file exists, skip to the next iteration
if (file.exists(each_filepath)) {
next
}
# Otherwise, run your code
each_result <- jitter(i, factor = 10)
# Save the result individually
save(each_result, file = each_filepath)
# OR, save the contents of an environment:
save(list = ls(environment()), file = each_filepath)
}
The results can be re-loaded into a list form after the loop has finished, which is what foreach should return anyway. Something like:
fls <- list.files(intermediate_directory, pattern = '.rda')
result_list <- lapply(fls, function(x) get(eval(load(x)))

Related

Keep track of "standard" loop nested in a parallelized loop [duplicate]

I have been trying to get some output displayed from the foreach loop R. A reproducible example is
cl <- makeCluster(2)
registerDoParallel(cl)
ptm1 <- proc.time()
foreach (i = 1:50, .packages = c("MASS"), .combine='+') %dopar% {
ginv(matrix(rexp(1000000, rate=.001), ncol=1000))
if (i >49){
cat("Time taken", proc.time() - ptm1)
}
}
I expect the time taken to be displayed. But, this does not display anything. Can you please suggest ways of capturing the messages in the foreach loop and displaying at the end of the loop.
I'm not sure if there's a way to output to the screen, but you can easily output to a log file using the sink function like so
ptm1 <- proc.time()
foreach (i = 1:50, .packages = c("MASS"), .combine='+') %dopar% {
ginv(matrix(rexp(1000000, rate=.001), ncol=1000))
if (i >49){
sink("Report.txt", append=TRUE) #open sink file and add output
cat("Time taken", proc.time() - ptm1)
}
}
EDIT : As #Roland points out, this can be dangerous if you want to capture output from every iteration and not just the final one, because you don't want the workers to clobber each other. He links to a better alternative for this scenario in his comment.

With foreach parallel, when I run for more than 100 loops the list structure fails [duplicate]

I have found a feature/bug in the foreach package, which I do not understand. Perhaps someone can explain me this behaviour:
I created a for-loop with the foreach package (I use them together with mutlicore calculations, but here just in a sequentiell example, the bug appears in both variants). This loop runs r times. In every run a list with c entries is returned. So I expect a list with r entries, and every entry consists of c lists.
My code was the following one:
library(foreach)
clusters <- 10
runs <- 100
temp <- foreach(r = 1:runs,
.combine = 'list',
.multicombine = TRUE) %do% {
signal_all <- lapply(1:clusters, function(x){
return(1)
})
return(signal_all)
} ## end do
With this code, all works as expected, see the following picture:
But when increasing runs <- 101, the output temp is this:
The expected list structure is destroyed. But when commenting out the line .combine = 'list' all works as expected.
library(foreach)
clusters <- 10
runs <- 100
temp <- foreach(r = 1:runs,
.multicombine = TRUE) %do% {
signal_all <- lapply(1:clusters, function(x){
return(1)
})
return(signal_all)
} ## end do
Can someone explain this behaviour?
Thanks for any help!
Meanwhile I have found a solution.
The foreach function knows that some comine-functions (e.g. c or cbind) take many arguments, and will call them with up to 100 arguments (by default) in order to improve performance. With the argument .maxcombine you can set them manually.
library(foreach)
clusters <- 10
runs <- 101
temp <- foreach(r = 1:runs,
.combine = 'list',
.maxcombine = runs,
.multicombine = T) %do% {
signal_all <- lapply(1:clusters, function(x){
return(1)
})
return(signal_all)
} ## end do

Parallelized `Find` loop in R

There are several packages in R to simplify running code in parallel, like foreach and future. Most of these have constructs which are like lapply or a for loop: they carry on until all the tasks have finished.
Is there a simple parallel version of Find? That is, I would like to run several tasks in parallel. I don't need all of them to finish, I just need to get the first one that finishes (maybe with a particular result). After that the other tasks can be killed, or left to finish on their own.
Conceptual code:
hunt_needle <- function (x, y) x %in% (y-1000):y
x <- sample.int(1000000, 1)
result <- parallel_find(seq(1000, 1000000, 1000), hunt_needle)
# should return the first value for which hunt_needle is true
You can use shared memory so that processes can communicate with one another.
For that, you can use package bigstatsr (disclaimer: I'm the author).
Choose a block size and do:
# devtools::install_github("privefl/bigstatsr")
library(bigstatsr)
# Data example
cond <- logical(1e6)
cond[sample(length(cond), size = 1)] <- TRUE
ind.block <- bigstatsr:::CutBySize(length(cond), block.size = 1000)
cl <- parallel::makeCluster(nb_cores())
doParallel::registerDoParallel(cl)
# This value (in an on-disk matrix) is shared by processes
found_it <- FBM(1, 1, type = "integer", init = 0L)
library(foreach)
res <- foreach(ic = sample(rows_along(ind.block)), .combine = 'c') %dopar% {
if (found_it[1]) return(NULL)
ind <- bigstatsr:::seq2(ind.block[ic, ])
find <- which(cond[ind])
if (length(find)) {
found_it[1] <- 1L
return(ind[find[1]])
} else {
return(NULL)
}
}
parallel::stopCluster(cl)
# Verification
all.equal(res, which(cond))
Basically, when a solution is found, you don't need to do some computations anymore, and others know it because you put a 1 in found_it which is shared between all processes.
As your question is not reproducible and I don't understand everything you need, you may have to adapt this solution a little bit.

Parallel process in chunks giving no performance benefits

I have a very huge list ( huge_list ) . A function (inner_fun) is called for each value of the list. Inner_fun takes around .5 seconds.output of inner_fun is a simple numeric vector of size 3. I am trying to parallelise this approach. After going through many articles , it was mentioned that it is better to divide in chunks when the parallel function is very quick. So i divided it based on cores. But there is no time benefit. I am not able to understand the concept here . Can anyone give few insights on this. My major concern is that if i am doing something wrong with the code. I am not posting exact codes here. but i have tried to replicate as much as possible
few observations :
dummy_fun and dummy_fun2 takes around 10 hrs with parallel kept as
11
with no parallel , this goes around 20 hrs.
with parallel=2 ,it completes in 15 hrs
I am using 12 cores , 60 GB RAM , ubuntu machine
Code to make cluster
no_of_clusters<-detectCores()-1
cl <- makeCluster(no_of_clusters) ; registerDoParallel(cl) ;
clusterExport(cl, varlist=c("arg1","arg2","inner_fun"))
Function without chunks
dummy_fun<-function(arg1,arg2,huge_list){
g <- foreach (i = 1: length(huge_list),.combine=rbind,
.multicombine=TRUE) %dopar% {
inner_fun(i,arg1,arg2,huge_list[i])
}
return(g)
}
**Functions with chunks **
dummy_fun2<-function(arg1,arg2,huge_list){
il<-1:length(huge_list)
il2<-split(il, ceiling(seq_along(il)/(length(il)/(detectCores()-1))))
g <- foreach ( i= il2 , .combine=rbind,.multicombine=TRUE) %dopar% {
ab1<-lapply(i,function(li)
{
inner_fun(i,arg1,arg2,huge_list(i))
}
)
do.call(rbind,ab1)
}
return(g)
}
You got the chunks wrong. It's not about dividing the indices in chunks of length no_of_clusters but rather to divide them in no_of_clusters chunks.
Try this out:
dummy_fun2 <- function(arg1, arg2, huge_list, inner_fun, ncores) {
cl <- parallel::makeCluster(ncores)
doParallel::registerDoParallel(cl)
on.exit(parallel::stopCluster(cl), add = TRUE)
L <- length(huge_list)
inds <- split(seq_len(L), sort(rep_len(seq_len(NCORES), L)))
foreach(l = seq_along(inds), .combine = rbind) %dopar% {
ab1 <- lapply(inds[[l]], function(i) {
inner_fun(i, arg1, arg2, huge_list[i])
})
do.call(rbind, ab1)
}
}
Further remarks:
it's often useless to use more than half of the cores you have on your computer.
the option .multicombine is automatically used with rbind. But the .maxcombine is really important (need more than 100). Here, we use lapply for the sequential part so this remark doesn't matter.
it's useless to have many exports when using foreach, it already exports what is necessary from the environment of dummy_fun2.
are you sure you want to use huge_list[i] (get a list of one element) rather than huge_list[[i]] (get the i-th element of the list)?

Shared memory in parallel foreach in R

Problem Description:
I have a big matrix c, loaded in RAM memory. My goal is through parallel processing to have read only access to it. However when I create the connections either I use doSNOW, doMPI, big.matrix, etc the amount to ram used increases dramatically.
Is there a way to properly create a shared memory, where all the processes may read from, without creating a local copy of all the data?
Example:
libs<-function(libraries){# Installs missing libraries and then load them
for (lib in libraries){
if( !is.element(lib, .packages(all.available = TRUE)) ) {
install.packages(lib)
}
library(lib,character.only = TRUE)
}
}
libra<-list("foreach","parallel","doSNOW","bigmemory")
libs(libra)
#create a matrix of size 1GB aproximatelly
c<-matrix(runif(10000^2),10000,10000)
#convert it to bigmatrix
x<-as.big.matrix(c)
# get a description of the matrix
mdesc <- describe(x)
# Create the required connections
cl <- makeCluster(detectCores ())
registerDoSNOW(cl)
out<-foreach(linID = 1:10, .combine=c) %dopar% {
#load bigmemory
require(bigmemory)
# attach the matrix via shared memory??
m <- attach.big.matrix(mdesc)
#dummy expression to test data aquisition
c<-m[1,1]
}
closeAllConnections()
RAM:
in the image above, you may find that the memory increases a lot until foreach ends and it is freed.
I think the solution to the problem can be seen from the post of Steve Weston, the author of the foreach package, here. There he states:
The doParallel package will auto-export variables to the workers that are referenced in the foreach loop.
So I think the problem is that in your code your big matrix c is referenced in the assignment c<-m[1,1]. Just try xyz <- m[1,1] instead and see what happens.
Here is an example with a file-backed big.matrix:
#create a matrix of size 1GB aproximatelly
n <- 10000
m <- 10000
c <- matrix(runif(n*m),n,m)
#convert it to bigmatrix
x <- as.big.matrix(x = c, type = "double",
separated = FALSE,
backingfile = "example.bin",
descriptorfile = "example.desc")
# get a description of the matrix
mdesc <- describe(x)
# Create the required connections
cl <- makeCluster(detectCores ())
registerDoSNOW(cl)
## 1) No referencing
out <- foreach(linID = 1:4, .combine=c) %dopar% {
t <- attach.big.matrix("example.desc")
for (i in seq_len(30L)) {
for (j in seq_len(m)) {
y <- t[i,j]
}
}
return(0L)
}
## 2) Referencing
out <- foreach(linID = 1:4, .combine=c) %dopar% {
invisible(c) ## c is referenced and thus exported to workers
t <- attach.big.matrix("example.desc")
for (i in seq_len(30L)) {
for (j in seq_len(m)) {
y <- t[i,j]
}
}
return(0L)
}
closeAllConnections()
Alternatively, if you are on Linux/Mac and you want a CoW shared memory, use forks. First load all your data into the main thread, and then launch working threads (forks) with general function mcparallel from the parallel package.
You can collect their results with mccollect or with the use of truly shared memory using the Rdsm library, like this:
library(parallel)
library(bigmemory) #for shared variables
shared<-bigmemory::big.matrix(nrow = size, ncol = 1, type = 'double')
shared[1]<-1 #Init shared memory with some number
job<-mcparallel({shared[1]<-23}) #...change it in another forked thread
shared[1,1] #...and confirm that it gets changed
# [1] 23
You can confirm, that the value really gets updated in backgruound, if you delay the write:
fn<-function()
{
Sys.sleep(1) #One second delay
shared[1]<-11
}
job<-mcparallel(fn())
shared[1] #Execute immediately after last command
# [1] 23
aaa[1,1] #Execute after one second
# [1] 11
mccollect() #To destroy all forked processes (and possibly collect their output)
To control for concurency and avoid race conditions use locks:
library(synchronicity) #for locks
m<-boost.mutex() #Lets create a mutex "m"
bad.incr<-function() #This function doesn't protect the shared resource with locks:
{
a<-shared[1]
Sys.sleep(1)
shared[1]<-a+1
}
good.incr<-function()
{
lock(m)
a<-shared[1]
Sys.sleep(1)
shared[1]<-a+1
unlock(m)
}
shared[1]<-1
for (i in 1:5) job<-mcparallel(bad.incr())
shared[1] #You can verify, that the value didn't get increased 5 times due to race conditions
mccollect() #To clear all threads, not to get the values
shared[1]<-1
for (i in 1:5) job<-mcparallel(good.incr())
shared[1] #As expected, eventualy after 5 seconds of waiting you get the 6
#[1] 6
mccollect()
Edit:
I simplified dependencies a bit by exchanging Rdsm::mgrmakevar into bigmemory::big.matrix. mgrmakevar internally calls big.matrix anyway, and we don't need anything more.

Resources