Details of foreach + doMPI: Multiple foreach loops in sequence in the same script? - r

In R, I am using the package foreach with doMPI in a wrapper script run an external model many times in parallel on a cluster. Each MPI process gets one parameter point for which to execute the model.
However, to run this, there's also a bit of pre- and post-processing -- making some folders first, and aggregating the results at the end. This is also parallelisable, but not with the same number of jobs as the main model runs.
The way I've handled it is by using multiple subsequent foreach loops in the script. First one that makes the folders, then when that's ended, another to run the model. And this is where, despite consulting the documentation, I am a little green on how the doMPI package works in detail, and how MPI works more generally, I guess: Am I guaranteed that all MPI processes in loop 1 finish before any work is done in loop 2? This would be a necessity for the script logic. If not, are there any magic MPI commands I could use to enforce my desired behaviour? Does it make any sense to close and reopen the cluster, even? Or is that stupid? Like,
foreach (i1=1:N1) %dopar% {
loopy loop number 1
}
# Stop the MPI cluster and start it again:
closeCluster(cl)
cl = startMPIcluster()
registerDoMPI(cl)
foreach (i2=1:N2) %dopar% {
loopy loop number 2
}
Thanks!

Related

Julia parallel processing on PBS multiple nodes

I am looking for a way to run simple parallel processes (one function run multiple times with different arguments, no communication between process) across multiple nodes in a PBS cluster.
Currently I am able to run it on a single node setting the number of threads with an environment variable in the PBS script, and using a for loop with #thread.threads
I have found references to clustermanager.jl, but no clear working example on how to use it on PBS.
For example: does addprocs_pbs in the file take care also of the script part, or do I still need to run a pbs script as usual, and this function is called inside the julia file?
This is the code structure I am using now. Ideally, it would stay more or less the same but parallel process could run across multiple nodes.
using JLD
include("path/to/library/with/function.jl")
seed = 342;
n = 18; # number of simulations
changing_parameter = [1,2,3,4];
input_file = "some file"
CSV.read(string(input_files_folder,input_file));
# I should also parallelise this external for loop
# it currently runs 18 simulations per run, and saves the results each time
for P in changing_parameter
Random.seed!(seed);
seeds = rand(1:100000,n)
results = []
Threads.#threads for i = 1:n
push!(results,function(some_fixed_parameters, P=P, seed=seeds[i]);)
end
# get the results
# save the results
JLD.save(filename,to_save,compress=true)
end
For distributed computing you normally need to use multiprocessing rather than multi-threading (although it is OK to have multi-threaded parallel processes if you need).
Hence, what you need to do is to use the ClustersManagers library to use the cluster manager to allocate processes for your Julia cluster.
I have been using Julia with Cray clusters using SLURM so not exactly PBS, however I since your question remain unanswered here is my working code. You will use addprocs_pbs that looks to have a very similar structure.
using ClusterManagers
addprocs_slurm(36,job_name="jobname", account="some_acc_name", time="01:00:00", exename="/lustre/tetyda/home/pszufe/julia/usr/bin/julia")
Once you add the worker processes all what remains is to use the Distributed package to orchestrate your workload.

How to use nested parallelisation in R when the nested loop is contained within a library?

I'm using the caret library in R and attempting to produce multiple models simultaneously. However, since caret is also capable of parallelization things aren't working properly.
I'm aware that the correct format for nested foreach loops in R is along the lines of:
foreach(i=inputarray) %:%
foreach(j=secondarray) %dopar% {
# functions here
}
However, in this situation the closest I can come is something like this:
foreach(i=inputarray) %:% {
trainModel(use="modelName")
}
Perhaps unsurprisingly this doesn't work too well, as the outside iterator doesn't get passed in properly and the code doesn't run at all. Using %dopar% instead results in code that works, but each call to trainModel uses only one thread, as visible from task manager when longer models are running.
In terms of system information I'm running Win 10 with R 3.6
In case somebody else finds themselves in need of this, the best solution I found was to create a second thread cluster inside the first foreach() {} using registerDoSNOW(makeCluster(x)) to assign an individual number of threads to each loop. It has the added benefit of allowing you to give each loop a different number of resources for inequal Job sizes too, which is useful for my application. Of course, there's the slight detractor that the outside cluster declaration causes some overhead threads that don't do much and impact performance a little, but overall a decent solution nonetheless.
cl <- makeCluster(n)
registerDoParallel(cl)
foreach(i=inputarray) %dopar% {
library(doSNOW)
registerDoSNOW(makeCluster(x))
trainModel(...)
}

R foreach doParallel with 1 worker/thread

Well, I don't think anyone understood the question...
I have a dynamic script. Sometimes, it will iterate through a list of 10 things, and sometimes it will only iterate through 1 thing.
I want to use foreach to run the script in parallel when the items to iterate through is greater than 1. I simply want to use 1 core per item to iterate through. So, if there are 5 things, I will parallel across 5 threads.
My question is, what happens when the list to iterate through is 1?
Is it better to NOT run in parallel and have the machine maximize throughput? Or can I have my script assign 1 worker and it will run the same as if I had not told it to run in parallel at all?
So lets call the "the number of things you are iterating" iter which you can set dynamically for different processes
Scripting the parallelization might look something like this
if(length(iter)==1){
Result <- #some function
} else {
cl <- makeCluster(iter)
registerDoParallel(cl)
Result <- foreach(z=1:iter) %dopar% {
# some function
}
stopCluster(cl)
}
Here if iter is 1 it will not invoke parallelization otherwise it will assign cores dynamically according to the number of iter. Note that if you intend to embed this in a function, makeCluster and registerDoParallel cannot be called within a function, you have to set them outside a function.
Alternatively you register as many clusters as you have nodes, run the foreach dynamically and the unused clusters will just remain idle.
EDIT: It is better to run NOT to run in parallel if you have only one operation to iterate through. If only to avoid additional time incurred by makeCluster(), registerDoParallel() and stopCluster(). But the difference will be small compared to going parallel with one worker. Modified code above adding conditional to screen for the case of just one worker. Please provide feedback bellow if you need further assistance.

R: At What Depth Should the Parallel Call Be In

I have a nested *apply calls and I want to parallelize them. I have the option to parallelize on either the top call or the nested inner call. I believe that, in theory, the first one is supposed to be better, but my problem is that I have 4 cores but the outer-most object has 5 parts that are of very varying sizes. When I ran the first example, all 4 cores ran for about 10 minutes before 2 of them finished. At 1 hour the third one finished, and the 4th was the last one to finish at 1:45 having gotten the two largest processes.
What are the pro's and con's of each?
parLapply(cl, object, function(obj) lapply(obj, funct))
-- OR --
lapply(object, function(obj) parLapply(cl, obj, funct))
Additionally, is there is a way to manually distribute the load? That way I could separate the two large objects and put the two smallest together.
EDIT: Generally, what does CS theory state about this situation? Which is generally the best place for a parallel call (excluding peculiar circumstances like this)
parLapply groups your tasks so there is one group of tasks per cluster worker. That doesn't work well if you need load balancing, so I suggest that you try clusterApplyLB instead:
clusterApplyLB(cl, object, function(obj) lapply(obj, funct))
If you have 5 tasks and 4 workers, this will schedule tasks 1-4 on workers 1-4, and then it will schedule task 5 on the worker that finishes it's task first. That may work reasonably well, but it will work better if the last task is the shortest.
If instead you use:
lapply(object, function(obj) clusterApplyLB(cl, obj, funct))
it will execute 5 separate parallel jobs. That could be very inefficient if the tasks within those parallel jobs are small, plus you will waste resources for each of the 5 jobs that have load balancing problems. Thus, this approach doesn't usually work well.
You usually want to use the first case, but load balancing is often a serious problem when the number of tasks isn't much larger than the number of workers. If each call to funct takes a reasonable amount of time (at least a couple of seconds, for example), you could try unrolling the loop using the nesting operator from the foreach package:
r <-
foreach(obj=object) %:%
foreach(o=obj) %dopar% {
funct(o)
}
This turns all of the calls to funct into a single stream of tasks, but still returns the results in a list of lists.
You can find out more about using the foreach nesting operator in a vignette that I wrote called Nesting Foreach Loops.

R foreach that includes for loops weird behavior

I am trying to build a parallel foreach loop using DoMC but there are some odd behaviors going on. The code looks like this
for (file in files) {
do stuff
for (extra in extras) {
do some heavy stuff
}
}
When I load the DoMC or DoParallel, the loop starts utilizing one core but in the second loop it utilizes all 4 cores
When I switch the for loops to foreach %do% I get the exact same behavior.
If I use foreach for the outer loop and leave the inner as a for loop, the script becomes slow. It starts with 4jobs parallel and then they all stop and gradually CPU usage decreases.
What I want is to parallelize the top loop and not the inner second. Anyone knows what's going on? I have used foreach and doMC in the past and never had this issue before.
It looks like you have a few things going on, but there is not enough here to be sure:
If you are using this from RStudio it may not work well, that is a stated limitation of doMC. Try running it straight out of R 64 bit.
You need to require(doMC) or library(doMC) call the package, but you also need to register it with your machine or it will not work right
registerDoMC(4)
That 4 is telling it how many cores to run. If you say nothing it TRIES to use 1/2 of your core.
And you do not have complete code above, the appropriate format is:
foreach(file in files) %dopar% {
stuff to do
}
You must expressly tell it to do parallel processing using the %dopar% command.
if you want to use all cores in one area and not in others, then you need to set options to tell it how many cores for the separate parts of you function or code. But if you tell and outer loop to use 4 and an inner loop to use 2 it may be slower than setting it to 4 in the outer loop and letting it manage things itself. I am not 100% clear on how it accomplishes hand-offs, experiment to see.
To change the number of cores, just add this line:
options(cores=2)
I hope this helps!

Resources