Parallel computing with R in a SLURM cluster - r

I need to do a model estimation using the MCMC method on a SLURM cluster (the system is CentOS). The estimation takes a very long time to finish.
Within each MCMC interaction, there is one step taking a particularly long time. Since this step is doing a lapply-loop (around 100000 loops, 30s to finish all loops), so as far as I understand, I should be able to use parallel computing to speed up.
I tried several packages (doMC, doParallel, doSNOW) together with the foreach framework. The setup is
parallel_cores=8
#doParallel
library(doParallel)
cl<-makeCluster(parallel_cores)
registerDoParallel(cl)
#doMC
library(doMC)
registerDoMC(parallel_cores)
#doSNOW, this is also fast
library(doSNOW)
ml<-makeCluster( parallel_cores)
registerDoSNOW(cl)
#foreach framework
#data is a list
data2=foreach(
data_i=
data,
.packages=c("somePackage")
) %dopar% {
data_i=some_operation(data_i)
list(beta=data_i$beta,sigma=data_i$sigma)
}
Using a doMC, the time for this step can be reduced to about 9s. However, as doMC is using shared memory and I have a large array to store estimate results, I quickly ran out of memory (i.e. slurmstepd: error: Exceeded job memory limit).
Using doParallel and doSNOW, the time for this step even increased, to about 120s, which sounds ridiculous. The mysterious thing is that when I tested the code in both my Mac and Windows machines, doParallel and doSNOW actually gave similar speed compared to doMC.
I'm stuck and not sure how to proceed. Any suggestions will be greatly appreciated!

Related

R language, keras, parallel computation

Although I modeled on a 40-core computer, I only used a few of them when modeling with Keras. I would like to know how to call all CPU cores for computation. I tried to implement parallel computing in the following way but failed. Scholars, please enlighten me on other methods.
library(doParallel)
cl <- makeCluster(detectCores())
registerDoParallel(cl)
history <- model %>% fit(xtrain, ytrain,
epochs = 200, batch_size=100, verbose = 1)
Tensorflow/Keras takes care of parallelism in fit(), and it generally won't work if you manually try to fork the parent R process or manage a PSOCK cluster. The {parallel} package that comes with R is not compatible with Tensorflow/Keras.
If it looks like Tensorflow/Keras is not using all your CPU cores with the default settings, you can adjust the thread pool size here: https://www.tensorflow.org/api_docs/python/tf/config/threading (but, in my experience, it's more likely that you're IO-limited, or the CPU is waiting on the GPU, and probably not that the thread pool size is too small).
If you're interested in distributed computing with Tensorflow, here is a good place to get started: https://www.tensorflow.org/api_docs/python/tf/distribute

foreach doparallel on GPU

I have this code for writing my results in parallel. I am using foreach and doParallel libraries in R.
output_location='/home/Desktop/pp/'
library(foreach)
library(doParallel)
library(data.table)
no_cores <- detectCores()
registerDoParallel(makeCluster(no_cores))
a=Sys.time()
foreach(i=1:100,.packages = c('foreach','doParallel')
,.options.multicore=mcoptions)%dopar%
{result<- my_functon(arg1,arg2)
write(result,file=paste(output_location,"out",toString(i),".csv"))
gc()
}
Now it uses 4 cores in the CPU and thus the writing takes very less time using this code.But i want foreach-doparallel using GPU. Is there any method for processing the foreach doParallel loop on GPU. gputools,gpuR are some GPU supporting R packages. But they are mainly for mathematical computations like gpuMatMult(),gpuMatrix() etc. I am looking for running the loop on GPU. Any help or guidance will be great.
Parallelization with foreach or similar tools works because you have multiple CPUs (or a CPU with multiple cores), which can process multiple tasks at once. A GPU also has multiple cores, but these are already used to process a single task in parallel. So if you want to parallelize further, you will need multiple GPUs.
However, keep in mind that GPUs are faster than CPUs only for certain types of applications. Matrix operations with large matrices being a prime example! See the performance section here for a recent comparison of one particular example. So it might make sense for you to consider if the GPU is the right tool for you.
In addition: File IO will always go via the CPU.

Unlimiting the CPU usage from R

Is there any way to unlimit the CPU usage so my PC puts more effort in finishing a task for rapidly? At the moment the k-means algorithm is estimated to finish in about 10 days, which is something I would like to reduce.
R is single-threaded by default, and runs only on a single thread on the CPU, which is a pity if you have a machine with 16 or 32 cores. By unlimiting the CPU usage, I have to assume you're asking if there's any way to have an R process (let's say part of the k-means algorithm) take advantage of your full CPU power by running the process in-parallel.
Many R packages and processes are not going to be helped by parallel processing though. So the technical solution to your particular problem goes down to the package implementation you're using. Popular packages like caret do support parallelization when that's possible, even though you may need to add an additional allowParallel=T parameter. They work in conjunction with a library such as doMC to allow multi-core processes. In the following sample code, I have my machine use 8 cores through the registerDoMC(8) function, and then set allowParallel=T.
library(doMC)
registerDoMC(8)
system.time({
ctrl_2 <- trainControl(method="cv", number=3, allowParallel=T)
fb_forest_2 <- train(classe ~ ., data=fb_train, method="rf", trControl = ctrl_2)
})
Again, parallel processing doesn't always help - Not all process can be parallelized! The documentation for foreach are a great read so if you can afford the time take a look at it. The specific code solution for your problem also depend on the library implementation you're using.

Parallel / Multicore Processing in R for an Integer Program?

Are there any packages specifically to let R run faster via parallel computing? I have made a very large IP that needs to run for a while, so I was wondering if there was a specific package in R that could help me run my IP. Currently, I have a function that returns the solution of an IP and the primary line that R gets stuck on (for a very...very long time) is when I use lp (....all.int = TRUE). My CPU is around 12.5% (8 cores) on my Windows computer, and I want it to near 100
Edit: I tried using the doParallel package,
library('doParallel')
cl <- makeCluster(8)
registerDoParallel(cl)
But my CPU usage is still not at 100%. What else do i need to do? Is there a specific package that makes optimization problems run faster? Most parallel packages help with simulation, and foreach seems to only work for iterative structures/ apply functions. I just want R to use all my CPU usage

parallel prediction with cforest/randomforest prediction (with doSNOW)

I'm trying to speed up the prediction of a test-dataset (n=35000) by splitting it up and letting R run on smaller chunks. The model has been generated with party::cforest.
However, I can't get R to calculate even the smallest parts when trying to use foreach with %dopar%.
My prediction function takes about 7 seconds for both
predict(fit,newdata=a[1:100,]) and foreach(i=1:10) %do% {predict(fit,newdata=a[1:10,])}.
But when I try and use %dopar%instead, R seems to freeze.
Shouldn't :
foreach(i=1:10, .packages=c('party')) %dopar% {predict(fit,newdata=a[1:10,])}
be way faster? Or is the parallelization itself slowing R down somehow?
Test-running with another function (repeatedly calculating sqrt(3) as suggested here ) has shown significant improvement, so the %dopar% is working too.
Predictions with a randomForest behave similarly, with the difference that here even %do% for 10x1:10 predictions takes a lot more time than just predicting 1:100
For randomForest I don't really care though, because predicting all 35k datasets is not a problem anyway.
Btw. it only me, or is cforest taking more time and RAM for everything? Only having trouble where randomForest works like a charm..
(running on Windows 7, x64, 8GB RAM, 4 cores/8 threads - using 6 nodes in doSNOW parallelization cluster)
The primary problem with your example is that foreach is automatically exporting the entire a data frame to each of the workers. Instead, try something like:
library(itertools)
foreach(1:10, suba=isplitRows(a, chunkSize=10), .packages='party') %dopar% {
predict(fit, newdata=suba)
}
The 1:10 is for test purposes, to limit the loop to only 10 iterations, as you're doing in your example.
This still requires that fit be exported to all of the workers, and it might be quite large. But since there are many more tasks than workers and if predict takes enough time compared to the time to send the test data, it might be worthwhile to parallelize the prediction.

Resources