In R, how does two-level parallel computing work? - r

Suppose that I want to do bootstrap procedure 1000 times on each of 100 different simulated data set.
At top level, I can set up foreach backend to distribute the 100 jobs to different CPUs. Then at the lower level, by using function boot from R package boot I can also invoke parallel computing by specifying 'parallel' option in the function.
The pseudo code may look like following.
library(doParallel}
registerDoParallel(cores=4)
foreach(i=seq(100, 5, length.out = 100), .combine=cbind) %dopar% {
sim.dat <- simualateData(i)
boot.res <- boot(sim.dat, mean, R=1000, parallel = 'multicore', ...)
## then extract results and combine
...
}
I am curious to know how the parallel computing really works in this case.
Would the two different levels of parallel computing work at the same time? how would they affect (interact? interrupt? disable?) each other?
More generally, I guess there are now more and more R functions that provide parallel computing option like boot for intensive simulation. In that situation, is there a need to specify the lower-level parallel provided the top level? Or vice versa?
What are the pros and cons, if any, for this two-level parallel setup?
Thanks for any clarification.
EDIT:
I should have explained more clearly the problem. Actually after the boot.res is returned, more additional calculations are to be done on it to finally get summary statistics from boot.res. That means the whole computation is not mutually independent bootstrapping procedure. In this case, only outer parallel loop would mess up the results. So if I understand correctly here, the best way would be using nested foreach parallel backend, but suppress 'parallel' option from boot.
Anyone please correct me if I am wrong. Regards.
END EDIT

Related

Looking for 'for loop' alternative in sparklyr

I m trying to tune the model using sparklyr. Using for loops to tune parameter is not parallelising the work as expected and it is taking lot of time.
My Question:
Is there any alternative t I can use to parallelize the work?
id_wss <- NA
for (i in 2:8)
{
id_cluster <- ml_kmeans(id_ip4, centers = i, seed = 1234, features_col = colnames(id_ip4))
id_wss[i] <- id_cluster$cost
}
There is nothing specifically wrong with your code when it comes to concurrency:
The distributed and parallel part is model fitting process ml_kmeans(...). For loop doesn't affect that. Each model will be trained using resources available on your cluster as expected.
The outer loop is a driver code. Under normal conditions we use standard single threaded code (not that multithreading at this level is really an option R) when working with Spark.
In general (Scala, Python, Java) it is possible to use separate threads to submit multiple Spark jobs at the same time, but in practice it requires a lot of tuning, and access to low level API. Even there it is rarely worth the fuss, unless you have significantly overpowered cluster at your disposal.
That being said please keep in mind that if you compare Spark K-Means to local implementations on a data that fits in memory, things will be relatively slow. Using randomized initialization might help speed things up:
ml_kmeans(id_ip4, centers = i, init_mode = "random",
seed = 1234, features_col = colnames(id_ip4))
On a side note with algorithms, which can be easily evaluated with one of the available evaluators (ml_binary_classification_evaluator, ml_multiclass_classification_evaluator, ml_regression_evaluator) you can use ml_cross_validator / ml_train_validation_split instead of manual loops (see for example How to train a ML model in sparklyr and predict new values on another dataframe?).

Using R Parallel with other R packages

I am working on a very time intensive analysis using the LQMM package in R. I set the model to start running on Thursday, it is now Monday, and is still running. I am confident in the model itself (tested as a standard MLM), and I am confident in my LQMM code (have run several other very similar LQMMs with the same dataset, and they all took over a day to run). But I'd really like to figure out how to make this run faster if possible using the parallel processing capabilities of the machines I have access to (note all are Microsoft Windows based).
I have read through several tutorials on using parallel, but I have yet to find one that shows how to use the parallel package in concert with other R packages....am I over thinking this, or is it not possible?
Here is the code that I am running using the R package LQMM:
install.packages("lqmm")
library(lqmm)
g1.lqmm<-lqmm(y~x+IEP+pm+sd+IEPZ+IEP*x+IEP*pm+IEP*sd+IEP*IEPZ+x*pm+x*sd+x*IEPZ,random=~1+x+IEP+pm+sd+IEPZ, group=peers, tau=c(.1,.2,.3,.4,.5,.6,.7,.8,.9),na.action=na.omit,data=g1data)
The dataset has 122433 observations on 58 variables. All variables are z-scored or dummy coded.
The dependent libraries will need to be evaluated on all your nodes. The function clusterEvalQ is foreseen inside the parallel package for this purpose. You might also need to export some of your data to the global environments of your subnodes: For this you can use the clusterExport function. Also view this page for more info on other relevant functions that might be useful to you.
In general, to speed up your application by using multiple cores you will have to split up your problem in multiple subpieces that can be processed in parallel on different cores. To achieve this in R, you will first need to create a cluster and assign a particular number of cores to it. Next, You will have to register the cluster, export the required variables to the nodes and then evaluate the necessary libraries on each of your subnodes. The exact way that you will setup your cluster and launch the nodes will depend on the type of sublibraries and functions that you will use. As an example, your clustersetup might look like this when you choose to utilize the doParallel package (and most of the other parallelisation sublibraries/functions):
library(doParallel)
nrCores <- detectCores()
cl <- makeCluster(nrCores)
registerDoParallel(cl);
clusterExport(cl,c("g1data"),envir=environment());
clusterEvalQ(cl,library("lqmm"))
The cluster is now prepared. You can now assign subparts of the global task to each individual node in your cluster. In the general example below each node in your cluster will process subpart i of the global task. In the example we will use the foreach %dopar% functionality that is provided by the doParallel package:
The doParallel package provides a parallel backend for the
foreach/%dopar% function using the parallel package of R 2.14.0 and
later.
Subresults will automatically be added to the resultList. Finally, when all subprocesses are finished we merge the results:
resultList <- foreach(i = 1:nrCores) %dopar%
{
#process part i of your data.
}
stopCluster(cl)
#merge data..
Since your question was not specifically on how to split up your data I will let you figure out the details of this part for yourself. However, you can find a more detailed example using the doParallel package in my answer to this post.
It sounds like you want to use parallel computing to make a single call of the lqmm function execute more quickly. To do that, you either have to:
Split the one call of lqmm into multiple function calls;
Parallelize a loop inside lqmm.
Some functions can be split up into multiple smaller pieces by specifying a smaller iteration value. Examples include parallelizing randomForest over the ntree argument, or parallelizing kmeans over the nstart argument. Another common case is to split the input data into smaller pieces, operate on the pieces in parallel, and then combine the results. That is often done when the input data is a data frame or a matrix.
But many times in order to parallelize a function you have to modify it. It may actually be easier because you may not have to figure out how to split up the problem and combine the partial results. You may only need to convert an lapply call into a parallel lapply, or convert a for loop into a foreach loop. However, it's often time consuming to understand the code. It's also a good idea to profile the code so that your parallelization really speeds up the function call.
I suggest that you download the source distribution of the lqmm package and start reading the code. Try to understand it's structure and get an idea which loops could be executed in parallel. If you're lucky, you might figure out a way to split one call into multiple calls, but otherwise you'll have to rebuild a modified version of the package on your machine.

R `optim` returns different results if run in parallel

Is there any possible explanation for multiple optim instances with set starting values to return different results if run in parallel or one after another on a single core?
Basically, I do a rolling forecast with refitting of the model each time, so I can easily parallelize over the rolling windows, but the results are different if I do not parallelize...
Sadly, I don't have a simple reproducible example. I know that if I link to different BLAS then the results differ, so is there anything like different numerical precision / set of libraries used, that might cause this?

What R parallelization/HPC packages allow for parallelization within a loop?

Suppose I have a hierarchical Bayesian model with $V$ first-level nodes, where $V$ is very large, and I am going to to do $S$ simulations. My thinking is that I could benefit by parallelizing the computation of each of those first-level nodes, and of course from running multiple chains in parallel. So I would have two for or *apply levels, one of the parallelization of the multiple chains, and one for the parallelization of the first-level node computations within an iteration for a particular chain. In what R packages, if any, is this possible? Thank you.
As requested, here is some high-level pseudo-code for something I'd want to do:
for node in top.cluster {
for draw in simulation {
draw population.level.variables from population.level.conditionals
for node in bottom.cluster {
draw random.effect[node] from random.effect.conditionals[node]
}
}
}
Does this make more sense?
In general, it is best to parallelize at the outermost level of the calculation as that avoids communication overhead as much as possible. Unless you tell us more specifics I don't see a point in parallelizing at two explicit levels of the code.
Here are some exceptions:
Of course that is not (easily) possibly if for your outer loop each iteration depends on the results of the last.
Another caveat is that you'd need to have sufficient memory for this high-level parallelization as (possibly) n copies of the data need to be held in RAM.
In R, you can do implicitly* parallelized matrix calculations by using a parallelized BLAS (I use OpenBLAS), which also doesn't need more memory. Depending on how much of your calculations are done by the BLAS, you may want to tweak the "outer" parallelization and the number of threads used by the BLAS.
* without any change to your code
Here's the high-performance computation task view, which gives you an overview of pacakges
Personally, I mostly use snow + the parallelized BLAS.

Using all cores for R MASS::stepAIC process

I've been struggling to perform this sort of analysis and posted on the stats site about whether I was taking things in the right direction, but as I've been investigating I've also found that my lovely beefy processor (linux OS, i7) is only actually using 1 of its cores. Turns out this is default behaviour, but I have a fairly large dataset and between 40 and 50 variables to select from.
A stepAIC function that is checking various different models seems like the ideal sort of thing for parellizing, but I'm a relative newb with R and I only have sketchy notions about parallel computing.
I've taken a look at the documentation for the packages parallel, and snowfall, but these seems to have some built-in list functions for parallelisation and I'm not sure how to morph the stepAIC into a form that can be run in parellel using these packages.
Does anyone know 1) whether this is a feasible exercise, 2) how to do what I'm looking to do and can give me a sort of basic structure/list of keywords I'll need?
Thanks in advance,
Steph
I think that a process in which a step depends on de last (as in step wise selection) is not trivial to do in parallel.
The simplest way to do something in parallel I know is:
library(doMC)
registerDoMC()
l <- foreach(i=1:X) %dopar% { fun(...) }
in my poor understanding of stepwise one extracts variables (or add forward/backward) of a model and measure the fitting in each step. If extracting a variable the model fit is best you keep this model, for example. In the foreach parallel function each step is blind to other step, maybe you could write your own function to perform this task as in
http://beckmw.wordpress.com/tag/stepwise-selection/
I looked for this code, and seems to me that you could use parallel computing with the vif_func function...
I think you also should check optimized codes to do that task as in the package leaps
http://cran.r-project.org/web/packages/leaps/index.html
hope this helps...

Resources