What R parallelization/HPC packages allow for parallelization within a loop? - r

Suppose I have a hierarchical Bayesian model with $V$ first-level nodes, where $V$ is very large, and I am going to to do $S$ simulations. My thinking is that I could benefit by parallelizing the computation of each of those first-level nodes, and of course from running multiple chains in parallel. So I would have two for or *apply levels, one of the parallelization of the multiple chains, and one for the parallelization of the first-level node computations within an iteration for a particular chain. In what R packages, if any, is this possible? Thank you.
As requested, here is some high-level pseudo-code for something I'd want to do:
for node in top.cluster {
for draw in simulation {
draw population.level.variables from population.level.conditionals
for node in bottom.cluster {
draw random.effect[node] from random.effect.conditionals[node]
}
}
}
Does this make more sense?

In general, it is best to parallelize at the outermost level of the calculation as that avoids communication overhead as much as possible. Unless you tell us more specifics I don't see a point in parallelizing at two explicit levels of the code.
Here are some exceptions:
Of course that is not (easily) possibly if for your outer loop each iteration depends on the results of the last.
Another caveat is that you'd need to have sufficient memory for this high-level parallelization as (possibly) n copies of the data need to be held in RAM.
In R, you can do implicitly* parallelized matrix calculations by using a parallelized BLAS (I use OpenBLAS), which also doesn't need more memory. Depending on how much of your calculations are done by the BLAS, you may want to tweak the "outer" parallelization and the number of threads used by the BLAS.
* without any change to your code
Here's the high-performance computation task view, which gives you an overview of pacakges
Personally, I mostly use snow + the parallelized BLAS.

Related

In R, how does two-level parallel computing work?

Suppose that I want to do bootstrap procedure 1000 times on each of 100 different simulated data set.
At top level, I can set up foreach backend to distribute the 100 jobs to different CPUs. Then at the lower level, by using function boot from R package boot I can also invoke parallel computing by specifying 'parallel' option in the function.
The pseudo code may look like following.
library(doParallel}
registerDoParallel(cores=4)
foreach(i=seq(100, 5, length.out = 100), .combine=cbind) %dopar% {
sim.dat <- simualateData(i)
boot.res <- boot(sim.dat, mean, R=1000, parallel = 'multicore', ...)
## then extract results and combine
...
}
I am curious to know how the parallel computing really works in this case.
Would the two different levels of parallel computing work at the same time? how would they affect (interact? interrupt? disable?) each other?
More generally, I guess there are now more and more R functions that provide parallel computing option like boot for intensive simulation. In that situation, is there a need to specify the lower-level parallel provided the top level? Or vice versa?
What are the pros and cons, if any, for this two-level parallel setup?
Thanks for any clarification.
EDIT:
I should have explained more clearly the problem. Actually after the boot.res is returned, more additional calculations are to be done on it to finally get summary statistics from boot.res. That means the whole computation is not mutually independent bootstrapping procedure. In this case, only outer parallel loop would mess up the results. So if I understand correctly here, the best way would be using nested foreach parallel backend, but suppress 'parallel' option from boot.
Anyone please correct me if I am wrong. Regards.
END EDIT

Parallelized multidimensional numerical integration in R?

I am working on a problem that needs a numerical integration of a bivariate function, where each evaluation of the function takes about 1 minute. Since numerical integration on a single core would evaluate the function thousands to tens of thousand times, I would like to parallelize the calculation. Right now I am using the bruteforce approach that calculates a naive grid of points and add them up with appropriate area multipliers. This is definitely not efficient and I suspect any modern multidimensional numerical integration algorithm would be able to achieve the same precision with a lot fewer function evaluations. There are many packages in R that would calculate 2-d integration much more efficiently and accurately (e.g. R2Cuba), but I haven't found anything that can be easily parallelized on a cluster with SGE managed job queues. Since this is only a small part of a bigger research problem, I would like to see if this can be done with reasonable effort , before I try to parallelize one of the cubature-rule based methods in R by myself.
I have found that using sparse grid achieves the best compromise between speed and accuracy in multi-dimensional integration, and it's easily parallized on the cluster because it doesn't involve any sequential steps. It won't be as accurate as other sequentially adpative integration algorithms but it is much better than the naive method because it provides a much sparser grid of points to calculate on each core.
The following R code deals with 2-dimensional integration, but can be easily modified for higher dimensions. The apply function towards the end can be easily parallelized on a cluster.
sg.int<-function(g,...,lower,upper)
{ require("SparseGrid")
lower<-floor(lower)
upper<-ceiling(upper)
if (any(lower>upper)) stop("lower must be smaller than upper")
gridss<-as.matrix(expand.grid(seq(lower[1],upper[1]-1,by=1),seq(lower[2],upper[2]-1,by=1)))
sp.grid <- createIntegrationGrid( 'KPU', dimension=2, k=5 )
nodes<-gridss[1,]+sp.grid$nodes
weights<-sp.grid$weights
for (i in 2:nrow(gridss))
{
nodes<-rbind(nodes,gridss[i,]+sp.grid$nodes)
weights<-c(weights,sp.grid$weights)
}
gx.sp <- apply(nodes, 1, g,...)
val.sp <- gx.sp %*%weights
val.sp
}

Parallel computing for TraMineR

I have a large dataset with more than 250,000 observations, and I would like to use the TraMineR package for my analysis. In particular, I would like to use the commands seqtreeand seqdist, which works fine when I for example use a subsample of 10,000 observations. The limit my computer can manage is around 20,000 observations.
I would like to use all the observations and I do have access to a supercomputer who should be able to do just that. However, this doesn't help much as the process runs on a single core only. Therefore my question, is it possible to apply parallel computing technics to the above mentioned commands? Or are there other ways to speed up the process? Any help would be appreciated!
The internal seqdist function is written in C++ and has numerous optimizations. For this reason, if you want to parallelize seqdist, you need to do it in C++. The loop is located in the source file "distancefunctions.cpp" and you need to look at the two loops located around line 300 in function "cstringdistance" (Sorry but all comments are in French). Unfortunately, the second important optimization is that the memory is shared between all computations. For this reason, I think that parallelization would be very complicated.
Apart from selecting a sample, you should consider the following optimizations:
aggregation of identical sequences (see here: Problem with big data (?) during computation of sequence distances using TraMineR )
If relevant, you can try to reduce the time granularity. Distance computation time is highly dependent on sequence length (O^2). See https://stats.stackexchange.com/questions/43601/modifying-the-time-granularity-of-a-state-sequence
Reducing time granularity may also increase the number of identical sequences, and hence, the impact of optimization one.
There is a hidden option in seqdist to use an optimized version of the optimal matching algorithm. It is still in testing phase (that's why it is hidden), but it should replace the actual algorithm in a future version. To use it, set method="OMopt", instead of method="OM". Depending on your sequences, it may reduce computation time.

Using all cores for R MASS::stepAIC process

I've been struggling to perform this sort of analysis and posted on the stats site about whether I was taking things in the right direction, but as I've been investigating I've also found that my lovely beefy processor (linux OS, i7) is only actually using 1 of its cores. Turns out this is default behaviour, but I have a fairly large dataset and between 40 and 50 variables to select from.
A stepAIC function that is checking various different models seems like the ideal sort of thing for parellizing, but I'm a relative newb with R and I only have sketchy notions about parallel computing.
I've taken a look at the documentation for the packages parallel, and snowfall, but these seems to have some built-in list functions for parallelisation and I'm not sure how to morph the stepAIC into a form that can be run in parellel using these packages.
Does anyone know 1) whether this is a feasible exercise, 2) how to do what I'm looking to do and can give me a sort of basic structure/list of keywords I'll need?
Thanks in advance,
Steph
I think that a process in which a step depends on de last (as in step wise selection) is not trivial to do in parallel.
The simplest way to do something in parallel I know is:
library(doMC)
registerDoMC()
l <- foreach(i=1:X) %dopar% { fun(...) }
in my poor understanding of stepwise one extracts variables (or add forward/backward) of a model and measure the fitting in each step. If extracting a variable the model fit is best you keep this model, for example. In the foreach parallel function each step is blind to other step, maybe you could write your own function to perform this task as in
http://beckmw.wordpress.com/tag/stepwise-selection/
I looked for this code, and seems to me that you could use parallel computing with the vif_func function...
I think you also should check optimized codes to do that task as in the package leaps
http://cran.r-project.org/web/packages/leaps/index.html
hope this helps...

How to let R use all the cores of the computer?

I have read that R uses only a single CPU. How can I let R use all the available cores to run statistical algorithms?
Yes, for starters, see the High Performance Computing Task View on CRAN. This lists details of packages that can be used in support of parallel computing on a single machine.
From R version 2.14.0, there is inbuilt support for parallel computing via the parallel package, which includes slightly modified versions of the existing snow and multicore packages. The parallel package has a vignette that you should read. You can view it using:
vignette(package="parallel", topic = "parallel")
There are other ways to exploit your multiple cores, for example via use of a multi-threaded BLAS for linear algebra computations.
Whether any of this will speed up the "statistics calculations" you want to do will depend on what those "statistics calculations" are. Spawning off multiple threads or workers entails an overhead cost to set them up, manage them and collect the results. Some operations see a benefit (some large, some small) of using multiple cores/threads, others are slowed down because of this extra overhead.
In short, do not expect to get an n times decrease in your compute time by using n cores instead of just 1.
If you happen to do few* iterations of the same thing (or same code for few* different parameters), the easiest way to go is to run several copies of R -- OS will allocate the work on different cores.
In the opposite case, go and learn how to use real parallel extensions.
For the sake of this answer, few means less or equal the number of cores.

Resources