Understanding the differences between mclapply and parLapply in R - r

I've recently started using parallel techniques in R for a project and have my program working on Linux systems using mclapply from the parallel package. However, I've hit a road block with my understanding of parLapply for Windows.
Using mclapply I can set the number of cores, iterations, and pass that to an existing function in my workspace.
mclapply(1:8, function(z) adder(z, 100), mc.cores=4)
I don't seem to be able to achieve the same in windows using parLapply. As I understand it, I need to pass all the variables through using clusterExport() and pass the actual function I want to apply into the argument.
Is this correct or is there something similar to the mclapply function that's applicable to Windows?

The beauty of mclapply is that the worker processes are all created as clones of the master right at the point that mclapply is called, so you don't have to worry about reproducing your environment on each of the cluster workers. Unfortunately, that isn't possible on Windows.
When using parLapply, you generally have to perform the following additional steps:
Create a PSOCK cluster
Register the cluster if desired
Load necessary packages on the cluster workers
Export necessary data and functions to the global environment of the cluster workers
Also, when you're done, it's good practice to shutdown the PSOCK cluster using stopCluster.
Here's a translation of your example to parLapply:
library(parallel)
cl <- makePSOCKcluster(4)
setDefaultCluster(cl)
adder <- function(a, b) a + b
clusterExport(NULL, c('adder'))
parLapply(NULL, 1:8, function(z) adder(z, 100))
If your adder function requires a package, you'll have to load that package on each of the workers before calling it with parLapply. You can do that quite easily with clusterEvalQ:
clusterEvalQ(NULL, library(MASS))
Note that the NULL first argument to clusterExport, clusterEval and parLapply indicates that they should use the cluster object registered via setDefaultCluster. That can be very useful if your program is using mclapply in many different functions, so that you don't have to pass the cluster object to every function that needs it when converting your program to use parLapply.
Of course, adder may call other functions in your global environment which call other functions, etc. In that case, you'll have to export them as well and load any packages that they need. Also note that if any variables that you've exported change during the course of your program, you will have to export them again in order to update them on the cluster workers. Again, that isn't necessary with mclapply because it always creates/clones/forks the workers whenever it is called, making that unnecessary.

mclapply is simpler to use, and uses the underlying operating system fork() functionality to achieve parallelization. However, since Windows does not have fork(), it will run standard lapply instead - no parallelization.
parLapply is a different beast. It will create a cluster of processes, which could even reside on different machines on your network, and they communicate via TCP/IP in order to pass the tasks and results between each other.
The problem in your code there is that you didn't realize the first parameter to parLapply should be a "cluster" object. The simplest example of using parLapply to run on a single machine that I can think of is this:
library(parallel)
# Spawn child processes using fork() on the local machine
cl <- makeForkCluster(getOption("cl.cores", 2))
# Use parLapply to calculate lengths of 1000 strings
text = rep("Hello, world!", 1000)
len = parLapply(cl, text, nchar)
# Kill child processes since they are no longer needed
stopCluster(cl)
Using parLapply with a cluster created using makeForkCluster as above is functionally equivalent to calling mclapply. So it will also not work on Windows. :) Take a look at the other ways you can create a cluster with makeCluster and makePSOCKcluster on the documentation and check out what works best for your requirements.

This is how I would use parLapply for parallel computatiuons:
library(parallel)
##
test_func0 = function(i) {
lfactorial(i) + q + d
}
##
x = 1:100
q=20
d= 25
##
t1=Sys.time()
##
clust <- makeCluster(4)
## clusterExport loads these objects from the specified environment into the clust object
clusterExport(clust, c("test_func0", "q", "d"), envir = environment())
a <- do.call(c, parLapply(clust, x, function(i) {
test_func0(i)
}))
print(Sys.time()-t1)
##
stopCluster(clust)
Please keep in mind that makeCluster is much faster than makePSOCKcluster to create the environment cluster of variables to feed parLapply.

Related

Is it possible to push/pull variables between two instances of R?

Suppose I have two instances of R running. Are there existing solutions to easily send variables/data from one instance to the other? Maybe even synchronize the values of a variable between the two instances?
For example, first the two instances (R1 & R2) would be connected somehow, then in R1:
> a <- 12
> push(a)
and at this point in R2:
> a
[1] 12
The keyword here is ease of use: make it as quick as possible (for the user) to interactively synchronize the value of certain variables. I would use this with Mathematica's RLink to work interactively in one R instance and push/pull data to/from Mathematica's instance.
I realize that the question might sound strange. The reason why I'm hopeful that something like this exists is that it would be useful for parallel or distributed computing as well (which is not my use case here).
Have a look at svSocket. From the package description at: svSocket.pdf
The SciViews svSocket package provides a stateful, multi-client and preemtive socket server. [...]
Although initially designed to server GUI clients, the R socket server can also be used to exchange data between separate R processes.
This demo video is really worth it.
This is a different approach to the push/pull model, but you can use the bigmemory package to create a matrix that exists in shared memory (or on disk) that can be accessed across multiple R sessions on the same machine:
R session 1
library(bigmemory)
m <- matrix(1:9, 3, 3)
m <- as.big.matrix(m, type="double", backingfile="m.bin", descriptorfile="m.desc")
m
# An object of class "big.matrix"
# Slot "address":
# <pointer: 0x7fba95004ee0>
R session 2
library(bigmemory)
m <- attach.big.matrix("m.desc")
# Now any changes you make to m will be reflected in both sessions!
This is also useful for parallel computing using on matrices, since you're now only passing around the pointer to the matrix to each of the spawned R sessions, rather than the whole
object.
Since we've created a file-backed big matrix, it also allows you to create matrices it also allows you to create and operate on matrices larger than memory!
Parallel example
library(bigmemory)
library(doMC) # Windows users will need to choose a different parallel backend
library(foreach)
registerDoMC(4) # number of cores (new R sessions to spawn) to run in parallel.
m <- matrix(rnorm(1000*1000), 1000)
as.big.matrix(m, type="double", backingfile="m.bin", descriptorfile="m.desc")
# Just to make sure we don't have any of these objects in memory when we spawn the
# parallel sessions
rm(m)
gc()
foreach(i = 1:4) %dopar% {
m <- attach.big.matrix("m.desc")
# do something!
}
I think Redis can help you achieve what you want.
You can use the R packages rredis and/or RcppRedis
On the first instance of R you can do
library(rredis)
redisConnect()
redisSet("a", 12)
[1] "OK"
Then on the second R instance you can then do
library(rredis)
redisConnect()
redisGet("a")
[1] 12

R: launch separate threads for function and wait for result

I've a function fA in R calling three times the same function fB like :
fA <- function(x){
r1 <- fB(param1)
r2 <- fB(param2)
r3 <- fB(param3)
return(c(r1,r2,r3))
}
the parameter of fBs function are computed in fA function. But to go faster how can I launch every fB function in background and wait for the results ( so the thre fB function are executed in parallel )
Thanks
Not too long ago, the parallel package was added to the R core. Have a look at functions like mclapply and parLapply for functions that mimic the behavior of lapply but execute in parallel. mclapply uses process forking, and parLapply uses clusters (e.g. a SOCK cluster). I would study the documentation of the parallel package to see what your specific situation requires.

Is it possible to do a common plot in parallel (with foreach)?

The problem in short: I have to visualise an optimisation process on a plot. On a pre-plotted device the new solutions (as points(x,y)) should appear as they are available. I have working sequential code by a nested loop, but it should possible to make it parallel. I gave a try with the foreach package (with the doParallel backend) but the spawned processes can not plot on the parent processe's device - obviously.
Is it possible to overcome this maybe with another package or anyhow?
I don't think it's possible to have one process plot to another process's graphics device in R. Instead, the processes need to send messages to a designated process that does all of the graphics operations. That would be rather difficult using parallel or foreach/doParallel, but it's relatively easy using foreach with a parallel backend that supports processing worker results on-the-fly, such as doMPI or doRedis. In that case, you can plot the points in the combine function as they are computed by the workers. Here's a simple example:
library(doMPI)
nworkers <- 4
cl <- startMPIcluster(nworkers)
comm <- cl$comm # get communicator number
registerDoMPI(cl)
cap <- capabilities()
if (cap['aqua']) quartz() else if (cap['X11']) X11() else windows()
plot(integer(0), integer(0),
main='Random points generated by doMPI workers',
xlab='X', ylab='Y', xlim=c(1,100), ylim=c(1,100))
legend('topright', sprintf('worker %d', 1:nworkers), pch=1:nworkers)
# The argument "p" is a list of arguments to the function "points"
plotpoint <- function(x, p) {
do.call('points', p)
x
}
foreach(i=icount(100), .combine='plotpoint',
.init=NULL, .inorder=FALSE) %dopar% {
Sys.sleep(abs(rnorm(1, mean=3)))
list(x=sample(100,1), y=sample(80,1), pch=mpi.comm.rank(comm))
}
text(50, 90, 'Finished')
Sys.sleep(10)
closeCluster(cl)
mpi.quit()
I'm throwing away the results since they are only used to call points, but you could modify plotpoint to save some part of the result in the first argument x. Just make sure that you modify the .init argument to the appropriate data structure.
You might be able to try something like this:
foreach(i=1:10) %dopar% with(env.profile(.GlobalEnv), #Your Code#)
I'm not sure if this would work or not. You may be running into problems because of the sequential nature of plotting with base R graphics works. You might also try creating a list and just adding the results of each iteration to the list during the %dopar% call. After it's finished, you should be able to just use lapply(point.list, points).

Asynchronous command dispatch in interactive R

I'm wondering if this is possible to do (it probably isn't) using one of the parallel processing backends in R. I've tried a few google searches and come up with nothing.
The general problem I have at the moment:
I have some large objects that take about half an hour to load
I want to generate a series of plots on the data (takes a few minutes).
I want to go and do other things with the data while this happens (Not changing the underlying data though!)
Ideally I would be able to dispatch the command from the interactive session, and not have to wait for it to return (so I can go do other things while I wait for the plot to render). Is this possible, or is this a case of wishful thinking?
To expand on Dirk's answer, I suggest that you use the "snow" API in the parallel package. The mcparallel function would seem to be perfect for this (if you're not using Windows), but it doesn't work well for performing graphic operations due to it's use of fork. The problem with the "snow" API is that it doesn't officially support asynchronous operations. However, it's rather easy to do if you don't mind cheating by using non-exported functions. If you look at the code for clusterCall, you can figure out how to submit tasks asynchronously:
> library(parallel)
> clusterCall
function (cl = NULL, fun, ...)
{
cl <- defaultCluster(cl)
for (i in seq_along(cl)) sendCall(cl[[i]], fun, list(...))
checkForRemoteErrors(lapply(cl, recvResult))
}
So you just use sendCall to submit a task, and recvResult to wait for the result. Here's an example of that using the bigmemory package, as suggested by Dirk.
You can create a "big matrix" using functions such as big.matrix or as.big.matrix. You'll probably want to do that efficiently, but I'll just convert a matrix z using as.big.matrix:
library(bigmemory)
big <- as.big.matrix(z)
Now I'll create a cluster and connect each of the workers to big using describe and attach.big.matrix:
cl <- makePSOCKcluster(2)
worker.init <- function(descr) {
library(bigmemory)
big <<- attach.big.matrix(descr)
X11() # use "quartz()" on a Mac; "windows()" on Windows
NULL
}
clusterCall(cl, worker.init, describe(big))
This also opens graphics window on each worker in addition to attaching to the big matrix.
To call persp on the first cluster worker, we use sendCall:
parallel:::sendCall(cl[[1]], function() {persp(big[]); NULL}, list())
This returns almost immediately, although it may take awhile until the plot appears. At this point, you can submit tasks to the other cluster worker, or do something else that is completely unrelated. Just make sure that you read the result before submitting another task to the same worker:
r1 <- parallel:::recvResult(cl[[1]])
Of course, this is all very error prone and not at all pretty, but you could write some functions to make it easier. Just keep in mind that non-exported functions such as these can change with any new release of R.
Note that it is perfectly possible and legitimate to execute a task on a specific worker or set of workers by subsetting the cluster object. For example:
clusterEvalQ(cl[1], persp(big[]))
This will send the task to the first worker while the others do nothing. But of course, this is synchronous, so you can't do anything on the other cluster workers until this task finishes. The only way that I know to send the tasks asynchronously is to cheat.
R is, and will remain, single-threaded.
But you can share resources. One approach would be to load your big data in one session, assign it to a bigmemory object -- and then share the 'handle' to that object with other R sessions on the same box. Should be a reasonably easy piece of cake on a decent Linux box with sufficient ram (ie a low multiple of all your data needs).

How to avoid duplicating objects with foreach

I have a very huge string vector and would like to do a parallel computing using foreach and dosnow package. I noticed that foreach would make copies of the vector for each process, thus exhaust system memory quickly. I tried to break the vector into smaller pieces in a list object, but still do not see any memory usage reduction. Does anyone have thoughts on this? Below are some demo code:
library(foreach)
library(doSNOW)
library(snow)
x<-rep('some string', 200000000)
# split x into smaller pieces in a list object
splits<-getsplits(x, mode='bysize', size=1000000)
tt<-vector('list', length(splits$start))
for (i in 1:length(tt)) tt[[i]]<-x[splits$start[i]: splits$end[i]]
ret<-foreach(i = 1:length(splits$start), .export=c('somefun'), .combine=c) %dopar% somefun(tt[[i]])
The style of iterating that you're using generally works well with the doMC backend because the workers can effectively share tt by the magic of fork. But with doSNOW, tt will be auto-exported to the workers, using lots of memory even though they only actually need a fraction of it. The suggestion made by #Beasterfield to iterate directly over tt resolves that issue, but it's possible to be even more memory efficient through the use of iterators and an appropriate parallel backend.
In cases like this, I use the isplitVector function from the itertools package. It splits a vector into a sequence of sub-vectors, allowing them to be processed in parallel without losing the benefits of vectorization. Unfortunately, with doSNOW, it will put these sub-vectors into a list in order to call the clusterApplyLB function in snow since clusterApplyLB doesn't support iterators. However, the doMPI and doRedis backends will not do that. They will send the sub-vectors to the workers right from the iterator, using almost half as much memory.
Here's a complete example using doMPI:
suppressMessages(library(doMPI))
library(itertools)
cl <- startMPIcluster()
registerDoMPI(cl)
n <- 20000000
chunkSize <- 1000000
x <- rep('some string', n)
somefun <- function(s) toupper(s)
ret <- foreach(s=isplitVector(x, chunkSize=chunkSize), .combine='c') %dopar% {
somefun(s)
}
print(length(ret))
closeCluster(cl)
mpi.quit()
When I run this on my MacBook Pro with 4 GB of memory
$ time mpirun -n 5 R --slave -f split.R
it takes about 16 seconds.
You have to be careful with the number of workers that you create on the same machine, although decreasing the value of chunkSize may allow you to start more.
You can decrease your memory usage even more if you're able to use an iterator that doesn't require all of the strings to be in memory at the same time. For example, if the strings are in a file named 'strings.txt', you can use s=ireadLines('strings.txt', n=chunkSize).

Resources