make .combine function scaleable - r

I am trying to use foreach and am having problems making the .combine function scalable. For example, here is a simple combine function
MyComb <- function(part1,part2){
xs <- c(part1$x,part2$x)
ys <- c(part1$y,part2$y)
return(list(xs,ys))
}
When I use this function to combine a foreach statement with an iterator other than 2 it returns it incorrectly. For example this works:
x = foreach(i=1:2,.combine=MyComb) %dopar% list("x"=i*2,"y"=i*3)
But not this:
x = foreach(i=1:3,.combine=MyComb) %dopar% list("x"=i*2,"y"=i*3)
Is there a way to generalize the combine function to make it scalable to n iterations?

Your .combine function must take either two pieces and return something that "looks" like a piece (could be passed back in as a part) or take many arguments and put all of them together at once (with the same restrictions). Thus at least your MyComb must return a list with components x and y (which is what each piece of your %dopar% do.
A couple of ways to do this:
MyComb1 <- function(part1, part2) {
list(x=c(part1$x, part2$x), y=c(part1$y, part2$y))
}
x = foreach(i=1:3,.combine=MyComb1) %dopar% list("x"=i*2,"y"=i*3)
This version takes only two pieces at a time.
MyComb2 <- function(...) {
dots = list(...)
ret <- lapply(names(dots[[1]]), function(e) {
unlist(sapply(dots, '[[', e))
})
names(ret) <- names(dots[[1]])
ret
}
s = foreach(i=1:3,.combine=MyComb2) %dopar% list("x"=i*2,"y"=i*3)
x = foreach(i=1:3,.combine=MyComb2, .multicombine=TRUE) %dopar% list("x"=i*2,"y"=i*3)
This one can take multiple pieces at a time and combine them. It is more general (but more complex).

Related

Foreach .combine Function to combine lists in R

the following is a parallel loop I am trying to run in R:
cl <- makeCluster(30,type="SOCK")
registerDoSNOW(cl)
results <- foreach (i = 1:30, .combine='bindlist', .multicombine=TRUE) %dopar% {
test <- i
test <- as.list(test)
list(test)
}
stopCluster(cl)
The output of my code is always a list and I want to combine the list into one large list. Thus I wrote the following .combine function:
bindlist <- function(x,y,...){
append(list(x),list(y),list(...))
}
As I am doing multiple runs and the number of variables change I tried to use .... However it does not work. How can I rewrite the .combine function so it can work with changing numbers of variables?
Have you considered using 'c'
results <- foreach (i = 1:4, .combine='c', .multicombine=TRUE) %dopar% {
test <- i
test <- as.list(test)
list(test)
}
If this adds an additional unwanted 'level' to your results, you could use 'unlist' to remove that level.
unlist(results, recursive = FALSE)

How to output two vectors that are iteratively filled using foreach?

I am trying to translate a for loop to a loop using foreach.
I have tried several output methods playing with the .combine argument, but I cannot output the two vectors that I create by first initilizing them to hold 1e4 zeros and then refilling each entry at each iteration.
In particular, I cannot recover the vectors that are created in this way:
Va = numeric(1e4)
Vb = numeric(1e4)
result = foreach(j = 1:1e4, .multicombine=TRUE) %dopar%
{
... rest of the code ...
Va[j] = sample(4,1)
Vb[j] = sample(5,1)
list(retSLSP, retBH)
}
Note that j is the loop variable in the foreach loop. Note also that the computations I showed are not the actual computations I have in my code, but are equivalent for the purposes of the example.
You can use shared-memory to be accessed by all threads.
library(bigmemory)
V <- big.matrix(1e4, 2)
desc <- describe(V)
result = foreach(j = 1:1e4, .multicombine=TRUE) %dopar%
{
V <- bigmemory::attach.big.matrix(desc)
... rest of the code ...
V[j, 1] = sample(4,1)
V[j, 2] = sample(5,1)
list(retSLSP, retBH)
}
Va <- V[, 1]
Vb <- V[, 2]
rm(V, desc)
Although, it would be better to parallelize by blocks than to do it for the whole loop.
An example: https://stackoverflow.com/a/45196081/6103040

Building a function for .combine in foreach

I have a process I want to do in parallel but I fail due to some strange error. Now I am considering to combine, and calculate the failing task on the master CPU. However I don't know how to write such a function for .combine.
How should it be written?
I know how to write them, for example this answer provides an example, but it doesn't provide how to handle with failing tasks, neither repeating a task on the master.
I would do something like:
foreach(i=1:100, .combine = function(x, y){tryCatch(?)} %dopar% {
long_process_which_fails_randomly(i)
}
However, how do I use the input of that task in the .combine function (if it can be done)? Or should I provide inside the %dopar% to return a flag or a list to calculate it?
To execute tasks in the combine function, you need to include extra information in the result object returned by the body of the foreach loop. In this case, that would be an error flag and the value of i. There are many ways to do this, but here's an example:
comb <- function(results, x) {
i <- x$i
result <- x$result
if (x$error) {
cat(sprintf('master computing failed task %d\n', i))
# Could call function repeatedly until it succeeds,
# but that could hang the master
result <- try(fails_randomly(i))
}
results[i] <- list(result) # guard against a NULL result
results
}
r <- foreach(i=1:100, .combine='comb',
.init=vector('list', 100)) %dopar% {
tryCatch({
list(error=FALSE, i=i, result=fails_randomly(i))
},
error=function(e) {
list(error=TRUE, i=i, result=e)
})
}
I'd be tempted to deal with this problem by executing the parallel loop repeatedly until all the tasks have been computed:
x <- rnorm(100)
results <- lapply(x, function(i) simpleError(''))
# Might want to put a limit on the number of retries
repeat {
ix <- which(sapply(results, function(x) inherits(x, 'error')))
if (length(ix) == 0)
break
cat(sprintf('computing tasks %s\n', paste(ix, collapse=',')))
r <- foreach(i=x[ix], .errorhandling='pass') %dopar% {
fails_randomly(i)
}
results[ix] <- r
}
Note that this solution uses the .errorhandling option which is very useful if errors can occur. For more information on this option, see the foreach man page.

Saving multiple outputs of foreach dopar loop

I would like to know if/how it would be possible to return multiple outputs as part of foreach dopar loop.
Let's take a very simplistic example. Let's suppose I would like to do 2 operations as part of the foreach loop, and would like to return or save the results of both operations for each value of i.
For only one output to return, it would be as simple as:
library(foreach)
library(doParallel)
cl <- makeCluster(3)
registerDoParallel(cl)
oper1 <- foreach(i=1:100000) %dopar% {
i+2
}
oper1 would be a list with 100000 elements, each element is the result of the operation i+2 for each value of i.
Suppose now I would like to return or save the results of two different operations separately, e.g. i+2 and i+3. I tried the following:
oper1 = list()
oper2 <- foreach(i=1:100000) %dopar% {
oper1[[i]] = i+2
return(i+3)
}
hoping that the results of i+2 will be saved in the list oper1, and that the results of the second operation i+3 will be returned by foreach. However, nothing gets populated in the list oper1! In this case, only the result of i+3 gets returned from the loop.
Is there any way of returning or saving both outputs in two separate lists?
Don't try to use side-effects with foreach or any other parallel program package. Instead, return all of the values from the body of the foreach loop in a list. If you want your final result to be a list of two lists rather than a list of 100,000 lists, then specify a combine function that transposes the results:
comb <- function(x, ...) {
lapply(seq_along(x),
function(i) c(x[[i]], lapply(list(...), function(y) y[[i]])))
}
oper <- foreach(i=1:10, .combine='comb', .multicombine=TRUE,
.init=list(list(), list())) %dopar% {
list(i+2, i+3)
}
oper1 <- oper[[1]]
oper2 <- oper[[2]]
Note that this combine function requires the use of the .init argument to set the value of x for the first invocation of the combine function.
I prefer to use a class to hold multiple results for a %dopar% loop.
This example spins up 3 cores, calculates multiple results on each core, then returns the list of results to the calling thread.
Tested under RStudio, Windows 10, and R v3.3.2.
library(foreach)
library(doParallel)
# Create class which holds multiple results for each loop iteration.
# Each loop iteration populates two properties: $result1 and $result2.
# For a great tutorial on S3 classes, see:
# http://www.cyclismo.org/tutorial/R/s3Classes.html#creating-an-s3-class
multiResultClass <- function(result1=NULL,result2=NULL)
{
me <- list(
result1 = result1,
result2 = result2
)
## Set the name for the class
class(me) <- append(class(me),"multiResultClass")
return(me)
}
cl <- makeCluster(3)
registerDoParallel(cl)
oper <- foreach(i=1:10) %dopar% {
result <- multiResultClass()
result$result1 <- i+1
result$result2 <- i+2
return(result)
}
stopCluster(cl)
oper1 <- oper[[1]]$result1
oper2 <- oper[[1]]$result2
This toy example shows how to return multiple results from a %dopar% loop.
This example:
Spins up 3 cores.
Renders a graph on each core.
Returns the graph and an attached message.
Prints the graphs and it's attached message out.
I found this really useful to speed up using Rmarkdown to print 1,800 graphs into a PDF document.
Tested under Windows 10, RStudio, and R v3.3.2.
R code:
# Demo of returning multiple results from a %dopar% loop.
library(foreach)
library(doParallel)
library(ggplot2)
cl <- makeCluster(3)
registerDoParallel(cl)
# Create class which holds multiple results for each loop iteration.
# Each loop iteration populates two properties: $resultPlot and $resultMessage.
# For a great tutorial on S3 classes, see:
# http://www.cyclismo.org/tutorial/R/s3Classes.html#creating-an-s3-class
plotAndMessage <- function(resultPlot=NULL,resultMessage="?")
{
me <- list(
resultPlot = resultPlot,
resultMessage = resultMessage
)
# Set the name for the class
class(me) <- append(class(me),"plotAndMessage")
return(me)
}
oper <- foreach(i=1:5, .packages=c("ggplot2")) %dopar% {
x <- c(i:(i+2))
y <- c(i:(i+2))
df <- data.frame(x,y)
p <- ggplot(df, aes(x,y))
p <- p + geom_point()
message <- paste("Hello, world! i=",i,"\n",sep="")
result <- plotAndMessage()
result$resultPlot <- p
result$resultMessage <- message
return(result)
}
# Print resultant plots and messages. Despite running on multiple cores,
# 'foreach' guarantees that the plots arrive back in the original order.
foreach(i=1:5) %do% {
# Print message attached to plot.
cat(oper[[i]]$resultMessage)
# Print plot.
print(oper[[i]]$resultPlot)
}
stopCluster(cl)

Next with Revolution R's foreach package?

I've looked through much of the documentation and done a fair amount of Googling, but can't find an answer to the following question: Is there a way to induce 'next-like' functionality in a parallel foreach loop using the foreach package?
Specifically, I'd like to do something like (this doesn't work with next but does without):
foreach(i = 1:10, .combine = "c") %dopar% {
n <- i + floor(runif(1, 0, 9))
if (n %% 3) {next}
n
}
I realize I can nest my brackets, but if I want to have a few next conditions over a long loop this very quickly becomes a syntax nightmare.
Is there an easy workaround here (either next-like functionality or a different way of approaching the problem)?
You could put your code in a function and call return. It's not clear from your example what you want it to do when n %% 3 so I'll return NA.
funi <- function(i) {
n <- i + floor(runif(1, 0, 9))
if (n %% 3) return(NA)
n
}
foreach(i = 1:10, .combine = "c") %dopar% { funi(i) }
Although it seems strange, you can use a return in the body of a foreach loop, without the need for an auxiliary function (as demonstrated by #Aaron):
r <- foreach(i = 1:10, .combine='c') %dopar% {
n <- i + floor(runif(1, 0, 9))
if (n %% 3) return(NULL)
n
}
A NULL is returned in this example since it is filtered out by the c function, which can be useful.
Also, although it doesn't work well for your example, the when function can take the place of next at times, and is useful for preventing the computation from taking place at all:
r <- foreach(i=1:5, .combine='c') %:%
foreach(j=1:5, .combine='c') %:%
when (i != j) %dopar% {
10 * i + j
}
The inner expression is only evaluated 20 times, not 25. This is particularly useful with nested foreach loops, since when has access to all of the upstream iterator values.
Update
If you want to filter out NULLs when returning the results in a list, you need to write your own combine function. Here's a complete example that demonstrates a combine function that works like the default combine function but includes a filtering mechanism:
library(doSNOW)
cl <- makeSOCKcluster(3)
registerDoSNOW(cl)
filteredlist <- function(a, ...) {
values <- list(...)
c(a, values[! sapply(values, is.null)])
}
r <- foreach(i=1:200, .combine='filteredlist', .init=list(),
.multicombine=TRUE) %dopar% {
# filter out odd values of i
if (i %% 2) return(NULL)
i
}
Note that this code works correctly when there are more than 100 task results (100 is the default value of the .maxcombine option).

Resources