How to include object in foreach function in R - r

My first code chunk below complains it is not able to find the object "M". The second, including the same work, but not wrapped inside a function, behaves as expected.
This is a just a toy example and obviously reproduces rowSums, but any pointers on how the first function could identify M?
## First Chunk
library(doParallel)
M <- matrix(rnorm(100), 10, 10)
myFun <- function(x){
cl <- makeCluster(4)
registerDoParallel(cl)
res <- foreach(i=1:nrow(M), .combine='c') %dopar% {
sum(M[i,]) + x
}
stopCluster (cl)
}
myFun(0)
## Second works fine
x <- 0
cl <- makeCluster(4)
registerDoParallel(cl)
res <- foreach(i=1:nrow(M), .combine='c') %dopar% {
sum(M[i,]) + x
}
stopCluster (cl)

Related

Stop criterion in foreach

I need to stop the parallel loop with a condition, e.g., when res < 1. A MWE is given by code below
library(foreach)
library(doParallel)
I <- 1000
L <- 1000
res <- Inf
cores <- detectCores()
cluster <- makeCluster(cores)
registerDoParallel(cluster)
out <- foreach(l = 1:I,.packages = "cec2013") %dopar% {
for(i in 1:I){
res <- 100/i
}
out <- res
out
}
out
My tentative to solve is given by
out <- foreach(l = 1:I,.packages = "cec2013")%:%when(res < 1) %dopar% {
for(i in 1:I){
res <- 100/i
}
out <- res
out
}
out
But out return a void list.
The reason this happens is because when(res < 1) looks in the calling environment, not the environment inside the loop. So you defined res <- Inf which is never less than 1.
Also, your foreach loop always returns 0.1, since the inner loop always stops at I=1000, so res will be 0.1. It's unclear what you're trying to do or when do you expect the loop to stop?

R: Parallel execution nested within a sequential loop with dependency

Let's say I have two functions f1 and f2. f2 is designed to take the output of f1 as an argument, and f1 is designed to take its own output to update it. Before the loop starts, output from f1 is initialized. Then within each iteration, f2 takes the previous output from f1 and executes, then f1 executes to update its own output. Two vectors will gather the sequential output from f1 and f2 respectively. The following code is a simple working example:
f1 <- function(x) return(x + pi)
f2 <- function(x) return(log(x))
f1.result <- res1 <- f1(1)
f2.result <- NULL
for(i in 2:100) { ## Need to parallelize these two lines ##
res2 <- f2(res1); f2.result <- c(f2.result, res2)
res1 <- f1(res1); f1.result <- c(f1.result, res1)
}
I am looking to parallelize the two executions inside the loop i.e. to get them run at the same time. How do I achieve this in R? I am familiar with the basics of foreach but can't figure this out. Thanks.
OK I think I figured this out. It's actually pretty simple. I use the doParallel package:
f1 <- function(x) return(x + pi)
f2 <- function(x) return(log(x))
f1.result <- res1 <- f1(1)
f2.result <- NULL
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
getDoParWorkers()
for(j in 2:100) {
res <- foreach(i = 1:2, .combine = c) %dopar% {
if(i==1) res <- f1(res1)
else res <- f2(res1)
}
res1 <- res[1]; f1.result <- c(f1.result, res1)
res2 <- res[2]; f2.result <- c(f2.result, res2)
}
stopCluster(cl)

how to use foreach calculate the each element in the upper triangular matrix?

I want to calculate each element in the upper triangular matrix using the foreach function
library(foreach)
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
tempdata <- matrix(0, nrow = 10, ncol = 10)
tempdata2 <- matrix(0, nrow = 10, ncol = 10)
foreach (i = 1:9, .combine='rbind') %do% {
for (j in (i+1):10) {
tempdata[i, j] <- i+j;
tempdata2[i, j] <- i*j
}
}
it works when I use %do%, but when I use %dopar% I get some nothing.
What am I doing wrong? thank you guys. Any suggestion will be appreciated.
You can't modify variables defined outside of the foreach loop and expect that data to be sent back to the master process. for loops allow that kind of side effect, but it doesn't work in parallel computing unless the workers are threads within the same process, and that isn't supported by any of the R parallel processing packages because R is single threaded.
Instead, you need to return a value from the body of the foreach loop and combine those values to get the desired result. In your case, you compute two values per iteration of the foreach loop, so you have to bundle them into a list, which means you need a more complicated combine function. Here's one way to do it:
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
comb <- function(...) {
mapply(rbind, ..., SIMPLIFY=FALSE)
}
r <- foreach(i=1:9, .combine='comb', .multicombine=TRUE) %dopar% {
tmp <- double(10)
tmp2 <- double(10)
for(j in (i+1):10) {
tmp[j] <- i+j
tmp2[j] <- i*j
}
list(tmp, tmp2)
}
tempdata <- r[[1]]
tempdata2 <- r[[2]]

R Foreach Iterator - Walkforward

How can I create a "walkforward" iterator using the iterators package? How can an iterator be created where each nextElem returns a fixed moving window?
For example, let's say we have a 10x10 matrix. Each iterator element should be a groups of rows. The first element is rows 1:5, second is 2:6, 3:7, 4:8....etc
How can I turn x into a walkforward iterator:
x <- matrix(1:100, 10)
EDIT: To be clear, I would like to use the resulting iterator in a parallel foreach loop.
foreach(i = iter(x), .combine=rbind) %dopar% myFun(i)
You could use an iterator that returns overlapping sub-matrices as you describe, but that would use much more memory than is required. It would be better to use an iterator that returns the indices of those sub-matrices. Here's one way to do that:
iwalk <- function(n, m) {
if (m > n)
stop('m > n')
it <- icount(n - m + 1)
nextEl <- function() {
i <- nextElem(it)
c(i, i + m - 1)
}
obj <- list(nextElem=nextEl)
class(obj) <- c('abstractiter', 'iter')
obj
}
This function uses the icount function from the iterators package so that I don't have to worry about details such as throwing the "StopIteration" exception, for example. That's a technique that I describe in the "Writing Custom Iterators" vignette.
If you were using the doMC parallel backend, you could use this iterator as follows:
library(doMC)
nworkers <- 3
registerDoMC(nworkers)
x <- matrix(1:100, 10)
m <- 5
r1 <- foreach(ix=iwalk(nrow(x), m)) %dopar% {
x[ix[1]:ix[2],, drop=FALSE]
}
This works nicely with doMC since each of the workers inherits the matrix x. However, if you're using doParallel with a cluster object or the doMPI backend, it would be nice to avoid exporting the entire matrix x to each of the workers. In that case, I would create an iterator function to send the overlapping sub-matrices of x to each of the workers, and then use iwalk to iterate over those sub-matrices:
ioverlap <- function(x, m, chunks) {
if (m > nrow(x))
stop('m > nrow(x)')
i <- 1
it <- idiv(nrow(x) - m + 1, chunks=chunks)
nextEl <- function() {
ntasks <- nextElem(it)
ifirst <- i
ilast <- i + ntasks + m - 2
i <<- i + ntasks
x[ifirst:ilast,, drop=FALSE]
}
obj <- list(nextElem=nextEl)
class(obj) <- c('abstractiter', 'iter')
obj
}
library(doParallel)
nworkers <- 3
cl <- makePSOCKcluster(nworkers)
registerDoParallel(cl)
x <- matrix(1:100, 10)
m <- 5
r2 <- foreach(y=ioverlap(x, m, nworkers), .combine='c',
.packages=c('foreach', 'iterators')) %dopar% {
foreach(iy=iwalk(nrow(y), m)) %do% {
y[iy[1]:iy[2],, drop=FALSE]
}
}
In this case I'm using iwalk on the workers, not the master, which is why the iterators package must be loaded by each of the workers.

R foreach %dopar% : export results to main R process

%dopar% forks the main R process into several independent sub-processes. Is there a way to make these sub-processes communicate with the main R process, so that data can be 'recovered' ?
require(foreach)
require(doMC)
registerDoMC()
options(cores = 2 )
a <- c(0,0)
foreach(i = 1:2 ) %do% {
a[i] <- i
}
print(a) # returns 1 2
a <- c(0,0)
foreach(i = 1:2 ) %dopar% {
a[i] <- i
}
print(a) # returns 0 0
Thanks!
You should read the foreach documentation:
The foreach and %do%/%dopar% operators provide a looping construct
that can be viewed as a hybrid of the standard for loop and lapply
function. It looks similar to the for loop, and it evaluates an
expression, rather than a function (as in lapply), but it's purpose is
to return a value (a list, by default), rather than to cause
side-effects.
Try this:
a <- foreach(i = 1:2 ) %dopar% {
i
}
print(unlist(a))
If you want your result to be a dataframe, you could do:
library(data.table)
result <- foreach(i = 1:2) %dopar% {
i
}
result.df <- rbindlist(Map(as.data.frame, result))
Thanks to Karl, I now understand the purpose of '.combine'
a <- foreach(i = 1:2 , .combine=c) %dopar% {
return(i)
}
print(a) # returns 1 2

Resources