R Foreach Iterator - Walkforward - r

How can I create a "walkforward" iterator using the iterators package? How can an iterator be created where each nextElem returns a fixed moving window?
For example, let's say we have a 10x10 matrix. Each iterator element should be a groups of rows. The first element is rows 1:5, second is 2:6, 3:7, 4:8....etc
How can I turn x into a walkforward iterator:
x <- matrix(1:100, 10)
EDIT: To be clear, I would like to use the resulting iterator in a parallel foreach loop.
foreach(i = iter(x), .combine=rbind) %dopar% myFun(i)

You could use an iterator that returns overlapping sub-matrices as you describe, but that would use much more memory than is required. It would be better to use an iterator that returns the indices of those sub-matrices. Here's one way to do that:
iwalk <- function(n, m) {
if (m > n)
stop('m > n')
it <- icount(n - m + 1)
nextEl <- function() {
i <- nextElem(it)
c(i, i + m - 1)
}
obj <- list(nextElem=nextEl)
class(obj) <- c('abstractiter', 'iter')
obj
}
This function uses the icount function from the iterators package so that I don't have to worry about details such as throwing the "StopIteration" exception, for example. That's a technique that I describe in the "Writing Custom Iterators" vignette.
If you were using the doMC parallel backend, you could use this iterator as follows:
library(doMC)
nworkers <- 3
registerDoMC(nworkers)
x <- matrix(1:100, 10)
m <- 5
r1 <- foreach(ix=iwalk(nrow(x), m)) %dopar% {
x[ix[1]:ix[2],, drop=FALSE]
}
This works nicely with doMC since each of the workers inherits the matrix x. However, if you're using doParallel with a cluster object or the doMPI backend, it would be nice to avoid exporting the entire matrix x to each of the workers. In that case, I would create an iterator function to send the overlapping sub-matrices of x to each of the workers, and then use iwalk to iterate over those sub-matrices:
ioverlap <- function(x, m, chunks) {
if (m > nrow(x))
stop('m > nrow(x)')
i <- 1
it <- idiv(nrow(x) - m + 1, chunks=chunks)
nextEl <- function() {
ntasks <- nextElem(it)
ifirst <- i
ilast <- i + ntasks + m - 2
i <<- i + ntasks
x[ifirst:ilast,, drop=FALSE]
}
obj <- list(nextElem=nextEl)
class(obj) <- c('abstractiter', 'iter')
obj
}
library(doParallel)
nworkers <- 3
cl <- makePSOCKcluster(nworkers)
registerDoParallel(cl)
x <- matrix(1:100, 10)
m <- 5
r2 <- foreach(y=ioverlap(x, m, nworkers), .combine='c',
.packages=c('foreach', 'iterators')) %dopar% {
foreach(iy=iwalk(nrow(y), m)) %do% {
y[iy[1]:iy[2],, drop=FALSE]
}
}
In this case I'm using iwalk on the workers, not the master, which is why the iterators package must be loaded by each of the workers.

Related

foreach (parallel) for matrix operation in R

I am trying to convert the following for loop to foreach to take the advantage of parallel.
dt = data.frame(t(data.frame(a=sample(1:10,10), b=sample(1:10,10), c=sample(1:10,10), d=sample(1:10,10))))
X = as.matrix(dt)
c = ncol(X)
itemnames=names(dt)
sm=matrix(0,c,c)
colnames(sm)=itemnames
row.names(sm)=itemnames
for (j in 1:c){
ind=setdiff(1:c,j)
print(ind)
print(j)
sm[j,ind]=sign(X[j]-X[ind])
print(sm[j,ind])
}
cvec = 1:c
r = foreach(d = cvec, .combine = rbind) %dopar% {
ind = setdiff(1:10,d)
sm[d,ind]=sign(X[d]-X[ind])
}
With for loop I am getting the 10*10 matrix where the above sign function repelaces the off diagonal elements and it would be 0 for diagonal elements.
But with foreach, I am getting 10*9 matrix, its missing the diagonal elements and everything else is same.
Please help me to get the same output as for loop. Thanks in advance.
I am not sure what you are trying to achieve here, since you are only using the first ten elements of you matrix. This can be done without any loops:
sign(outer(X[1:10], X[1:10], FUN = "-"))
In addition, I am not sure that parallel processing will be faster for this kind of problem, even assuming that the real case is much bigger. But if you want to use foreach, you should not assign to the global sm within the loop and instead return a suitable vector in the end:
foreach(d = cvec, .combine = rbind) %dopar% {
ind <- setdiff(cvec,d)
res <- rep(0, 10)
res[ind] <- sign(X[d]-X[ind])
res
}
If you want to assign to a matrix in parallel, you'll need a shared matrix:
# devtools::install_github("privefl/bigstatsr")
library(bigstatsr)
sm <- FBM(c, c)
library(foreach)
cl <- parallel::makeCluster(3)
doParallel::registerDoParallel(cl)
r = foreach(d = cvec, .combine = c) %dopar% {
ind = setdiff(1:10,d)
sm[d,ind]=sign(X[d]-X[ind])
NULL
}
parallel::stopCluster(cl)
sm[]

How to construct in R a parallel version of nested for loop to compute values for a square matrix where the function is dependent on i and j?

I have a function that takes i and j as parameters and returns a single value and I currently have a nested loop designed to compute a value for each entry in a square matrix. But in essence since each individual value can be computed in parallel. Is there a way I can apply lapply in this situation? The resulting matrix must be N X N and the function is dependant on i and j. Thanks
for ( i in 1:matrixRowLength ) {
for ( j in 1:matrixColLength ) {
result_matrix[i,j] <- function(i,j) } }
The foreach package has a nesting operator that can be useful when parallelizing nested for loops. Here's an example:
library(doSNOW)
cl <- makeSOCKcluster(3)
registerDoSNOW(cl)
matrixRowLength <- 5
matrixColLength <- 5
fun <- function(i, j) 10 * i + j
result_matrix.1 <-
foreach(j=1:matrixColLength, .combine='cbind') %:%
foreach(i=1:matrixRowLength, .combine='c') %dopar% {
fun(i, j)
}
Note that I reversed the order of the loops so that the matrix is computed column by column. This is generally preferable since matrices in R are stored in column-major order.
The nesting operator is useful if you have large tasks and at least one of the loops may have a small number of iterations. But in many cases, it's safer to only parallelize the outer loop:
result_matrix.2 <-
foreach(j=1:matrixColLength, .combine='cbind') %dopar% {
x <- double(matrixRowLength)
for (i in 1:matrixRowLength) {
x[i] <- fun(i, j)
}
x
}
Note that it can also be useful to use chunking in the outer loop to decrease the amount of post processing performed by the master process. Unfortunately, this technique is a bit more tricky:
library(itertools)
nw <- getDoParWorkers()
result_matrix.3 <-
foreach(jglobals=isplitIndices(matrixColLength, chunks=nw),
.combine='cbind') %dopar% {
localColLength <- length(jglobals)
m <- matrix(0, nrow=matrixRowLength, ncol=localColLength)
for (j in 1:localColLength) {
for (i in 1:matrixRowLength) {
m[i,j] <- fun(i, jglobals[j])
}
}
m
}
In my experience, this method often gives the best performance.
Thanks for an interesting question / use case. Here's a solution using the future package (I'm the author):
First, define (*):
future_array_call <- function(dim, FUN, ..., simplify = TRUE) {
args <- list(...)
idxs <- arrayInd(seq_len(prod(dim)), .dim = dim)
idxs <- apply(idxs, MARGIN = 1L, FUN = as.list)
y <- future::future_lapply(idxs, FUN = function(idx_list) {
do.call(FUN, args = c(idx_list, args))
})
if (simplify) y <- simplify2array(y)
dim(y) <- dim
y
}
This function does not make any assumptions on what data type your function returns, but with the default simplify = TRUE it will try to simplify the returned data type iff possible (similar to how sapply() works).
Then with your matrix dimensions (**):
matrixRowLength <- 5
matrixColLength <- 5
dim <- c(matrixRowLength, matrixColLength)
and function:
slow_fun <- function(i, j, ..., a = 1.0) {
Sys.sleep(0.1)
a * i + j
}
you can run calculate slow_fun(i, j, a = 10) for all elements as:
y <- future_array_call(dim, FUN = slow_fun, a = 10)
To do it in parallel on your local machine, use:
library("future")
plan(multiprocess)
y <- future_array_call(dim, FUN = slow_fun, a = 10)
On a cluster of machines (for which you have SSH access with SSH-key authentication), use:
library("future")
plan(cluster, workers = c("machine1", "machine2"))
y <- future_array_call(dim, FUN = slow_fun, a = 10)
Footnotes:
(*) If you wonder how it works, just replace the future::future_lapply() statement with a regular lapply().
(**) future_array_call(dim, FUN) should work for any length(dim), not just for two (= matrices).

how to use foreach calculate the each element in the upper triangular matrix?

I want to calculate each element in the upper triangular matrix using the foreach function
library(foreach)
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
tempdata <- matrix(0, nrow = 10, ncol = 10)
tempdata2 <- matrix(0, nrow = 10, ncol = 10)
foreach (i = 1:9, .combine='rbind') %do% {
for (j in (i+1):10) {
tempdata[i, j] <- i+j;
tempdata2[i, j] <- i*j
}
}
it works when I use %do%, but when I use %dopar% I get some nothing.
What am I doing wrong? thank you guys. Any suggestion will be appreciated.
You can't modify variables defined outside of the foreach loop and expect that data to be sent back to the master process. for loops allow that kind of side effect, but it doesn't work in parallel computing unless the workers are threads within the same process, and that isn't supported by any of the R parallel processing packages because R is single threaded.
Instead, you need to return a value from the body of the foreach loop and combine those values to get the desired result. In your case, you compute two values per iteration of the foreach loop, so you have to bundle them into a list, which means you need a more complicated combine function. Here's one way to do it:
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
comb <- function(...) {
mapply(rbind, ..., SIMPLIFY=FALSE)
}
r <- foreach(i=1:9, .combine='comb', .multicombine=TRUE) %dopar% {
tmp <- double(10)
tmp2 <- double(10)
for(j in (i+1):10) {
tmp[j] <- i+j
tmp2[j] <- i*j
}
list(tmp, tmp2)
}
tempdata <- r[[1]]
tempdata2 <- r[[2]]

Updated: Parallel computing using R result in "attempt to replicate an object of type 'closure'"

I have set up a Metropolis-Hastings algorithm, and now I am trying to run the algorithm using parallel computing. I have set up a single-chain function
library(parallel)
library(foreach)
library(mvtnorm)
library(doParallel)
n<-100
mX <- 1:n
vY <- rnorm(n)
chains <- 4
iter <- n
p <- 2
#Loglikelihood
post <- function(y, theta) dmvnorm(t(y), rep(0,length(y)), theta[1]*exp(- abs(matrix(rep(mX,n),n) - matrix(rep(mX,each=n),n))/theta[2]),log=TRUE)
geninits <- function() list(theta = runif(p, 0, 1))
dist <- 0.01
jump <- function(x, dist) exp(log(x) + rmvnorm(1,rep(0,p),diag(rep(dist,p))))
MCsingle <- function(){ # This is part of a larger function, so no input are needed
inits <- geninits()
theta.post <- matrix(NA,nrow=p,ncol=iter)
for (i in 1:p) theta.post[i,1] <- inits$theta[i]
for (t in 2:iter){
theta_star <- c(jump(theta.post[, t-1],dist))
pstar <- post(vY, theta = theta_star) # post is the loglikelihood using dmvnorm.
pprev <- post(vY, theta = theta.post[,t-1])
r <- min(exp(pstar - pprev) , 1)
accept <- rbinom(1, 1, prob = r)
if (accept == 1){
theta.post[, t] <- theta_star
} else {
theta.post[, t] <- theta.post[, t-1]
}
}
return(theta.post)
}
, which returns an p x iter matrix, with p parameters and iter iterations.
cl<-makeCluster(4)
registerDoParallel(cl)
posterior <- foreach(c = 1:chains) %dopar% {
MCsingle() }
UPDATE: When I tried to simplify the problem the code suddenly seemed to work. Even though I purposely tried to make errors, the code ran perfectly and the results were as wanted. So for others with similar problems unfortunately I cannot give an answer.
A follow-up question:
My initial purpose was to built up an entire function, such that
MCmulti <- function(mX,vY,iter,chains){
posterior <- foreach(c = 1:chains) %dopar% {
MCsingle() }
return(posterior)
}
but the foreach-loop does not seem to read all the required functions like:
Error in FUN() : task 1 failed - "could not find function "geninits""
Can anybody answer how to implement custom functions inside a foreach loop? Am I to input it as MCmulti <- function(FUN,...) FUN() and call MCmulti(MCsingle,...) ?

Parallelize an R Script

The problem with my R script is that it takes too much time and the main solution that I consider is to parallelize it. I don't know where to start.
My code look like this:
n<- nrow (aa)
output <- matrix (0, n, n)
akl<- function (dii){
ddi<- as.matrix (dii)
m<- rowMeans(ddi)
M<- mean(ddi)
r<- sweep (ddi, 1, m)
b<- sweep (r, 2, m)
return (b + M)
}
for (i in 1:n)
{
A<- akl(dist(aa[i,]))
dVarX <- sqrt(mean (A * A))
for (j in i:n)
{
B<- akl(dist(aa[j,]))
V <- sqrt (dVarX * (sqrt(mean(B * B))))
output[i,j] <- (sqrt(mean(A * B))) / V
}
}
I would like to parallelize on different cpus. How can I do that?
I saw the SNOW package, is it suitable for my purpose?
Thank you for suggestions,
Gab
There are two ways in which your code could be made to run faster that I could think of:
First: As #Dwin was saying (with a small twist), you could precompute akl (yes, not necesarily dist, but the whole of akl).
# a random square matrix
aa <- matrix(runif(100), ncol=10)
n <- nrow(aa)
output <- matrix (0, n, n)
akl <- function(dii) {
ddi <- as.matrix(dii)
m <- rowMeans(ddi)
M <- mean(m) # mean(ddi) == mean(m)
r <- sweep(ddi, 1, m)
b <- sweep(r, 2, m)
return(b + M)
}
# precompute akl here
require(plyr)
akl.list <- llply(1:nrow(aa), function(i) {
akl(dist(aa[i, ]))
})
# Now, apply your function, but index the list instead of computing everytime
for (i in 1:n) {
A <- akl.list[[i]]
dVarX <- sqrt(mean(A * A))
for (j in i:n) {
B <- akl.list[[j]]
V <- sqrt (dVarX * (sqrt(mean(B * B))))
output[i,j] <- (sqrt(mean(A * B))) / V
}
}
This should already get your code to run faster than before (as you compute akl everytime in the inner loop) on larger matrices.
Second: In addition to that, you can get it faster by parallelising as follows:
# now, the parallelisation you require can be achieved as follows
# with the help of `plyr` and `doMC`.
# First step of parallelisation is to compute akl in parallel
require(plyr)
require(doMC)
registerDoMC(10) # 10 Cores/CPUs
akl.list <- llply(1:nrow(aa), function(i) {
akl(dist(aa[i, ]))
}, .parallel = TRUE)
# then, you could write your for-loop using plyr again as follows
output <- laply(1:n, function(i) {
A <- akl.list[[i]]
dVarX <- sqrt(mean(A * A))
t <- laply(i:n, function(j) {
B <- akl.list[[j]]
V <- sqrt(dVarX * (sqrt(mean(B*B))))
sqrt(mean(A * B))/V
})
c(rep(0, n-length(t)), t)
}, .parallel = TRUE)
Note that I have added .parallel = TRUE only on the outer loop. This is because, you assign 10 processors to the outer loop. Now, if you add it to both outer and inner loops, then the total number of processers will be 10 * 10 = 100. Please take care of this.

Resources