I need to stop the parallel loop with a condition, e.g., when res < 1. A MWE is given by code below
library(foreach)
library(doParallel)
I <- 1000
L <- 1000
res <- Inf
cores <- detectCores()
cluster <- makeCluster(cores)
registerDoParallel(cluster)
out <- foreach(l = 1:I,.packages = "cec2013") %dopar% {
for(i in 1:I){
res <- 100/i
}
out <- res
out
}
out
My tentative to solve is given by
out <- foreach(l = 1:I,.packages = "cec2013")%:%when(res < 1) %dopar% {
for(i in 1:I){
res <- 100/i
}
out <- res
out
}
out
But out return a void list.
The reason this happens is because when(res < 1) looks in the calling environment, not the environment inside the loop. So you defined res <- Inf which is never less than 1.
Also, your foreach loop always returns 0.1, since the inner loop always stops at I=1000, so res will be 0.1. It's unclear what you're trying to do or when do you expect the loop to stop?
Related
I am trying to convert the following for loop to foreach to take the advantage of parallel.
dt = data.frame(t(data.frame(a=sample(1:10,10), b=sample(1:10,10), c=sample(1:10,10), d=sample(1:10,10))))
X = as.matrix(dt)
c = ncol(X)
itemnames=names(dt)
sm=matrix(0,c,c)
colnames(sm)=itemnames
row.names(sm)=itemnames
for (j in 1:c){
ind=setdiff(1:c,j)
print(ind)
print(j)
sm[j,ind]=sign(X[j]-X[ind])
print(sm[j,ind])
}
cvec = 1:c
r = foreach(d = cvec, .combine = rbind) %dopar% {
ind = setdiff(1:10,d)
sm[d,ind]=sign(X[d]-X[ind])
}
With for loop I am getting the 10*10 matrix where the above sign function repelaces the off diagonal elements and it would be 0 for diagonal elements.
But with foreach, I am getting 10*9 matrix, its missing the diagonal elements and everything else is same.
Please help me to get the same output as for loop. Thanks in advance.
I am not sure what you are trying to achieve here, since you are only using the first ten elements of you matrix. This can be done without any loops:
sign(outer(X[1:10], X[1:10], FUN = "-"))
In addition, I am not sure that parallel processing will be faster for this kind of problem, even assuming that the real case is much bigger. But if you want to use foreach, you should not assign to the global sm within the loop and instead return a suitable vector in the end:
foreach(d = cvec, .combine = rbind) %dopar% {
ind <- setdiff(cvec,d)
res <- rep(0, 10)
res[ind] <- sign(X[d]-X[ind])
res
}
If you want to assign to a matrix in parallel, you'll need a shared matrix:
# devtools::install_github("privefl/bigstatsr")
library(bigstatsr)
sm <- FBM(c, c)
library(foreach)
cl <- parallel::makeCluster(3)
doParallel::registerDoParallel(cl)
r = foreach(d = cvec, .combine = c) %dopar% {
ind = setdiff(1:10,d)
sm[d,ind]=sign(X[d]-X[ind])
NULL
}
parallel::stopCluster(cl)
sm[]
I want to calculate each element in the upper triangular matrix using the foreach function
library(foreach)
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
tempdata <- matrix(0, nrow = 10, ncol = 10)
tempdata2 <- matrix(0, nrow = 10, ncol = 10)
foreach (i = 1:9, .combine='rbind') %do% {
for (j in (i+1):10) {
tempdata[i, j] <- i+j;
tempdata2[i, j] <- i*j
}
}
it works when I use %do%, but when I use %dopar% I get some nothing.
What am I doing wrong? thank you guys. Any suggestion will be appreciated.
You can't modify variables defined outside of the foreach loop and expect that data to be sent back to the master process. for loops allow that kind of side effect, but it doesn't work in parallel computing unless the workers are threads within the same process, and that isn't supported by any of the R parallel processing packages because R is single threaded.
Instead, you need to return a value from the body of the foreach loop and combine those values to get the desired result. In your case, you compute two values per iteration of the foreach loop, so you have to bundle them into a list, which means you need a more complicated combine function. Here's one way to do it:
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
comb <- function(...) {
mapply(rbind, ..., SIMPLIFY=FALSE)
}
r <- foreach(i=1:9, .combine='comb', .multicombine=TRUE) %dopar% {
tmp <- double(10)
tmp2 <- double(10)
for(j in (i+1):10) {
tmp[j] <- i+j
tmp2[j] <- i*j
}
list(tmp, tmp2)
}
tempdata <- r[[1]]
tempdata2 <- r[[2]]
I have set up a Metropolis-Hastings algorithm, and now I am trying to run the algorithm using parallel computing. I have set up a single-chain function
library(parallel)
library(foreach)
library(mvtnorm)
library(doParallel)
n<-100
mX <- 1:n
vY <- rnorm(n)
chains <- 4
iter <- n
p <- 2
#Loglikelihood
post <- function(y, theta) dmvnorm(t(y), rep(0,length(y)), theta[1]*exp(- abs(matrix(rep(mX,n),n) - matrix(rep(mX,each=n),n))/theta[2]),log=TRUE)
geninits <- function() list(theta = runif(p, 0, 1))
dist <- 0.01
jump <- function(x, dist) exp(log(x) + rmvnorm(1,rep(0,p),diag(rep(dist,p))))
MCsingle <- function(){ # This is part of a larger function, so no input are needed
inits <- geninits()
theta.post <- matrix(NA,nrow=p,ncol=iter)
for (i in 1:p) theta.post[i,1] <- inits$theta[i]
for (t in 2:iter){
theta_star <- c(jump(theta.post[, t-1],dist))
pstar <- post(vY, theta = theta_star) # post is the loglikelihood using dmvnorm.
pprev <- post(vY, theta = theta.post[,t-1])
r <- min(exp(pstar - pprev) , 1)
accept <- rbinom(1, 1, prob = r)
if (accept == 1){
theta.post[, t] <- theta_star
} else {
theta.post[, t] <- theta.post[, t-1]
}
}
return(theta.post)
}
, which returns an p x iter matrix, with p parameters and iter iterations.
cl<-makeCluster(4)
registerDoParallel(cl)
posterior <- foreach(c = 1:chains) %dopar% {
MCsingle() }
UPDATE: When I tried to simplify the problem the code suddenly seemed to work. Even though I purposely tried to make errors, the code ran perfectly and the results were as wanted. So for others with similar problems unfortunately I cannot give an answer.
A follow-up question:
My initial purpose was to built up an entire function, such that
MCmulti <- function(mX,vY,iter,chains){
posterior <- foreach(c = 1:chains) %dopar% {
MCsingle() }
return(posterior)
}
but the foreach-loop does not seem to read all the required functions like:
Error in FUN() : task 1 failed - "could not find function "geninits""
Can anybody answer how to implement custom functions inside a foreach loop? Am I to input it as MCmulti <- function(FUN,...) FUN() and call MCmulti(MCsingle,...) ?
I'm trying to use doMC with foreach and %dopar%. Here is the function:
doTheMath_MC <- function(st, end, nd) {
print(getDoParWorkers())
if (st > end) stop("end must be larger than st")
# Helper function from stackoverflow.com/a/23158178/633251
tr <- function(x, prec = 0) trunc(x * 10^prec) / 10^prec
# Function to use with foreach
fef <- function(i, j, num, trpi) {
if (num[j] >= num[i]) return(NULL)
val <- num[i]/num[j]
if (!tr(val, nd) == trpi) return(NULL)
return(c(i, j, tr(val, nd)))
}
# Here we go...
nd <- nd - 1
trpi <- tr(pi, nd)
num <- st:end
ni <- length(num)
ans <- foreach(i = 1:ni, .combine = rbind) %:%
foreach(j = 1:ni, .combine = rbind) %dopar% {
fef(i, j, num, trpi)
}
cat("Done computing", paste("EST", st, end, nd+1, sep = "_"), "\n")
if (is.null(ans)) return(NULL)
ans <- as.matrix(na.omit(ans)) # probably not needed in MC version
return(ans) # c("num", "den", "est", "eff")
}
I've previously set up the cores and another function calls the function above (this info posted below, I don't think it is the problem). getDoParWorkers() reports that 7 cores have been assigned as intended. The cat statement verifies that the 2 'loops' are working correctly as far as output goes. However, only 1 core is used. Anyone see why? Mac OSX 10.10.2 and R 3.2 (2015-03-15 r67992). Finally, using doParallel to control everything gives the same result.
The steps which set up everything:
mn <- 1
mx <- 10000
jmp <- 1000
mc <- TRUE
if (mc) {
require("doMC")
registerDoMC(7)
}
st <- seq(mn -1, mx - jmp, jmp) + 1
end <- seq(mn - 1 + jmp, mx, jmp)
nd <- rep(1:15, each = mx/jmp) # watch the recycling
df <- data.frame(st = st, end = end, nd = nd)
for (i in 1:nrow(df)) {
findEsts(df$st[i], df$end[i], df$nd[i], MC = mc)
}
Sorry to answer my own question! I changed the dopar handling so that only the outer loop is made parallel:
ans <- foreach(i = 1:ni, .combine = rbind) %dopar%
for (j in 1:ni) {
fef(i, j, num, trpi)
}
And, I was simply not asking for enough iterations. For testing, I had been using mx = 10000 and jmp = 1000 (see original question). These were not large enough to trigger parallel processing apparently. Increasing each 10x was necessary to get parallel processing going. Thanks to the commenters!
NOTE: While the code above activates the parallel processing, it does not return the answer correctly. That will be the subject of another question.
How can I create a "walkforward" iterator using the iterators package? How can an iterator be created where each nextElem returns a fixed moving window?
For example, let's say we have a 10x10 matrix. Each iterator element should be a groups of rows. The first element is rows 1:5, second is 2:6, 3:7, 4:8....etc
How can I turn x into a walkforward iterator:
x <- matrix(1:100, 10)
EDIT: To be clear, I would like to use the resulting iterator in a parallel foreach loop.
foreach(i = iter(x), .combine=rbind) %dopar% myFun(i)
You could use an iterator that returns overlapping sub-matrices as you describe, but that would use much more memory than is required. It would be better to use an iterator that returns the indices of those sub-matrices. Here's one way to do that:
iwalk <- function(n, m) {
if (m > n)
stop('m > n')
it <- icount(n - m + 1)
nextEl <- function() {
i <- nextElem(it)
c(i, i + m - 1)
}
obj <- list(nextElem=nextEl)
class(obj) <- c('abstractiter', 'iter')
obj
}
This function uses the icount function from the iterators package so that I don't have to worry about details such as throwing the "StopIteration" exception, for example. That's a technique that I describe in the "Writing Custom Iterators" vignette.
If you were using the doMC parallel backend, you could use this iterator as follows:
library(doMC)
nworkers <- 3
registerDoMC(nworkers)
x <- matrix(1:100, 10)
m <- 5
r1 <- foreach(ix=iwalk(nrow(x), m)) %dopar% {
x[ix[1]:ix[2],, drop=FALSE]
}
This works nicely with doMC since each of the workers inherits the matrix x. However, if you're using doParallel with a cluster object or the doMPI backend, it would be nice to avoid exporting the entire matrix x to each of the workers. In that case, I would create an iterator function to send the overlapping sub-matrices of x to each of the workers, and then use iwalk to iterate over those sub-matrices:
ioverlap <- function(x, m, chunks) {
if (m > nrow(x))
stop('m > nrow(x)')
i <- 1
it <- idiv(nrow(x) - m + 1, chunks=chunks)
nextEl <- function() {
ntasks <- nextElem(it)
ifirst <- i
ilast <- i + ntasks + m - 2
i <<- i + ntasks
x[ifirst:ilast,, drop=FALSE]
}
obj <- list(nextElem=nextEl)
class(obj) <- c('abstractiter', 'iter')
obj
}
library(doParallel)
nworkers <- 3
cl <- makePSOCKcluster(nworkers)
registerDoParallel(cl)
x <- matrix(1:100, 10)
m <- 5
r2 <- foreach(y=ioverlap(x, m, nworkers), .combine='c',
.packages=c('foreach', 'iterators')) %dopar% {
foreach(iy=iwalk(nrow(y), m)) %do% {
y[iy[1]:iy[2],, drop=FALSE]
}
}
In this case I'm using iwalk on the workers, not the master, which is why the iterators package must be loaded by each of the workers.