How can I define something similar to for(i in nums) in case of foreach? It seems that foreach allows i=1:nums, but in my case numbers in nums are not sequential.
nums <- c(1,2,5,8)
prob <- foreach(i in nums, .combine = rbind, .packages = "randomForest") %dopar% {
#...
}
You don't use in with foreach(). You just use named parameters. Try
nums <- c(1,2,5,8)
prob <- foreach(i =nums, .combine = rbind, .packages = "randomForest") %dopar% {#...}
The parameters will accept a vector without problem. The 1:n syntax is just an easy way to create a vector of elements from 1 to n. But you can pass in your own vector directly.
Related
I am trying to convert the following for loop to foreach to take the advantage of parallel.
dt = data.frame(t(data.frame(a=sample(1:10,10), b=sample(1:10,10), c=sample(1:10,10), d=sample(1:10,10))))
X = as.matrix(dt)
c = ncol(X)
itemnames=names(dt)
sm=matrix(0,c,c)
colnames(sm)=itemnames
row.names(sm)=itemnames
for (j in 1:c){
ind=setdiff(1:c,j)
print(ind)
print(j)
sm[j,ind]=sign(X[j]-X[ind])
print(sm[j,ind])
}
cvec = 1:c
r = foreach(d = cvec, .combine = rbind) %dopar% {
ind = setdiff(1:10,d)
sm[d,ind]=sign(X[d]-X[ind])
}
With for loop I am getting the 10*10 matrix where the above sign function repelaces the off diagonal elements and it would be 0 for diagonal elements.
But with foreach, I am getting 10*9 matrix, its missing the diagonal elements and everything else is same.
Please help me to get the same output as for loop. Thanks in advance.
I am not sure what you are trying to achieve here, since you are only using the first ten elements of you matrix. This can be done without any loops:
sign(outer(X[1:10], X[1:10], FUN = "-"))
In addition, I am not sure that parallel processing will be faster for this kind of problem, even assuming that the real case is much bigger. But if you want to use foreach, you should not assign to the global sm within the loop and instead return a suitable vector in the end:
foreach(d = cvec, .combine = rbind) %dopar% {
ind <- setdiff(cvec,d)
res <- rep(0, 10)
res[ind] <- sign(X[d]-X[ind])
res
}
If you want to assign to a matrix in parallel, you'll need a shared matrix:
# devtools::install_github("privefl/bigstatsr")
library(bigstatsr)
sm <- FBM(c, c)
library(foreach)
cl <- parallel::makeCluster(3)
doParallel::registerDoParallel(cl)
r = foreach(d = cvec, .combine = c) %dopar% {
ind = setdiff(1:10,d)
sm[d,ind]=sign(X[d]-X[ind])
NULL
}
parallel::stopCluster(cl)
sm[]
I have a function that takes i and j as parameters and returns a single value and I currently have a nested loop designed to compute a value for each entry in a square matrix. But in essence since each individual value can be computed in parallel. Is there a way I can apply lapply in this situation? The resulting matrix must be N X N and the function is dependant on i and j. Thanks
for ( i in 1:matrixRowLength ) {
for ( j in 1:matrixColLength ) {
result_matrix[i,j] <- function(i,j) } }
The foreach package has a nesting operator that can be useful when parallelizing nested for loops. Here's an example:
library(doSNOW)
cl <- makeSOCKcluster(3)
registerDoSNOW(cl)
matrixRowLength <- 5
matrixColLength <- 5
fun <- function(i, j) 10 * i + j
result_matrix.1 <-
foreach(j=1:matrixColLength, .combine='cbind') %:%
foreach(i=1:matrixRowLength, .combine='c') %dopar% {
fun(i, j)
}
Note that I reversed the order of the loops so that the matrix is computed column by column. This is generally preferable since matrices in R are stored in column-major order.
The nesting operator is useful if you have large tasks and at least one of the loops may have a small number of iterations. But in many cases, it's safer to only parallelize the outer loop:
result_matrix.2 <-
foreach(j=1:matrixColLength, .combine='cbind') %dopar% {
x <- double(matrixRowLength)
for (i in 1:matrixRowLength) {
x[i] <- fun(i, j)
}
x
}
Note that it can also be useful to use chunking in the outer loop to decrease the amount of post processing performed by the master process. Unfortunately, this technique is a bit more tricky:
library(itertools)
nw <- getDoParWorkers()
result_matrix.3 <-
foreach(jglobals=isplitIndices(matrixColLength, chunks=nw),
.combine='cbind') %dopar% {
localColLength <- length(jglobals)
m <- matrix(0, nrow=matrixRowLength, ncol=localColLength)
for (j in 1:localColLength) {
for (i in 1:matrixRowLength) {
m[i,j] <- fun(i, jglobals[j])
}
}
m
}
In my experience, this method often gives the best performance.
Thanks for an interesting question / use case. Here's a solution using the future package (I'm the author):
First, define (*):
future_array_call <- function(dim, FUN, ..., simplify = TRUE) {
args <- list(...)
idxs <- arrayInd(seq_len(prod(dim)), .dim = dim)
idxs <- apply(idxs, MARGIN = 1L, FUN = as.list)
y <- future::future_lapply(idxs, FUN = function(idx_list) {
do.call(FUN, args = c(idx_list, args))
})
if (simplify) y <- simplify2array(y)
dim(y) <- dim
y
}
This function does not make any assumptions on what data type your function returns, but with the default simplify = TRUE it will try to simplify the returned data type iff possible (similar to how sapply() works).
Then with your matrix dimensions (**):
matrixRowLength <- 5
matrixColLength <- 5
dim <- c(matrixRowLength, matrixColLength)
and function:
slow_fun <- function(i, j, ..., a = 1.0) {
Sys.sleep(0.1)
a * i + j
}
you can run calculate slow_fun(i, j, a = 10) for all elements as:
y <- future_array_call(dim, FUN = slow_fun, a = 10)
To do it in parallel on your local machine, use:
library("future")
plan(multiprocess)
y <- future_array_call(dim, FUN = slow_fun, a = 10)
On a cluster of machines (for which you have SSH access with SSH-key authentication), use:
library("future")
plan(cluster, workers = c("machine1", "machine2"))
y <- future_array_call(dim, FUN = slow_fun, a = 10)
Footnotes:
(*) If you wonder how it works, just replace the future::future_lapply() statement with a regular lapply().
(**) future_array_call(dim, FUN) should work for any length(dim), not just for two (= matrices).
I'm hoping to convert the second lapply function (# Make the new list) into a foreach loop, using the foreach package.
## Example data
lst <- lapply(1:30, function(x) lapply(1:5, function(y) rnorm(10)))
## Make the new list
res <- lapply(1:5, function(x) lapply(1:10, function(y) sapply(lst, function(z) z[[x]][[y]])))
I'm not sure if this is possible. I'm not concerned about lapply performing better than the foreach loops. For context, I'm re-organizing a list of lists of vectors in such a way:
new_thing[[5]][[10]][30] <- daily_by_security[[30]][[5]][10]
Thanks!
To figure you how to solve your problem, I looked at the foreach examples and the second one does exactly what you are looking for:
library("foreach")
example(foreach)
# equivalent to lapply(1:3, sqrt)
foreach(i=1:3) %do% sqrt(i)
I then adapted this to your problem:
lst <- lapply(1:30, function(x) lapply(1:5, function(y) rnorm(10)))
resFE <- foreach(i = 1:5) %do%
lapply(1:10, function(y) sapply(lst, function(z) z[[i]][[y]]))
Edit: The OP was able to figure out a solution based upon my work. Here is the solution:
resFE <- foreach(i = 1:5, .packages = "foreach") %dopar%
{ foreach(m = 1:10) %dopar%
{ foreach(t = lst, .combine = c) %do%
{ t[[i]][[m]] } } }
%dopar% forks the main R process into several independent sub-processes. Is there a way to make these sub-processes communicate with the main R process, so that data can be 'recovered' ?
require(foreach)
require(doMC)
registerDoMC()
options(cores = 2 )
a <- c(0,0)
foreach(i = 1:2 ) %do% {
a[i] <- i
}
print(a) # returns 1 2
a <- c(0,0)
foreach(i = 1:2 ) %dopar% {
a[i] <- i
}
print(a) # returns 0 0
Thanks!
You should read the foreach documentation:
The foreach and %do%/%dopar% operators provide a looping construct
that can be viewed as a hybrid of the standard for loop and lapply
function. It looks similar to the for loop, and it evaluates an
expression, rather than a function (as in lapply), but it's purpose is
to return a value (a list, by default), rather than to cause
side-effects.
Try this:
a <- foreach(i = 1:2 ) %dopar% {
i
}
print(unlist(a))
If you want your result to be a dataframe, you could do:
library(data.table)
result <- foreach(i = 1:2) %dopar% {
i
}
result.df <- rbindlist(Map(as.data.frame, result))
Thanks to Karl, I now understand the purpose of '.combine'
a <- foreach(i = 1:2 , .combine=c) %dopar% {
return(i)
}
print(a) # returns 1 2
I would like to transform the following nested for loop
first <- c(1, 2, 3)
second <- c(1, 2, 3)
dummy = matrix(double(), len(first), len(second))
c <- list()
c$sum <- dummy
c$times <- dummy
for (i in 1:len(first)) {
for (j in 1:len(second)) {
c$sum[i, j] <- first[i] + second[j]
c$times[i, j] <- first[i] * second[j]
}
}
c
into code using foreach and get the same list of matrices as a result. I tried many different things but the closest "result" is this:
x <- foreach(b = second, .combine = "cbind") %:% foreach(a = first, .combine = "c") %do% {
c <- list()
c$sum <- a+b
c$times <- a*b
out <- c
}
x
How to get this list of matrices right using foreach?
EDIT: One possibility is using a result and transform it after calling foreach:
res <- list()
res$sum <- x[rownames(x)=="sum", ]
rownames(res$sum) <- NULL
colnames(res$sum) <- NULL
res$times <- x[rownames(x)=="times", ]
rownames(res$times) <- NULL
colnames(res$times) <- NULL
res
How to "parametrize" foreach so there is no need to transform results?
You "just" have to provide the correct .combine function.
If you only have numbers, you can return an array rather than a list.
library(foreach)
library(abind)
first <- 1:3
second <- 4:5
x <-
foreach(b = second, .combine = function(...) abind(..., along=3)) %:%
foreach(a = first, .combine = rbind) %do% {
c( sum=a+b, times=a*b )
}
If you really need lists, writing the combining functions is much harder.
Instead, you can build a data.frame, and reshape it afterwards, if needed.
x <-
foreach(b = second, .combine = rbind) %:%
foreach(a = first, .combine = rbind) %do% {
data.frame(a=a, b=b, sum=a+b, times=a*b )
}
library(reshape2)
list(
sum = dcast(x, a ~ b, value.var="sum" )[,-1],
times = dcast(x, a ~ b, value.var="times")[,-1]
)