I am trying to parallelize some function on the 4 cores of my machine using parLapply.
My function defines two embedded loops which are meant to fill out some empty columns of a predefined matrix M.
However, when I run the code below I obtain the following error
2 nodes produced errors; first error: incorrect number of dimensions
Code:
require("parallel")
TheData<-list(E,T) # list of 2 matrices of different dimensions, T is longer and wider than E
myfunction <- function(TheData) {
for (k in 1:length(TheData[[1]][,1])) {
distance<-matrix(,nrow=length(TheData[[1]][,1]),ncol=1)
for (j in 1:length(TheData[[2]][,1])) {
distance[j]<-sqrt((as.numeric(TheData[[2]][j,1])-as.numeric(TheData[[1]][k,2]))^2+(as.numeric(TheData[[2]][j,2])-as.numeric(TheData[[1]][k,1]))^2)
}
index<-which(distance == min(distance))
M[k,4:9]<-c(as.numeric(TheData[[2]][index,1]),as.numeric(TheData[[2]][index,2]),as.numeric(TheData[[2]][index,3]),as.numeric(TheData[[2]][index,4]),as.numeric(TheData[[2]][index,5]),as.numeric(TheData[[2]][index,6]))
rm(distance)
gc()
}
}
n_cores <- 4
Cl = makeCluster(n_cores)
Results <- parLapplyLB(Cl, TheData, myfunction)
# I also tried: Results <- parLapply(Cl, TheData, myfunction)
In your example, parLapply is iterating over a list of matrices, and passing those matrices as the argument to "myfunction". However, "myfunction" seems to expect its argument to be a list of two matrices, and so an error occurs. I can reproduce that error with:
> E <- matrix(0, 4, 4)
> E[[1]][,1]
Error in E[[1]][, 1] : incorrect number of dimensions
I'm not sure what you're really trying to do, but with the current implementation of "myfunction", I would expect you to call parLapply with a list of lists containing two matrices, such as:
TheDataList <- list(list(A,B), list(C,D), list(E,F), list(G,H))
Passing this as the second argument to parLapply would result in "myfunction" being called four times, each time with a list containing two matrices.
But your example has another problem. It looks like you expect parLapply to modify the matrix "M" as a side-effect, but it can't. I think you should change "myfunction" to return a matrix. parLapply will return the matrices in a list which you can then bind together into the desired result.
Update
From your comment, I now believe that you essentially want to parallelize "myfunction". Here's my attempt to do that:
library(parallel)
cl <- makeCluster(4)
myfunction <- function(Exy) {
iM <- integer(nrow(Exy))
for (k in 1:nrow(Exy)) {
distance <- sqrt((Txy[,1] - Exy[k,2])^2 + (Txy[,2] - Exy[k,1])^2)
iM[k] <- which.min(distance)
}
iM
}
# Random example data for testing
T <- matrix(rnorm(150), 10)
E <- matrix(rnorm(120), 10)
# Only export the first two columns to T to the workers
Txy <- T[,1:2]
clusterExport(cl, c('Txy'))
# Parallelize "myfunction" by calling it in parallel on block rows of "E".
ExyList <- parallel:::splitRows(E[,1:2], length(cl))
iM <- do.call('c', clusterApply(cl, ExyList, myfunction))
# Update "M" using data from "T" indexed by "iM"
M <- matrix(0, nrow(T), 9) # more fake data
for (k in iM) {
M[k,4:9] <- T[k, 1:6]
}
print(M)
stopCluster(cl)
Notes:
I vectorized myfunction which should make it more efficient. Hopefully it's nearly correct.
I also modified myfunction to return a vector of indices into "T" to reduce the amount of data sent back to the master.
The splitRows function from the parallel package is used to split the first two columns of "E" into a list of submatrices.
splitRows isn't exported by parallel, so I used ':::'. If this offends you, then use the splitRows function from snow which is exported.
The first two columns of "T" are exported to each of the workers since each task requires the entire first two columns.
clusterApply is used rather than parLapply since we need to iterate over submatrices of E.
Related
I tried to look for a duplicate question and I know many people have asked about parLapply in R so I apologize if I missed one that is applicable to my situation.
Problem: I have the following function that runs correctly in R but when I try to run it in parallel using parLapply (I'm on a windows machine) I get the error that $ operator is invalid for atomic vectors. The error mentions that 3 nodes produced the errors no matter how many nodes I set my cluster at, for example I have 8 cores on my desktop so I set the cluster to 7 nodes.
Here is example code showing where the problem is:
library(parallel)
library(doParallel)
library(arrangements)
#Function
perms <- function(inputs)
{
x <- 0
L <- 2^length(inputs$w)
ip <- inputs$ip
for( i in 1:L)
{
y <- ip$getnext()%*%inputs$w
if (inputs$t >= y)
{
x <- x + 1
}
}
return(x)
}
#Inputs is a list of several other variables that are created before this
#function runs (w, t_obs and iperm), here is a reproducible example of them
#W is derived from my data, this is just an easy way to make a reproducible example
set.seed(1)
m <- 15
W <- matrix(runif(15,0,1))
iperm <- arrangements::ipermutations(0:1, m, replace = T)
t_obs <- 5
inputs <- list(W,t_obs, iperm)
names(inputs) <- c("w", "t", "ip")
#If I run the function not in parallel
perms(inputs)
#It gives a value of 27322 for this example data
This runs exactly as it should, however when I try the following to run in parallel I get an error
#make the cluster
cor <- detectCores()
cl<-makeCluster(cor-1,type="SOCK")
#passing library and arguments
clusterExport(cl, c("inputs"))
clusterEvalQ(cl, {
library(arrangements)
})
results <- parLapply(cl, inputs, perms)
I get the error:
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: $ operator is invalid for atomic vectors
However I've checked to see if anything is an atomic vector using is.atomic(), and using is.recursive(inputs) it says this is TRUE.
My question is why am I getting this error when I try to run this using parLapply when the function otherwise runs correctly and is there a reason is says "3 nodes produced errors" even when I have 7 nodes?
It says "3 nodes" because, as you're passing it to parLapply, you are only activating three nodes. The first argument to parLapply should be a list of things, each element to pass to each node. In your case, your inputs is a list, correct, but it is being broken down, such that your three nodes are effectively seeing:
# node 1
perms(inputs[[1]]) # effectively inputs$w
# node 2
perms(inputs[[2]]) # effectively inputs$t
# node 3
perms(inputs[[3]]) # effectively inputs$ip
# nodes 4-7 idle
You could replicate this on the local host (not parallel) with:
lapply(inputs, perms)
and when you see it like that, perhaps it becomes a little more obvious what is being passed to your nodes. (If you want to see if further, do debug(perms) then run the lapply above, and see what the inputs inside that function call looks like.)
To get this to work once on one node (I think not what you're trying to do), you could do
parLapply(cl, list(inputs), perms)
But that's only going to run one instance on one node. Perhaps you would prefer to do something like:
parLapply(cl, replicate(7, inputs, simplify=FALSE), perms)
I'm adding an answer in case anyone with a similar problem comes across this. #r2evans answered my original question which lead to a realization that even fixing the above problems would not get me the desired result (see comments to his answer).
Problem: Using the package arrangements to generate a large number of combinations and apply a function to the combinations. This becomes very time consuming as the number of combinations gets huge. What we need to do is split the combinations into chunks depending on the number of cores you will using to run in parallel and then do the calculations in each node only on that specific chunk of the combinations.
Solution:
cor <- detectCores()-1
cl<-makeCluster(cor,type="SOCK")
set.seed(1)
m <- 15
W <- matrix(runif(15,0,1))
#iperm <- arrangements::ipermutations(0:1, m, replace = T)
t_obs <- 5
chunk_list <- list()
for (i in 1:cor)
{
chunk_list[i] <- i
}
chunk_size <- floor((2^m)/(cor))
chunk_size <- c(rep(chunk_size,cor-1), (2^m)-chunk_size*(cor-1))
inputs_list <- Map(list, t=list(t_obs), w=list(W), chunk_list = chunk_list, chunk_size = list(chunk_size))
#inputs <- list(W,t_obs, iperm)
#names(inputs) <- c("w", "t", "ip", "chunk_it")
perms <- function(inputs)
{
x <- 0
L <- 2^length(inputs$w)
ip <- arrangements::ipermutations(0:1, m, replace = T)
chunk_size <- floor((2^m)/(cor))
chunk_size <- c(rep(chunk_size,cor-1), (2^m)-chunk_size*(cor-1))
if (inputs$chunk_list !=1)
{
ip$getnext(sum(chunk_size[1:inputs$chunk_list-1]))
}
for( i in 1:chunk_size[inputs$chunk_list])
{
y <- ip$getnext()%*%inputs$w
if (inputs$t >= y)
{
x <- x + 1
}
}
return(x)
}
clusterExport(cl, c("inputs_list", "m", "cor"))
clusterEvalQ(cl, {
library(arrangements)
})
system.time(results <- parLapply(cl, inputs_list, perms))
Reduce(`+`, results)
What I did was split the total number of combinations up into different chunks, i.e. the first 4681 (I have 7 nodes assigned to cor), the second and so on and made sure I didn't miss any combinations. Then I changed my original function to generate the permutations in each node but to basically skip to the combination it should start calculating on, so for node 1 it starts with the first combination but for node it it starts with the 4682 and so on. I'm still working on optimizing this because it's currently only about 4 times as fast as running it in parallel even though I'm using 7 cores. I think the skip in the permutation option will speed this up but I haven't checked yet. Hopefully this is helpful to someone else, it speeds up my estimated time to run (with m = 25, not 15) a simulation from about 10 days to about 2.5 days.
You need to pass dplyr to the nodes to solve this
clusterEvalQ(clust,{library (dplyr)})
The above code should solve your issue.
I have the following R "apply" statement:
for(i in 1:NROW(dataframe_stuff_that_needs_lookup_from_simulation))
{
matrix_of_sums[,i]<-
apply(simulation_results[,colnames(simulation_results) %in%
dataframe_stuff_that_needs_lookup_from_simulation[i,]],1,sum)
}
So, I have the following data structures:
simulation_results: A matrix with column names that identify every possible piece of desired simulation lookup data for 2000 simulations (rows).
dataframe_stuff_that_needs_lookup_from_simulation: Contains, among other items, fields whose values match the column names in the simulation_results data structure.
matrix_of_sums: When function is run, a 2000 row x 250,000 column (# of simulations x items being simulated) structure meant to hold simulation results.
So, the apply function is looking up the dataframe columns values for each row in a 250,000 data set, computing the sum, and storing it in the matrix_of_sums data structure.
Unfortunately, this processing takes a very long time. I have explored the use of rowsums as an alternative, and it has cut the processing time in half, but I would like to try multi-core processing to see if that cuts processing time even more. Can someone help me convert the code above to "lapply" from "apply"?
Thanks!
With base R parallel, try
library(parallel)
cl <- makeCluster(detectCores())
matrix_of_sums <- parLapply(cl, 1:nrow(dataframe_stuff_that_needs_lookup_from_simulation), function(i)
rowSums(simulation_results[,colnames(simulation_results) %in%
dataframe_stuff_that_needs_lookup_from_simulation[i,]]))
stopCluster(cl)
ans <- Reduce("cbind", matrix_of_sums)
You could also try foreach %dopar%
library(doParallel) # will load parallel, foreach, and iterators
cl <- makeCluster(detectCores())
registerDoParallel(cl)
matrix_of_sums <- foreach(i = 1:NROW(dataframe_stuff_that_needs_lookup_from_simulation)) %dopar% {
rowSums(simulation_results[,colnames(simulation_results) %in%
dataframe_stuff_that_needs_lookup_from_simulation[i,]])
}
stopCluster(cl)
ans <- Reduce("cbind", matrix_of_sums)
I wasn't quite sure how you wanted your output at the end, but it looks like you're doing a cbind of each result. Let me know if you're expecting something else however.
without really having any applicable or sample data to go off of... the process would look like this:
Create a holding matrix(matrix_of_sums)
loop by row through variable table(dataframe_stuff_that_needs_lookup_from_simulation)
find matching indices within the simulation model(simulation_results)
bind the rowSums into the holding matrix(matrix of sums)
I recreated a sample set which is meaningless and produces identical results but should work for your data
# Holding matrix which will be our end-goal
msums <- matrix(nrow = 2000,ncol = 0)
# Loop
parallel::mclapply(1:nrow(ts_df), function(i){
# Store the row to its own variable for ease
d <- ts_df[i,]
# cbind the results using the global assignment operator `<<-`
msums <<- cbind(
msums,
rowSums(
sim_df[,which(colnames(sim_df) %in% colnames(d))]
))
}, mc.cores = parallel::detectCores(), mc.allow.recursive = TRUE)
I do maximum-likelihood-optimization with R's optim-procedure (using BFGS).
Each time the target function is evaluated a couple of vector-matrix-multiplications will take place where the same design-matrix will be postmultilpied with the vector of changing parameters.
If I simply use the parallel package to distribute the job among cores, the distribution time within each iteration basically kills the computation time such it even takes longer with the parallel version if compared to the ordinary matrix vector product.
What I would like to do is to distribute the pieces of the matrix among cores once and then perform the multiplication on the pieces since the matrix will not change between iterations.
Basically I do not want that the same object will be distributed every iteration.
What Ive done so far is
nc <- detectCores()
cl <- makeCluster(rep("localhost", nc))
matprod.par <- function(cl, A, B){
idx <- splitIndices(nrow(A), length(cl))
Alist <- lapply(idx, function(ii) A[ii,,drop=FALSE])
ans <- clusterApply(cl, Alist, get("%*%"), B)
do.call(rbind, ans)
}
Here, the clusterApply-Function distributes the pieces of A, i.e., Alistamong cores. Is there a possibility to distribute Alist among cores once and then perform the multiplication on the distributed pieces and put them back together via clusterApply?
/edit
I compared the clusterCall-approach of Steve Weston with a simple foreach-%dopar%-approch:
matprod.par1 <-function(Alist,B,nc){
par <- foreach(i=1:nc, .combine=rbind) %dopar%{
Alist[[i]]%*%B
}
}
and the clusterExport-approach which, unlike the clusterCall-approach, copies everything to every spawn.
matprod.par2 <-function(obj1,obj2,nc){
return( do.call(rbind, sfClusterApplyLB(1:nc, function(i) eval(as.name(obj1))[[i]]%*%eval(as.name(obj2)))) )
}
I will run the matrix multiplication of two 1000x1000 matrices, 100times on a 8core cluster with 30GB RAM
The setting is
nr <- 1000
A <- matrix(round(rnorm(nr^2),1),nr=nr)
B <- t(A) + 4
ordinary <- A %*% B
This is what I see
On the y-axis we see the time in seconds.
Clearly the sfCluster-Approach performs the best but is not feasible if the matrices are very huge (which in my case they are). So is it wise to go for the clusterCall-approach even though the foreach one is better? I'm not sure bot the foreach one does not copy everything to right?
I would use clusterApply to distribute submatrices of A to the workers, and then use clusterCall to perform operations repeatedly on those submatrices. For example:
A <- matrix(rnorm(16), 4)
idx <- splitIndices(nrow(A), length(cl))
Alist <- lapply(idx, function(ii) A[ii,,drop=FALSE])
clusterApply(cl, Alist, function(a) { subA <<- a; NULL })
matprod.par <- function(cl, B) {
do.call(rbind, clusterCall(cl, function(b) subA %*% b, B))
}
AB <- matprod.par(cl, matrix(1:16, 4))
AC <- matprod.par(cl, matrix(rnorm(16), 4))
Unlike clusterExport, clusterApply can export a different value of the variable "subA" for each of cluster workers.
The clusterCall function is very useful in this context since it allows you to iterate over data that has already been distributed to the workers, and you can still pass "B" along with each of the tasks.
Update
First, I should note that my example makes three assumptions:
The real problem involves more computations than a simple matrix multiply, since that is much better done sequentially on the master;
The "matprod.par" function will be executed multiple times making it worthwhile to pre-distribute the matrix "A" since it will be reused;
The "B" matrix is different on each call to "matprod.par", so no reuse is possible.
The "foreach" example that you present doesn't pre-distribute "A", so it can't reuse "A" if it is called multiple times. Both the "foreach" and "sfClusterApplyLB" examples copy all of "A" to all workers, which uses more memory, as you point out.
If you aren't going to call "matprod.par" multiple times, you could use:
matprod.par <- function(Alist, B) {
foreach(a=Alist, .combine=rbind) %dopar% {
a %*% B
}
}
which avoids copying all of "A" to all the workers. This is equivalent to:
matprod.par <- function(cl, Alist, B) {
do.call(rbind, clusterApply(cl, Alist, get('%*%'), B))
}
which will run somewhat faster than the foreach version since it has less overhead.
If you are going to call matprod.par many times, the pre-distribution time becomes insignificant, and the "clusterCall" example will be faster since none of "A" is sent to the workers again. Where the cross-over occurs depends on the number of times matprod.par is called, the size of "A", and the number of workers.
I want to ask for some help on writing a combine function for foreach(). Consider the function below:
library(mvtnorm)
library(doMC)
mySimFunc <- function(){
myNum <- runif(1)
myVec <- rnorm(10)
myMat <- rmvnorm(5, rep(0, 3), diag(3))
myListRslt <- list("myNum" = myNum, "myVec" = myVec, "myMat" = myMat)
return (myListRslt)
}
Now I'd like to run the code above for 1000 times using foreach() %dopar% and in each iteration I'd like to:
return myNum as is
get average of myVec and return it
get colMeans() of myMat and return it.
I'd like foreach() %dopar% to return a final list including:
a vector of length 1000 including 1000 myNum each corresponding to an iteration
a vector of length 1000 including 1000 average of myVec in each iteration
a matrix with 1000 rows where each row includes colMeans of myMat in that iteration
My Ideal solution
My ideal solution is o find a way that foreach() acts exactly like for so that I can simply define:
myNumRslt <- NULL
myVecRslt <- NULL
myMatRslt <- NULL
# and then simply aggregate result of each iteration to the variables above as:
foreach(i = 1:1000) %dopar%{
rslt <- mySimFunc()
myNumRslt <- c(myNumRslt, rslt$myNum)
myVecRslt <- c(myVecRslt, mean(rslt$myVec))
myMatRslt.tmp <- colMeans(rslt$myMat)
myMatRslt <- rbind(myMatRslt, myMatRslt.tmp)
}
BUT, unfortunately seems that it's not possible to do that with foreach() so then I think the only solution is to write a combine function that does similar to result aggregation above.
Challenge
1) How could I write a combine function that returns what I explained above?
2) When we do %dopar% (suppose using doMC package), does doMC distribute each iteration to a CPU or it goes further and divide each iteration to further pieces and distribute them?
3) Is there any better (more efficient) way than using doMC and foreach() ?
idea's
In this question Brian mentioned a brilliant way to deal with lists including numeric values. In my case, I have numeric values as well as vectors and matrices. I don't know how to extend Brian's idea in my case.
Thanks very much for your help.
Edit
Cleaned up, generalizable solution using .combine:
#modify function to include aggregation
mySimFunc2 <- function(){
myNum <- runif(1)
myVec <- mean(rnorm(10))
myMat <- colMeans(rmvnorm(5, rep(0, 3), diag(3)))
myListRslt <- list("myNum" = myNum, "myVec" = myVec, "myMat" = myMat)
return (myListRslt)
}
#.combine function
MyComb1 <- function(...) {
lst=list(...)
vec<-sapply(1:length(lst), function (i) return(lst[[i]][[1]] ))
vecavg<-sapply(1:length(lst),function (i) return(lst[[i]][[2]] ))
colmeans<-t(sapply(1:length(lst), function (i) return(lst[[i]][[3]])))
final<-list(vec,vecavg,colmeans)
names(final)<-c("vec","vecavg","colmeans")
return(final)
}
library(doParallel)
cl <- makeCluster(3) #set cores
registerDoParallel(cl)
foreach(i=1:1000,.export=c("mySimFunc2","MyComb1"),.combine=MyComb1,
.multicombine=TRUE,.maxcombine=1000, .packages=c("mvtnorm"))%dopar%{mySimFunc2()}
You should now have a list output containing the desired three objects, which I've titled respectively as vec, vecavg, and colmeans. Note you must set .maxcombine to the number of iterations if iterations are greater than 100.
As a side note, it does not make sense to parallelize for this example task, although I'm guessing the real task may be more complex.
I have two objects of equal length (one is a list produced by parsing JSON, and another is a slice of multi-dimensional of an array), e.g.:
library(rjson)
library(foreach)
iter1<-iter( fromJSON(file=jsonfilename)$someJSONarray )
iter2<-iter( myarr, by="row" )
I need to be able to do the following:
out=foreach(x=zipiter(iter1,iter2),combine=list) %do%
{
#Do stuff with elements from both iterators accessed by e.g. x[[1]] and x[[2]]
}
Is there any standard way of doing it (like in Python/C++ with boost Zip iterator)?
There is an izip function in the itertools package that does what you describe:
library(itertools)
out <- foreach(x=izip(iter1, iter2)) %do% {
# x[[1]] contains a value from iter1
# x[[2]] contains a value from iter2
}
But I prefer to specify multiple loop variables to foreach:
out <- foreach(x=iter1, y=iter2) %do% {
# x contains a value from iter1
# y contains a value from iter2
}
Both solutions iterate over values from the iterators in parallel. If you want the two arguments in a list, then izip is more convenient.
This might not be exactly what you need, but hopefully it might at least get you on the right track:
library(foreach)
X = 1:10
Y = 11:20
out = foreach(x=t(data.frame(X, Y))) %do% {x[1,]*x[2,]}
Yes, this is weird, but it properly pairs up the data. If you pass in a dataframe, foreach iterates over the columns instead of the rows (this isn't entirely surprising since lapply does the same thing). So transposing the dataframe results in iteration over the rows, but then the x object is a column vector, so we need to index the rows instead of the columns like we'd expect.
After Steve Weston's answer there is nothing really to add; I'd post my own working version of the zip iterator, so the people learning how to live with the iterators in R would benefit. The iterator is named izipiter to avoid name clash with the existing izip iterator.
library(foreach)
izipiter<- function(iters)
{
nextEl<-function()
{
tryCatch(
{
foreach(it=iters,.combine=list) %do%
nextElem(it)
}, error=function(e) {stop("StopIteration")})
}
obj<-list(nextElem=nextEl)
class(obj) <- c('izipiter','abstractiter','iter')
obj
}
And this is how to use it:
it1=iter(c(3,5,15))
it2=iter(list(x="a",y="b",z=c("one","two","three")))
myit=izipiter(iters=list(it1,it2))
foreach(it=myit) %do%
print(it)