Need help on combine function in a Parallel Simulation study using doMC - r

I want to ask for some help on writing a combine function for foreach(). Consider the function below:
library(mvtnorm)
library(doMC)
mySimFunc <- function(){
myNum <- runif(1)
myVec <- rnorm(10)
myMat <- rmvnorm(5, rep(0, 3), diag(3))
myListRslt <- list("myNum" = myNum, "myVec" = myVec, "myMat" = myMat)
return (myListRslt)
}
Now I'd like to run the code above for 1000 times using foreach() %dopar% and in each iteration I'd like to:
return myNum as is
get average of myVec and return it
get colMeans() of myMat and return it.
I'd like foreach() %dopar% to return a final list including:
a vector of length 1000 including 1000 myNum each corresponding to an iteration
a vector of length 1000 including 1000 average of myVec in each iteration
a matrix with 1000 rows where each row includes colMeans of myMat in that iteration
My Ideal solution
My ideal solution is o find a way that foreach() acts exactly like for so that I can simply define:
myNumRslt <- NULL
myVecRslt <- NULL
myMatRslt <- NULL
# and then simply aggregate result of each iteration to the variables above as:
foreach(i = 1:1000) %dopar%{
rslt <- mySimFunc()
myNumRslt <- c(myNumRslt, rslt$myNum)
myVecRslt <- c(myVecRslt, mean(rslt$myVec))
myMatRslt.tmp <- colMeans(rslt$myMat)
myMatRslt <- rbind(myMatRslt, myMatRslt.tmp)
}
BUT, unfortunately seems that it's not possible to do that with foreach() so then I think the only solution is to write a combine function that does similar to result aggregation above.
Challenge
1) How could I write a combine function that returns what I explained above?
2) When we do %dopar% (suppose using doMC package), does doMC distribute each iteration to a CPU or it goes further and divide each iteration to further pieces and distribute them?
3) Is there any better (more efficient) way than using doMC and foreach() ?
idea's
In this question Brian mentioned a brilliant way to deal with lists including numeric values. In my case, I have numeric values as well as vectors and matrices. I don't know how to extend Brian's idea in my case.
Thanks very much for your help.

Edit
Cleaned up, generalizable solution using .combine:
#modify function to include aggregation
mySimFunc2 <- function(){
myNum <- runif(1)
myVec <- mean(rnorm(10))
myMat <- colMeans(rmvnorm(5, rep(0, 3), diag(3)))
myListRslt <- list("myNum" = myNum, "myVec" = myVec, "myMat" = myMat)
return (myListRslt)
}
#.combine function
MyComb1 <- function(...) {
lst=list(...)
vec<-sapply(1:length(lst), function (i) return(lst[[i]][[1]] ))
vecavg<-sapply(1:length(lst),function (i) return(lst[[i]][[2]] ))
colmeans<-t(sapply(1:length(lst), function (i) return(lst[[i]][[3]])))
final<-list(vec,vecavg,colmeans)
names(final)<-c("vec","vecavg","colmeans")
return(final)
}
library(doParallel)
cl <- makeCluster(3) #set cores
registerDoParallel(cl)
foreach(i=1:1000,.export=c("mySimFunc2","MyComb1"),.combine=MyComb1,
.multicombine=TRUE,.maxcombine=1000, .packages=c("mvtnorm"))%dopar%{mySimFunc2()}
You should now have a list output containing the desired three objects, which I've titled respectively as vec, vecavg, and colmeans. Note you must set .maxcombine to the number of iterations if iterations are greater than 100.
As a side note, it does not make sense to parallelize for this example task, although I'm guessing the real task may be more complex.

Related

Call function on its own output, N times

I want to run a function N times, with it's input being the output it produced in the last iteration. Here's a manual example (with N=3):
fun <- function(data) {
x <- data$x
y <- data$y
new_x <- x+y
new_y <- x*y
list(x=new_x, y=new_y)
}
#Initialise:
data <- list(x=2,y=3)
#Run N times:
data <- fun(data)
data <- fun(data)
data <- fun(data)
Is there a simple/fast way to do this, without using slow loops?
Is there a simple/fast way to do this
Yes, this is a trivial loop:
N = 3
for(i in 1:N) {
data = fun(data)
}
without using slow loops?
This is not slow.
Loops in R are slower than vectorized operations. However, since each iteration depends on the previous result, this cannot be vectorized. With R's JIT compilation, a for loop will likely be faster than common ways in R to hide loops, like *apply functions. And anyway, it's difficult to make most of the *apply functions update their inputs for successive iterations, as is needed here. (JIT compilation has been enabled by default for many years now.)

Dealing with multidimensional output in parallel programming

I am currently working on a program to evaluate the out-of-sample performance of several forecasting models on simulated data. For those who are familiar with finance, it works exactly like backtesting a trading strategy, except that I would evaluate forecasts and not transactions.
Some of the objects I currently manipulate using for loops for this type of task are 7 dimensional arrays (dimensions stand for Monte Carlo replications, data generating processes, forecast horizons, 3 dimensions for model parameter selection, and one dimension for all the periods covered in the out-of-sample analysis). Obviously, it is painfully slow, so parallel computing has became a must for me.
My problem is: how do I keep track of more than 2 dimensions in R? Let's just show you using 'for loops' and only 3 dimensions what I mean:
x <- array(dim=c(2,2,2))
for (i in 1:2){
for (j in 1:2){
for (k in 1:2){
x[i,j,k] <- i+j+k
}
}
}
If I use something like 'foreach', I am very annoyed by the fact that, to my knowledge, available combining functionalities will return lists, matrices or vectors -- but not arbitrarily large multidimensional arrays. For instance:
library(doParallel)
library(foreach)
# Get the number of cores to use
no_cores <- max(1, detectCores()-1)
# Make cluster object using no_cores
cl <- makeCluster(no_cores)
# Initialize cluster for parallel computing
registerDoParallel(cl)
x <- foreach(i=1:2, .combine=rbind)%:%
foreach(j=1:2, .combine=cbind)%:%
foreach(k=1:2, .combine=c)%dopar%{
i+j+k
}
Here, I basically combine results into vectors, then matrices and, finally, I pile up matrices by rows. Another option would be to use lists, or pile matrices through columns, but you can imagine the mess when you have 7 dimensions and millions of iterations to track.
I suppose I could also write my own 'combine' function and get the kind of output I want, but I suspect that I am not the first person to encounter this problem. Either there is a way to do exactly what I want, or someone here can point out a way to think differently about storing my results. It wouldn't be surprising that I am taking an absurdly inefficient path toward solving this problem -- I am an economist, not a data scientist, after all!
Any help would be greatly appreciated. Thanks in advance.
There is one available solution that I finally stumbled upon tonight. I can create an appropriate combination function along the dimension of my choice using the 'abind' function of the 'abind' package:
library(abind)
# Get the number of cores to use
no_cores <- max(1, detectCores()-1)
# Make cluster object using no_cores
cl <- makeCluster(no_cores)
# Initialize cluster for parallel computing
registerDoParallel(cl)
mbind <- function(...) abind(..., along=3)
x <- foreach(i=1:2, .combine=mbind)%:%
foreach(j=1:2, .combine=cbind)%:%
foreach(k=1:2, .combine=c)%dopar%{
i+j+k
}
I would still like to see if someone has other means of doing what I want to do, however. There might be many ways to do it and I am new to R, yet this solution is a distinct possibility.
What I would do and I already use in one of my packages, bigstatsr.
Take only one dimension and cut it in no_cores blocks. It should have sufficient iterations (e.g. 20 for 4 cores). For each iteration, construct part of the array you want and store it in a temporary file. The, use the content of these files to fill the whole array. By doing so, you fill only preallocated objects, which should be faster and easier.
Example:
x.all <- array(dim=c(20,2,2))
no_cores <- 3
tmpfile <- tempfile()
range.parts <- bigstatsr:::CutBySize(nrow(x.all), nb = no_cores)
library(foreach)
cl <- parallel::makeCluster(no_cores)
doParallel::registerDoParallel(cl)
foreach(ic = 1:no_cores) %dopar% {
ind <- bigstatsr:::seq2(range.parts[ic, ])
x <- array(dim = c(length(ind), 2, 2))
for (i in seq_along(ind)){
for (j in 1:2){
for (k in 1:2){
x[i,j,k] <- ind[i]+j+k
}
}
}
saveRDS(x, file = paste0(tmpfile, "_", ic, ".rds"))
}
parallel::stopCluster(cl)
for (ic in 1:no_cores) {
ind <- bigstatsr:::seq2(range.parts[ic, ])
x.all[ind, , ] <- readRDS(paste0(tmpfile, "_", ic, ".rds"))
}
print(x.all)
Instead of writing files, you could also directly return the no_cores parts of the array in foreach and combine them with the right abind.

Convert R apply statement to lapply for parallel processing

I have the following R "apply" statement:
for(i in 1:NROW(dataframe_stuff_that_needs_lookup_from_simulation))
{
matrix_of_sums[,i]<-
apply(simulation_results[,colnames(simulation_results) %in%
dataframe_stuff_that_needs_lookup_from_simulation[i,]],1,sum)
}
So, I have the following data structures:
simulation_results: A matrix with column names that identify every possible piece of desired simulation lookup data for 2000 simulations (rows).
dataframe_stuff_that_needs_lookup_from_simulation: Contains, among other items, fields whose values match the column names in the simulation_results data structure.
matrix_of_sums: When function is run, a 2000 row x 250,000 column (# of simulations x items being simulated) structure meant to hold simulation results.
So, the apply function is looking up the dataframe columns values for each row in a 250,000 data set, computing the sum, and storing it in the matrix_of_sums data structure.
Unfortunately, this processing takes a very long time. I have explored the use of rowsums as an alternative, and it has cut the processing time in half, but I would like to try multi-core processing to see if that cuts processing time even more. Can someone help me convert the code above to "lapply" from "apply"?
Thanks!
With base R parallel, try
library(parallel)
cl <- makeCluster(detectCores())
matrix_of_sums <- parLapply(cl, 1:nrow(dataframe_stuff_that_needs_lookup_from_simulation), function(i)
rowSums(simulation_results[,colnames(simulation_results) %in%
dataframe_stuff_that_needs_lookup_from_simulation[i,]]))
stopCluster(cl)
ans <- Reduce("cbind", matrix_of_sums)
You could also try foreach %dopar%
library(doParallel) # will load parallel, foreach, and iterators
cl <- makeCluster(detectCores())
registerDoParallel(cl)
matrix_of_sums <- foreach(i = 1:NROW(dataframe_stuff_that_needs_lookup_from_simulation)) %dopar% {
rowSums(simulation_results[,colnames(simulation_results) %in%
dataframe_stuff_that_needs_lookup_from_simulation[i,]])
}
stopCluster(cl)
ans <- Reduce("cbind", matrix_of_sums)
I wasn't quite sure how you wanted your output at the end, but it looks like you're doing a cbind of each result. Let me know if you're expecting something else however.
without really having any applicable or sample data to go off of... the process would look like this:
Create a holding matrix(matrix_of_sums)
loop by row through variable table(dataframe_stuff_that_needs_lookup_from_simulation)
find matching indices within the simulation model(simulation_results)
bind the rowSums into the holding matrix(matrix of sums)
I recreated a sample set which is meaningless and produces identical results but should work for your data
# Holding matrix which will be our end-goal
msums <- matrix(nrow = 2000,ncol = 0)
# Loop
parallel::mclapply(1:nrow(ts_df), function(i){
# Store the row to its own variable for ease
d <- ts_df[i,]
# cbind the results using the global assignment operator `<<-`
msums <<- cbind(
msums,
rowSums(
sim_df[,which(colnames(sim_df) %in% colnames(d))]
))
}, mc.cores = parallel::detectCores(), mc.allow.recursive = TRUE)

Using large numbers of permutations in parallel: combining iterpc and foreach

Iterpc begins each loop from the same point. This has created an amusing, though frustrating issue, illustrated below:
####Load Packages:
library("doParallel")
library("foreach")
library("iterpc")
####Define variables:
n<-2
precision<-0.1
support<-matrix(seq(0+precision,1-precision,by=precision), ncol=1)
nodes<-2 #preparing for multicore.
cl<-makeCluster(nodes)
####Prep iterations
I<-iterpc(table(support),n, ordered=TRUE,replace=FALSE)
steps<-((factorial(length(support)) / factorial(length(support)-n)))/n
####Run loop to get the combined values:
registerDoParallel(cl)
support_n<-foreach(m=1:n,.packages="iterpc", .combine='cbind') %dopar% {
t(getnext(I,steps))
} #????
Which returns
support_n
I was hoping that this would run each of the sets in parallel, one half of the permutations assigned to each node. However, it only does the first half of the permutations... twice. ([,1] is equal to [,37].) How do I get it to return all of permutations and combine them in parallel?
Assume there will be an arbitrarily large number of permutations so memory management and speed are nontrivial.
Previous research:All possible permutations for large n
Just for anyone, who will come here by searching "foreach iterpc R", as i did.
Your approach marked as accepted answer does not really differ much from
result <- foreach(a=1:10) %dopar% {
a
}
because a=getnext(I,d=(2*steps)) will simply return the first 2*steps combinations and then foreach package will iterate in parallel over this combinations.
When you have very large number of combinations returned by iterpc (which it is build for) you cannot in fact use such an approach.
In that case the only thing i believe one could do is to write iterator wrapper over the iterpc object.
# register parallel backend
library(doParallel)
registerDoParallel(cores = 3)
#create iterpc object
library(iterpc)
combinations <- iterpc(4,2)
library(iterators)
iterpc_iterator <- function(iterpc_object, iteration_length) {
# one's own function of nextElement() because iterpc
# returns NULL on finished iteration on subsequent getnext() invocation
# but not 'StopIteration'
nextEl <- function() {
if (iteration_length > 0)
iteration_length <<- iteration_length - 1
else
stop('StopIteration')
getnext(iterpc_object)
}
obj <- list(nextElem=nextEl)
class(obj) <- c('irep', 'abstractiter', 'iter')
obj
}
it <- iterpc_iterator(combinations, getlength(combinations))
library(foreach)
result <- foreach(i=it) %dopar% {
i
}
You can simply use iterpc::iter_wrapper.
The relevant line from your example:
support_n <-foreach(a = iter_wrapper(I), .combine='cbind') %dopar% a
After further investigation I believe the following does in fact execute the command in parallel.
registerDoParallel(cl)
system.time(
support_n<-foreach(a=getnext(I,d=(2*steps)),.combine='cbind') %dopar% a
)
support_n<-t(support_n)
Thank you for your assistance.

incorrect number of dimensions error using parLapply

I am trying to parallelize some function on the 4 cores of my machine using parLapply.
My function defines two embedded loops which are meant to fill out some empty columns of a predefined matrix M.
However, when I run the code below I obtain the following error
2 nodes produced errors; first error: incorrect number of dimensions
Code:
require("parallel")
TheData<-list(E,T) # list of 2 matrices of different dimensions, T is longer and wider than E
myfunction <- function(TheData) {
for (k in 1:length(TheData[[1]][,1])) {
distance<-matrix(,nrow=length(TheData[[1]][,1]),ncol=1)
for (j in 1:length(TheData[[2]][,1])) {
distance[j]<-sqrt((as.numeric(TheData[[2]][j,1])-as.numeric(TheData[[1]][k,2]))^2+(as.numeric(TheData[[2]][j,2])-as.numeric(TheData[[1]][k,1]))^2)
}
index<-which(distance == min(distance))
M[k,4:9]<-c(as.numeric(TheData[[2]][index,1]),as.numeric(TheData[[2]][index,2]),as.numeric(TheData[[2]][index,3]),as.numeric(TheData[[2]][index,4]),as.numeric(TheData[[2]][index,5]),as.numeric(TheData[[2]][index,6]))
rm(distance)
gc()
}
}
n_cores <- 4
Cl = makeCluster(n_cores)
Results <- parLapplyLB(Cl, TheData, myfunction)
# I also tried: Results <- parLapply(Cl, TheData, myfunction)
In your example, parLapply is iterating over a list of matrices, and passing those matrices as the argument to "myfunction". However, "myfunction" seems to expect its argument to be a list of two matrices, and so an error occurs. I can reproduce that error with:
> E <- matrix(0, 4, 4)
> E[[1]][,1]
Error in E[[1]][, 1] : incorrect number of dimensions
I'm not sure what you're really trying to do, but with the current implementation of "myfunction", I would expect you to call parLapply with a list of lists containing two matrices, such as:
TheDataList <- list(list(A,B), list(C,D), list(E,F), list(G,H))
Passing this as the second argument to parLapply would result in "myfunction" being called four times, each time with a list containing two matrices.
But your example has another problem. It looks like you expect parLapply to modify the matrix "M" as a side-effect, but it can't. I think you should change "myfunction" to return a matrix. parLapply will return the matrices in a list which you can then bind together into the desired result.
Update
From your comment, I now believe that you essentially want to parallelize "myfunction". Here's my attempt to do that:
library(parallel)
cl <- makeCluster(4)
myfunction <- function(Exy) {
iM <- integer(nrow(Exy))
for (k in 1:nrow(Exy)) {
distance <- sqrt((Txy[,1] - Exy[k,2])^2 + (Txy[,2] - Exy[k,1])^2)
iM[k] <- which.min(distance)
}
iM
}
# Random example data for testing
T <- matrix(rnorm(150), 10)
E <- matrix(rnorm(120), 10)
# Only export the first two columns to T to the workers
Txy <- T[,1:2]
clusterExport(cl, c('Txy'))
# Parallelize "myfunction" by calling it in parallel on block rows of "E".
ExyList <- parallel:::splitRows(E[,1:2], length(cl))
iM <- do.call('c', clusterApply(cl, ExyList, myfunction))
# Update "M" using data from "T" indexed by "iM"
M <- matrix(0, nrow(T), 9) # more fake data
for (k in iM) {
M[k,4:9] <- T[k, 1:6]
}
print(M)
stopCluster(cl)
Notes:
I vectorized myfunction which should make it more efficient. Hopefully it's nearly correct.
I also modified myfunction to return a vector of indices into "T" to reduce the amount of data sent back to the master.
The splitRows function from the parallel package is used to split the first two columns of "E" into a list of submatrices.
splitRows isn't exported by parallel, so I used ':::'. If this offends you, then use the splitRows function from snow which is exported.
The first two columns of "T" are exported to each of the workers since each task requires the entire first two columns.
clusterApply is used rather than parLapply since we need to iterate over submatrices of E.

Resources