R Parallel Programming with Two Loops and Storage Results - r

I have a function with two loops involved and the results is two lists of data.
The structure
function (){
for (i in 1:50){
for (j in 1:100){
"Do something"
"get results a and b"
a list
b list
}
"use the series of a and b calculate two parameter A and B"
"put A and B into their list"
list A = append(list A, A)
'or'list B = cbind(list B, B) # I don't know which one is better
}
plot the figure using list A and B
"saving the results"
dataframe = df(listA, listB)
dataframe to csv
}
The code needs simulate 5000 times and each step takes at least 1 minutes:
I want to run this whole function using parallel programming; I tried lapply but it only works well with one loop, if I do so the results is not consistent and the plot can not work, i.e. I cannot get the results;
and I find some parallel code can not work on Windows and some cannot work on Mac, I am confused with those ...
Each steps in the loop is independently so one alternative way I thought is just divide the jobs to do them simultaneously, but I need the results constantly (as the order they should be).
To using the data in further plot requirements I need to save the results, I feel trouble with this one(upper here) and also the parallel one;
The way I save the results is looks like a mess.
For example, what I want is:
A B
0 0
0.1 1
1.2 4
3 9
6 12
... ...
but what I got is:
V1
0 0 0.1 1 1.2 4 3 9 6 12 ... ...
I don't know how to save two columns data from parallel programming.

I like using the foreach package for tasks like this (check the documentation). This function is like a for loop, but it works on a cluster. So each for iteration is done separately and is combined afterwards. I made a small example with the structure you are using. You can modify this for your task.
library(foreach)
library(doParallel)
#number of your cluster precessors, i choosed 4
cl <- makeCluster(4)
registerDoParallel(cl)
# use for z=1:10 your range, the .combine declares how to combine your dataframe afterwrads,
#.inorder makes sure it's sorted and the values are in the right order (TRUE is default)
df<-foreach(z = 1:10, .combine=rbind, .inorder=TRUE) %dopar%{
list_b = list()
list_a = list()
for (i in 1:50){
for (j in 1:100){
#some random task you are doing
a = i
b = 50-i
}
#combining them
list_b= cbind(list_b, b)
list_a= cbind(list_a, a)
}
#make sure you return the values, otherwise they don't get combined by foreach
return(do.call(rbind, Map(data.frame, A=list_a, B=list_b)))
}
#foreach returns nested lists, so you can change it to a dataframe easily
df= as.data.frame(df)
View(df)
stopCluster(cl)

Related

Way to parallelize QCA minimize() in R?

I'm using the R package QCA (https://cran.r-project.org/web/packages/QCA/index.html) for Qualitative Comparative Analysis. I want to be able to try out many different combinations, which is taking a very long time. On my faster CPU, trying all the options that I am interested in takes a little over 24 hours. R seems to be using only one of the cores available on my CPU and requires relatively little memory (just under 100MB). I am hoping someone has a good idea on how to speed up this process, perhaps through parallelization?
Here's what I'm doing:
Loading my data set (data), which is a CSV file with the outcome condition and all the options for my causal conditions. The causal conditions are in 4 groups A, B, C, and D. There are approx. 200 observations, i.e., rows in the data set.
Starting a log file with sink()
Creating a series of nested loops to generate each combination of causal conditions I want to examine.
Running the minimize() function within the nested loops. Specifically this looks like this:
for (a in causal_condition_group_A) {
for (b in causal_condition_group_B) {
for (c in causal_condition_group_C) {
for (d in causal_condition_group_D) {
minimize(data, outcome = my_outcome, conditions = paste0(a, ", ", b, ", ", c, ", ", d), ...)
}
}
}
}
The minimize function's conditions argument essentially takes a character vector as input and this is all my nested loops are creating. For example, a random conditions argument might read:
conditions = "causal_condition_A_87, causal_condition_B_2, causal_condition_C_42, causal_condition_D_219"
I tried several different things in an attempt to parallelize this approach, but so far I have not been successful. I tried experimenting with both parSapply and foreach %dopar%, but I am running into various problems. I either can't get the actual parallelization process to work properly or - and this is in some of my toy experiments - I am having trouble logging all the output, which is essential.
Please let me know if I can provide additional information to help clear things up! Thanks for your help!
EDIT:
I was able to create a working foreach() loop based on #HenrikB's advice, but I'm running into a different problem now.
Here's my test solution so far. It includes one less nested loop than I want in the final solution, but that's not important for now:
# SET QCA OUTCOME CONDITION
outcome = "c_outcome"
# LOAD LIBRARIES
library(doParallel)
library(QCA)
# CREATE CLUSTER FOR PARALLELIZATION
cores <- detectCores()
cl <- makeCluster(cores[1]-1, type = "PSOCK", outfile="")
registerDoParallel(cl)
# LOAD AND SET UP DATA
outcomecond <- read.csv("outcomes.csv", header=TRUE, row.names="ID")
causalcond <- read.csv("causal_conditions.csv", header=TRUE, row.names="ID")
data <- cbind(outcomecond[outcome], causalcond)
temp <- data[!is.na(data[outcome]), ] #keep only rows where outcome is not NA
# EXPORT CURRENT DATASET TO CLUSTERS
clusterExport(cl, "temp")
# CREATE CAUSAL CONDITION LISTS
causal_condition_group_A <- colnames(causalcond[, 1:99])
causal_condition_group_B <- colnames(causalcond[, 100:141])
causal_condition_group_C <- colnames(causalcond[, 142:183])
# EXPORT LIBRARIES TO CLUSTERS
clusterEvalQ(cl, library(doParallel))
clusterEvalQ(cl, library(QCA))
# START TIMER
start.time <- Sys.time()
# THREE NESTED FOREACH LOOPS (ONE FOR EACH CAUSAL CONDITION GROUP)
x <-
foreach(c=causal_condition_group_C, .combine='cbind') %:%
foreach(b=causal_condition_group_B, .combine='cbind') %:%
foreach(a=causal_condition_group_A, .combine='cbind') %dopar% {
tryCatch({
minimize(temp, outcome = outcome, conditions = paste0(a,",",b,",",c), n.cut = 1, incl.cut = 0.400, include = "?", details = TRUE, use.letters = TRUE)
}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}
# END TIMER
end.time <- Sys.time()
# PROGRAM RUNNING TIME:
print(end.time - start.time)
# END CLUSTER
stopCluster(cl)
When I run this as a sequential %do% loop, I need something like 3 GB of memory to store x. However, when I try to run this sequentially, the memory rises exponentially. Here's a screenshot from shortly before I gave up:
Screenshot of task manager
Does someone know why %dopar% is using so much more memory and is there a way to avoid this?
Could I, for example, be writing x to file once in a while and purge memory after I do this? x is a list that is of dimension 14 by number of iterations in the foreach() loops. Here is what one of the "columns" in x looks like:
result.98
tt List,11
options List,10
negatives Numeric,3
initials "~A*B"
PIchart TRUE
primes Integer,2
solution List,1
essential "~A*B"
inputcases "205,245,253,306,425,468,490,511,514,585,587,657,684,696,739,740,784,796"
pims List,1
IC List,4
numbers Numeric,4
SA List,1
call Expression

R: Error in parLapply - $ invalid for atomic vectors only occurs running in parallel

I tried to look for a duplicate question and I know many people have asked about parLapply in R so I apologize if I missed one that is applicable to my situation.
Problem: I have the following function that runs correctly in R but when I try to run it in parallel using parLapply (I'm on a windows machine) I get the error that $ operator is invalid for atomic vectors. The error mentions that 3 nodes produced the errors no matter how many nodes I set my cluster at, for example I have 8 cores on my desktop so I set the cluster to 7 nodes.
Here is example code showing where the problem is:
library(parallel)
library(doParallel)
library(arrangements)
#Function
perms <- function(inputs)
{
x <- 0
L <- 2^length(inputs$w)
ip <- inputs$ip
for( i in 1:L)
{
y <- ip$getnext()%*%inputs$w
if (inputs$t >= y)
{
x <- x + 1
}
}
return(x)
}
#Inputs is a list of several other variables that are created before this
#function runs (w, t_obs and iperm), here is a reproducible example of them
#W is derived from my data, this is just an easy way to make a reproducible example
set.seed(1)
m <- 15
W <- matrix(runif(15,0,1))
iperm <- arrangements::ipermutations(0:1, m, replace = T)
t_obs <- 5
inputs <- list(W,t_obs, iperm)
names(inputs) <- c("w", "t", "ip")
#If I run the function not in parallel
perms(inputs)
#It gives a value of 27322 for this example data
This runs exactly as it should, however when I try the following to run in parallel I get an error
#make the cluster
cor <- detectCores()
cl<-makeCluster(cor-1,type="SOCK")
#passing library and arguments
clusterExport(cl, c("inputs"))
clusterEvalQ(cl, {
library(arrangements)
})
results <- parLapply(cl, inputs, perms)
I get the error:
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: $ operator is invalid for atomic vectors
However I've checked to see if anything is an atomic vector using is.atomic(), and using is.recursive(inputs) it says this is TRUE.
My question is why am I getting this error when I try to run this using parLapply when the function otherwise runs correctly and is there a reason is says "3 nodes produced errors" even when I have 7 nodes?
It says "3 nodes" because, as you're passing it to parLapply, you are only activating three nodes. The first argument to parLapply should be a list of things, each element to pass to each node. In your case, your inputs is a list, correct, but it is being broken down, such that your three nodes are effectively seeing:
# node 1
perms(inputs[[1]]) # effectively inputs$w
# node 2
perms(inputs[[2]]) # effectively inputs$t
# node 3
perms(inputs[[3]]) # effectively inputs$ip
# nodes 4-7 idle
You could replicate this on the local host (not parallel) with:
lapply(inputs, perms)
and when you see it like that, perhaps it becomes a little more obvious what is being passed to your nodes. (If you want to see if further, do debug(perms) then run the lapply above, and see what the inputs inside that function call looks like.)
To get this to work once on one node (I think not what you're trying to do), you could do
parLapply(cl, list(inputs), perms)
But that's only going to run one instance on one node. Perhaps you would prefer to do something like:
parLapply(cl, replicate(7, inputs, simplify=FALSE), perms)
I'm adding an answer in case anyone with a similar problem comes across this. #r2evans answered my original question which lead to a realization that even fixing the above problems would not get me the desired result (see comments to his answer).
Problem: Using the package arrangements to generate a large number of combinations and apply a function to the combinations. This becomes very time consuming as the number of combinations gets huge. What we need to do is split the combinations into chunks depending on the number of cores you will using to run in parallel and then do the calculations in each node only on that specific chunk of the combinations.
Solution:
cor <- detectCores()-1
cl<-makeCluster(cor,type="SOCK")
set.seed(1)
m <- 15
W <- matrix(runif(15,0,1))
#iperm <- arrangements::ipermutations(0:1, m, replace = T)
t_obs <- 5
chunk_list <- list()
for (i in 1:cor)
{
chunk_list[i] <- i
}
chunk_size <- floor((2^m)/(cor))
chunk_size <- c(rep(chunk_size,cor-1), (2^m)-chunk_size*(cor-1))
inputs_list <- Map(list, t=list(t_obs), w=list(W), chunk_list = chunk_list, chunk_size = list(chunk_size))
#inputs <- list(W,t_obs, iperm)
#names(inputs) <- c("w", "t", "ip", "chunk_it")
perms <- function(inputs)
{
x <- 0
L <- 2^length(inputs$w)
ip <- arrangements::ipermutations(0:1, m, replace = T)
chunk_size <- floor((2^m)/(cor))
chunk_size <- c(rep(chunk_size,cor-1), (2^m)-chunk_size*(cor-1))
if (inputs$chunk_list !=1)
{
ip$getnext(sum(chunk_size[1:inputs$chunk_list-1]))
}
for( i in 1:chunk_size[inputs$chunk_list])
{
y <- ip$getnext()%*%inputs$w
if (inputs$t >= y)
{
x <- x + 1
}
}
return(x)
}
clusterExport(cl, c("inputs_list", "m", "cor"))
clusterEvalQ(cl, {
library(arrangements)
})
system.time(results <- parLapply(cl, inputs_list, perms))
Reduce(`+`, results)
What I did was split the total number of combinations up into different chunks, i.e. the first 4681 (I have 7 nodes assigned to cor), the second and so on and made sure I didn't miss any combinations. Then I changed my original function to generate the permutations in each node but to basically skip to the combination it should start calculating on, so for node 1 it starts with the first combination but for node it it starts with the 4682 and so on. I'm still working on optimizing this because it's currently only about 4 times as fast as running it in parallel even though I'm using 7 cores. I think the skip in the permutation option will speed this up but I haven't checked yet. Hopefully this is helpful to someone else, it speeds up my estimated time to run (with m = 25, not 15) a simulation from about 10 days to about 2.5 days.
You need to pass dplyr to the nodes to solve this
clusterEvalQ(clust,{library (dplyr)})
The above code should solve your issue.

parallel programming for a function taking two arguments from a list of data frames using R

I have a function called DTW in similarity measure package. It takes two matrix or data frame as its arguments and returns the Dynamic time warping distance. Those data frames are the longitudes and latitudes of trajectory.
My program looks like this and all the data frames like df1, df2,df3 and so on are available:
distance <- function(arg1,arg2) {
DTW(arg1, arg2)
}
for(i in 1:length(LIST)){
for(j in 1:length(LIST)){
a <- get(paste0("df",i))
b <- get(paste0("df",j))
ddist[i,j] <- distance(a,b)
print(ddist)
}
}
I am making a matrix ddist in which all the values are inserted returned by distance function. The program is working fine. I want to make it fast using parallel programming like parapply or parlapply function.
Here is a simple method to give you an idea of how to make it parallel
k<-length(LIST)
ddist<-matrix(0,k,k)
library("doParallel")
cl<-makeCluster(4,outfile='')
registerDoParallel(cl)
for(i in 1:k) {
a <- get(paste0("df",i))
ddist[i,]=foreach(j = 1:k , .combine='cbind' ,.export=paste0("df",1:k)) %dopar% {
b <- get(paste0("df",j))
distance(a,b)
}
}
stopCluster(cl)
Having said that , things to evaluate
if the distance function takes more than 2 seconds ,then only use
parallel
df1 , df2 etc may not be a good idea , store each
dataframe as df[[1]] , df[[2]]. Better than using the get function
if length(k) is very huge , then the amount of time taken for
transferring the exported df1,df2 etc is quite a long time , hence
try to hit the sweet spot of performance with various iterations
You can see the option of data.table where there is inplace edit,
use this instead of the ddist as it might be faster
If this code is called within a function , then you might also need to
export the function ddist , like .export=c(ddist,paste0("df",1:k))
Change the "4" in makeCluster to chose the cores you want, as a
thumbrule , keep it as detectCores()-1

Optimizing nested foreach dopar in R

I'd like input on how my code below is structured. Would like to know if it needs to be organized in a different way to execute faster. Specifically, whether I need to be using foreach and dopar differently in the nested loops. Currently, the inner loop is the bulk of the work (ddply with between 1-8 breakdown variables, each of which has 10-200 levels), and that's what I have running in parallel. I left out the code details for simplicity.
Any ideas? My code, as organized below, does work, but it takes a few hours on a 6-core, 41gb machine. The dataset isn't that large (< 20k records).
for(m in 1:length(Predictors)){ # has up to three elements in the vector
# construct the dataframe based on the specified predictor
# subset the original dataframe based on the breakdown variables, outcome, predictor and covariates
for(l in 1:nrow(pairwisematrixReduced)){ # this has 1-6 rows;subset based on correct comparison groups
# some code here
cl <- makeCluster(detectCores())
registerDoParallel(cl)
for (i in 1:nrow(subsetting_table)){ # this table has about 50 rows
# this uses the columns specified by k in the glm; the prior columns will be used as breakdown variables
# up to 10 covariates
result[[length(result) + 1]] <- foreach(k = 11:17, .packages=c('plyr','reshape2', 'fastmatch')) %dopar% {
ddply(
df,
b, # vector of breakdown variables
function(x) {
# run a GLM and manipulate the output
,.parallel = TRUE) # close ddply
} # close k loop -- set of covariates
} # close i loop -- subsetting table
} #close l -- group combinations
} # close m loop - this is the pairwise predictor matrix
stopCluster(cl)
result <- unlist(result, recursive = FALSE)
tmp2<-do.call(rbind.fill, result)
Copied out of vignette("nested")
3 Using %:% with %dopar%
When parallelizing nested for loops, there is always a question of which loop to parallelize. The standard advice is...
You also are using foreach %dopar% along with ddply and .parallel=TRUE. With a six core processor (and presumably hyper threading) means the foreach block would start 12 environments and then the ddply would start 12 environments within each of those for 144 simultaneous environments. The foreach should be changed to %do% to be consistent with your questions text of running the inner loop in parallel. Or to make it cleaner, change both to foreach and use %dopar% for one loop and %:% for the other.

Make nested loops more efficient?

I'm analyzing large sets of data using the following script:
M <- c_alignment
c_check <- function(x){
if (x == c_1) {
1
}else{
0
}
}
both_c_check <- function(x){
if (x[res_1] == c_1 && x[res_2] == c_1) {
1
}else{
0
}
}
variance_function <- function(x,y){
sqrt(x*(1-x))*sqrt(y*(1-y))
}
frames_total <- nrow(M)
cols <- ncol(M)
c_vector <- apply(M, 2, max)
freq_vector <- matrix(nrow = sum(c_vector))
co_freq_matrix <- matrix(nrow = sum(c_vector), ncol = sum(c_vector))
insertion <- 0
res_1_insertion <- 0
for (res_1 in 1:cols){
for (c_1 in 1:conf_vector[res_1]){
res_1_insertion <- res_1_insertion + 1
insertion <- insertion + 1
res_1_subset <- sapply(M[,res_1], c_check)
freq_vector[insertion] <- sum(res_1_subset)/frames_total
res_2_insertion <- 0
for (res_2 in 1:cols){
if (is.na(co_freq_matrix[res_1_insertion, res_2_insertion + 1])){
for (c_2 in 1:max(c_vector[res_2])){
res_2_insertion <- res_2_insertion + 1
both_res_subset <- apply(M, 1, both_c_check)
co_freq_matrix[res_1_insertion, res_2_insertion] <- sum(both_res_subset)/frames_total
co_freq_matrix[res_2_insertion, res_1_insertion] <- sum(both_res_subset)/frames_total
}
}
}
}
}
covariance_matrix <- (co_freq_matrix - crossprod(t(freq_vector)))
variance_matrix <- matrix(outer(freq_vector, freq_vector, variance_function), ncol = length(freq_vector))
correlation_coefficient_matrix <- covariance_matrix/variance_matrix
A model input would be something like this:
1 2 1 4 3
1 3 4 2 1
2 3 3 3 1
1 1 2 1 2
2 3 4 4 2
What I'm calculating is the binomial covariance for each state found in M[,i] with each state found in M[,j]. Each row is the state found for that trial, and I want to see how the state of the columns co-vary.
Clarification: I'm finding the covariance of two multinomial distributions, but I'm doing it through binomial comparisons.
The input is a 4200 x 510 matrix, and the c value for each column is about 15 on average. I know for loops are terribly slow in R, but I'm not sure how I can use the apply function here. If anyone has a suggestion as to properly using apply here, I'd really appreciate it. Right now the script takes several hours. Thanks!
I thought of writing a comment, but I have too much to say.
First of all, if you think apply goes faster, look at Is R's apply family more than syntactic sugar? . It might be, but it's far from guaranteed.
Next, please don't grow matrices as you move through your code, that slows down your code incredibly. preallocate the matrix and fill it up, that can increase your code speed more than a tenfold. You're growing different vectors and matrices through your code, that's insane (forgive me the strong speech)
Then, look at the help page of ?subset and the warning given there:
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting functions like
[, and in particular the non-standard evaluation of argument subset
can have unanticipated consequences.
Always. Use. Indices.
Further, You recalculate the same values over and over again. fre_res_2 for example is calculated for every res_2 and state_2 as many times as you have combinations of res_1 and state_1. That's just a waste of resources. Get out of your loops what you don't need to recalculate, and save it in matrices you can just access again.
Heck, now I'm at it: Please use vectorized functions. Think again and see what you can drag out of the loops : This is what I see as the core of your calculation:
cov <- (freq_both - (freq_res_1)*(freq_res_2)) /
(sqrt(freq_res_1*(1-freq_res_1))*sqrt(freq_res_2*(1-freq_res_2)))
As I see it, you can construct a matrix freq_both, freq_res_1 and freq_res_2 and use them as input for that one line. And that will be the whole covariance matrix (don't call it cov, cov is a function). Exit loops. Enter fast code.
Given the fact I have no clue what's in c_alignment, I'm not going to rewrite your code for you, but you definitely should get rid of the C way of thinking and start thinking R.
Let this be a start: The R Inferno
It's not really the 4 way nested loops but the way your code is growing memory on each iteration. That's happening 4 times where I've placed # ** on the cbind and rbind lines. Standard advice in R (and Matlab and Python) in situations like this is to allocate in advance and then fill it in. That's what the apply functions do. They allocate a list as long as the known number of results, assign each result to each slot, and then merge all the results together at the end. In your case you could just allocate the correct size matrix in advance and assign into it at those 4 points (roughly speaking). That should be as fast as the apply family, and you might find it easier to code.

Resources