R - cpv (trotter package) and %dopar% - r

I'd like to know whether the cpv function within the trotter package works with %dopar%? I'm getting the following error:
task 1 failed - "object of type 'S4' is not subsettable"
Here's a small example:
library(doParallel)
library(trotter)
registerDoParallel(cores = 2)
x <- letters
combos <- cpv(2, 1:4)
print(combos)
num_combos <- length(combos)
results_list <- foreach(combo_num=1:num_combos) %dopar% { # many iterations
y <- x[combos[combo_num]]
# time consuming stuff follows that involves using y
}
Replacing %dopar% with %do% (or simply using a for loop) and it works fine.

Depending on the cluster type one needs to explicitly specify the used packages via the .packages argument. The following should work:
library(doParallel)
library(trotter)
cl <- makePSOCKcluster(2)
registerDoParallel(cl=cl)
x <- letters
combos <- cpv(2, 1:4)
num_combos <- length(combos)
rl <- foreach(combo_num=1:num_combos, .packages="trotter") %dopar% {
x[combos[combo_num]]
}

Related

How to make parallel processing in R with empty result?

I'd like to make parallel processing in R by using packages 'doParallel' and 'foreach'. And, the idea is to make parallel only computations without any outcomes. What I've found looks like 'foreach' operator always return some kind of result that takes memory in the RAM. So, I need any help to have an empty result for parallel processing loops.
# 1. Packages
library(doParallel)
library(foreach)
# 2. Create and run app cluster
cluster_app <- makeCluster(detectCores())
registerDoParallel(cluster_app)
# 3. Loop with result
list_i <- foreach(i = 1:100) %dopar% {
print(i)
}
# 4. List is not empty
list_i
# 5. How make loop with empty 'list_i' ?
# TODO: make 'list' equal NULL or NA
# 6. Stop app cluster
stopCluster(cluster_app)
Here is the solution I found:
# 1. Packages
library(doParallel)
library(foreach)
# 2. Create and run app cluster
cluster_app <- makeCluster(detectCores())
registerDoParallel(cluster_app)
# 3. Loop with result
list_i <- foreach(i = 1:100) %dopar% {
print(i)
}
list_i
# 4. Mock data processing
mock_data <- function(x) {
data.frame(matrix(NA, nrow = x, ncol = x))
}
# 4. How make loop with empty 'list_i' ?
foreach(i = 1:10, .combine = 'c') %dopar% {
# 1. Calculations
mock_data(x)
# 2. Result
NULL
}
# The results has only one value 'NULL' (not a data set)
list_i
# 5. Stop app cluster
stopCluster(cluster_app)

R - parallelisation error, checkCluster(cl) - not a valid cluster

This code brings me an error: Error in checkCluster(cl): not a valid cluster
library(parallel)
numWorkers <-8
cl <-makeCluster(numWorkers, type="PSOCK")
res.mat <- parLapply(1:10, function(x) my.fun(x))
stopCluster(cl)
Without parallelisation attempts this works totally fine:
res.mat <- lapply(1:10, function(x) my.fun(x))
And this example works very well too:
workerFunc <- function(n){return(n^2)}
library(parallel)
numWorkers <-8
cl <-makeCluster(numWorkers, type ="PSOCK")
res <- parLapply(cl, 1:100, workerFunc)
stopCluster(cl)
print(unlist(res))
How can i solve my problem?
I found for example
class(cl)
[1] "SOCKcluster" "cluster"
an cl is:
cl
socket cluster with 8 nodes on host ‘localhost’
library(parallel)
numWorkers <- 8
cl <-makeCluster(numWorkers, type="PSOCK")
res.mat <- parLapply(cl,1:10, function(x) my.fun(x))
stopCluster(cl)
Just to be excessively specific, the problem with
res.mat <- parLapply(1:10, function(x) my.fun(x))
is not necessarily the order of the arguments, but that the argument cl is not specified.
cl <-makeCluster(numWorkers, type ="PSOCK")
res.mat <- parLapply(x = 1:10,
fun = function(x) my.fun(x),
cl = cl
)
should work, because all required arguments are specified. Alternatively, ?parLapply indicates that parLapply uses the default cluster if cl is not specified. A default cluster can be set using parallel::setDefaultCluster(), which then allows parLapply to revert to default behaviour when cl is not included in the user input.
cl <-makeCluster(numWorkers, type ="PSOCK")
parallel::setDefaultCluster(cl)
res.mat <- parallel::parLapply(x = 1:10,#by default cl = NULL if not specified
fun = function(x) my.fun(x),
)

Assign function output with assign

I am using
library(foreach)
library(doSNOW)
And I have a function mystoploss(data,n=14)
I then call it like that (I want to loop over n=14 for now):
registerDoSNOW(makeCluster(4, type = "SOCK"))
foreach(i = 14) %dopar% {assign(paste("Performance",i,sep=""),
mystoploss(data=mydata,n=i))}
I then try to find Performance14 from above, but it is not assigned.
Is there some way to assign so the output will be in Performance14?
And if I use
foreach(i = 14) %dopar% {assign(paste("Performance",i,sep=""),
mystoploss(data=mydata,n=i),envir = .GlobalEnv)}
I get error :
Error in e$fun(obj, substitute(ex), parent.frame(), e$data) :
worker initialization failed: Error in as.name
Best Regards
This is because the assign operations are happening in the worker processes. The vaues of the variables are being sent back (see your R session console) but not with the names you assigned. You need to capture these values and assign them names again. See this related question.
The following is an alternative that may be of help: asign the output of foreach to an intermediate variable and assign it to your desired variables in the current 'master process' environment.
PerformanceAll <- foreach(i = 1:14,.combine="c") %dopar% { mystoploss(data=mydata,n=i) } #pick .combine appropriately
for(i in 1:14){ assign(paste("Performance",i,sep=""), PerformanceAll[i]) }
Here is the full example I tried:
library(foreach)
library(doSNOW)
mystoploss <- function(data=1,n=1){
return(runif(data)) #some operation, returns a scalar
}
mydata <- 1
registerDoSNOW(makeCluster(4, type = "SOCK"))
PerformanceAll <- foreach(i = 1:14,.combine="c") %dopar% { mystoploss(data=mydata,n=i) }#pick .combine appropriately
for(i in 1:14){ assign(paste("Performance",i,sep=""), PerformanceAll[i]) }
Edit: If the output of mystoploss is a list, then do the following changes:
mystoploss <- function(data=1,n=1){#Example
return(list(a=runif(data),b=1)) #some operation, return a list
}
PerformanceAll <- foreach(i = 1:14) %dopar% { mystoploss(data=mydata,n=i) }#remove .combine
for(i in 1:14){ assign(paste("Performance",i,sep=""), PerformanceAll[[i]]) } #double brackets

nlsBoot and foreach %dopar%: scoping issues

I would like to do bootstrap of residuals for nls fits in a loop. I use nlsBoot and in order to decrease computation time I would like to do that in parallel (on a Windows 7 system at the moment). Here is some code, which reproduces my problem:
#function for fitting
Falge2000 <- function(GP2000,alpha,PAR) {
(GP2000*alpha*PAR)/(GP2000+alpha*PAR-GP2000/2000*PAR)
}
#some data
PAR <- 10:1600
GPP <- Falge2000(-450,-0.73,PAR) + rnorm(length(PAR),sd=0.0001)
df1 <- data.frame(PAR,GPP)
#nls fit
mod <- nls(GPP~Falge2000(GP2000,alpha,PAR),start=list(GP2000=-450,alpha=-0.73),data=df1, upper=c(0,0),algorithm="port")
#bootstrap of residuals
library(nlstools)
summary(nlsBoot(mod,niter=5))
#works
#now do it several times
#and in parallel
library(foreach)
library(doParallel)
cl <- makeCluster(1)
registerDoParallel(cl)
ttt <- foreach(1:5, .packages='nlstools',.export="df1") %dopar% {
res <- nlsBoot(mod,niter=5)
summary(res)
}
#Error in { :
#task 1 failed - "Procedure aborted: the fit only converged in 1 % during bootstrapping"
stopCluster(cl)
I suspect this an issue with environments and after looking at the code of nlsBoot the problem seems to arise from the use of an anonymous function in a lapply call:
l1 <- lapply(1:niter, function(i) {
data2[, var1] <- fitted1 + sample(scale(resid1, scale = FALSE),
replace = TRUE)
nls2 <- try(update(nls, start = as.list(coef(nls)), data = data2),
silent = TRUE)
if (inherits(nls2, "nls"))
return(list(coef = coef(nls2), rse = summary(nls2)$sigma))
})
if (sum(sapply(l1, is.null)) > niter/2)
stop(paste("Procedure aborted: the fit only converged in",
round(sum(sapply(l1, is.null))/niter), "% during bootstrapping"))
Is there a way to use nlsBoot in a parallel loop? Or do I need to modify the function? (I could try to use a for loop instead of lapply.)
By moving the creation of the mod object into the %dopar% loop, it looks like everything works OK. Also, this automatically exports the df1 object, so you can remove the .export argument.
ttt <- foreach(1:5, .packages='nlstools') %dopar% {
mod <- nls(GPP~Falge2000(GP2000,alpha,PAR),start=list(GP2000=-450,alpha=-0.73),data=df1, upper=c(0,0),algorithm="port")
res <- nlsBoot(mod,niter=5)
capture.output(summary(res))
}
However, you might need to work out what you want returned. Using capture.output was just to see if things were working, since summary(res) seemed to only return NULL.

Foreach throws error with %dopar% but executes successfully with %do%

I am trying to convert the following code into parallel using foreach and %dopar%.
library(doSNOW)
library(foreach)
cl<- makeCluster(4, type = "SOCK")
registerDoSNOW(cl)
min_subid <- c()
max_subid <- c()
p_typ <- c()
p_nm <- c()
st_tm<-c()
end_tm <- c()
supp <- c()
chart_type <- c()
foreach(j =1:noOfPhases) %dopar%
{
start_time <-phases[j, colnames(phases)=="StartTime"]
end_time <-phases[j, colnames(phases)=="StopTime"]
phase_type <-phases[j, colnames(phases)=="Phase_Type_Id"]
phase_name <-phases[j, colnames(phases)=="Phase_Name"]
suppress <-phases[j, colnames(phases)=="Suppression_Time"]
chart_typ <-phases[j, colnames(phases)=="chartType"]
conft<-(masterData$Time.Subgroup>=start_time & masterData$Time.Subgroup<=end_time)
masterData[which(conft), colnames(masterData)=="Phase_Type"]<-phase_type
masterData[which(conft), colnames(masterData)=="Phase_Name"]<-phase_name
min_subid <- rbind(min_subid, min(which(conft)))
max_subid <- rbind(max_subid, max(which(conft)))
p_typ <- rbind( p_typ, masterData$Phase_Type[min(which(conft))])
p_nm <- rbind( p_nm, masterData$Phase_Name[min(which(conft))])
st_tm <- rbind( st_tm, as.character(start_time))
end_tm <- rbind( end_tm, as.character(end_time))
supp <- rbind(supp,as.character(suppress))
chart_type <- rbind(chart_type,as.character(chart_typ))
phase_info <- data.frame(Subgrp_No_Start=min_subid, Subgrp_No_End=max_subid, Phase_Type=p_typ,
Phase_Name=p_nm, Start_Time=st_tm, Stop_Time=end_tm,
Suppression_Time=supp,ChartType=chart_type)
}
phase_output<-merge(phase_info, phases, by.x=c("Start_Time",
"Stop_Time","ChartType"), by.y=c("StartTime", "StopTime","chartType"))
The above code executes successfully when %do% is included instead of %dopar%. can anyone help me in understanding why I get the following error when it runs parallel (%dopar%) and runs successfully on sequential (%do%)
Error in merge(phase_info, phases, by.x = c("Start_Time", "Stop_Time", :
object 'phase_info' not found
The solution is really simple, but I start off with an explanation of what is happening when you execute the code to explain the error.
What happens in your foreach block is that the one data frame (phase_info) is created for each value of j and they are returned together in a list. However, since your assignment phase_info <- data.frame(...) is located inside the foreach rather than outside, the list is not stored anywhere and gets discarded. The cause for confusion is that when using %do% you create all the data frames sequentially on the master node and when using %dopar% the frames are being created in parallel on the worker nodes. The following merge command is executed on the master node causing an error if you used %dopar% since phase_info does not exist in its workspace. Also note that when using %do% like above, each iterations of foreach overwrites the result of the previous ones (i.e. you get only the result of the last iteration).
This minor change fixes it:
phase_info <- foreach(...) %dopar% {
...
data.frame(Subgrp_No_Start=min_subid, Subgrp_No_End=max_subid, Phase_Type=p_typ,
Phase_Name=p_nm, Start_Time=st_tm, Stop_Time=end_tm,
Suppression_Time=supp,ChartType=chart_type)
# No need to give it a name as it will be returned and the name forgotten
}
phase_output <- merge(phase_info, ...)
As I mentioned above, phase_info will now be a list where each element is a data frame. I am just guessing now but you probably want to execute the merge elementwise then, like this:
phase_output <- lapply(phase_info, merge, phases, by.x=c("Start_Time",
"Stop_Time","ChartType"), by.y=c("StartTime", "StopTime","chartType"))

Resources