doSNOW and Foreach loop (R) on cluster? - r

I am using a cluster to run a foreach loop in parallel, using doSNOW. The loop works on my desktop, I receive this warning when running on the cluster
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: local ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
The loop is rather large, so I have just provided a very basic sample here (I do not believe the error is in the loop, as it works on the desktop).
library(sp)
library(raster)
library(fields)
library(tidyr)
library(dplyr)
library(sphereplot)
library(dismo)
library(doSNOW)
library(parallel)
cores <- (detectCores()/2)/2
print(cores)
cl <- makeCluster(cores, type = "SOCK", outfile = "")
registerDoSNOW(cl)
FossilClimCoV <- foreach(i = 0:5,.combine = "rbind",
.packages = paste(c("dplyr","dismo",
"sp","raster",
"fields",
"tidyr","sphereplot","doSNOW","parallel"
)))%dopar%{
print(i)
FossilTemp <- Fossils%>%dplyr::filter(Age == i)
if(nrow(FossilTemp)>0){
BULK removed for ease
return(FossilTemp1)
}
}
I'm not sure how to fix this error. I don't understand why it will not work on the cluster, but will on my desktop.
EDIT 1
I have now resolved this large error by changing from a doSNOW backend to doParallel.
library(doParallel)
registerDoParallel(cores=3)
*foreach loop*
However, I now have a new error:
Calls: %dopar% -> <Anonymous>
Execution halted
If I change the errorhandling to "remove" the foreach loop will always return an empty vector.

Related

How to fix C function R_nc4_get_vara_double returned error in ncdf4 parallel processing in R

I want to download nc data through OPENDAP from a remote storage. I use parallel backend with foreach - dopar loop as follows:
# INPUTS
inputs=commandArgs(trailingOnly = T)
interimpath=as.character(inputs[1])
gcm=as.character(inputs[2])
period=as.character(inputs[3])
var=as.character(inputs[4])
source='MACAV2'
cat('\n\n EXTRACTING DATA FOR',var, gcm, period, '\n\n')
# CHANGING LIBRARY PATHS
.libPaths("/storage/home/htn5098/local_lib/R40") # local library for packages
setwd('/storage/work/h/htn5098/DataAnalysis')
source('./src/Rcodes/CWD_function_package.R') # Calling the function Rscript
# CALLING PACKAGES
library(foreach)
library(doParallel)
library(parallel)
library(filematrix)
# REGISTERING CORES FOR PARALLEL PROCESSING
no_cores <- detectCores()
cl <- makeCluster(no_cores)
registerDoParallel(cl)
invisible(clusterEvalQ(cl,.libPaths("/storage/home/htn5098/local_lib/R40"))) # Really have to import library paths into the workers
invisible(clusterEvalQ(cl, c(library(ncdf4))))
# EXTRACTING DATA FROM THE .NC FILES TO MATRIX FORM
url <- readLines('./data/external/MACAV2_OPENDAP_allvar_allgcm_allperiod.txt')
links <- grep(x = url,pattern = paste0('.*',var,'.*',gcm,'_.*',period), value = T)
start=c(659,93,1) # lon, lat, time
count=c(527,307,-1)
spfile <- read.csv('./data/external/SERC_MACAV2_Elev.csv',header = T)
grids <- sort(unique(spfile$Grid))
clusterExport(cl,list('ncarray2matrix','start','count','grids')) #exporting data into clusters for parallel processing
cat('\nChecking when downloading all grids\n')
# k <- foreach(x = links,.packages = c('ncdf4')) %dopar% {
# nc <- nc_open(x)
# nc.var=ncvar_get(nc,varid=names(nc$var),start=start,count=count)
# return(nc.var)
# nc_close(nc)
# }
k <- foreach(x = links,.packages = c('ncdf4'),.errorhandling = 'pass') %dopar% {
nc <- nc_open(x)
print(nc)
nc.var=ncvar_get(nc,varid=names(nc$var),start=c(659,93,1),count=c(527,307,-1))
nc_close(nc)
return(dim(nc.var))
Sys.sleep(10)
}
# k <- parSapply(cl,links,function(x) {
# nc <- nc_open(x)
# nc.var=ncvar_get(nc,varid=names(nc$var),start=start,count=count)
# nc_close(nc)
# return(nc.var)
# })
print(k)
However, I keep getting this error:
<simpleError in ncvar_get_inner(ncid2use, varid2use, nc$var[[li]]$missval, addOffset, scaleFact, start = start, count = count, verbose = verbose, signedbyte = signedbyte, collapse_degen = collapse_degen): C function R_nc4_get_vara_double returned error>
What could be the reason for this problem? Can you recommend a solution for this that is time-efficient (I have to repeat this for about 20 files)?
Thank you.
I had the same error in my code. The problem was not the code itself. It was one of the files that I wanted to read. It has something wrong, so R couldn't open it. I identified the file and downloaded it again, and the same code worked perfectly.
I also encountered the same error. For me, restarting R session did the trick.

Error running Rmpi when doing parallel computing

I'm trying to running parallel computing in R with below lines
library(parallel)
library(snow)
library(snowFT)
library(VGAM)
library(dplyr)
library(Rmpi)
nCores <- detectCores() - 1
cl <- makeCluster(nCores)
Then R returns an error
Error in Rmpi::mpi.comm.spawn(slave = mpitask, slavearg = args, nslaves = count, : Internal MPI error!, error stack: MPI_Comm_spawn(cmd="C:/R/R-40~1.2/bin/x64/Rscript.exe", argv=0x00000223DB137530, maxprocs=11, MPI_INFO_NULL, root=0, MPI_COMM_SELF, intercomm=0x00000223DCFCD998, errors=0x00000223DA9FC9E8) failed Internal MPI error! FAILspawn not supported without process manager
3. Rmpi::mpi.comm.spawn(slave = mpitask, slavearg = args, nslaves = count, intercomm = intercomm)
2. makeMPIcluster(spec, ...)
1. makeCluster(nCores)
I've tried to install MPICH2 on Windows from here, but the final cmd command mpiexec -validate always returns FAIL.
Could you please elaborate on how to solve this issue?
The problem is that makeCluster(nCores) is used by more than one package. As such, I use parallel::makeCluster(nCores) to solve the issue.

Error in serialize(data, node$con) : error writing to connection

I'm currently trying to run some code that implements parallel processing, but I'm running into this error:
Error: cannot allocate vector of size 2.1 Gb
Execution halted
Error in serialize(data, node$con) : error writing to connection
Calls: %dopar% ... postNode -> sendData -> sendData.SOCKnode -> serialize
Execution halted
Warning message:
system call failed: Cannot allocate memory
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode ->
unserialize Execution halted
I can't seem to figure out why there's a memory problem. If I take the code out of the foreach loop or change the foreach to a for loop, it works perfectly fine, so I don't think it has to do with the contents of the code itself, but rather something about the parallelization. Also, it seems to throw the error pretty soon after the code starts executing. Any ideas why this might be happening? Here's a look at my code:
list_storer <- list()
list_storer <- foreach(bt=2:bootreps, .combine=list, .multicombine=TRUE) %dopar% {
ur <- sample.int(nrow(dailydatyr),nrow(dailydatyr),replace=TRUE)
ddyr_boot <- dailydatyr[ur,]
weightvar <- ddyr_boot[,c('ymd1_IssueD','MatD_ymd2')]
weightvar <- abs(weightvar)
x <- DM[ur,]
y<-log(ddyr_boot$dirtyprice2/ddyr_boot$dirtyprice1)
weightings <- rep(1,nrow(ddyr_boot))
weightings <- weightings/(ddyr_boot$datenum2-ddyr_boot$datenum1)
treg <- repeatsales(y,x,maxdailyreturn,weightings,weightvar)
zbtcol <- 0
cnst <- NULL
if (is.null(dums) == FALSE){
zbtcol <- length(treg)-ncol(x)
cnst <- paste("tbs(",dums,")_",(middleyr),sep="")
if (is.null(interactVar) == FALSE){
ninteract <- (length(treg)-ncol(x)-length(dums))/length(dums)
interact <- unlist(lapply(cnst,function(xla) paste(xla,"*c",c(1:ninteract),sep="")))
cnst <- c(cnst,interact)}
}
}
tregtotal <- tregtotal + (is.na(treg)==FALSE)
treg[is.na(treg)==TRUE] <- 0
list_storer[[length(list_storer)+1]] <- treg
}
stopImplicitCluster(cl)
Parallelisation as done by foreach is a space vs. time trade-off. We get faster execution at the expense of higher memory usage. The reason for the higher memory usage is that several R process are started and each of them needs it’s own memory to hold the data necessary for the calculation. Currently foreach is using an implicit PSOCK cluster. One way to solve this is to make the cluster creation explicit using a lower number of processes. How low depends on the amount of memory you have and on the memory requirements of each job:
n <- parallel::detectCores()/2 # experiment!
cl <- parallel::makeCluster(n)
doParallel::registerDoParallel(cl)
<foreach>
parallel::stopCluster(cl)

Error in parallel process using doSNOW

Error in { : task 1 failed - "invalid connection"
Why do I get this error, every time when I try to use all 4 cores for a parallel process.
Here is the example code:
NumberOfCluster <- 4
cl <- makeCluster(NumberOfCluster)
registerDoSNOW(cl)
fl<- file(file.choose(),"r") # file.choose() is going to locate a file(.tsv)
# of size 8 gb (RAM is 4 GB)
foreach(i=1:3) %dopar% {
View(name_fil <- read.delim(fl,nrows = 1000000,header = TRUE))
}
You're getting an error because file objects can't be exported to the workers. Instead, you could export the name of the file and open that file on each of the workers:
fname <- file.choose()
foreach(i=1:3) %dopar% {
fl <- file(fname, "r")
View(name_fil <- read.delim(fl,nrows = 1000000,header = TRUE))
}
You may run into problems using the View function next, but this should solve the "invalid connection" error.

using package snow's parRapply: argument missing error

I want to find documents whose similarity between other doucuments are larger than a given value(0.1) by cutting documents into blocks.
library(tm)
data("crude")
sample.dtm <- DocumentTermMatrix(
crude, control=list(
weighting=function(x) weightTfIdf(x, normalize=FALSE),
stopwords=TRUE
)
)
step = 5
n = nrow(sample.dtm)
block = n %/% step
start = (c(1:block)-1)*step+1
end = start+step-1
j = unlist(lapply(1:(block-1),function(x) rep(((x+1):block),times=1)))
i = unlist(lapply(1:block,function(x) rep(x,times=(block-x))))
ij <- cbind(i,j)
library(skmeans)
getdocs <- function(k){
ci <- c(start[k[[1]]]:end[k[[1]]])
cj <- c(start[k[[2]]]:end[k[[2]]])
combi <- sample.dtm[ci]
combj < -sample.dtm[cj]
rownames(combi)<-ci
rownames(combj)<-cj
comb<-c(combi,combj)
sim<-1-skmeans_xdist(comb)
cat("Block", k[[1]], "with Block", k[[2]], "\n")
flush.console()
tri.sim<-upper.tri(sim,diag=F)
results<-tri.sim & sim>0.1
docs<-apply(results,1,function(x) length(x[x==TRUE]))
docnames<-names(docs)[docs>0]
gc()
return (docnames)
}
It works well when using apply
system.time(rmdocs<-apply(ij,1,getdocs))
When using parRapply
library(snow)
library(skmeans)
cl<-makeCluster(2)
clusterExport(cl,list("getdocs","sample.dtm","start","end"))
system.time(rmdocs<-parRapply(cl,ij,getdocs))
Error:
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: attempt to set 'rownames' on an object with no dimensions
Timing stopped at: 0.01 0 0.04
It seems that sample.dtm coundn't be used in parRapply. I'm confused. Can anyone help me? Thanks!
In addition to exporting objects, you need to load the necessary packages on the cluster workers. In your case, the result of not doing so is that there isn't a dimnames method defined for "DocumentTermMatrix" objects, causing rownames<- to fail.
You can load packages on the cluster workers with the clusterEvalQ function:
clusterEvalQ(cl, { library(tm); library(skmeans) })
After doing that, rownames(combi)<-ci will work correctly.
Also, if you want to see the output from cat, you should use the makeCluster outfile argument:
cl <- makeCluster(2, outfile='')

Resources