doParallel issue with inline function on Windows 7 (works on Linux)

doParallel issue with inline function on Windows 7 (works on Linux) - r

I am using R 3.0.1 both on Windows 7 and Linux (SUSE Server 11 (x86_64)). The following example code produces an error on Windows but not on Linux. All the toolboxes listed are up-to-date in both machines.
The Windows error is:
Error in { : task 1 failed - "NULL value passed as symbol address"
If I change %dopar% to %do%, the Windows code runs without any errors. My initial guess was that this relates to some configuration issue in Windows and I tried reinstalling Rcpp and R but that did not help. The error seems to be related to scoping - if I define and compile the function cFunc inside f1, then %dopar% works but, as expected, it is very slow since we are calling the compiler once for each task.
Does anyone have some insights on why the error happens or suggestions on how to fix it?
library(inline)
sigFunc <- signature(x="numeric", size_x="numeric")
code <- ' double tot =0;
for(int k = 0; k < INTEGER(size_x)[0]; k++){
tot += REAL(x)[k];
};
return ScalarReal(tot);
'
cFunc <- cxxfunction(sigFunc, code)
f1 <- function(){
x <- rnorm(100)
a <- cFunc(x=x, size_x=as.integer(length(x)))
return(a)
}
library(foreach)
library(doParallel)
registerDoParallel()
# this produces an error in Windows but not in Linux
res <- foreach(counter=(1:100)) %dopar% {f1()}
# this works for both Windows and Linux
res <- foreach(counter=(1:100)) %do% {f1()}
# The following is not a practical solution, but I can compile cFunc inside f1 and then this works in Windows but it is very slow
f1 <- function(){
library(inline)
sigFunc <- signature(x="numeric", size_x="numeric")
code <- ' double tot =0;
for(int k = 0; k < INTEGER(size_x)[0]; k++){
tot += REAL(x)[k];
};
return ScalarReal(tot);
'
cFunc <- cxxfunction(sigFunc, code)
x <- rnorm(100)
a <- cFunc(x=x, size_x=as.integer(length(x)))
return(a)
}
# this now works in Windows but is very slow
res <- foreach(counter=(1:100)) %dopar% {f1()}
Thanks!
Gustavo

The error message "NULL value passed as symbol address" is unusual, and isn't due to the function not being exported to the workers. The cFunc function just doesn't work after being serialized, sent to a worker, and unserialized. It also doesn't work when it's loaded from a saved workspace, which results in the same error message. That doesn't surprise me much, and it may be a documented behavior of the inline package.
As you've demonstrated, you can work-around the problem by creating cFunc on the workers. To do that efficiently, you need to do it only once on each of the workers. To do that with the doParallel backend, I would define a worker initialization function, and execute it on each of the workers using the clusterCall function:
worker.init <- function() {
library(inline)
sigFunc <- signature(x="numeric", size_x="numeric")
code <- ' double tot =0;
for(int k = 0; k < INTEGER(size_x)[0]; k++){
tot += REAL(x)[k];
};
return ScalarReal(tot);
'
assign('cFunc', cxxfunction(sigFunc, code), .GlobalEnv)
NULL
}
f1 <- function(){
x <- rnorm(100)
a <- cFunc(x=x, size_x=as.integer(length(x)))
return(a)
}
library(foreach)
library(doParallel)
cl <- makePSOCKcluster(3)
clusterCall(cl, worker.init)
registerDoParallel(cl)
res <- foreach(counter=1:100) %dopar% f1()
Note that you must create the PSOCK cluster object explicitly in order to call clusterCall.
The reason that your example worked on Linux is that the mclapply function is used when you call registerDoParallel without an argument, while on Windows a cluster object is created and the clusterApplyLB function is used. Functions and variables aren't serialized and sent to the workers when using mclapply, so there is no error.
It would be nice if doParallel included support for initializing the workers without the need for using clusterCall, but it doesn't yet.

The easiest 'workaround', I could think, would be
1) Write your code in a separate source file, say cFunc.c,
2) Compile it with R CMD SHLIB,
3) dyn.load that function within your foreach call.
For example,
cFunc.c
=======
#include <R.h>
#include <Rinternals.h>
SEXP cFunc( SEXP x, SEXP size_x ) {
double tot = 0;
for (int k=0; k < INTEGER(size_x)[0]; ++k ) {
tot += REAL(x)[k];
}
return ScalarReal(tot);
}
and
library(foreach)
library(doParallel)
registerDoParallel()
x <- as.numeric(1:100)
size_x <- as.integer(length(x))
res <- foreach(counter=(1:100)) %dopar% {
dyn.load("cFunc.dll")
.Call("cFunc", x, size_x)
}
Alternatively (and probably better), consider building an actual package with this function exported that you can use.

Related

optimParallel can not find Rcpp function [duplicate]

I've written a function in Rcpp and compiled it with inline. Now, I want to run it in parallel on different cores, but I'm getting a strange error. Here's a minimal example, where the function funCPP1 can be compiled and runs well by itself, but cannot be called by snow's clusterCall function. The function runs well as a single process, but gives the following error when ran in parallel:
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
2 nodes produced errors; first error: NULL value passed as symbol address
And here is some code:
## Load and compile
library(inline)
library(Rcpp)
library(snow)
src1 <- '
Rcpp::NumericMatrix xbem(xbe);
int nrows = xbem.nrow();
Rcpp::NumericVector gv(g);
for (int i = 1; i < nrows; i++) {
xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
}
return xbem;
'
funCPP1 <- cxxfunction(signature(xbe = "numeric", g="numeric"),body = src1, plugin="Rcpp")
## Single process
A <- matrix(rnorm(400), 20,20)
funCPP1(A, 0.5)
## Parallel
cl <- makeCluster(2, type = "SOCK")
clusterExport(cl, 'funCPP1')
clusterCall(cl, funCPP1, A, 0.5)

Think it through -- what does inline do? It creates a C/C++ function for you, then compiles and links it into a dynamically-loadable shared library. Where does that one sit? In R's temp directory.
So you tried the right thing by shipping the R frontend calling that shared library to the other process (which has another temp directory !!), but that does not get the dll / so file there.
Hence the advice is to create a local package, install it and have both snow processes load and call it.
(And as always: better quality answers may be had on the rcpp-devel list which is read by more Rcpp constributors than SO is.)

Old question, but I stumbled across it while looking through the top Rcpp tags so maybe this answer will be of use still.
I think Dirk's answer is proper when the code you've written is fully de-bugged and does what you want, but it can be a hassle to write a new package for such as small piece of code like in the example. What you can do instead is export the code block, export a "helper" function that compiles source code and run the helper. That'll make the CXX function available, then use another helper function to call it. For instance:
# Snow must still be installed, but this functionality is now in "parallel" which ships with base r.
library(parallel)
# Keep your source as an object
src1 <- '
Rcpp::NumericMatrix xbem(xbe);
int nrows = xbem.nrow();
Rcpp::NumericVector gv(g);
for (int i = 1; i < nrows; i++) {
xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
}
return xbem;
'
# Save the signature
sig <- signature(xbe = "numeric", g="numeric")
# make a function that compiles the source, then assigns the compiled function
# to the global environment
c.inline <- function(name, sig, src){
library(Rcpp)
funCXX <- inline::cxxfunction(sig = sig, body = src, plugin="Rcpp")
assign(name, funCXX, envir=.GlobalEnv)
}
# and the function which retrieves and calls this newly-compiled function
c.namecall <- function(name,...){
funCXX <- get(name)
funCXX(...)
}
# Keep your example matrix
A <- matrix(rnorm(400), 20,20)
# What are we calling the compiled funciton?
fxname <- "TestCXX"
## Parallel
cl <- makeCluster(2, type = "PSOCK")
# Export all the pieces
clusterExport(cl, c("src1","c.inline","A","fxname"))
# Call the compiler function
clusterCall(cl, c.inline, name=fxname, sig=sig, src=src1)
# Notice how the function now named "TestCXX" is available in the environment
# of every node?
clusterCall(cl, ls, envir=.GlobalEnv)
# Call the function through our wrapper
clusterCall(cl, c.namecall, name=fxname, A, 0.5)
# Works with my testing
I've written a package ctools (shameless self-promotion) which wraps up a lot of the functionality that is in the parallel and Rhpc packages for cluster computing, both with PSOCK and MPI. I already have a function called "c.sourceCpp" which calls "Rcpp::sourceCpp" on every node in much the same way as above. I'm going to add in a "c.inlineCpp" which does the above now that I see the usefulness of it.
Edit:
In light of Coatless' comments, the Rcpp::cppFunction() in fact negates the need for the c.inline helper here, though the c.namecall is still needed.
src2 <- '
NumericMatrix TestCpp(NumericMatrix xbe, int g){
NumericMatrix xbem(xbe);
int nrows = xbem.nrow();
NumericVector gv(g);
for (int i = 1; i < nrows; i++) {
xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
}
return xbem;
}
'
clusterCall(cl, Rcpp::cppFunction, code=src2, env=.GlobalEnv)
# Call the function through our wrapper
clusterCall(cl, c.namecall, name="TestCpp", A, 0.5)

I resolved it by sourcing on each cluster cluster node an R file with the wanted C inline function:
clusterEvalQ(cl,
{
library(inline)
invisible(source("your_C_func.R"))
})
And your file your_C_func.R should contain the C function definition:
c_func <- cfunction(...)

R foreach stop iteration at i

I am using R package foreach.
When bug exists in foreach block, it's hard to re-occur it and hard to debug.
Take the following script as example.
I want to stop at i=4 to check what's wrong. However, it stops at i=10.
Any solution?
library(foreach)
foreach(i = icount(10)) %do% {
if (i == 4){
e <- simpleError("test error")
stop(e)
}
}

One option to handle this is with a browser() inside a tryCatch as in:
foreach(i = icount(10)) %do% {
tryCatch(
if (i == 4){
e <- simpleError("test error")
stop(e)
},
error = function(e) browser()
)
}
This will produce a browser of the environment at the time of the error, which will allow you to inspect any objects and/or debug your code.
Your console will then look like the following and you can ask what the value of i is. Like this:
Browse[1]> i
[1] 4

Unable to start Julia connection on port 1023: all connections are in use

I am trying to run Julia function via R using XRJulia package. Below is my code snippet.
## start
library(XRJulia)
prevInterface <- XR::getInterface()
if (is.null(prevInterface)) {
ev <- RJulia(.makeNew = TRUE)
} else {
ev <- RJulia(.makeNew = FALSE)
}
juliaAddToPath(directory = '/home/.julia/lib/v0.6/', package = NULL, evaluator = ev)
runjl <- juliaEval('function sum(a, b)
c= a+b;
return c
end
')
runjl_function <- JuliaFunction(runjl)
sum_result <- runjl_function(1, 5)
XR::rmInterface(XR::getInterface())
## end
This code is working fine. But few times when I am running above code multiple times I am getting
error: Unable to start Julia connection on port 1023: all connections
are in use.
How to close all connections of Julia and what is the systematic way..? Please suggest.

You have the function ServerQuit() in the RJuliaConnect:
https://github.com/johnmchambers/XRJulia/blob/master/R/RJuliaConnect.R

unrelated nested foreach with an outer %dopar% and an inner %do%

I am running tasks locally in parallel using %dopar% from the foreach package using the doSNOW package to create the cluster (running this on a windows machine at the moment). I have done this many times before and it works fine until I place an unrelated foreach loop using a %do% (i.e. non-parallel) inside of it. Then R gives me the error (with traceback) :
Error in { : task 1 failed - "could not find function "%do%"" 3 stop(simpleError(msg, call = expr)) 2 e$fun(obj, substitute(ex), parent.frame(), e$data) 1 foreach(rc = 1:5) %dopar% {
aRandomCounter = -1
if (1 > 0) {
for (batchi in 1:20) { ...
Here is some code that replicates the problem on my machine:
require(foreach)
require(doSNOW)
cl<-makeCluster(5)
registerDoSNOW(cl)
for(stepi in 1:10) # normal outer for
{
foreach(rc=1:5) %dopar% # the time consuming stuff in parallel (not looking to actually retrieve any data)
{
aRandomCounter = -1
if(1 > 0)
{
for(batchi in 1:20)
{
anObjectIwantToCreate <- foreach( qrc = 1:100, .combine=c ) %do%
{
return(runif(1)) # I know this is not efficient, it is a placeholder to reproduce the issue
}
aRandomCounter = aRandomCounter + sum(anObjectIwantToCreate > 0.5)
}
}
return(aRandomCounter)
}
}
stopCluster(cl)
Replacing the inner foreach with a simple for or (l/s)apply is a solution. But is there a way to make this work with the inner foreach and why the error in the first place ?

Of course, I got it to work as soon as I posted it (sorry.. I will leave it in case someone else has the same issue). It is a scoping issue - I knew you had to load any external packages within the %dopar%, but what I did not realize is that that includes the foreach package itself. Here is the solution:
require(foreach)
require(doSNOW)
cl<-makeCluster(5)
registerDoSNOW(cl)
for(stepi in 1:10) # normal outer for
{
foreach(rc=1:5) %dopar% # the time consuming stuff in parallel (not looking to actually retrieve any data)
{
require(foreach) ### the solution
aRandomCounter = -1
if(1 > 0)
{
for(batchi in 1:20)
{
anObjectIwantToCreate <- foreach( qrc = 1:100, .combine=c ) %do%
{
return(runif(1))
}
aRandomCounter = aRandomCounter + sum(anObjectIwantToCreate > 0.5)
}
}
return(aRandomCounter)
}
}
stopCluster(cl)

I know this is an outdate question, but just to give a hint for those
who do not get nested foreach to work.
If parallelizing outer loop with putting %do% in %dopar%, you would
need to include .packages = c("doSNOW") in the augment of the
outer loop (%dopar%), otherwise you will run into "doSNOW not found" error.
Generally, people just parallelize inner loop (%dopar% in %:%), which
can be slow for a huge amount of data (waiting for combinations of inner loops).

Using Rcpp within parallel code via snow to make a cluster

I've written a function in Rcpp and compiled it with inline. Now, I want to run it in parallel on different cores, but I'm getting a strange error. Here's a minimal example, where the function funCPP1 can be compiled and runs well by itself, but cannot be called by snow's clusterCall function. The function runs well as a single process, but gives the following error when ran in parallel:
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
2 nodes produced errors; first error: NULL value passed as symbol address
And here is some code:
## Load and compile
library(inline)
library(Rcpp)
library(snow)
src1 <- '
Rcpp::NumericMatrix xbem(xbe);
int nrows = xbem.nrow();
Rcpp::NumericVector gv(g);
for (int i = 1; i < nrows; i++) {
xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
}
return xbem;
'
funCPP1 <- cxxfunction(signature(xbe = "numeric", g="numeric"),body = src1, plugin="Rcpp")
## Single process
A <- matrix(rnorm(400), 20,20)
funCPP1(A, 0.5)
## Parallel
cl <- makeCluster(2, type = "SOCK")
clusterExport(cl, 'funCPP1')
clusterCall(cl, funCPP1, A, 0.5)

Think it through -- what does inline do? It creates a C/C++ function for you, then compiles and links it into a dynamically-loadable shared library. Where does that one sit? In R's temp directory.
So you tried the right thing by shipping the R frontend calling that shared library to the other process (which has another temp directory !!), but that does not get the dll / so file there.
Hence the advice is to create a local package, install it and have both snow processes load and call it.
(And as always: better quality answers may be had on the rcpp-devel list which is read by more Rcpp constributors than SO is.)

Old question, but I stumbled across it while looking through the top Rcpp tags so maybe this answer will be of use still.
I think Dirk's answer is proper when the code you've written is fully de-bugged and does what you want, but it can be a hassle to write a new package for such as small piece of code like in the example. What you can do instead is export the code block, export a "helper" function that compiles source code and run the helper. That'll make the CXX function available, then use another helper function to call it. For instance:
# Snow must still be installed, but this functionality is now in "parallel" which ships with base r.
library(parallel)
# Keep your source as an object
src1 <- '
Rcpp::NumericMatrix xbem(xbe);
int nrows = xbem.nrow();
Rcpp::NumericVector gv(g);
for (int i = 1; i < nrows; i++) {
xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
}
return xbem;
'
# Save the signature
sig <- signature(xbe = "numeric", g="numeric")
# make a function that compiles the source, then assigns the compiled function
# to the global environment
c.inline <- function(name, sig, src){
library(Rcpp)
funCXX <- inline::cxxfunction(sig = sig, body = src, plugin="Rcpp")
assign(name, funCXX, envir=.GlobalEnv)
}
# and the function which retrieves and calls this newly-compiled function
c.namecall <- function(name,...){
funCXX <- get(name)
funCXX(...)
}
# Keep your example matrix
A <- matrix(rnorm(400), 20,20)
# What are we calling the compiled funciton?
fxname <- "TestCXX"
## Parallel
cl <- makeCluster(2, type = "PSOCK")
# Export all the pieces
clusterExport(cl, c("src1","c.inline","A","fxname"))
# Call the compiler function
clusterCall(cl, c.inline, name=fxname, sig=sig, src=src1)
# Notice how the function now named "TestCXX" is available in the environment
# of every node?
clusterCall(cl, ls, envir=.GlobalEnv)
# Call the function through our wrapper
clusterCall(cl, c.namecall, name=fxname, A, 0.5)
# Works with my testing
I've written a package ctools (shameless self-promotion) which wraps up a lot of the functionality that is in the parallel and Rhpc packages for cluster computing, both with PSOCK and MPI. I already have a function called "c.sourceCpp" which calls "Rcpp::sourceCpp" on every node in much the same way as above. I'm going to add in a "c.inlineCpp" which does the above now that I see the usefulness of it.
Edit:
In light of Coatless' comments, the Rcpp::cppFunction() in fact negates the need for the c.inline helper here, though the c.namecall is still needed.
src2 <- '
NumericMatrix TestCpp(NumericMatrix xbe, int g){
NumericMatrix xbem(xbe);
int nrows = xbem.nrow();
NumericVector gv(g);
for (int i = 1; i < nrows; i++) {
xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
}
return xbem;
}
'
clusterCall(cl, Rcpp::cppFunction, code=src2, env=.GlobalEnv)
# Call the function through our wrapper
clusterCall(cl, c.namecall, name="TestCpp", A, 0.5)

I resolved it by sourcing on each cluster cluster node an R file with the wanted C inline function:
clusterEvalQ(cl,
{
library(inline)
invisible(source("your_C_func.R"))
})
And your file your_C_func.R should contain the C function definition:
c_func <- cfunction(...)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

doParallel issue with inline function on Windows 7 (works on Linux) - r

Related

optimParallel can not find Rcpp function [duplicate]

R foreach stop iteration at i

Unable to start Julia connection on port 1023: all connections are in use

unrelated nested foreach with an outer %dopar% and an inner %do%

Using Rcpp within parallel code via snow to make a cluster

Categories

Resources