parallelize Own Package in R - r

As recommended in other posts I wrote my own package in R to parallelize functions I wrote with Rcpp. I can load the package and everything works, but when I'm using optimParallel, I get the message:
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: object '_EffES_profileLLcpp' not found
Here is what I'm doing:
library(optimParallel)
library(EffES) # EffES is my own package
cl <- makeCluster(detectCores()-1)
clusterEvalQ(cl, library(EffES))
clusterEvalQ(cl, library(optimParallel))
setDefaultCluster(cl = cl)
w.es <- optimParallel(par=rep(0.001,3), profileLLcpp, y=y.test, x=x.test, lower = rep(0.001,3), method = "L-BFGS-B")$par
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: object '_EffES_profileLLcpp' not found
What am I doing wrong?

Edit: The problem is solved in optimParallel version 0.7-4
The version is available on CRAN: https://CRAN.R-project.org/package=optimParallel
For older versions:
As detailed in this post optimParallel() needs to trick a bit in order to have no restrictions on the argument names that can be passed through the ... argument. Currently, this implies that the function passed to optimParallel() has to be defined in the .GlobalEnv in order to find compiled code properly.
Hence, a workaround could be to define the function in the .GlobalEnv:
library(optimParallel)
library(EffES) # EffES is your own package
cl <- makeCluster(detectCores()-1)
clusterEvalQ(cl, library(EffES))
setDefaultCluster(cl=cl)
f <- function(par, y, x) {
profileLLcpp(par=par, x=x, y=y)
}
optimParallel(par=rep(0.001,3), f=f, y=y.test, x=x.test,
lower = rep(0.001,3), method = "L-BFGS-B")$par
Suggestions to improve the code of optimParallel() are welcome. I opened a corresponding question here.

You have to spread the object '_EffES_profileLLcpp' to each core of your cluster. You can do this using clusterExport, in your case:
clusterExport(cl,'_EffES_profileLLcpp')
Repeat this step with every object needed to be used in parallel (or just check which object shows up in the error log and spreat it using clusterExport).
Hope this helps

Related

Attempting to save intermediate states when running rmh yields error

I am trying to simulate a multitype point process, saving the intermediate states every 1000 steps in rmhcontrol. However, I can't simulate whenever I specify nsave. As an example, whenever I run the code block below, I get the error:
Error in factor(Cmprop, levels = Ctypes) : object 'Cmprop' not found
The code is:
library(spatstat)
library(optimbase)
num_marks <- length(unique(marks(amacrine)))
iradii <- .1*ones(nrow=num_marks,ncol=num_marks)
MSH1 <- MultiStraussHard(iradii=iradii)
x <- ppm(amacrine, trend =~polynom(x,y,3), interaction=MSH1)
control <- rmhcontrol(nsave=1e3)
rmh(x,control=control)
Thanks for the help!
This is a bug in spatstat versions 1.62-1 and 1.62-2.
It has already been fixed in the current development version 1.62-2.006 which you can download from the GitHub repository for spatstat. The next public release on CRAN will be at the end of January 2020.
Please note: the code in the original question generates an error because ones has formal arguments nx, ny rather than nrow, ncol. The following code tests the bug:
library(spatstat)
nm <- length(levels(marks(amacrine)))
ir <- matrix(0.1, nm, nm)
MSH1 <- MultiStraussHard(iradii=ir)
fit <- ppm(amacrine ~ polynom(x,y,3), MSH1)
rmh(fit, nsave=1e3, verbose=FALSE)

optimParallel in Package of the same name cannot find C_dnorm function

I want to optimize a function from a package in R using optimParallel. Till now I only optimized functions that I wrote in my environment and it worked. But functions from any package don't work and I get a Error. I checked with .libPaths() if the paths are the same on each node and I used Sys.info() to check for any differences. Here is an example (which is not meaningful, but it should show my problem)
library(optimParallel)
.libPaths()
[1] "C:/Users/Name/Documents/R/win-library/3.5" "C:/Program Files/R/R-3.5.1/library"
cl <- makeCluster(2) #also tried to set "master" to my IP
clusterEvalQ(cl, .libPaths())
[[1]]
[1] "C:/Users/Name/Documents/R/win-library/3.5" "C:/Program Files/R/R-3.5.1/library"
[[2]]
[1] "C:/Users/Name/Documents/R/win-library/3.5" "C:/Program Files/R/R-3.5.1/library"
setDefaultCluster(cl)
optimParallel(par=0, dnorm, mean=1, method = "L-BFGS-B")$par
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: object 'C_dnorm' not found
#for comparison
optim(par=0, dnorm, mean=1, method = "L-BFGS-B")$par
[1] -5.263924
What am I doing wrong?
Edit: The problem is solved in optimParallel version 0.7-4
The version is available on CRAN: https://CRAN.R-project.org/package=optimParallel
For older versions:
A workaround is to wrap dnorm() into a function defined in the .GlobalEnv.
library("optimParallel")
cl <- makeCluster(2)
setDefaultCluster(cl)
f <- function(x, mean) dnorm(x, mean=mean)
optimParallel(par=0, f, mean=1, method="L-BFGS-B")$par
[1] -5.263924
A more difficult task is to explain why the problem occurs:
optimParallel() uses parallel::parLapply() to evaluate f.
parLapply() has the arguments cl, X, fun.
If we would use parLapply() without pre-processing the arguments passed via ... of optimParallel(), f could not have arguments named cl, X, fun, because this would cause errors like:
Error in lapply(X = x, FUN = f, ...) (from #2) :
formal argument "X" matched by multiple actual arguments
Simply speaking, optimParallel() avoids this error by removing all arguments from f, putting them into an environment and evaluating f in that environment.
One problem of that approach occurs when f is defined in another R package and links to compiled code. That case is illustrated in the question above.
Suggestions for better approaches to handle the issue are welcome. I opened a corresponding question here. As long as there is no better solution, one can use the workaround illustrated above.
Reasoning that your error message indicated that the parallel processes were not getting adequate information, I looked at the examples in the documentation of the optimParallel package. The first one defines a helper function which will carry an environment with it, but it otherwise resembles yours in some respects.
library(optimParallel)
set.seed(123); x <- rnorm(n=1000, mean=1, sd=2)
negll <- function(par, x) -sum(dnorm(x=x, mean=par[1], sd=par[2], log=TRUE))
o1 <- optimParallel(par=c(0, 1), fn=negll, x=x, method="L-BFGS-B", lower=c(-Inf, 0.0001))
o1$par
#[1] 1.032256 1.982398
That example also differs from yours in that it is using data to estimate the parameters. I'm not sure what your result means, whereas I do understand what the values returned by the modification of that example that I posted here. The minimum log-likelihood for that particular data (not completely reproducible since I forgot to set a seed) is at a mean of 1.126 and an sd of 2.007.
For an example of how to create a situation where the environment of a non-base package gets carried to the workers, see this prior answer: parallel::clusterExport how to pass nested functions from global environment?

What causes this weird behaviour in the randomForest.partialPlot function?

I am using the randomForest package (v. 4.6-7) in R 2.15.2. I cannot find the source code for the partialPlot function and am trying to figure out exactly what it does (the help file seems to be incomplete.) It is supposed to take the name of a variable x.var as an argument:
library(randomForest)
data(iris)
rf <- randomForest(Species ~., data=iris)
x1 <- "Sepal.Length"
partialPlot(x=rf, pred.data=iris, x.var=x1)
# Error in `[.data.frame`(pred.data, , xname) : undefined columns selected
partialPlot(x=rf, pred.data=iris, x.var=as.character(x1))
# works!
typeof(x1)
# [1] "character"
x1 == as.character(x1)
# TRUE
# Now if I try to wrap it in a function...
f <- function(w){
partialPlot(x=rf, pred.data=iris, x.var=as.character(w))
}
f(x1)
# Error in as.character(w) : 'w' is missing
Questions:
1) Where can I find the source code for partialPlot?
2) How is it possible to write a function which takes a string x1 as an argument where x1 == as.character(x1), but the function throws an error when as.character is not applied to x1?
3) Why does it fail when I wrap it inside a function? Is partialPlot messing with environments somehow?
Tips/ things to try that might be helpful for solving such questions by myself in future would also be very welcome!
The source code for partialPlot() is found by entering
randomForest:::partialPlot.randomForest
into the console. I found this by first running
methods(partialPlot)
because entering partialPlot only tells me that it uses a method. From the methods call we see that there is one method, and the asterisk next to it tells us that it is a non-exported function. To view the source code of a non-exported function, we use the triple-colon operator :::. So it goes
package:::generic.method
Where package is the package, generic is the generic function (here it's partialPlot), and method is the method (here it's the randomForest method).
Now, as for the other questions, the function can be written with do.call() and you can pass w without a wrapper.
f <- function(w) {
do.call("partialPlot", list(x = rf, pred.data = iris, x.var = w))
}
f(x1)
This works on my machine. It's not so much environments as it is evaluation. Many plotting functions use some non-standard evaluation, which can be handled most of the time with this do.call() construct.
But note that outside the function you can also use eval() on x1.
partialPlot(x = rf, pred.data = iris, x.var = eval(x1))
I don't really see a reason to check for the presence of as.character() inside the function. If you can leave a comment we can go from there if you need more info. I'm not familiar enough with this package yet to go any further.

R - Parallelization in EasyABC. Error: ... could not find function

I am trying to run the ABC_sequential() function from the package EasyABC in parallel in R. But I am getting the error:
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: could not find function "f"
I think this is because ABC_sequential() is ultimately calling parLapplyLB() (https://github.com/cran/EasyABC/blob/master/R/EasyABC-internal.R) and I have to export the functions using clusterExport()? (parSapply not finding objects in global environment)
Because the function calls makeCluster() within it, it seems like I may have to modify the package to add clusterExport(cl, "f")? However, as I am a fairly new, I haven't looked into modifying packages for my needs (and I am suspecting it may be more complicated than adding the one line of code). I am wondering if there is a better/easier workaround to getting my function onto the parallel nodes? Below is a simplified reproducible example based on the parallel example given in the R help for ABC_sequential:
library(EasyABC)
f <- function(x){
x = x^2
}
toy_model_parallel <- function(x){
set.seed(x[1])
2 * x[2] + f(2) + rnorm(1,0,0.1)
}
sum_stat_obs <- 6.5
pacc <- .4
toy_prior <- list(c("unif",0,1)) # a uniform prior distribution between 0 and 1
# this line of code gives the checkForRemoteErrors(val) error
ABC_Lenormand <- ABC_sequential(method="Lenormand", model=toy_model_parallel, prior=toy_prior, nb_simul=20, summary_stat_target=sum_stat_obs, p_acc_min=pacc, use_seed=TRUE, n_cluster=2)
}
Any advice is greatly appreciated.
You could define any necessary auxiliary functions inside the model function. In this case:
toy_model_parallel <- function(x){
f <- function(x){
x = x^2
}
set.seed(x[1])
2 * x[2] + f(2) + rnorm(1,0,0.1)
}
It looks like you need to do any worker initialization at the beginning of this function. So if your function needs to call functions from another package, you'd also need to load that package at the beginning of the model function.
I suggest that you send an email to the package developers to see if they have a better solution to this problem. If they don't, you might request that they add support for a user specified cluster object.

Locked package namespace

I am using the "BMA" package in R 3.1.0, and get an error when running one of the functions in the package, iBMA.glm. When running the example in the package documentation:
## Not run:
############ iBMA.glm
library("MASS")
library("BMA")
data(birthwt)
y<- birthwt$lo
x<- data.frame(birthwt[,-1])
x$race<- as.factor(x$race)
x$ht<- (x$ht>=1)+0
x<- x[,-9]
x$smoke <- as.factor(x$smoke)
x$ptl<- as.factor(x$ptl)
x$ht <- as.factor(x$ht)
x$ui <- as.factor(x$ui)
### add 41 columns of noise
noise<- matrix(rnorm(41*nrow(x)), ncol=41)
colnames(noise)<- paste('noise', 1:41, sep='')
x<- cbind(x, noise)
iBMA.glm.out<- iBMA.glm( x, y, glm.family="binomial",
factor.type=FALSE, verbose = TRUE,
thresProbne0 = 5 )
summary(iBMA.glm.out)
I get the error:
Error in registerNames(names, package, ".__global__", add) :
The namespace for package "BMA" is locked; no changes in the global variables list may be made.
I get the error in RStudio running R 3.1.0 on Ubuntu.
on Windows 7, from RStudio and the R console I get a similar error:
Error in utils::globalVariables(c("nastyHack_glm.family", "nastyHack_x.df")) :
The namespace for package "BMA" is locked; no changes in the global variables list may be made.
I also get the same error when running my own data in the function. I'm not clear on what this error means and how to work around the error to be actually able to use the function. Any advice would be appreciated!

Resources