R - Parallelization in EasyABC. Error: ... could not find function - r

I am trying to run the ABC_sequential() function from the package EasyABC in parallel in R. But I am getting the error:
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: could not find function "f"
I think this is because ABC_sequential() is ultimately calling parLapplyLB() (https://github.com/cran/EasyABC/blob/master/R/EasyABC-internal.R) and I have to export the functions using clusterExport()? (parSapply not finding objects in global environment)
Because the function calls makeCluster() within it, it seems like I may have to modify the package to add clusterExport(cl, "f")? However, as I am a fairly new, I haven't looked into modifying packages for my needs (and I am suspecting it may be more complicated than adding the one line of code). I am wondering if there is a better/easier workaround to getting my function onto the parallel nodes? Below is a simplified reproducible example based on the parallel example given in the R help for ABC_sequential:
library(EasyABC)
f <- function(x){
x = x^2
}
toy_model_parallel <- function(x){
set.seed(x[1])
2 * x[2] + f(2) + rnorm(1,0,0.1)
}
sum_stat_obs <- 6.5
pacc <- .4
toy_prior <- list(c("unif",0,1)) # a uniform prior distribution between 0 and 1
# this line of code gives the checkForRemoteErrors(val) error
ABC_Lenormand <- ABC_sequential(method="Lenormand", model=toy_model_parallel, prior=toy_prior, nb_simul=20, summary_stat_target=sum_stat_obs, p_acc_min=pacc, use_seed=TRUE, n_cluster=2)
}
Any advice is greatly appreciated.

You could define any necessary auxiliary functions inside the model function. In this case:
toy_model_parallel <- function(x){
f <- function(x){
x = x^2
}
set.seed(x[1])
2 * x[2] + f(2) + rnorm(1,0,0.1)
}
It looks like you need to do any worker initialization at the beginning of this function. So if your function needs to call functions from another package, you'd also need to load that package at the beginning of the model function.
I suggest that you send an email to the package developers to see if they have a better solution to this problem. If they don't, you might request that they add support for a user specified cluster object.

Related

Is there a way to make an R function return its internal variable?

I am new to R. I am currently trying to implement a regression based on instrumental variable from the sysid R-package. I chose this package since it can predict my instrument.
I found a suitable method ("iv" is the function here) to solve my problem. But the R function is not returning the "Predicted Instrument" as one of its return argument. I am very much interested in that predicted variable. Is there any way to get this variable as an argument?
I already tried to create a clone of this function but it has many dependent function from sysid package so it failed. I also tried to use the "source" command to link this modified function in my R code but rest of the libraries are delinked from my current script. Please provide me any solution to get the predicted instrument. The source code is available below:
https://rdrr.io/cran/sysid/src/R/iv.R.
iv4 <- function(z,order=c(0,1,0)){
na <- order[1]; nb <- order[2]
# Steps 1-2
mod_iv <- iv(z,order)
# Step 3
w <- resid(mod_iv)
mod_ar <- ar(w,aic = F,order.max =na+nb)
Lhat <- signal::Arma(b=c(1,-mod_ar$ar),a=1)
# Step 4
x2 <- matrix(sim(mod_iv$sys,inputData(z)))
ivcompute(z,x2,order,Lhat)
}
I want predicted instrument- Lhat to be returned. I also welcome suggestion for using any other package or regression method which can do the same(predict instrument).

parallelize Own Package in R

As recommended in other posts I wrote my own package in R to parallelize functions I wrote with Rcpp. I can load the package and everything works, but when I'm using optimParallel, I get the message:
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: object '_EffES_profileLLcpp' not found
Here is what I'm doing:
library(optimParallel)
library(EffES) # EffES is my own package
cl <- makeCluster(detectCores()-1)
clusterEvalQ(cl, library(EffES))
clusterEvalQ(cl, library(optimParallel))
setDefaultCluster(cl = cl)
w.es <- optimParallel(par=rep(0.001,3), profileLLcpp, y=y.test, x=x.test, lower = rep(0.001,3), method = "L-BFGS-B")$par
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: object '_EffES_profileLLcpp' not found
What am I doing wrong?
Edit: The problem is solved in optimParallel version 0.7-4
The version is available on CRAN: https://CRAN.R-project.org/package=optimParallel
For older versions:
As detailed in this post optimParallel() needs to trick a bit in order to have no restrictions on the argument names that can be passed through the ... argument. Currently, this implies that the function passed to optimParallel() has to be defined in the .GlobalEnv in order to find compiled code properly.
Hence, a workaround could be to define the function in the .GlobalEnv:
library(optimParallel)
library(EffES) # EffES is your own package
cl <- makeCluster(detectCores()-1)
clusterEvalQ(cl, library(EffES))
setDefaultCluster(cl=cl)
f <- function(par, y, x) {
profileLLcpp(par=par, x=x, y=y)
}
optimParallel(par=rep(0.001,3), f=f, y=y.test, x=x.test,
lower = rep(0.001,3), method = "L-BFGS-B")$par
Suggestions to improve the code of optimParallel() are welcome. I opened a corresponding question here.
You have to spread the object '_EffES_profileLLcpp' to each core of your cluster. You can do this using clusterExport, in your case:
clusterExport(cl,'_EffES_profileLLcpp')
Repeat this step with every object needed to be used in parallel (or just check which object shows up in the error log and spreat it using clusterExport).
Hope this helps

optimParallel in Package of the same name cannot find C_dnorm function

I want to optimize a function from a package in R using optimParallel. Till now I only optimized functions that I wrote in my environment and it worked. But functions from any package don't work and I get a Error. I checked with .libPaths() if the paths are the same on each node and I used Sys.info() to check for any differences. Here is an example (which is not meaningful, but it should show my problem)
library(optimParallel)
.libPaths()
[1] "C:/Users/Name/Documents/R/win-library/3.5" "C:/Program Files/R/R-3.5.1/library"
cl <- makeCluster(2) #also tried to set "master" to my IP
clusterEvalQ(cl, .libPaths())
[[1]]
[1] "C:/Users/Name/Documents/R/win-library/3.5" "C:/Program Files/R/R-3.5.1/library"
[[2]]
[1] "C:/Users/Name/Documents/R/win-library/3.5" "C:/Program Files/R/R-3.5.1/library"
setDefaultCluster(cl)
optimParallel(par=0, dnorm, mean=1, method = "L-BFGS-B")$par
Error in checkForRemoteErrors(val) :
2 nodes produced errors; first error: object 'C_dnorm' not found
#for comparison
optim(par=0, dnorm, mean=1, method = "L-BFGS-B")$par
[1] -5.263924
What am I doing wrong?
Edit: The problem is solved in optimParallel version 0.7-4
The version is available on CRAN: https://CRAN.R-project.org/package=optimParallel
For older versions:
A workaround is to wrap dnorm() into a function defined in the .GlobalEnv.
library("optimParallel")
cl <- makeCluster(2)
setDefaultCluster(cl)
f <- function(x, mean) dnorm(x, mean=mean)
optimParallel(par=0, f, mean=1, method="L-BFGS-B")$par
[1] -5.263924
A more difficult task is to explain why the problem occurs:
optimParallel() uses parallel::parLapply() to evaluate f.
parLapply() has the arguments cl, X, fun.
If we would use parLapply() without pre-processing the arguments passed via ... of optimParallel(), f could not have arguments named cl, X, fun, because this would cause errors like:
Error in lapply(X = x, FUN = f, ...) (from #2) :
formal argument "X" matched by multiple actual arguments
Simply speaking, optimParallel() avoids this error by removing all arguments from f, putting them into an environment and evaluating f in that environment.
One problem of that approach occurs when f is defined in another R package and links to compiled code. That case is illustrated in the question above.
Suggestions for better approaches to handle the issue are welcome. I opened a corresponding question here. As long as there is no better solution, one can use the workaround illustrated above.
Reasoning that your error message indicated that the parallel processes were not getting adequate information, I looked at the examples in the documentation of the optimParallel package. The first one defines a helper function which will carry an environment with it, but it otherwise resembles yours in some respects.
library(optimParallel)
set.seed(123); x <- rnorm(n=1000, mean=1, sd=2)
negll <- function(par, x) -sum(dnorm(x=x, mean=par[1], sd=par[2], log=TRUE))
o1 <- optimParallel(par=c(0, 1), fn=negll, x=x, method="L-BFGS-B", lower=c(-Inf, 0.0001))
o1$par
#[1] 1.032256 1.982398
That example also differs from yours in that it is using data to estimate the parameters. I'm not sure what your result means, whereas I do understand what the values returned by the modification of that example that I posted here. The minimum log-likelihood for that particular data (not completely reproducible since I forgot to set a seed) is at a mean of 1.126 and an sd of 2.007.
For an example of how to create a situation where the environment of a non-base package gets carried to the workers, see this prior answer: parallel::clusterExport how to pass nested functions from global environment?

Weird behaviour of the car::boxCox() function when wrap into a homemade function

I'm trying to wrap the car::boxCox function into a homemade function so I can mapply it to a list of datasets. I'm using the boxCox function from the car package and not the MASS package because I want to use the family="yjPower". My problem is weird and it's either something fondamental I don't understand or some kind of bug. Here is a reproducible example:
library(car)
le.mod <- function(val.gold,val.bad){
donn <- data.frame(val.gold,val.bad)
res.lm <- lm(val.gold ~ val.bad, data=donn)
bcres <- boxCox(res.lm, family="yjPower", plotit=F)
lambda <- bcres$x[which.max(bcres$y)]
donn$val.bad.t <- donn$val.bad^lambda
res.lm <- lm(val.gold ~ val.bad.t, data=donn)
list(res.lm=res.lm, lambda = lambda)
}
xx <- runif(1000,1,100)
xxt1 <- xx^0.6 + runif(1000,1,10)
yy <- 2*xx + 10 + rnorm(1000,0,2)
le.mod(yy,xxt1)
This gives me the error message:
## Error in is.data.frame(data) : object 'donn' not found
I pin-pointed the problem to the line:
bcres <- boxCox(res.lm, family="yjPower", plotit=F)
boxCox is suppose to be able to take a lm class object, it just doesn't find the associated data that were created 2 lines before.
It works well outside of the function le.mod(). It's probably a problem related to environment management, the boxCox fonction looking for "donn" in the global environment but not finding it and for a reason I ignore not looking for it in the function specific environment.
Anybody have an idea to fix this or explain to me what I don't understand here? I've been turning my head over this problem for days and I can't get it working.
Thanks
I've found the answer (!), however I can't understand the reason of the behaviour so if somebody have an explanation, don't hesitate to post it.
The solution by adding y=TRUE in the second line of the function:
res.lm <- lm(val.gold ~ val.bad, data=donn,y=TRUE)
For some reasons, this allows it to get throught.

What causes this weird behaviour in the randomForest.partialPlot function?

I am using the randomForest package (v. 4.6-7) in R 2.15.2. I cannot find the source code for the partialPlot function and am trying to figure out exactly what it does (the help file seems to be incomplete.) It is supposed to take the name of a variable x.var as an argument:
library(randomForest)
data(iris)
rf <- randomForest(Species ~., data=iris)
x1 <- "Sepal.Length"
partialPlot(x=rf, pred.data=iris, x.var=x1)
# Error in `[.data.frame`(pred.data, , xname) : undefined columns selected
partialPlot(x=rf, pred.data=iris, x.var=as.character(x1))
# works!
typeof(x1)
# [1] "character"
x1 == as.character(x1)
# TRUE
# Now if I try to wrap it in a function...
f <- function(w){
partialPlot(x=rf, pred.data=iris, x.var=as.character(w))
}
f(x1)
# Error in as.character(w) : 'w' is missing
Questions:
1) Where can I find the source code for partialPlot?
2) How is it possible to write a function which takes a string x1 as an argument where x1 == as.character(x1), but the function throws an error when as.character is not applied to x1?
3) Why does it fail when I wrap it inside a function? Is partialPlot messing with environments somehow?
Tips/ things to try that might be helpful for solving such questions by myself in future would also be very welcome!
The source code for partialPlot() is found by entering
randomForest:::partialPlot.randomForest
into the console. I found this by first running
methods(partialPlot)
because entering partialPlot only tells me that it uses a method. From the methods call we see that there is one method, and the asterisk next to it tells us that it is a non-exported function. To view the source code of a non-exported function, we use the triple-colon operator :::. So it goes
package:::generic.method
Where package is the package, generic is the generic function (here it's partialPlot), and method is the method (here it's the randomForest method).
Now, as for the other questions, the function can be written with do.call() and you can pass w without a wrapper.
f <- function(w) {
do.call("partialPlot", list(x = rf, pred.data = iris, x.var = w))
}
f(x1)
This works on my machine. It's not so much environments as it is evaluation. Many plotting functions use some non-standard evaluation, which can be handled most of the time with this do.call() construct.
But note that outside the function you can also use eval() on x1.
partialPlot(x = rf, pred.data = iris, x.var = eval(x1))
I don't really see a reason to check for the presence of as.character() inside the function. If you can leave a comment we can go from there if you need more info. I'm not familiar enough with this package yet to go any further.

Resources