foreach R: Calling functions in my own package - r

I'm in the process of writing an R package. One of my functions takes another function and some other data-related arguments and runs a %dopar% loop using the foreach package. This foreach-using function is used inside the one of the main functions of my package.
When I call the main function from another file, after having loaded my package, I get the message
Error in { : task 1 failed - "could not find function "some_function"
where some_function is some function from my package. I get this message, with varying missing function, when I set the .export argument in the call to foreach to any of the following:
ls(as.environment("package:mypackagename"))
ls(.GlobalEnv)
ls(environment())
ls(parent.env(environment()))
ls(parent.env(parent.env(environment()))
And even concatenations of the above. I also tried passing my package name to the .package argument, which only yields the error message
Error in e$fun(obj, substitute(ex), parent.frame(), e$data) : worker initialization failed: there is no package called ‘mypackagename’
I feel like I have tried just about everything, and I really need this piece of code to work. I should note that it does work if I use %do% instead of %dopar%. What am I doing wrong?

Related

Parsing error in MonteCarlo::MonteCarlo function in R

I am trying to run a power analysis using a MonteCarlo approach in R.
I have created a function of two parameters that does output a boolean (tested manually for all relevant values of the parameters). I also have run baby-examples of the MonteCarlo function to make sure that I understand it and that it works well.
Yet when I try to run the real thing, I get the following error message:
Error in parse(text = all_funcs_found[i]) : <text>:1:1: unexpected '::'
1: ::
I read through the source code of the MonteCarlo function (which I found here) and found
#loop through non-primitive functions used in func and check from which package they are
for(i in 1:length(all_funcs_found)){
if(environmentName(environment(eval(parse(text=all_funcs_found[i]))))%in%env_names){
packages<-c(packages,env_names[which(env_names==environmentName(environment(eval(parse(text=all_funcs_found[i])))))])
}
}
which doesn't really make sense to me - why should there be a problem there?
Thank you for any ideas.
I found the answer: the function I wrote was calling a function from a specific library in the form libraryname::functionname.
This works OK if you use the function once manually, but makes MonteCarlo break.
I solved the problem by first loading the relevant library, then removing the 'libraryname::' part from the definition of the main function. MonteCarlo then runs just fine.

R package build, reason for "object 'xxx' not found"

I'm attempting to build an R package from code that works outside a package. My first try and it is rather complex, nested functions that end up doing parallel processing using doMPI and foreach. Also using RStudio 1.01.43 on Ubuntu 16.04. I build the package and works ok. Then when I try to run the top level function which calls the next it throws an error:
Error in { : task 6 failed - "object 'RunOys' not found"
I'm setting the boolean variable RunOys=TRUE manually before calling the top level function, when it gets down to the one where this variable is called for an ifelse statement it fails. Before I call the top level function I check the globalenv() and
> RunOys
[1] TRUE
In the foreach parallel code I have this statement, which works find until compiled into an R package:
FinalCalcs <- function (...) {
results <- data.frame ( foreach::`%dopar%`(
foreach::`%:%`(foreach::foreach(j = 1:NumSim, .combine = acomb,
.options.mpi=opts1),
foreach::foreach (i = 1:PopSize, .combine=rbind,
.options.mpi=opts2,
.export = c(ls(globalenv())),
.packages = c("zoo", "msm", "FAdist", "qmra"))),
{
which should export all of the objects in globalenv() to each slave.
I can't understand why some variables seem to get passed and not other. Do I need to specify it explicitly as a #param in the file for the function where it is called?
With foreach, the better is to have all the needed variables present in the same environment where foreach is called. So basically, I always use foreach inside a function and pass all the variables that are needed in the foreach to this function.
Do as if foreach couldn't see past its calling function. You won't need to export anything. For functions, use package::function (like in packages so that you don't need to #import packages).

R / foreach-package: how to pass customized function to clusters?

This post describes how to pass packages to all clusters:
Function not found in R doParallel 'foreach' - Error in { : task 1 failed - "could not find function "raster""
However, I would like to pass functions to the clusters (because I use them in the foreach-loop), that are not part of an R-package. How can I do this?
Edit:
My only idea was to run another foreach-loop from 1:#cluster before the actual foreach-loop and define the function in each cluster. But this is a) not very elegant and b) it doesn't work (function can still not be found)

Namespace fails to unload in R

I am having trouble unloading the namespace for a package I created in R. Every time I try to do so I get the following error:
Error in .mergeMethodsTable(generic, mtable, get(tname, envir = env), :
trying to get slot "defined" from an object of a basic class ("environment") with no
slots
Calls: unloadNamespace ... -> .updateMethodsInTable -> .mergeMethodsTable
Here are the results of a call to traceback() after the above error occurs.
>4: .mergeMethodsTable(generic, mtable, get(tname, envir = env), attach)
3: .updateMethodsInTable(fdef, where, attach)
2: methods:::cacheMetaData(ns, FALSE, ns)
1: unloadNamespace("coleXcms")
I honestly have tried everything I can think of, but to no avail. I'm pretty new to R so I was hoping someone might be able to help me.
Also, don't know if this will be useful, but here is my package's unloading hook. (The name of my package is coleXcms)
.onUnload <- function(libpath) {
mzR:::rampCloseAll()
library.dynam.unload("coleXcms", libpath)
}
The function I have used with success is unloadNamespace. It appears that library.dynam.unload is designed to remove the DLLs but it's not clear to me that it will remove the rest of a package.

How to use plyr mdply with failsafe execution in parallel

I have to run an analysis on multiple datasets. I use plyr (mdply) with the doSNOW package to use multiple cores.
Sometimes the analysis code will fail,raising an error and stopping execution. I want the analysis to be continued for the other datasets. How to achieve that?
Solution 1: Coding so that all errors are caught which is not feasable.
Solution 2: A failsafe plyr wrapper to run the function in parallel that returns all valid results, and indicates where something went wrong.
I implemented the second solution (see answer below). The tricky part was that I wanted a single function call to accomplish the failsafe-and-return-a-data.frame feature.
How I went about constructing the function:
The actual function call is wrapped with tryCatch. It is called from within a callfailsafe function, which in turn is necessary to pass the individual function name simple and respective parameters in (...) to the whole procedure.
Maybe I did it overly complicated... but it works.
Be sure that your simple function does not rely on any globally defined functions or parameters, as these will not be loaded when used with .parallel=T and doSNOW.
Here is my test dataset: There are 100 tasks. For each a function "simple" will be called. However sometimes the function fails. I use it typically on tasks that autonomously load many rdata files do extensive processing, save some output and finally return a data.frame object.
library(plyr)
library(doSNOW)
N=100
multiargtab= data.frame(ID=1:N,A=round(runif(N,0,1)),B=round(runif(N,0,1)))
simple=function(ID,A,B){ # a function that will sometimes fail
if(B==0) rm(B)
data.frame(A=A,B=B,AB=A/B,ID=ID)
}
The signature of the calling function is:
res2=mdply.anyfun.parallel.failsafe(multiargtab,simple)
The function mdply.anyfun.parallel.failsafe takes a data.frame and a functionname myfunction (as character) as parameters. myfunction is then called for every row in the data.frame and passed all column values as parameters like the original mdply. Additionally to the original mdply functionality the function does not stop when a task fails, but continues on the other tasks. The error message of the failed task is returned in the column "error".
library(doSNOW)
library(plyr)
mdply.anyfun.parallel.failsafe=function(multiargtab,myfunction){
cl<-makeCluster(4)
registerDoSNOW(cl)
callfailsafe=function(...){
r=tryCatch.W.E(FUN(...))
val=r$value[[1]]
if(!"simpleError" %in% class(val)){
return(val)
}else{
return(data.frame(...,error= (as.character(val))))
}
}
tryCatch.W.E=function(expr) {
#pass a function, it will be run and result returned; if error then error will return - BUT function will not fail
W <- NULL
w.handler <- function(w){ # warning handler
W <<- w
invokeRestart("muffleWarning")
}
list(value = list(withCallingHandlers(tryCatch(expr, error = function(e) e), warning = w.handler)), warning = W)
}
FUN=match.fun(myfunction)
res=mdply(multiargtab,callfailsafe,.parallel=T)
stopCluster(cl)
res
}
Testing the function:
res2=mdply.anyfun.parallel.failsafe(multiargtab,simple)
Which generally works fine. I only have some strange errors when multiargtab is of type data.table
Error in data.table(..., key = key(..1)) :
Item 1 has no length. Provide at least one item
I circumvented the error by casting as as.data.frame ...although it would be interesting to know why data.table would not work.

Resources