R package build, reason for "object 'xxx' not found" - r

I'm attempting to build an R package from code that works outside a package. My first try and it is rather complex, nested functions that end up doing parallel processing using doMPI and foreach. Also using RStudio 1.01.43 on Ubuntu 16.04. I build the package and works ok. Then when I try to run the top level function which calls the next it throws an error:
Error in { : task 6 failed - "object 'RunOys' not found"
I'm setting the boolean variable RunOys=TRUE manually before calling the top level function, when it gets down to the one where this variable is called for an ifelse statement it fails. Before I call the top level function I check the globalenv() and
> RunOys
[1] TRUE
In the foreach parallel code I have this statement, which works find until compiled into an R package:
FinalCalcs <- function (...) {
results <- data.frame ( foreach::`%dopar%`(
foreach::`%:%`(foreach::foreach(j = 1:NumSim, .combine = acomb,
.options.mpi=opts1),
foreach::foreach (i = 1:PopSize, .combine=rbind,
.options.mpi=opts2,
.export = c(ls(globalenv())),
.packages = c("zoo", "msm", "FAdist", "qmra"))),
{
which should export all of the objects in globalenv() to each slave.
I can't understand why some variables seem to get passed and not other. Do I need to specify it explicitly as a #param in the file for the function where it is called?

With foreach, the better is to have all the needed variables present in the same environment where foreach is called. So basically, I always use foreach inside a function and pass all the variables that are needed in the foreach to this function.
Do as if foreach couldn't see past its calling function. You won't need to export anything. For functions, use package::function (like in packages so that you don't need to #import packages).

Related

using rstudioapi in devtools tests

I'm making a package which contains a function that calls rstudioapi::jobRunScript(), and I would like to to be able to write tests for this function that can be run normally by devtools::test(). The package is only intended for use during interactive RStudio sessions.
Here's a minimal reprex:
After calling usethis::create_package() to initialize my package, and then usethis::use_r("rstudio") to create R/rstudio.R, I put:
foo_rstudio <- function(...) {
script.file <- tempfile()
write("print('hello')", file = script.file)
rstudioapi::jobRunScript(
path = script.file,
name = "foo",
importEnv = FALSE,
exportEnv = "R_GlobalEnv"
)
}
I then call use_test() to make an accompanying test file, in which I put:
test_that("foo works", {
foo_rstudio()
})
I then run devtools::test() and get:
I think I understand the basic problem here: devtools runs a separate R session for the tests, and that session doesn't have access to RStudio. I see here that rstudioapi can work inside child R sessions, but seemingly only those "normally launched by RStudio."
I'd really like to use devtools to test my function as I develop it. I suppose I could modify my function to accept an argument passed from the test code which will simply run the job in the R session itself or in some other kind of child R process, instead of an RStudio job, but then I'm not actually testing the normal intended functionality, and if there's an issue which is specific to the rstudioapi::jobRunScript() call and which could occur during normal use, then my tests wouldn't be able to pick it up.
Is there a way to initialize an RStudio process from within a devtools::test() session, or some other solution here?

error loading package inside foreach when using testthat

I am trying to a debug an issue with using unit tests with testthat. The code runs fine if run manually, however it seems when running test(), the workers inside the foreach don't seem to have access to the package or functions inside the package I am testing. The code is quite complex so I don't have a great working example, but here is the outline of the structure:
unit test in tests/testthat:
test_that("dataset runs successful", {
expect_snapshot_output(myFunc(dataset, params))
})
MyFunc calls another func, and inside that func, creates workers to run some code:
final_out <- foreach(i = 1:nrow(data),
.combine = c,
.export = c("func1", "func2", "params"),
.packages = c("fields", "dplyr")) %dopar% {
output = func1(stuff)
more = func2(stuff)
out = rbind(output, more)
return (out)
}
The workers don't seem to have access to func1, func2 etc..
I tried adding the name of the package to packages in this line, but it doesn't work either.
Any ideas?
As I mentioned, this is only an issue when trying to run the unit tests and I suspect it is somehow related to how the package I am testing is being loaded?
When the workers are started they do not have the full set of packages a normal session has; pass all package names of the packages in the search path when the tests are running in the the local session to the .packages argument.

foreach (R): suppress Messages from packages loaded from global environment

I am loading several packages loaded in the global environment in my foreach call using .packages = (.packages()). However, I could not find how to suppress the package startup messages. As they are loaded for every assigned core, this list gets rather long.
I already tried wrapping the standard calls like suppressMessages() etc. around the function call and the .packages argument without success.
foreach(i = x, .packages = (.packages()))
I am using the foreach call within a generic function so it needs to adapt to whatever packages are loaded a priori by the user.
I could just use an apply call inside the foreach call with all the packages loaded in the global environment but I assume foreach needs it to be loaded in its .packages argument?
If there is a better way in general how to do this, let me know.
I have a lame semi-answer: when you create the cluster you can specify outfile = '/dev/null' to silence all output from worker nodes. The problem is, this prevents you from printing anything else from your nodes...
As a workaround, I am silencing nodes as described, but using a progress bar to give the user at least some information, though undetailed.
This is also a lame answer and more of a work around. If your function is in a separate R script instead of using .packages() you do:
options( warn = FALSE )
suppressPackageStartupMessages( library(dplyr) )
options( warn = FALSE )
inside of your your function file when you call your libraries. This will shutdown the warnings for your packages and turn them back on after. It would be great if there was an option for this.

foreach R: Calling functions in my own package

I'm in the process of writing an R package. One of my functions takes another function and some other data-related arguments and runs a %dopar% loop using the foreach package. This foreach-using function is used inside the one of the main functions of my package.
When I call the main function from another file, after having loaded my package, I get the message
Error in { : task 1 failed - "could not find function "some_function"
where some_function is some function from my package. I get this message, with varying missing function, when I set the .export argument in the call to foreach to any of the following:
ls(as.environment("package:mypackagename"))
ls(.GlobalEnv)
ls(environment())
ls(parent.env(environment()))
ls(parent.env(parent.env(environment()))
And even concatenations of the above. I also tried passing my package name to the .package argument, which only yields the error message
Error in e$fun(obj, substitute(ex), parent.frame(), e$data) : worker initialization failed: there is no package called ‘mypackagename’
I feel like I have tried just about everything, and I really need this piece of code to work. I should note that it does work if I use %do% instead of %dopar%. What am I doing wrong?

could not find function inside foreach loop

I'm trying to use foreach to do multicore computing in R.
A <-function(....) {
foreach(i=1:10) %dopar% {
B()
}
}
then I call function A in the console. The problem is I'm calling a function Posdef inside B that is defined in another script file which I source. I had to put Posdef in the list of export argument of foreach: .export=c("Posdef"). However I get the following error:
Error in { : task 3 failed - "could not find function "Posdef""
Why cant R find this defined function?
So I can reproduce this, for the curious:
require(doSNOW)
registerDoSNOW(makeCluster(5, type="SOCK"))
getDoParWorkers()
getDoParName()
getDoParVersion()
fib <- function(n) {
if (n <= 1) { return(1) }
return(fib(n-1) + fib(n-2))
}
my.matrix <- matrix(runif(2500, 10, 50), nrow=50)
calcLotsaFibs <- function() {
result <- foreach(row.num=1:nrow(my.matrix), .export=c("fib", "my.matrix")) %dopar% {
return(Vectorize(fib)(my.matrix[row.num,]))
}
return(result)
}
lotsa.fibs <- calcLotsaFibs()
I have been able to get around this by putting the function in another file and loading that file in the body of the foreach. You could also obviously move the function definition into the body of the foreach itself.
[EDIT -- I had previously suggested that perhaps .export doesn't work properly with function names, but was corrected below.]
The short answer is that this was a bug in parallel backends such as doSNOW, doParallel and doMPI, but it has since been fixed.
The slightly longer answer is that foreach exports functions to the workers using a special "export" environment, not the global environment. That used to cause problems for functions that were created in the global environment, because the "export" environment wasn't in their scope, even though they were now defined in that same "export" environment. Thus, they couldn't see any other functions or variables defined in the "export" environment, such as "Posdef" in your case.
The doSNOW, doParallel and doMPI backends now change the associated environment from the global to the "export" environment for functions exported via ".export", and seems to have resolved these issues.
Quick fix for problem with foreach %dopar% is to reinstall these packages:
install.packages("doSNOW")
install.packages("doParallel")
install.packages("doMPI")
It worked in my case.

Resources