foreach (R): suppress Messages from packages loaded from global environment - r

I am loading several packages loaded in the global environment in my foreach call using .packages = (.packages()). However, I could not find how to suppress the package startup messages. As they are loaded for every assigned core, this list gets rather long.
I already tried wrapping the standard calls like suppressMessages() etc. around the function call and the .packages argument without success.
foreach(i = x, .packages = (.packages()))
I am using the foreach call within a generic function so it needs to adapt to whatever packages are loaded a priori by the user.
I could just use an apply call inside the foreach call with all the packages loaded in the global environment but I assume foreach needs it to be loaded in its .packages argument?
If there is a better way in general how to do this, let me know.

I have a lame semi-answer: when you create the cluster you can specify outfile = '/dev/null' to silence all output from worker nodes. The problem is, this prevents you from printing anything else from your nodes...
As a workaround, I am silencing nodes as described, but using a progress bar to give the user at least some information, though undetailed.

This is also a lame answer and more of a work around. If your function is in a separate R script instead of using .packages() you do:
options( warn = FALSE )
suppressPackageStartupMessages( library(dplyr) )
options( warn = FALSE )
inside of your your function file when you call your libraries. This will shutdown the warnings for your packages and turn them back on after. It would be great if there was an option for this.

Related

using rstudioapi in devtools tests

I'm making a package which contains a function that calls rstudioapi::jobRunScript(), and I would like to to be able to write tests for this function that can be run normally by devtools::test(). The package is only intended for use during interactive RStudio sessions.
Here's a minimal reprex:
After calling usethis::create_package() to initialize my package, and then usethis::use_r("rstudio") to create R/rstudio.R, I put:
foo_rstudio <- function(...) {
script.file <- tempfile()
write("print('hello')", file = script.file)
rstudioapi::jobRunScript(
path = script.file,
name = "foo",
importEnv = FALSE,
exportEnv = "R_GlobalEnv"
)
}
I then call use_test() to make an accompanying test file, in which I put:
test_that("foo works", {
foo_rstudio()
})
I then run devtools::test() and get:
I think I understand the basic problem here: devtools runs a separate R session for the tests, and that session doesn't have access to RStudio. I see here that rstudioapi can work inside child R sessions, but seemingly only those "normally launched by RStudio."
I'd really like to use devtools to test my function as I develop it. I suppose I could modify my function to accept an argument passed from the test code which will simply run the job in the R session itself or in some other kind of child R process, instead of an RStudio job, but then I'm not actually testing the normal intended functionality, and if there's an issue which is specific to the rstudioapi::jobRunScript() call and which could occur during normal use, then my tests wouldn't be able to pick it up.
Is there a way to initialize an RStudio process from within a devtools::test() session, or some other solution here?

RStudio: Statement to clear memory [duplicate]

I was hoping to make a global function that would clear my workspace and dump my memory. I called my function "cleaner" and want it to execute the following code:
remove(list = ls())
gc()
I tried making the function in the global environment but when I run it, the console just prints the text of the function. In my function file to be sourced:
cleaner <- function(){
remove(list = ls())
gc()
#I tried adding return(invisible()) to see if that helped but no luck
}
cleaner()
Even when I make the function in the script I want it to run (cutting out any potential errors with sourcing), the storage dump seems to work, but it still doesn't clear the workspace.
Two thoughts about this: Your code does not delete all objects, to also remove the hidden ones use
rm(list = ls(all.names = TRUE))
There is also the command gctorture() which provokes garbage collection on (nearly) every memory allocation (as the man page said). It's intended for R developers to ferret out memory protection bugs:
cleaner <- function(){
# Turn it on
gctorture(TRUE)
# Clear workspace
rm(list = ls(all.names = TRUE, envir=sys.frame(-1)),
envir = sys.frame(-1))
# Turn it off (important or it gets very slow)
gctorture(FALSE)
}
If this procedure is used within a function, there is the following problem: Since the function has its own stack frame, only the objects within this stack frame are deleted. They still exist outside. Therefore, it must be specified separately with sys.frame(-1) that only the higher-level stack frame should be considered. The variables are then only deleted within the function that calls cleaner() and in cleaner itself when the function is exited.
But this also means that the function may only be called from the top level in order to function correctly (you can use sys.frames() which lists all higher-level stack frames to build something that also avoids this problem if really necessary)

R package build, reason for "object 'xxx' not found"

I'm attempting to build an R package from code that works outside a package. My first try and it is rather complex, nested functions that end up doing parallel processing using doMPI and foreach. Also using RStudio 1.01.43 on Ubuntu 16.04. I build the package and works ok. Then when I try to run the top level function which calls the next it throws an error:
Error in { : task 6 failed - "object 'RunOys' not found"
I'm setting the boolean variable RunOys=TRUE manually before calling the top level function, when it gets down to the one where this variable is called for an ifelse statement it fails. Before I call the top level function I check the globalenv() and
> RunOys
[1] TRUE
In the foreach parallel code I have this statement, which works find until compiled into an R package:
FinalCalcs <- function (...) {
results <- data.frame ( foreach::`%dopar%`(
foreach::`%:%`(foreach::foreach(j = 1:NumSim, .combine = acomb,
.options.mpi=opts1),
foreach::foreach (i = 1:PopSize, .combine=rbind,
.options.mpi=opts2,
.export = c(ls(globalenv())),
.packages = c("zoo", "msm", "FAdist", "qmra"))),
{
which should export all of the objects in globalenv() to each slave.
I can't understand why some variables seem to get passed and not other. Do I need to specify it explicitly as a #param in the file for the function where it is called?
With foreach, the better is to have all the needed variables present in the same environment where foreach is called. So basically, I always use foreach inside a function and pass all the variables that are needed in the foreach to this function.
Do as if foreach couldn't see past its calling function. You won't need to export anything. For functions, use package::function (like in packages so that you don't need to #import packages).

How to export many variables and functions from global environment to foreach loop?

How can I export the global environment for the beginning of each parallel simulation in foreach? The following code is part of a function that is called to run the simulations.
num.cores <- detectCores()-1
cluztrr <- makeCluster(num.cores)
registerDoParallel(cl = cluztrr)
sim.result.list <- foreach(r = 1:simulations,
.combine = list,
.multicombine = TRUE,
) %dopar% {
#...tons of calculations using many variables...
list(vals1,
vals2,
vals3)
}
stopCluster(cluztrr)
Is it necessary to use .export with a character vector of every variable and function that I use? Would that be slow in execution time?
If the foreach loop is in the global environment, variables should be exported automatically. If not, you can use .export = ls(globalenv()) (or .GlobalEnv).
For functions from other packages, you just need to use the syntax package::function.
The "If [...] in the global environment, ..." part of F. Privé reply is very important here. The foreach framework will only identify global variables in that case. It will not do so if the foreach() call is done within a function.
However, if you use the doFuture backend (disclaimer: I'm the author);
library("doFuture")
registerDoFuture()
plan(cluster, workers = cl)
global variables that are needed will be automatically identified and exported (which is then done by the future framework and not the foreach framework). Now, if you rely on this, and don't explicitly specify .export, then your code will only work with doFuture and none of the other backends. That's a decision you need to make as a developer.
Also, automatic exporting of globals is neat, but be careful that you know how much is exported; exporting too many too large objects can be quite costly and introduce lots of overhead in your parallel code.

R functions that execute functions

I'm trying to break out common lines of code used in a fairly large R script into encapsulated functions...however, they don't seem to be running the intended code when called. I feel like I'm missing some conceptual piece of how R works, or functional programming in general.
Examples:
Here's a piece of code I'd like to call to clear the workspace -
clearWorkSpace <- function() {
rm(list= ls(all=TRUE))
}
As noted, the code inside of the function executes as expected, however if the parent function is called, the environment is not cleared.
Again, here's a function intended to load all dependency files -
loadDependencies <- function() {
dep_files <- list.files(path="./dependencies")
for (file in dep_files) {
file_path <- paste0("./dependencies/",file)
source(file_path,local=TRUE)
}
}
If possible, it'd be great to be able to encapsulate code into easy to read functions. Thanks for your help in advance.
What you are calling workspace is more properly referred to as the global environment.
Functions execute in their own environments. This is, for example, why you don't see the variables defined inside a function in your global environment. Also how a function knows to use a variable named x defined in the function body rather than some x you might happen to have in your global environment.
Most functions don't modify the external environments, which is good! It's the functional programming paradigm. Functions that do modify environments, such as rm and source, usually take arguments so that you can be explicit about which environment is modified. If you look at ?rm you'll see an envir argument, and that argument is most of what its Details section describes. source has a local argument:
local - TRUE, FALSE or an environment, determining where the parsed expressions are evaluated. FALSE (the default) corresponds to the user's workspace (the global environment) and TRUE to the environment from which source is called.
You explicitly set local = TRUE when you call source, which explicitly tells source to only modify the local (inside the function) environment, so of course your global environment is untouched!
To make your functions work as I assume you want them to, you could modify clearWorkSpace like this:
clearWorkSpace <- function() {
rm(list= ls(all=TRUE, envir = .GlobalEnv), envir = .GlobalEnv)
}
And for loadDependencies simply delete the local = TRUE. (Or more explicitly set local = FALSE or local = .GlobalEnv) Though you could re-write it in a more R-like way:
loadDependencies = function() {
invisible(lapply(list.files(path = "./dependencies", full.names = TRUE), source))
}
For both of these (especially with the simplified dependency running above) I'd question whether you really need these wrapped up in functions. Might be better to just get in the habit of restarting R when you resume work on a project and keeping invisible(lapply(list.files(path = "./dependencies", full.names = TRUE), source)) at the top of your script...
For more reading on environments, there is The Evironments Section of Advanced R. Notably, there are several ways to specify environments that might be useful for different use cases rather than hard-coding the global environment.
In theory you need just to do something like:
rm(list= ls(all=TRUE, envir = .GlobalEnv))
I mean you set explicitly the environment ( even it is better here to use pos argument). but this will delete also the clearWorkSpace function since it is a defined in the global environment. So this will fails with a recursive call.
Personally I never use rm within a function or a local call. My understanding , rm is intended to be called from the console to clear the work space.

Resources