parallel::clusterExport - non exported library/package function - r

similar to this question
But how do I clusterExport a package's non-exported function to a cluster? For some reason this passed my tests before submitting to CRAN but isn't working in production. Obviously, I want to fix and resubmit to CRAN.
library(imputeMulti)
library(parallel)
imputeMulti:::count_compare # function to be exported
nnodes <- 2L
cl <- parallel::makeCluster(nnodes)
parallel::clusterExport(cl, varlist= c("count_compare")) # fails -- but initially passed tests
parallel::clusterExport(cl, varlist= c("count_compare"), envir= as.environment("package:imputeMulti")) # also fails
I'm using cluster export to avoid the CRAN/R CMD check note "use of ::: in package". Obviously, I could export count_compare, but that's not a desirable choice.
Any help appreciated!
adding tests information:
devtools::test("imputeMulti", "count_levels")
Loading imputeMulti
Testing imputeMulti
int- count_levels works: ...............................
DONE ===========================================================================================================================================

You can use an equivalent call to clusterCall to do this.
parallel::clusterCall(cl, assign, "count_compare", count_compare, envir = .GlobalEnv)
See the definition of clusterExport to verify this is doing the same thing.

Based on my tests and work, I do not see away to use parallel::clusterExport on a non-exported library function.
The following works, but results in 1 R CMD check note:
R CMD check results
0 errors | 0 warnings | 1 note
checking dependencies in R code ... NOTE
There are ::: calls to the package's namespace in its code. A package
almost never needs to use ::: for its own objects:
'count_compare'
count_compare <- imputeMulti:::count_compare
parallel::clusterExport(cl, varlist= c("count_compare"), envir= 1)
Perhaps one of the developers of library(testthat) can provide a solution / update regarding the problem with the testthat framework not catching this. Based off of Hadley's R-Journal article (pg7 using journal numbering), I'm guessing this has to do with how environments are used for testing. But, that's just a guess.
Note: Hadley has confirmed this is the reason for the testthat miss via email correspondence.

Related

Difference between environment etc. in testthat::test_check vs testthat::test_dir

This is somewhat deep R testing question, and as such, I'm not sure if general Stack Overflow is the right place for it, or if there's an R specific forum that would be better.
Any pointers on that are welcome.
The scenario is: I have package that is using testthat and has some tests in tests/testthat and (for reasons that are important but, to be honest, I don't totally understand) there are some other tests in inst/validation that need to be run as well, as part of a validation script (i.e. the script that this post is about).
I was running test_check(pkg) in my tests folder and it was working fine, but I wasn't getting the extra tests (which makes sense). So then I switched to the following:
test_dirs <- c("tests/testthat", "inst/validation")
for (.t in test_dirs) {
test_dir(.t)
}
Now a bunch of my tests are failing because they can't find some of the constants, etc. that are part of my package! (see note at the bottom for more details...)
So I dig in to the source code and find that test_check() actually calls testthat:::test_package_dir under the hood. Note the ::: this is an unexported function, so I don't really just want to call it in my own code.
testthat:::test_package_dir in turn calls the following, before calling test_dir() itself:
env <- test_pkg_env(package)
withr::local_options(list(topLevelEnvironment = env))
withr::local_envvar(list(
TESTTHAT_PKG = package,
TESTTHAT_DIR = maybe_root_dir(test_path)
))
test_dir(...
Sooooo... it seems like test_check() essentially just does some things to load the package environment (note test_pkg_env is also unexported) and then calls test_dir().
So I guess my question is: why? I've actually noticed this before with test_file() not working because it doesn't have everything in the package environment. Why do these functions not load the package environment like the other testing functions do?
Or really, my question is: is there a way to make them load it? And specifically in my case, is there a way to do what I'm trying to do (run tests in a few different directories) and have it load the package environment?
I notice this in the test_dirs docs:
env -- Environment in which to execute the tests. Expert use only.
which is set to test_env() by default. I have a feeling this is my answer, but I can't figure out how to get the package environment without basically copy/pasting a bunch of code out of functions that are hidden in :::. Perhaps I don't qualify as an "expert"...
Thanks for any insight and/or solutions!
note at the bottom:
Specifically my issue is that I have some "constants" in my aaa.R that are mostly just hard-coded strings or lists like:
SUMMARY_NAME <- "summary"
SUMMARY_COUNT <- "sum_count"
SUMMARY_PATH <- "sum_path"
SUM_REQ_COLS <- list(
list(name = SUMMARY_NAME, type = "character"),
list(name = SUMMARY_COUNT, type = "numeric"),
list(name = SUMMARY_PATH, type = "character"),
)
These are things that I use for checking S3 classes and other purposes so that I don't have hard-coded strings all over my code. The point is: I use some of these in my tests, which works fine for test_check() and devtools::check() and devtools::test() but dies when I try to use test_dir() or test_file() because they can't be found, presumably because the package environment isn't loaded.

R testthat and devtools: why does a minimal unit test break my package?

I'm working on an R package for sparse matrix handling. It kinda works; here's a minimal example to set the stage for my question.
devtools::install_github("ekernf01/MatrixLazyEval", ref = "eef5593ad")
library(Matrix)
library(MatrixLazyEval)
data(CAex)
M = rbind(CAex, CAex)
M = matrix(stats::rnorm(prod(dim(M))), nrow = nrow(M))
M_lazy = AsLazyMatrix( M )
svd_lazy = RandomSVDLazyMatrix(M_lazy)
But, when I run even a minimal unit test, it breaks the package permanently (I have to restart my R session or reinstall the package). The immediate cause is that R can't find some S4 methods from packages I depend on (e.g. for matrix transpose t or colSums from the Matrix package). I run the unit test like this:
devtools::test(filter = "minimal")
svd_lazy = RandomSVDLazyMatrix(M_lazy)
Here's the contents of the test files.
> cat tests/testthat.R
library(testthat)
testthat::test_check("MatrixLazyEval")
> cat tests/testthat/testthat_minimal.R
context("minimal")
Why does this happen? Maybe this is naive, but the unit test shouldn't even do anything.
Edit
Possibly related:
r - data.table and testthat package
https://github.com/r-lib/devtools/issues/192
R data.table breaks in exported functions
You need to import all the generics you're using in your package namespace:
#' #importFrom Matrix t tcrossprod colSums rowMeans
NULL
This will fix the issue that you're observing and you'll be able the tests multiple times in the same session.
Also this will allow other packages that import Matrix::t to consistently use your custom methods. Currently, since you're calling setMethod() in your package, you're creating a new t() generic local to your namespace whenever Matrix is not attached to the search path at load-time (this is why it worked the first time you ran the tests). This prevents other packages using Matrix::t() to access your methods. Importing Matrix::t() explicitly will fix this because you'll never create a local generic for t().

running all examples in r package

I am developing a package in Rstudio. Many of my examples need updating so I am going through each one. The only way to check the examples is by running devtools::check() but of course this runs all the checks and it takes a while.
Is there a way of just running the examples so I don't have to wait?
Try the following code to run all examples
devtools::run_examples()
You can also do this without devtools, admittedly it's a bit more circuitous.
package = "rgl"
# this gives a key-value mapping of the various `\alias{}`es
# in each Rd file to that file's canonical name
aliases <- readRDS(system.file("help", "aliases.rds", package=package))
# or sapply(unique(aliases), example, package=package, character.only=TRUE),
# but I think the for loop is superior in this case.
for (topic in unique(aliases)) example(topic, package=package, character.only = TRUE)

R: Patching a package function and reloading base libraries

Occasionally one wants to patch a function in a package, without recompiling the whole package.
For example, in Emacs ESS, the function install.packages() might get stuck if tcltk is not loaded. One might want to patch install.packages() in order to require tcltk before installation and unload it after the package setup.
A temp() patched version of install.packages() might be:
## Get original args without ending NULL
temp=rev(rev(deparse(args(install.packages)))[-1])
temp=paste(paste(temp, collapse="\n"),
## Add code to load tcltk
"{",
" wasloaded= 'package:tcltk' %in% search()",
" require(tcltk)",
## Add orginal body without braces
paste(rev(rev(deparse(body(install.packages))[-1])[-1]), collapse="\n"),
## Unload tcltk if it was not loaded before by user
" if(!wasloaded) detach('package:tcltk', unload=TRUE)",
"}\n",
sep="\n")
## Eval patched function
temp=eval(parse(text=temp))
# temp
Now we want to replace the original install.packages() and perhaps insert the code in Rprofile.
To this end it is worth nothing that:
getAnywhere("install.packages")
# A single object matching 'install.packages' was found
# It was found in the following places
# package:utils
# namespace:utils
# with value
#
# ... install.packages() source follows (quite lengthy)
That is, the function is stored inside the package/namespace of utils. This environment is sealed and therefore install.packages() should be unlocked before being replaced:
## Override original function
unlockBinding("install.packages", as.environment("package:utils"))
assign("install.packages", temp, envir=as.environment("package:utils"))
unlockBinding("install.packages", asNamespace("utils"))
assign("install.packages", temp, envir=asNamespace("utils"))
rm(temp)
Using getAnywhere() again, we get:
getAnywhere("install.packages")
# A single object matching 'install.packages' was found
# It was found in the following places
# package:utils
# namespace:utils
# with value
#
# ... the *new* install.packages() source follows
It seems that the patched function is placed in the right place.
Unfortunately, running it gives:
Error in install.packages(xxxxx) :
could not find function "getDependencies"
getDependencies() is a function inside the same utils package, but not exported; therefore it is not accessible outside its namespace.
Despite the output of getAnywhere("install.packages"), the patched install.packages() is still misplaced.
The problem is that we need to reload the utils library to obtain the desired effect, which also requires unloading other libraries importing it.
detach("package:stats", unload=TRUE)
detach("package:graphics", unload=TRUE)
detach("package:grDevices", unload=TRUE)
detach("package:utils", unload=TRUE)
library(utils)
install.packages() works now.
Of course, we need to reload the other libraries too. Given the dependencies, using
library(stats)
should reload everything. But there is a problem when reloading the graphics library, at least on Windows:
library(graphics)
# Error in FUN(X[[i]], ...) :
# no such symbol C_contour in package path/to/library/graphics/libs/x64/graphics.dll
Which is the correct way of (re)loading the graphics library?
Patching functions in packages is a low-level operation that should be avoided, because it may break internal assumptions of the execution environment and lead to unpredictable behavior/crashes. If there is a problem with tck/ESS (I didn't try to repeat that) perhaps it should be fixed or there may be a workaround. Particularly changing locked bindings is something to avoid.
If you really wanted to run some code at the start/end of say install.packages, you can use trace. It will do some of the low-level operations mentioned in the question, but the good part is you don't have to worry about fixing this whenever some new internals of R change.
trace(install.packages,
tracer=quote(cat("Starting install.packages\n")),
exit=quote(cat("Ending install packages.\n"))
)
Replace tracer and exit accordingly - maybe exit is not needed anyway, maybe you don't need to unload the package. Still, trace is a very useful tool for debugging.
I am not sure if that will solve your problem - if it would work with ESS - but in general you can also wrap install.packages in a function you define say in your workspace:
install.packages <- function(...) {
cat("Entry.\n")
on.exit(cat("Exit.\n"))
utils::install.packages(...)
}
This is the cleanest option indeed.

Why does using "<<-" in a function in the global workspace work, but not in a package?

I'm creating a package using devtools and roxygen2 (in RStudio), however after I've built the package my function no longer works as intended. Yet, if I load the function's .R file and run the function from there in RStudio, it works perfectly. I've created another package using this method before and it worked fine (13 functions all working as intended from my other package), yet I cant seem to get this new one to work.
To start creating the package I start with:
library("devtools")
devtools::install_github("klutometis/roxygen")
library(roxygen2)
setwd("my parent directory")
create("triale")
All is working fine so far. So I put my .R file containing my function in the R folder under the triale folder. The .R file looks like this:
#' Trial Z Function
#'
#' This function counts the values in the columns
#' #param x is the number
#' #keywords x
#' #export
#' #examples
#' trialz()
trialz = function(x) {w_id= c(25,x,25,25,25,1,1,1,1,1);
wcenter= c(rep("BYSTAR-1",10));
df1 <<- data.frame(w_id, wcenter);
countit <<- data.table(df1);
view <<- countit[, .N, by = list(w_id, wcenter)];
View(view)}
Again if I were to just run the code from the .R file, and test the function it works fine. But to continue, next I enter:
setwd("./triale")
document()
The triale documentation is updated, triale is loaded, and the NAMESPACE and trialz.Rd are both written so that trialz.Rd is under the man folder, and NAMESPACE is under the triale folder as intended. Next I install triale:
setwd("..")
install("triale")
Which I know works because I get the following:
Installing triale
"C:/PROGRA~1/R/R-31~1.3/bin/x64/R" --vanilla CMD INSTALL \
"C:/Users/grice/Documents/R/triale" \
--library="C:/Users/grice/Documents/R/win-library/3.1" --install-tests
* installing *source* package 'triale' ...
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (triale)
Reloading installed triale
Package is now built, so I do the following:
library("triale")
library("data.table")
Note whenever I load the package data.table I get the following error message:
data.table 1.9.4 For help type: ?data.table
*** NB: by=.EACHI is now explicit. See README to restore previous behaviour.
However it doesnt seem to affect my function. So now its time to test my function from my package:
trialz(25)
This goes through, and I of course get a populated df1, and countit, but for whatever reason view is always empty (as in 0 obs. of 0 variables).
So I test my work using the dummy code below:
>trialy = function(x) {wid= c(25,x,25,25,25,1,1,1,1,1);
wc= c(rep("BYSTAR-1",10));
df2 <<- data.frame(wid, wc);
countitt <<- data.table(df2);
viewer <<- countitt[, .N, by = list(wid, wc)];
View(viewer)}
>trialy(25)
Even though this is the same exact code with just the names changed around it works. Dumbfounded I open trialz.R and copy the function from there and run it as below, and that works:
> trialz = function(x) {w_id= c(25,x,25,25,25,1,1,1,1,1);
wcenter= c(rep("BYSTAR-1",10));
df1 <<- data.frame(w_id, wcenter);
countit <<- data.table(df1);
view <<- countit[, .N, by = list(w_id, wcenter)];
View(view)}
> trialz(25)
Since I've created a package before I know my method is solid (that package had 13 dif. functions, all of which worked). I just don't understand how a function can work fine as written, yet when I package it, the function no longer works.
Again here is where it stops working as intended when using my package:
view <<- countit[, .N, by = list(w_id, wcenter)];
View(view)}
And my end result should look something like this, if my package worked:
wid wc N
1 25 BYSTAR-1 5
2 1 BYSTAR-1 5
Can anyone explain why view is never populated after I package my function? I've tested it as much as I know how, and my results should be reproducible for anyone thats willing to try it for themselves.
Thanks, I appreciate any feedback.
Your problem here is that "<<-" does not create variables in the global environment but rather in the parent environment. (See help("<<-").)
The parent environment of a function is the environment in which it has been defined. In the case where you defined your function directly in your workspace, this parent environment actually is the same as your workspace environment (namely: .GlobalEnv), which is why your variables are assigned values as you expect them to. In the case where your function is packaged, however, the parent environment is the package environment and not the .GlobalEnv! This is why you do not see your variables being assigned values in your workspace.
Refer to the chapter on environments in Hadley's book and How R Searches and Finds Stuff for more details on environments in R.
Note that doing this would not be considered a proper debugging technique, to say the least. In general, you never want to use the "<<-" operator.
For options on debugging R code, see, e.g., this question. I, in particular, like the debugonce function very well. See ?debugonce.
I forgot one important part when editing my description file in that I for got to add
Imports: data.table
Also the NAMESPACE file needed to include the data.table package as an import as well, like so:
import(data.table)
export(Z)
export(AS) .... etc.
Doing this ensures that whenever a function within your package uses a function from another package, that (second) package is called up before your code is executed.

Resources