Passing values between functions in an R package - r

In an R package, let's say we have two functions. One is setting some parameters; the other one is using those parameters. How can I build such a pattern in R. It is similar to event-driven applications. But I am not sure if it is possible in R or not.
For example:
If we run set_param( a=10), whenever we run print_a.R, it prints 10, and incase of running set_param(a=20), it prints 20.
I need a solution without assigning value to the global environment because CRAN checks raise notes.

I suggest adding a variable to your package, as #MrFlick suggested.
For instance, in ./R/myoptions.R:
.myoptions <- new.env(parent = emptyenv())
getter <- function(k) {
.myoptions[[k]]
}
setter <- function(k, v) {
.myoptions[[k]] <- v
}
lister <- function() {
names(.myoptions)
}
Then other package functions can use this as a key/value store:
getter("optA")
# NULL
setter("optA", 99)
getter("optA")
# [1] 99
lister()
# [1] "optA"
and all the while, nothing is in the .GlobalEnv:
ls(all.names = TRUE)
# character(0)
Values can be as complex as you want.
Note that these are not exported, so if you want/need the user to have direct access to this, then you'll need to update NAMESPACE or, if using roxygen2, add #' #export before each function definition.
NB: I should add that a more canonical approach might be to use options(.) for these, so that users can preemptively control and have access to them., programmatically.

Related

Unit testing functions with global variables in R

Preamble: package structure
I have an R package that contains an R/globals.R file with the following content (simplified):
utils::globalVariables("COUNTS")
Then I have a function that simply uses this variable. For example, R/addx.R contains a function that adds a number to COUNTS
addx <- function(x) {
COUNTS + x
}
This is all fine when doing a devtools::check() on my package, there's no complaining about COUNTS being out of the scope of addx().
Problem: writing a unit test
However, say I also have a tests/testthtat/test-addx.R file with the following content:
test_that("addition works", expect_gte(fun(1), 1))
The content of the test doesn't really matter here, because when running devtools::test() I get an "object 'COUNTS' not found" error.
What am I missing? How can I correctly write this test (or setup my package).
What I've tried to solve the problem
Adding utils::globalVariables("COUNTS") to R/addx.R, either before, inside or after the function definition.
Adding utils::globalVariables("COUNTS") to tests/testthtat/test-addx.R in all places I could think of.
Manually initializing COUNTS (e.g., with COUNTS <- 0 or <<- 0) in all places of tests/testthtat/test-addx.R I could think of.
Reading some examples from other packages on GitHub that use a similar syntax (source).
I think you misunderstand what utils::globalVariables("COUNTS") does. It just declares that COUNTS is a global variable, so when the code analysis sees
addx <- function(x) {
COUNTS + x
}
it won't complain about the use of an undefined variable. However, it is up to you to actually create the variable, for example by an explicit
COUNTS <- 0
somewhere in your source. I think if you do that, you won't even need the utils::globalVariables("COUNTS") call, because the code analysis will see the global definition.
Where you would need it is when you're doing some nonstandard evaluation, so that it's not obvious where a variable comes from. Then you declare it as a global, and the code analysis won't worry about it. For example, you might get a warning about
subset(df, Col1 < 0)
because it appears to use a global variable named Col1, but of course that's fine, because the subset() function evaluates in a non-standard way, letting you include column names without writing df$Col.
#user2554330's answer is great for many things.
If I understand correctly, you have a COUNTS that needs to be updateable, so putting it in the package environment might be an issue.
One technique you can use is the use of local environments.
Two alternatives:
If it will always be referenced in one function, it might be easiest to change the function from
myfunc <- function(...) {
# do something
COUNTS <- COUNTS + 1
}
to
myfunc <- local({
COUNTS <- NA
function(...) {
# do something
COUNTS <<- COUNTS + 1
}
})
What this does is create a local environment "around" myfunc, so when it looks for COUNTS, it will be found immediately. Note that it reassigns using <<- instead of <-, since the latter would not update the different-environment-version of the variable.
You can actually access this COUNTS from another function in the package:
otherfunc <- function(...) {
COUNTScopy <- get("COUNTS", envir = environment(myfunc))
COUNTScopy <- COUNTScopy + 1
assign("COUNTS", COUNTScopy, envir = environment(myfunc))
}
(Feel free to name it COUNTS here as well, I used a different name to highlight that it doesn't matter.)
While the use of get and assign is a little inconvenient, it should only be required twice per function that needs to do this.
Note that the user can get to this if needed, but they'll need to use similar mechanisms. Perhaps that's a problem; in my packages where I need some form of persistence like this, I have used convenience getter/setter functions.
You can place an environment within your package, and then use it like a named list within your package functions:
E <- new.env(parent = emptyenv())
myfunc <- function(...) {
# do something
E$COUNTS <- E$COUNTS + 1
}
otherfunc <- function(...) {
E$COUNTS <- E$COUNTS + 1
}
We do not need the get/assign pair of functions, since E (a horrible name, chosen for its brevity) should be visible to all functions in your package. If you don't need the user to have access, then keep it unexported. If you want users to be able to access it, then exporting it via the normal package mechanisms should work.
Note that with both of these, if the user unloads and reloads the package, the COUNTS value will be lost/reset.
I'll list provide a third option, in case the user wants/needs direct access, or you don't want to do this type of value management within your package.
Make the user provide it at all times. For this, add an argument to every function that needs it, and have the user pass an environment. I recommend that because most arguments are passed by-value, but environments allow referential semantics (pass by-reference).
For instance, in your package:
myfunc <- function(..., countenv) {
stopifnot(is.environment(countenv))
# do something
countenv$COUNT <- countenv$COUNT + 1
}
otherfunc <- function(..., countenv) {
countenv$COUNT <- countenv$COUNT + 1
}
new_countenv <- function(init = 0) {
E <- new.env(parent = emptyenv())
E$COUNT <- init
E
}
where new_countenv is really just a convenience function.
The user would then use your package as:
mycount <- new_countenv()
myfunc(..., countenv = mycount)
otherfunc(..., countenv = mycount)

R Package Development - Global Variables vs Parameter [duplicate]

I'm developing a package in R. I have a bunch of functions, some of them need some global variables. How do I manage global variables in packages?
I've read something about environment, but I do not understand how it will work, of if this even is the way to go about the things.
You can use package local variables through an environment. These variables will be available to multiple functions in the package, but not (easily) accessible to the user and will not interfere with the users workspace. A quick and simple example is:
pkg.env <- new.env()
pkg.env$cur.val <- 0
pkg.env$times.changed <- 0
inc <- function(by=1) {
pkg.env$times.changed <- pkg.env$times.changed + 1
pkg.env$cur.val <- pkg.env$cur.val + by
pkg.env$cur.val
}
dec <- function(by=1) {
pkg.env$times.changed <- pkg.env$times.changed + 1
pkg.env$cur.val <- pkg.env$cur.val - by
pkg.env$cur.val
}
cur <- function(){
cat('the current value is', pkg.env$cur.val, 'and it has been changed',
pkg.env$times.changed, 'times\n')
}
inc()
inc()
inc(5)
dec()
dec(2)
inc()
cur()
You could set an option, eg
options("mypkg-myval"=3)
1+getOption("mypkg-myval")
[1] 4
In general global variables are evil. The underlying principle why they are evil is that you want to minimize the interconnections in your package. These interconnections often cause functions to have side-effects, i.e. it depends not only on the input arguments what the outcome is, but also on the value of some global variable. Especially when the number of functions grows, this can be hard to get right and hell to debug.
For global variables in R see this SO post.
Edit in response to your comment:
An alternative could be to just pass around the needed information to the functions that need it. You could create a new object which contains this info:
token_information = list(token1 = "087091287129387",
token2 = "UA2329723")
and require all functions that need this information to have it as an argument:
do_stuff = function(arg1, arg2, token)
do_stuff(arg1, arg2, token = token_information)
In this way it is clear from the code that token information is needed in the function, and you can debug the function on its own. Furthermore, the function has no side effects, as its behavior is fully determined by its input arguments. A typical user script would look something like:
token_info = create_token(token1, token2)
do_stuff(arg1, arg2, token_info)
I hope this makes things more clear.
The question is unclear:
Just one R process or several?
Just on one host, or across several machine?
Is there common file access among them or not?
In increasing order of complexity, I'd use a file, a SQLite backend via the RSQlite package or (my favourite :) the rredis package to set to / read from a Redis instance.
You could also create a list of tokens and add it to R/sysdata.rda with usethis::use_data(..., internal = TRUE). The data in this file is internal, but accessible by all functions. The only problem would arise if you only want some functions to access the tokens, which would be better served by:
the environment solution already proposed above; or
creating a hidden helper function that holds the tokens and returns them. Then just call this hidden function inside the functions that use the tokens, and (assuming it is a list) you can inject them to their environment with list2env(..., envir = environment()).
If you don't mind adding a dependency to your package, you can use an R6 object from the homonym package, as suggested in the comments to #greg-snow's answer.
R6 objects are actual environments with the possibility of adding public and private methods, are very lightweight and could be a good and more rigorous option to share package's global variables, without polluting the global environment.
Compared to #greg-snow's solution, it allows for a stricter control of your variables (you can add methods that check for types for example). The drawback can be the dependency and, of course, learning the R6 syntax.
library(R6)
MyPkgOptions = R6::R6Class(
"mypkg_options",
public = list(
get_option = function(x) private$.options[[x]]
),
active = list(
var1 = function(x){
if(missing(x)) private$.options[['var1']]
else stop("This is an environment parameter that cannot be changed")
}
,var2 = function(x){
if(missing(x)) private$.options[['var2']]
else stop("This is an environment parameter that cannot be changed")
}
),
private = list(
.options = list(
var1 = 1,
var2 = 2
)
)
)
# Create an instance
mypkg_options = MyPkgOptions$new()
# Fetch values from active fields
mypkg_options$var1
#> [1] 1
mypkg_options$var2
#> [1] 2
# Alternative way
mypkg_options$get_option("var1")
#> [1] 1
mypkg_options$get_option("var3")
#> NULL
# Variables are locked unless you add a method to change them
mypkg_options$var1 = 3
#> Error in (function (x) : This is an environment parameter that cannot be changed
Created on 2020-05-27 by the reprex package (v0.3.0)

Global variable in a package - which approach is more recommended?

I do understand that generally global variables are evil and I should avoid them, but if my package does need to have a global variable, which of these two approaches are better? And are there any other recommended approaches?
Using an environment visible to the package
pkgEnv <- new.env()
pkgEnv$sessionId <- "xyz123"
Using options
options("pkgEnv.sessionId" = "xyz123")
I know there are some other threads that ask about how to achieve global variables, but I haven't seen a discussion on which one is recommended
Some packages use hidden variables (variables that begin with a .), like .Random.seed and .Last.value do in base R. In your package you could do
e <- new.env()
assign(".sessionId", "xyz123", envir = e)
ls(e)
# character(0)
ls(e, all = TRUE)
# [1] ".sessionId"
But in your package you don't need to assign e. You can use a .onLoad() hook to assign the variable upon loading the package.
.onLoad <- function(libname, pkgname) {
assign(".sessionId", "xyz123", envir = parent.env(environment()))
}
See this question and its answers for some good explanation on package variables.
When most people say you should avoid 'global' variables, they mean you should not assign to the global environment (.GlobalEnv,GlobalEnv, or as.environment(1)) or that you should not pass information between internal functions by any method other than passing such data as the arguments of a function call.
Caching is another matter entirely. I often caching results that I don't want to re-calculate (memoization) or re-query. A pattern I use a lot in packages is the following:
myFunction <- local({
cache <- list() # or numeric(0) or whatever
function(x,y,z){
# calculate the index of the answer
# (most of the time this is a trivial calculation, often the identity function)
indx = answerIndex(x,y,z)
# check if the answer is stored in the cache
if(indx %in% names(cacheList))
# if so, return the answer
return(cacheList[indx])
[otherwise, do lots of calculations or data queries]
# store the answer
cahceList[indx] <<- answer
return(answer)
}
})
The call to local creates a new environment where I can store results using the scoping assignment operator <<- without having to worry about the fact that the package was already sealed, and the last expression (the function definition) is returned as the value of the call to local() and is bound to the name myFunction.

retrieve original version of package function even if over-assigned

Suppose I replace a function of a package, for example knitr:::sub_ext.
(Note: I'm particularly interested where it is an internal function, i.e. only accessible by ::: as opposed to ::, but the same answer may work for both).
library(knitr)
my.sub_ext <- function (x, ext) {
return("I'm in your package stealing your functions D:")
}
# replace knitr:::sub_ext with my.sub_ext
knitr <- asNamespace('knitr')
unlockBinding('sub_ext', knitr)
assign('sub_ext', my.sub_ext, knitr)
lockBinding('sub_ext', knitr)
Question: is there any way to retrieve the original knitr:::sub_ext after I've done this? Preferably without reloading the package?
(I know some people want to know why I would want to do this so here it is. Not required reading for the question). I've been patching some functions in packages like so (not actually the sub_ext function...):
original.sub_ext <- knitr:::sub_ext
new.sub_ext <- function (x, ext) {
# some extra code that does something first, e.g.
x <- do.something.with(x)
# now call the original knitr:::sub_ext
original.sub_ext(x, ext)
}
# now set knitr:::sub_ext to new.sub_ext like before.
I agree this is not in general a good idea (in most cases these are quick fixes until changes make their way into CRAN, or they are "feature requests" that would never be approved because they are somewhat case-specific).
The problem with the above is if I accidentally execute it twice (e.g. it's at the top of a script that I run twice without restarting R in between), on the second time original.sub_ext is actually the previous new.sub_ext as opposed to the real knitr:::sub_ext, so I get infinite recursion.
Since sub_ext is an internal function (I wouldn't call it directly, but functions from knitr like knit all call it internally), I can't hope to modify all the functions that call sub_ext to call new.sub_ext manually, hence the approach of replacing the definition in the package namespace.
When you do assign('sub_ext', my.sub_ext, knitr), you are irrevocably overwriting the value previously associated with sub_ext with the value of my.sub_ext. If you first stash the original value, though, it's not hard to reset it when you're done:
library(knitr)
knitr <- asNamespace("knitr")
## Store the original value of sub_ext
.sub_ext <- get("sub_ext", envir = knitr)
## Overwrite it with your own function
my.sub_ext <- function (x, ext) "I'm in your package stealing your functions D:"
assignInNamespace('sub_ext', my.sub_ext, knitr)
knitr:::sub_ext("eg.csv", "pdf")
# [1] "I'm in your package stealing your functions D:"
## Reset when you're done
assignInNamespace('sub_ext', .sub_ext, knitr)
knitr:::sub_ext("eg.csv", "pdf")
# [1] "eg.pdf"
Alternatively, as long as you are just adding lines of code to what's already there, you could add that code using trace(). What's nice about trace() is that, when you are done, you can use untrace() to revert the function's body to its original form:
trace(what = "mean.default",
tracer = quote({
a <- 1
b <- 2
x <- x*(a+b)
}),
at = 1)
mean(1:2)
# Tracing mean.default(1:2) step 1
# [1] 4.5
untrace("mean.default")
# Untracing function "mean.default" in package "base"
mean(1:2)
# [1] 1.5
Note that if the function you are tracing is in a namespace, you'll want to use trace()'s where argument, passing it the name of some other (exported) function that shares the to-be-traced function's namespace. So, to trace an unexported function in knitr's namespace, you could set where=knit

Global variables in packages in R

I'm developing a package in R. I have a bunch of functions, some of them need some global variables. How do I manage global variables in packages?
I've read something about environment, but I do not understand how it will work, of if this even is the way to go about the things.
You can use package local variables through an environment. These variables will be available to multiple functions in the package, but not (easily) accessible to the user and will not interfere with the users workspace. A quick and simple example is:
pkg.env <- new.env()
pkg.env$cur.val <- 0
pkg.env$times.changed <- 0
inc <- function(by=1) {
pkg.env$times.changed <- pkg.env$times.changed + 1
pkg.env$cur.val <- pkg.env$cur.val + by
pkg.env$cur.val
}
dec <- function(by=1) {
pkg.env$times.changed <- pkg.env$times.changed + 1
pkg.env$cur.val <- pkg.env$cur.val - by
pkg.env$cur.val
}
cur <- function(){
cat('the current value is', pkg.env$cur.val, 'and it has been changed',
pkg.env$times.changed, 'times\n')
}
inc()
inc()
inc(5)
dec()
dec(2)
inc()
cur()
You could set an option, eg
options("mypkg-myval"=3)
1+getOption("mypkg-myval")
[1] 4
In general global variables are evil. The underlying principle why they are evil is that you want to minimize the interconnections in your package. These interconnections often cause functions to have side-effects, i.e. it depends not only on the input arguments what the outcome is, but also on the value of some global variable. Especially when the number of functions grows, this can be hard to get right and hell to debug.
For global variables in R see this SO post.
Edit in response to your comment:
An alternative could be to just pass around the needed information to the functions that need it. You could create a new object which contains this info:
token_information = list(token1 = "087091287129387",
token2 = "UA2329723")
and require all functions that need this information to have it as an argument:
do_stuff = function(arg1, arg2, token)
do_stuff(arg1, arg2, token = token_information)
In this way it is clear from the code that token information is needed in the function, and you can debug the function on its own. Furthermore, the function has no side effects, as its behavior is fully determined by its input arguments. A typical user script would look something like:
token_info = create_token(token1, token2)
do_stuff(arg1, arg2, token_info)
I hope this makes things more clear.
The question is unclear:
Just one R process or several?
Just on one host, or across several machine?
Is there common file access among them or not?
In increasing order of complexity, I'd use a file, a SQLite backend via the RSQlite package or (my favourite :) the rredis package to set to / read from a Redis instance.
You could also create a list of tokens and add it to R/sysdata.rda with usethis::use_data(..., internal = TRUE). The data in this file is internal, but accessible by all functions. The only problem would arise if you only want some functions to access the tokens, which would be better served by:
the environment solution already proposed above; or
creating a hidden helper function that holds the tokens and returns them. Then just call this hidden function inside the functions that use the tokens, and (assuming it is a list) you can inject them to their environment with list2env(..., envir = environment()).
If you don't mind adding a dependency to your package, you can use an R6 object from the homonym package, as suggested in the comments to #greg-snow's answer.
R6 objects are actual environments with the possibility of adding public and private methods, are very lightweight and could be a good and more rigorous option to share package's global variables, without polluting the global environment.
Compared to #greg-snow's solution, it allows for a stricter control of your variables (you can add methods that check for types for example). The drawback can be the dependency and, of course, learning the R6 syntax.
library(R6)
MyPkgOptions = R6::R6Class(
"mypkg_options",
public = list(
get_option = function(x) private$.options[[x]]
),
active = list(
var1 = function(x){
if(missing(x)) private$.options[['var1']]
else stop("This is an environment parameter that cannot be changed")
}
,var2 = function(x){
if(missing(x)) private$.options[['var2']]
else stop("This is an environment parameter that cannot be changed")
}
),
private = list(
.options = list(
var1 = 1,
var2 = 2
)
)
)
# Create an instance
mypkg_options = MyPkgOptions$new()
# Fetch values from active fields
mypkg_options$var1
#> [1] 1
mypkg_options$var2
#> [1] 2
# Alternative way
mypkg_options$get_option("var1")
#> [1] 1
mypkg_options$get_option("var3")
#> NULL
# Variables are locked unless you add a method to change them
mypkg_options$var1 = 3
#> Error in (function (x) : This is an environment parameter that cannot be changed
Created on 2020-05-27 by the reprex package (v0.3.0)

Resources