Why/how some packages define their functions in nameless environment? - r

In my code, I needed to check which package the function is defined from (in my case it was exprs(): I needed it from Biobase but it turned out to be overriden by rlang).
From this SO question, I thought I could use simply environmentName(environment(functionname)). But for exprs from Biobase that expression returned empty string:
environmentName(environment(exprs))
# [1] ""
After checking the structure of environment(exprs) I noticed that it has .Generic member which contains package name as an attribute:
environment(exprs)$.Generic
# [1] "exprs"
# attr(,"package")
# [1] "Biobase"
So, for now I made this helper function:
pkgparent <- function(functionObj) {
functionEnv <- environment(functionObj)
envName <- environmentName(functionEnv)
if (envName!="")
return(envName) else
return(attr(functionEnv$.Generic,'package'))
}
It does the job and correctly returns package name for the function if it is loaded, for example:
pkgparent(exprs)
# Error in environment(functionObj) : object 'exprs' not found
library(Biobase)
pkgparent(exprs)
# [1] "Biobase"
library(rlang)
# The following object is masked from ‘package:Biobase’:
# exprs
pkgparent(exprs)
# [1] "rlang"
But I still would like to learn how does it happen that for some packages their functions are defined in "unnamed" environment while others will look like <environment: namespace:packagename>.

What you’re seeing here is part of how S4 method dispatch works. In fact, .Generic is part of the R method dispatch mechanism.
The rlang package is a red herring, by the way: the issue presents itself purely due to Biobase’s use of S4.
But more generally your resolution strategy might fail in other situations, because there are other reasons (albeit rarely) why packages might define functions inside a separate environment. The reason for this is generally to define a closure over some variable.
For example, it’s generally impossible to modify variables defined inside a package at the namespace level, because the namespace gets locked when loaded. There are multiple ways to work around this. A simple way, if a package needs a stateful function, is to define this function inside an environment. For example, you could define a counter function that increases its count on each invocation as follows:
counter = local({
current = 0L
function () {
current <<- current + 1L
current
}
})
local defines an environment in which the function is wrapped.
To cope with this kind of situation, what you should do instead is to iterate over parent environments until you find a namespace environment. But there’s a simpler solution, because R already provides a function to find a namespace environment for a given environment (by performing said iteration):
pkgparent = function (fun) {
nsenv = topenv(environment(fun))
environmentName(nsenv)
}

Related

Passing values between functions in an R package

In an R package, let's say we have two functions. One is setting some parameters; the other one is using those parameters. How can I build such a pattern in R. It is similar to event-driven applications. But I am not sure if it is possible in R or not.
For example:
If we run set_param( a=10), whenever we run print_a.R, it prints 10, and incase of running set_param(a=20), it prints 20.
I need a solution without assigning value to the global environment because CRAN checks raise notes.
I suggest adding a variable to your package, as #MrFlick suggested.
For instance, in ./R/myoptions.R:
.myoptions <- new.env(parent = emptyenv())
getter <- function(k) {
.myoptions[[k]]
}
setter <- function(k, v) {
.myoptions[[k]] <- v
}
lister <- function() {
names(.myoptions)
}
Then other package functions can use this as a key/value store:
getter("optA")
# NULL
setter("optA", 99)
getter("optA")
# [1] 99
lister()
# [1] "optA"
and all the while, nothing is in the .GlobalEnv:
ls(all.names = TRUE)
# character(0)
Values can be as complex as you want.
Note that these are not exported, so if you want/need the user to have direct access to this, then you'll need to update NAMESPACE or, if using roxygen2, add #' #export before each function definition.
NB: I should add that a more canonical approach might be to use options(.) for these, so that users can preemptively control and have access to them., programmatically.

How to define an S3 generic with the same name as a primitive function?

I have a class myclass in an R package for which I would like to define a method as.raw, so of the same name as the primitive function as.raw(). If constructor, generic and method are defined as follows...
new_obj <- function(n) structure(n, class = "myclass") # constructor
as.raw <- function(obj) UseMethod("as.raw") # generic
as.raw.myclass <- function(obj) obj + 1 # method (dummy example here)
... then R CMD check leads to:
Warning: declared S3 method 'as.raw.myclass' not found
See section ‘Generic functions and methods’ in the ‘Writing R
Extensions’ manual.
If the generic is as_raw instead of as.raw, then there's no problem, so I assume this comes from the fact that the primitive function as.raw already exists. Is it possible to 'overload' as.raw by defining it as a generic (or would one necessarily need to use a different name?)?
Update: NAMESPACE contains
export("as.raw") # export the generic
S3method("as.raw", "myclass") # export the method
This seems somewhat related, but dimnames there is a generic and so there is a solution (just don't define your own generic), whereas above it is unclear (to me) what the solution is.
The problem here appears to be that as.raw is a primitive function (is.primitive(as.raw)). From the ?setGeneric help page, it says
A number of the basic R functions are specially implemented as primitive functions, to be evaluated directly in the underlying C code rather than by evaluating an R language definition. Most have implicit generics (see implicitGeneric), and become generic as soon as methods (including group methods) are defined on them.
And according to the ?InternalMethods help page, as.raw is one of these primitive generics. So in this case, you just need to export the S3method. And you want to make sure your function signature matches the signature of the existing primitive function.
So if I have the following R code
new_obj <- function(n) structure(n, class = "myclass")
as.raw.myclass <- function(x) x + 1
and a NAMESPACE file of
S3method(as.raw,myclass)
export(new_obj)
Then this passes the package checks for me (on R 4.0.2). And I can run the code with
as.raw(new_obj(4))
# [1] 5
# attr(,"class")
# [1] "myclass"
So in this particular case, you need to leave the as.raw <- function(obj) UseMethod("as.raw") part out.

How to declare S3 method to default to loaded environment?

In a package, I would like to call an S3 method "compact" for object foobar.
There would therefore be a compact.foobar function in my package, along with the compact function itself:
compact = function(x, ...){
UseMethod("compact", x)
}
However, this latter would be conflicting with purrr::compact.
I could default the method to use purrr (compact.default = purrr::compact, or maybe
compact.list = purrr::compact), but that would make little sense if the user does not have purrr loaded.
How can I default my method to the loaded version of compact, in the user environment? (so that it uses purrr::compact, any other declared compact function, or fails of missing function)
Unfortunately S3 does not deal with this situation well. You have to search for suitable functions manually. The following works, more or less:
get_defined_function = function (name) {
matches = getAnywhere(name)
# Filter out invisible objects and duplicates
objs = matches$objs[matches$visible & ! matches$dups]
# Filter out non-function objects
funs = objs[vapply(objs, is.function, logical(1L))]
# Filter out function defined in own package.
envs = lapply(funs, environment)
funs = funs[! vapply(envs, identical, logical(1L), topenv())]
funs[1L][[1L]] # Return `NULL` if no function exists.
}
compact.default = function (...) {
# Maybe add error handling for functions not found.
get_defined_function('compact')(...)
}
This uses getAnywhere to find all objects named compact that R knows about. It then filters out those that are not visible because they’re not inside attached packages, and those that are duplicate (this is probably redundant, but we do it anyway).
Next, it filters out anything that’s not a function. And finally it filters out the compact S3 generic that our own package defines. To do this, it compares each function’s environment to the package environment (given by topenv()).
This should work, but it has no rule for which function to prefer if there are multiple functions with the same name defined in different locations (it just picks an arbitrary one first), and it also doesn’t check whether the function signature matches (doing this is hard in R, since function calling and parameter matching is very flexible).

Set the environment of a function placed outside the .GlobalEnv

I want to attach functions from a custom environment to the global environment, while masking possible internal functions.
Specifically, say that f() uses an internal function g(), then:
f() should not be visible in .GlobalEnv with ls(all=TRUE).
f() should be usable from .GlobalEnv.
f() internal function g() should not be visible and not usable from .GlobalEnv.
First let us create environments and functions as follows:
assign('ep', value=new.env(parent=.BaseNamespaceEnv), envir=.BaseNamespaceEnv)
assign('e', value=new.env(parent=ep), envir=ep)
assign('g', value=function() print('hello'), envir=ep)
assign('f', value=function() g(), envir=ep$e)
ls(.GlobalEnv)
## character(0)
Should I run now:
ep$e$f()
## Error in ep$e$f() (from #1) : could not find function "g"
In fact, the calling environment of f is:
environment(get('f', envir=ep$e))
## <environment: R_GlobalEnv>
where g is not present.
Trying to change f's environment gives an error:
environment(get('f', envir=ep$e))=ep
## Error in environment(get("f", envir = ep$e)) = ep :
## target of assignment expands to non-language object
Apparently it works with:
environment(ep$e$f)=ep
attach(ep$e)
Now, as desired, only f() is usable from .GlobalEnv, g() is not.
f()
[1] "hello"
g()
## Error: could not find function "g" (intended behaviour)
Also, neither f() nor g() are visible from .GlobalEnv, but unfortunately:
ls(.GlobalEnv)
## [1] "ep"
Setting the environment associated with f() to ep, places ep in .GlobalEnv.
Cluttering the Global environment was exactly what I was trying to avoid.
Can I reset the parent environment of f without making it visible from the Global one?
UPDATE
From your feedback, you suggest to build a package to get proper namespace services.
The package is not flexible. My helper functions are stored in a project subdir, say hlp, and sourced like source("hlp/util1.R").
In this way scripts can be easily mixed and updated on the fly on a project basis.
(Added new enumerated list on top)
UPDATE 2
An almost complete solution, which does not require external packages, is now here.
Either packages or modules do exactly what you want. If you’re not happy with packages’ lack of flexibility, I suggest you give ‘box’ modules a shot: they elegantly solve your problem and allow you to treat arbitrary R source files as modules:
Just mark public functions inside the module with the comment #' #export, and load it via
box::use(./foo)
foo$f()
or
box::use(./foo[...])
f()
This fulfils all the points in your enumeration. In particular, both pieces of code make f, but not g, available to the caller. In addition, modules have numerous other advantages over using source.
On a more technical note, your code results in ep being inside the global environment because the assignment environment(ep$e$f)=ep creates a copy of ep inside your global environment. Once you’ve attached the environment, you can delete this object. However, the code still has issues (it’s more complex than necessary and, as Hong Ooi mentioned, you shouldn’t mess with the base namespace).
First, you shouldn't be messing around with the base namespace. Cluttering up the base because you don't want to clutter up the global environment is just silly.*
Second, you can use local() as a poor-man's namespacing:
e <- local({
g <- function() "hello"
f <- function() g()
environment()
})
e$f()
# [1] "hello"
* If what you have in mind is a method for storing package state, remember that (essentially) anything you put in the global environment will be placed in its own namespace when you package it up. So don't worry about cluttering things up.

Where and how to define a generic function, if multiple packages are used

I know there are related posts, but with insufficient answers. So please answer seriously to this question.
There are two packages ("keithley" and "xantrex") which control two different hardware devices. Therefore, both are independent from each other. Each of them must be initialised separately. So I wrote two methods
init.keithley(inst,...) # in keythley package
and
init.xantrex(inst,...) # in xantrex package
for the generic S3 function init(inst,...). I tried to declare the generic function in the keithley package and in the xantrex package, but then it is masked, once the latter is loaded and the methods where not found any more.
What I tried is the .onAttach()-hook
.onAttach <- function(libname, pkgname)
{
if(!exists("init"))
eval(expression(init <- function(inst,...) UseMethod("init")),envir = .GlobalEnv)
}
But with this it is NOT possible to evaluate the init() function within the package namespace. This can be proofed with the option envir = environment(), which will not work. I also tried setGenericS3() and setGeneric() with always the same result.
The "dirty" solution could be to define a third package and import it, but there must be a clean way to do this.
Where and how should I define the generic function?
Here is the solution:
As I understand, an attached package has three environments (e.g. "package:Xantrex", "namespace:Xantrex" and "imports:Xantrex") the different meaning of these is explained in detail here: Advanced R.
Now, we have to test whether the generic function init() is already there and if not we have to initialize it in the right environment. The following code will do that for us.
.onAttach <- function(libname, pkgname)
{
if(!exists("init",mode = "function"))
eval(expression(init <- function(inst,...) UseMethod("init")),envir = as.environment("package:Xantrex"))
}
The .onAttach-hook, is necessary to guarantee that the different namespaces are initialized. In contrast to that the .onLoad-hook, would be too early. Mention that the expression is evaluated in the package:Xantrex environment, so the generic becomes visible in the search path.
Next to that take care, that your NAMESPACE file will export(init.xantrex) and NOT S3method(init,xantrex). The latter will result an error, because the generic for the method init.xantrex()is not present while building the package.
Best!
Martin

Resources