How to overload S4 slot selector `#` to be a generic function - r

I am trying to turn the # operator in R into a generic function for the S3 system.
Based on the chapter in Writing R extensions: adding new generic I tried implementing the generic for # like so:
`#` <- function(object, name) UseMethod("#")
`#.default` <- function(object, name) base::`#`(object, name)
However this doesn't seem to work as it breaks the # for the S4 methods. I am using Matrix package as an example of S4 instance:
Matrix::Matrix(1:4, nrow=2, ncol=2)#Dim
Error in #.default(Matrix::Matrix(1:4, nrow = 2, ncol = 2), Dim) :
no slot of name "name" for this object of class "dgeMatrix"
How to implement a generic # so it correctly dispatches in the case of S4 classes?
EDIT
Also interested in opinions about why it might not be a good idea?

R's documentation is somewhat confusing as to whether # is already a generic or not: the help page for # says it is, but it isn't listed on the internalGenerics page.
The # operator has specific behaviour as well as (perhaps) being a generic. From the help page for #: "It is checked that object is an S4 object (see isS4), and it is an error to attempt to use # on any other object." That would appear to rule out writing methods for S3 classes, though the documentation is unclear if this check happens before method dispatch (if there is any) or after (whence it could be skipped if you supplied a specific method for some S3 class).
You can implement what you want by completely redefining what # is, along the line of the suggestion in comments:
`#.default` <- function(e1,e2) slot(e1,substitute(e2))
but there are two reasons not to do this:
1) As soon as someone loads your package, it supersedes the normal # function, so if people call it with other S4 objects, they are getting your version rather than the R base version.
2) This version is considerably less efficient than the internal one, and because of (1) you have just forced your users to use it (unless they use the cumbersome construction base::"#"(e1,e2)). Efficiency may not matter to your use case, but it may matter to your users' other code that uses S4.
Practically, a reasonable compromise might be to define your own binary operator %#%, and have the default method call #. That is,
`%#%` <- function(e1,e2) slot(e1,substitute(e2))
setGeneric("%#%")
This is called in practice as follows:
> setClass("testClass",slots=c(a="character")) -> testClass
> x <- testClass(a="cheese")
> x %#% a
[1] "cheese"

Related

How to define an S3 generic with the same name as a primitive function?

I have a class myclass in an R package for which I would like to define a method as.raw, so of the same name as the primitive function as.raw(). If constructor, generic and method are defined as follows...
new_obj <- function(n) structure(n, class = "myclass") # constructor
as.raw <- function(obj) UseMethod("as.raw") # generic
as.raw.myclass <- function(obj) obj + 1 # method (dummy example here)
... then R CMD check leads to:
Warning: declared S3 method 'as.raw.myclass' not found
See section ‘Generic functions and methods’ in the ‘Writing R
Extensions’ manual.
If the generic is as_raw instead of as.raw, then there's no problem, so I assume this comes from the fact that the primitive function as.raw already exists. Is it possible to 'overload' as.raw by defining it as a generic (or would one necessarily need to use a different name?)?
Update: NAMESPACE contains
export("as.raw") # export the generic
S3method("as.raw", "myclass") # export the method
This seems somewhat related, but dimnames there is a generic and so there is a solution (just don't define your own generic), whereas above it is unclear (to me) what the solution is.
The problem here appears to be that as.raw is a primitive function (is.primitive(as.raw)). From the ?setGeneric help page, it says
A number of the basic R functions are specially implemented as primitive functions, to be evaluated directly in the underlying C code rather than by evaluating an R language definition. Most have implicit generics (see implicitGeneric), and become generic as soon as methods (including group methods) are defined on them.
And according to the ?InternalMethods help page, as.raw is one of these primitive generics. So in this case, you just need to export the S3method. And you want to make sure your function signature matches the signature of the existing primitive function.
So if I have the following R code
new_obj <- function(n) structure(n, class = "myclass")
as.raw.myclass <- function(x) x + 1
and a NAMESPACE file of
S3method(as.raw,myclass)
export(new_obj)
Then this passes the package checks for me (on R 4.0.2). And I can run the code with
as.raw(new_obj(4))
# [1] 5
# attr(,"class")
# [1] "myclass"
So in this particular case, you need to leave the as.raw <- function(obj) UseMethod("as.raw") part out.

Why/how some packages define their functions in nameless environment?

In my code, I needed to check which package the function is defined from (in my case it was exprs(): I needed it from Biobase but it turned out to be overriden by rlang).
From this SO question, I thought I could use simply environmentName(environment(functionname)). But for exprs from Biobase that expression returned empty string:
environmentName(environment(exprs))
# [1] ""
After checking the structure of environment(exprs) I noticed that it has .Generic member which contains package name as an attribute:
environment(exprs)$.Generic
# [1] "exprs"
# attr(,"package")
# [1] "Biobase"
So, for now I made this helper function:
pkgparent <- function(functionObj) {
functionEnv <- environment(functionObj)
envName <- environmentName(functionEnv)
if (envName!="")
return(envName) else
return(attr(functionEnv$.Generic,'package'))
}
It does the job and correctly returns package name for the function if it is loaded, for example:
pkgparent(exprs)
# Error in environment(functionObj) : object 'exprs' not found
library(Biobase)
pkgparent(exprs)
# [1] "Biobase"
library(rlang)
# The following object is masked from ‘package:Biobase’:
# exprs
pkgparent(exprs)
# [1] "rlang"
But I still would like to learn how does it happen that for some packages their functions are defined in "unnamed" environment while others will look like <environment: namespace:packagename>.
What you’re seeing here is part of how S4 method dispatch works. In fact, .Generic is part of the R method dispatch mechanism.
The rlang package is a red herring, by the way: the issue presents itself purely due to Biobase’s use of S4.
But more generally your resolution strategy might fail in other situations, because there are other reasons (albeit rarely) why packages might define functions inside a separate environment. The reason for this is generally to define a closure over some variable.
For example, it’s generally impossible to modify variables defined inside a package at the namespace level, because the namespace gets locked when loaded. There are multiple ways to work around this. A simple way, if a package needs a stateful function, is to define this function inside an environment. For example, you could define a counter function that increases its count on each invocation as follows:
counter = local({
current = 0L
function () {
current <<- current + 1L
current
}
})
local defines an environment in which the function is wrapped.
To cope with this kind of situation, what you should do instead is to iterate over parent environments until you find a namespace environment. But there’s a simpler solution, because R already provides a function to find a namespace environment for a given environment (by performing said iteration):
pkgparent = function (fun) {
nsenv = topenv(environment(fun))
environmentName(nsenv)
}

Dispatch of `rbind` and `cbind` for a `data.frame`

Background
The dispatch mechanism of the R functions rbind() and cbind() is non-standard. I explored some possibilities of writing rbind.myclass() or cbind.myclass() functions when one of the arguments is a data.frame, but so far I do not have a satisfactory approach. This post concentrates on rbind, but the same holds for cbind.
Problem
Let us create an rbind.myclass() function that simply echoes when it has been called.
rbind.myclass <- function(...) "hello from rbind.myclass"
We create an object of class myclass, and the following calls to rbind all
properly dispatch to rbind.myclass()
a <- "abc"
class(a) <- "myclass"
rbind(a, a)
rbind(a, "d")
rbind(a, 1)
rbind(a, list())
rbind(a, matrix())
However, when one of the arguments (this need not be the first one), rbind() will call base::rbind.data.frame() instead:
rbind(a, data.frame())
This behavior is a little surprising, but it is actually documented in the
dispatch section of rbind(). The advice given there is:
If you want to combine other objects with data frames,
it may be necessary to coerce them to data frames first.
In practice, this advice may be difficult to implement. Conversion to a data frame may remove essential class information. Moreover, the user who might be unware of the advice may be stuck with an error or an unexpected result after issuing the command rbind(a, x).
Approaches
Warn the user
A first possibility is to warn the user that the call to rbind(a, x) should not be made when x is a data frame. Instead, the user of package mypackage should make an explicit call to a hidden function:
mypackage:::rbind.myclass(a, x)
This can be done, but the user has to remember to make the explicit call when needed. Calling the hidden function is something of a last resort, and should not be regular policy.
Intercept rbind
Alternatively, I tried to shield the user by intercepting dispatch. My first try was to provide a local definition of base::rbind.data.frame():
rbind.data.frame <- function(...) "hello from my rbind.data.frame"
rbind(a, data.frame())
rm(rbind.data.frame)
This fails as rbind() is not fooled in calling rbind.data.frame from the .GlobalEnv, and calls the base version as usual.
Another strategy is to override rbind() by a local function, which was suggested in S3 dispatching of `rbind` and `cbind`.
rbind <- function (...) {
if (attr(list(...)[[1]], "class") == "myclass") return(rbind.myclass(...))
else return(base::rbind(...))
}
This works perfectly for dispatching to rbind.myclass(), so the user can now type rbind(a, x) for any type of object x.
rbind(a, data.frame())
The downside is that after library(mypackage) we get the message The following objects are masked from ‘package:base’: rbind .
While technically everything works as expected, there should be better ways than a base function override.
Conclusion
None of the above alternatives is satisfactory. I have read about alternatives using S4 dispatch, but so far I have not located any implementations of the idea. Any help or pointers?
As you mention yourself, using S4 would be one good solution that works nicely. I have not investigated recently, with data frames as I am much more interested in other generalized matrices, in both of my long time CRAN packages 'Matrix' (="recommended", i.e. part of every R distribution) and in 'Rmpfr'.
Actually even two different ways:
1) Rmpfr uses the new way to define methods for the '...' in rbind()/cbind().
this is well documented in ?dotsMethods (mnemonic: '...' = dots) and implemented in Rmpfr/R/array.R line 511 ff (e.g. https://r-forge.r-project.org/scm/viewvc.php/pkg/R/array.R?view=annotate&root=rmpfr)
2) Matrix uses the older approach by defining (S4) methods for rbind2() and cbind2(): If you read ?rbind it does mention that and when rbind2/cbind2 are used. The idea there: "2" means you define S4 methods with a signature for two ("2") matrix-like objects and rbind/cbind uses them for two of its potentially many arguments recursively.
The dotsMethod approach was suggested by Martin Maechler and implemented in the Rmpfr package. We need to define a new generic, class and a method using S4.
setGeneric("rbind", signature = "...")
mychar <- setClass("myclass", slots = c(x = "character"))
b <- mychar(x = "b")
rbind.myclass <- function(...) "hello from rbind.myclass"
setMethod("rbind", "myclass",
function(..., deparse.level = 1) {
args <- list(...)
if(all(vapply(args, is.atomic, NA)))
return( base::cbind(..., deparse.level = deparse.level) )
else
return( rbind.myclass(..., deparse.level = deparse.level))
})
# these work as expected
rbind(b, "d")
rbind(b, b)
rbind(b, matrix())
# this fails in R 3.4.3
rbind(b, data.frame())
Error in rbind2(..1, r) :
no method for coercing this S4 class to a vector
I haven't been able to resolve the error. See
R: Shouldn't generic methods work internally within a package without it being attached?
for a related problem.
As this approach overrides rbind(), we get the warning The following objects are masked from 'package:base': rbind.
I don't think you're going to be able to come up with something completely satisfying. The best you can do is export rbind.myclass so that users can call it directly without doing mypackage:::rbind.myclass. You can call it something else if you want (dplyr calls its version bind_rows), but if you choose to do so, I'd use a name that evokes rbind, like rbind_myclass.
Even if you can get r-core to agree to change the dispatch behavior, so that rbind dispatches on its first argument, there are still going to be cases when users will want to rbind multiple objects together with a myclass object somewhere other than the first. How else can users dispatch to rbind.myclass(df, df, myclass)?
The data.table solution seems dangerous; I would not be surprised if the CRAN maintainers put in a check and disallow this at some point.

type/origin of R's 'as' function

R's S3 OO system is centered around generic functions that call methods depending on the class of the object the generic function is being called on. The crux is that the generic function calls the appropriate method, as opposed to other OO programming languages in which the method is defined within the class.
For example, the mean function is a generic function.
isGeneric("mean")
methods(mean)
This will print
TRUE
[1] mean,ANY-method mean.Date mean.default mean.difftime
[5] mean.IDate* mean,Matrix-method mean.POSIXct mean.POSIXlt
[9] mean,sparseMatrix-method mean,sparseVector-method
see '?methods' for accessing help and source code
I was exploring R a bit and found the as function. I am confused by the fact that R says the function is not generic, but it still has methods.
isGeneric("as")
methods(as)
TRUE
[1] as.AAbin as.AAbin.character
[3] as.alignment as.allPerms
[5] as.array as.array.default
[7] as.binary as.bitsplits
[9] as.bitsplits.prop.part as.call
...
At the end there is a warning that says that as is not a generic.
Warning message:
In .S3methods(generic.function, class, parent.frame()) :
function 'as' appears not to be S3 generic; found functions that look like S3 methods
Could someone explain me what the as function is and how is connected to as.list, as.data.frame etc? R says that as.list is a generic (where I am tempted to get a bit mad at the inconsistencies within R, because I would expect as.list to be a method for a list object from the as generic function). Please help.
as is not an S3 generic, but notice that you got a TRUE. (I got a FALSE.) That means you have loaded a package that definesas as an S4-generic. S3-generics work via class dispatch that employs a *.default function and the UseMethod-function. The FALSE I get means there is no method defined for a generic as that would get looked up. One arguable reason for the lack of a generic as is that calling such a function with only one data object would not specify a "coercion destination". That means the destination needs to be built into the function name.
After declaring as to be Generic (note the capitalization which is a hint that this applies to S4 features:
setGeneric("as") # note that I didn't really even need to define any methods
get('as')
#--- output----
standardGeneric for "as" defined from package "methods"
function (object, Class, strict = TRUE, ext = possibleExtends(thisClass,
Class))
standardGeneric("as")
<environment: 0x7fb1ba501740>
Methods may be defined for arguments: object, Class, strict, ext
Use showMethods("as") for currently available ones.
If I reboot R (and don't load any libraries that call setGeneric for 'as') I get:
get('as')
#--- output ---
function (object, Class, strict = TRUE, ext = possibleExtends(thisClass,
Class))
{
if (.identC(Class, "double"))
Class <- "numeric"
thisClass <- .class1(object)
if (.identC(thisClass, Class) || .identC(Class, "ANY"))
return(object)
where <- .classEnv(thisClass, mustFind = FALSE)
coerceFun <- getGeneric("coerce", where = where)
coerceMethods <- .getMethodsTable(coerceFun, environment(coerceFun),
inherited = TRUE)
asMethod <- .quickCoerceSelect(thisClass, Class, coerceFun,
coerceMethods, where)
.... trimmed the rest of the code
But you ask "why", always a dangerous question when discussing language design, of course. I've flipped through the last chapter of Statistical Models in S which is the cited reference for most of the help pages that apply to S3 dispatch and find no discussion of either coercion or the as function. There is an implicit definition of "S3 generic" requiring the use of UseMethod but no mention of why as was left out of that strategy. I think of two possibilities: it is to prevent any sort of inheritance ambiguity in the application of the coercion, or it is an efficiency decision.
I should probably add that there is an S4 setAs-function and that you can find all the S4-coercion functions with showMethods("coerce").

How to call a function that contains a comma in R?

When using S3 or S4 classes in R, it's common to set a class as generic my_generic and then use dots for each subtype my_generic.my_type. Lately, I've been seeing this pattern, but using commas instead of periods my_generic,my_type. The problem is that I can't use the help operator ? or enter the function name in the console because the comma is treated as an error. Is there a workaround? I've tried using backticks, but it doesn't work.
An example of this is the draw method in the ComplexHeatmap package:
methods(draw)
[1] draw.colorkey draw.details draw,HeatmapAnnotation-method
[4] draw,HeatmapList-method draw,Heatmap-method draw.key
draw,SingleAnnotation-method
Doing ?draw.colorkey works, but ?draw,HeatmapAnnotation-method doesn't.
First of all, it is terribly bad practice to call methods directly, especially with S4. The "functions with a comma" you're looking at, are actually S4 methods.
Help pages
To find the help page (if it exists), you can use quotation marks like this:
?"draw,Heatmap-method"
But success is not guaranteed. This heavily depends on whether the author of the package has separate help files for the methods, or used the correct aliases. In this particular case, you see that on the help page ?draw the author of the package added a couple of links to the specific methods.
Find all S4 methods
To get an idea about all the S4 methods alone , use showMethods instead of methods.
> library(ComplexHeatmap)
> showMethods("draw")
Function: draw (package ComplexHeatmap)
object="Heatmap"
object="HeatmapAnnotation"
object="HeatmapList"
object="SingleAnnotation"
See the internal code of a method
To get the actual method so you can see the internal code, use getMethod:
getMethod(draw, signature = "Heatmap")
Method Definition:
function (object, ...)
{...
}
.local(object, ...)
}
<environment: namespace:ComplexHeatmap>
Signatures:
object
target "Heatmap"
defined "Heatmap"
Use a specific S4 method (but don't really)
You can assign the result of that call and use that as a function:
mat = matrix(rnorm(80, 2), 8, 10)
mat = rbind(mat, matrix(rnorm(40, -2), 4, 10))
rownames(mat) = letters[1:12]
colnames(mat) = letters[1:10]
ht = Heatmap(mat)
myMethod <- getMethod(draw, signature = "Heatmap")
myMethod(ht)
But you shouldn't try to call a method directly. The result of that last call is the exact same as
draw(ht)
So you better use the generic function and let the dispatching do its work.

Resources