Dispatch of `rbind` and `cbind` for a `data.frame` - r

Background
The dispatch mechanism of the R functions rbind() and cbind() is non-standard. I explored some possibilities of writing rbind.myclass() or cbind.myclass() functions when one of the arguments is a data.frame, but so far I do not have a satisfactory approach. This post concentrates on rbind, but the same holds for cbind.
Problem
Let us create an rbind.myclass() function that simply echoes when it has been called.
rbind.myclass <- function(...) "hello from rbind.myclass"
We create an object of class myclass, and the following calls to rbind all
properly dispatch to rbind.myclass()
a <- "abc"
class(a) <- "myclass"
rbind(a, a)
rbind(a, "d")
rbind(a, 1)
rbind(a, list())
rbind(a, matrix())
However, when one of the arguments (this need not be the first one), rbind() will call base::rbind.data.frame() instead:
rbind(a, data.frame())
This behavior is a little surprising, but it is actually documented in the
dispatch section of rbind(). The advice given there is:
If you want to combine other objects with data frames,
it may be necessary to coerce them to data frames first.
In practice, this advice may be difficult to implement. Conversion to a data frame may remove essential class information. Moreover, the user who might be unware of the advice may be stuck with an error or an unexpected result after issuing the command rbind(a, x).
Approaches
Warn the user
A first possibility is to warn the user that the call to rbind(a, x) should not be made when x is a data frame. Instead, the user of package mypackage should make an explicit call to a hidden function:
mypackage:::rbind.myclass(a, x)
This can be done, but the user has to remember to make the explicit call when needed. Calling the hidden function is something of a last resort, and should not be regular policy.
Intercept rbind
Alternatively, I tried to shield the user by intercepting dispatch. My first try was to provide a local definition of base::rbind.data.frame():
rbind.data.frame <- function(...) "hello from my rbind.data.frame"
rbind(a, data.frame())
rm(rbind.data.frame)
This fails as rbind() is not fooled in calling rbind.data.frame from the .GlobalEnv, and calls the base version as usual.
Another strategy is to override rbind() by a local function, which was suggested in S3 dispatching of `rbind` and `cbind`.
rbind <- function (...) {
if (attr(list(...)[[1]], "class") == "myclass") return(rbind.myclass(...))
else return(base::rbind(...))
}
This works perfectly for dispatching to rbind.myclass(), so the user can now type rbind(a, x) for any type of object x.
rbind(a, data.frame())
The downside is that after library(mypackage) we get the message The following objects are masked from ‘package:base’: rbind .
While technically everything works as expected, there should be better ways than a base function override.
Conclusion
None of the above alternatives is satisfactory. I have read about alternatives using S4 dispatch, but so far I have not located any implementations of the idea. Any help or pointers?

As you mention yourself, using S4 would be one good solution that works nicely. I have not investigated recently, with data frames as I am much more interested in other generalized matrices, in both of my long time CRAN packages 'Matrix' (="recommended", i.e. part of every R distribution) and in 'Rmpfr'.
Actually even two different ways:
1) Rmpfr uses the new way to define methods for the '...' in rbind()/cbind().
this is well documented in ?dotsMethods (mnemonic: '...' = dots) and implemented in Rmpfr/R/array.R line 511 ff (e.g. https://r-forge.r-project.org/scm/viewvc.php/pkg/R/array.R?view=annotate&root=rmpfr)
2) Matrix uses the older approach by defining (S4) methods for rbind2() and cbind2(): If you read ?rbind it does mention that and when rbind2/cbind2 are used. The idea there: "2" means you define S4 methods with a signature for two ("2") matrix-like objects and rbind/cbind uses them for two of its potentially many arguments recursively.

The dotsMethod approach was suggested by Martin Maechler and implemented in the Rmpfr package. We need to define a new generic, class and a method using S4.
setGeneric("rbind", signature = "...")
mychar <- setClass("myclass", slots = c(x = "character"))
b <- mychar(x = "b")
rbind.myclass <- function(...) "hello from rbind.myclass"
setMethod("rbind", "myclass",
function(..., deparse.level = 1) {
args <- list(...)
if(all(vapply(args, is.atomic, NA)))
return( base::cbind(..., deparse.level = deparse.level) )
else
return( rbind.myclass(..., deparse.level = deparse.level))
})
# these work as expected
rbind(b, "d")
rbind(b, b)
rbind(b, matrix())
# this fails in R 3.4.3
rbind(b, data.frame())
Error in rbind2(..1, r) :
no method for coercing this S4 class to a vector
I haven't been able to resolve the error. See
R: Shouldn't generic methods work internally within a package without it being attached?
for a related problem.
As this approach overrides rbind(), we get the warning The following objects are masked from 'package:base': rbind.

I don't think you're going to be able to come up with something completely satisfying. The best you can do is export rbind.myclass so that users can call it directly without doing mypackage:::rbind.myclass. You can call it something else if you want (dplyr calls its version bind_rows), but if you choose to do so, I'd use a name that evokes rbind, like rbind_myclass.
Even if you can get r-core to agree to change the dispatch behavior, so that rbind dispatches on its first argument, there are still going to be cases when users will want to rbind multiple objects together with a myclass object somewhere other than the first. How else can users dispatch to rbind.myclass(df, df, myclass)?
The data.table solution seems dangerous; I would not be surprised if the CRAN maintainers put in a check and disallow this at some point.

Related

How to overload S4 slot selector `#` to be a generic function

I am trying to turn the # operator in R into a generic function for the S3 system.
Based on the chapter in Writing R extensions: adding new generic I tried implementing the generic for # like so:
`#` <- function(object, name) UseMethod("#")
`#.default` <- function(object, name) base::`#`(object, name)
However this doesn't seem to work as it breaks the # for the S4 methods. I am using Matrix package as an example of S4 instance:
Matrix::Matrix(1:4, nrow=2, ncol=2)#Dim
Error in #.default(Matrix::Matrix(1:4, nrow = 2, ncol = 2), Dim) :
no slot of name "name" for this object of class "dgeMatrix"
How to implement a generic # so it correctly dispatches in the case of S4 classes?
EDIT
Also interested in opinions about why it might not be a good idea?
R's documentation is somewhat confusing as to whether # is already a generic or not: the help page for # says it is, but it isn't listed on the internalGenerics page.
The # operator has specific behaviour as well as (perhaps) being a generic. From the help page for #: "It is checked that object is an S4 object (see isS4), and it is an error to attempt to use # on any other object." That would appear to rule out writing methods for S3 classes, though the documentation is unclear if this check happens before method dispatch (if there is any) or after (whence it could be skipped if you supplied a specific method for some S3 class).
You can implement what you want by completely redefining what # is, along the line of the suggestion in comments:
`#.default` <- function(e1,e2) slot(e1,substitute(e2))
but there are two reasons not to do this:
1) As soon as someone loads your package, it supersedes the normal # function, so if people call it with other S4 objects, they are getting your version rather than the R base version.
2) This version is considerably less efficient than the internal one, and because of (1) you have just forced your users to use it (unless they use the cumbersome construction base::"#"(e1,e2)). Efficiency may not matter to your use case, but it may matter to your users' other code that uses S4.
Practically, a reasonable compromise might be to define your own binary operator %#%, and have the default method call #. That is,
`%#%` <- function(e1,e2) slot(e1,substitute(e2))
setGeneric("%#%")
This is called in practice as follows:
> setClass("testClass",slots=c(a="character")) -> testClass
> x <- testClass(a="cheese")
> x %#% a
[1] "cheese"

How to call a function that contains a comma in R?

When using S3 or S4 classes in R, it's common to set a class as generic my_generic and then use dots for each subtype my_generic.my_type. Lately, I've been seeing this pattern, but using commas instead of periods my_generic,my_type. The problem is that I can't use the help operator ? or enter the function name in the console because the comma is treated as an error. Is there a workaround? I've tried using backticks, but it doesn't work.
An example of this is the draw method in the ComplexHeatmap package:
methods(draw)
[1] draw.colorkey draw.details draw,HeatmapAnnotation-method
[4] draw,HeatmapList-method draw,Heatmap-method draw.key
draw,SingleAnnotation-method
Doing ?draw.colorkey works, but ?draw,HeatmapAnnotation-method doesn't.
First of all, it is terribly bad practice to call methods directly, especially with S4. The "functions with a comma" you're looking at, are actually S4 methods.
Help pages
To find the help page (if it exists), you can use quotation marks like this:
?"draw,Heatmap-method"
But success is not guaranteed. This heavily depends on whether the author of the package has separate help files for the methods, or used the correct aliases. In this particular case, you see that on the help page ?draw the author of the package added a couple of links to the specific methods.
Find all S4 methods
To get an idea about all the S4 methods alone , use showMethods instead of methods.
> library(ComplexHeatmap)
> showMethods("draw")
Function: draw (package ComplexHeatmap)
object="Heatmap"
object="HeatmapAnnotation"
object="HeatmapList"
object="SingleAnnotation"
See the internal code of a method
To get the actual method so you can see the internal code, use getMethod:
getMethod(draw, signature = "Heatmap")
Method Definition:
function (object, ...)
{...
}
.local(object, ...)
}
<environment: namespace:ComplexHeatmap>
Signatures:
object
target "Heatmap"
defined "Heatmap"
Use a specific S4 method (but don't really)
You can assign the result of that call and use that as a function:
mat = matrix(rnorm(80, 2), 8, 10)
mat = rbind(mat, matrix(rnorm(40, -2), 4, 10))
rownames(mat) = letters[1:12]
colnames(mat) = letters[1:10]
ht = Heatmap(mat)
myMethod <- getMethod(draw, signature = "Heatmap")
myMethod(ht)
But you shouldn't try to call a method directly. The result of that last call is the exact same as
draw(ht)
So you better use the generic function and let the dispatching do its work.

Parsing ellipsis arguments

I'm trying to write a wrapper for some ggplot2 graphs and am trying to use ellipsis' to make the function flexible. I want to save the user (n = 1 me!) from having to explicitly pass the axis titles so thought it might be possible to parse the ... arguments and set the axis titles appropriately. I've read in several Stackoverflow threads (e.g. 1, e.g. 2 or even the R Documentation) that ellipsis arguments can be converted to a list using args <- list(...) so have knocked up a simplified example...
test <- function(...){
args <- list(...)
is.list(args) %>% print()
if(grepl('a', args)){
title <- 'A'
}
else if(grepl('b', args)){
title <- 'B'
}
return(title)
}
Testing the function I get what I expect when supplying a single a as an argument...
> test(a)
[1] TRUE
[1] "A"
But when I try passing other arguments including multiple one via ellipsis I don't understand what is happening. One non-a argument
> test(b)
Error in test(b) (from #2) : object 'b' not found
...then the first argument as a with secondary ones...
> test(a, c, d)
Error in test(a, c, d) (from #2) : object 'd' not found
...or non a at first but something further down the line which should match....
> test(c, b, d)
Error in test(c, b, d) (from #2) : object 'b' not found
The problem is cropping up at args <- list(...) because the logical test to see if args is a list isn't printed, but this doesn't fit with what I've read list(...) does (which is turn the ellipsis arguments into a list). I expect I may need to use something like args <- list(...) %>% unlist() in order to convert the list into a vector which can then be used as an argument in grepl() (and have actually tried it but as far as I can tell the error is occurring before getting to the if()) but I don't understand whats going on and would be grateful for any explanations.
EDIT :
In light of comments it looks like this is a problem of my own creation as I'm mixing Standard and Non-Standard Evaluation. I had been trying to write a wrapper to ggplot2 and have been using the NSE Vignette and lazyeeval vignette to learn/guide me (as well as various threads here on SO), but was faltering when trying to pick out specific variables from the ellipsis (...) to pass to the ggplot call I was making.
Downside is work want results and don't afford much time for learning/improving our coding practices so I'll switch to using Standard Evaulation and have another stab at properly understanding Non-Standard Evaluation in the future.

I cannot find unique function from data.table package [duplicate]

help(unique) shows that unique function is present in two packages - base and data.table. I would like to use this function from data.table package. I thought that the following syntax - data <- data.table::unique(data) indicates the package to be used. But I get the following error -
'unique' is not an exported object from 'namespace:data.table'
But data <- unique(data) works well.
What is wrong here?
The function in question is really unique.data.table, an S3 method defined in the data.table package. That method is not really intended to be called directly, so it isn't exported. This is typically the case with S3 methods. Instead, the package registers the method as an S3 method, which then allows the S3 generic, base::unique in this case, to dispatch on it. So the right way to call the function is:
library(data.table)
irisDT <- data.table(iris)
unique(irisDT)
We use base::unique, which is exported, and it dispatches data.table:::unique.data.table, which is not exported. The function data.table:::unique does not actually exist (or does it need to).
As eddi points out, base::unique dispatches based on the class of the object called. So base::unique will call data.table:::unique.data.table only if the object is a data.table. You can force a call to that method directly with something like data.table:::unique.data.table(iris), but internally that will mostly likely result in the next method getting called unless your object is actually a data.table.
There are actually two infix operators in R that pull functions from particular package namespaces. You used :: but there is also a ::: that retrieves "unexported" functions. The unique-function is actually a family of functions and its behavior will depend on both the class of its argument and the particular packages that have been loaded. The R term of this is "generic". Try:
data <- data.table:::unique(data) # assuming 'data' is a data.table
The other tool that lets you peek behind the curtain that the lack of "exportation" is creating is the getAnywhere-function. It lets you see the code at the console:
> unique.data.table
Error: object 'unique.data.table' not found
> getAnywhere(unique.data.table)
A single object matching ‘unique.data.table’ was found
It was found in the following places
registered S3 method for unique from namespace data.table
namespace:data.table
with value
function (x, incomparables = FALSE, fromLast = FALSE, by = key(x),
...)
{
if (!cedta())
return(NextMethod("unique"))
dups <- duplicated.data.table(x, incomparables, fromLast,
by, ...)
.Call(CsubsetDT, x, which_(dups, FALSE), seq_len(ncol(x)))
}
<bytecode: 0x2ff645950>
<environment: namespace:data.table>

`data.table::unique` errors: is not an exported object from namespace

help(unique) shows that unique function is present in two packages - base and data.table. I would like to use this function from data.table package. I thought that the following syntax - data <- data.table::unique(data) indicates the package to be used. But I get the following error -
'unique' is not an exported object from 'namespace:data.table'
But data <- unique(data) works well.
What is wrong here?
The function in question is really unique.data.table, an S3 method defined in the data.table package. That method is not really intended to be called directly, so it isn't exported. This is typically the case with S3 methods. Instead, the package registers the method as an S3 method, which then allows the S3 generic, base::unique in this case, to dispatch on it. So the right way to call the function is:
library(data.table)
irisDT <- data.table(iris)
unique(irisDT)
We use base::unique, which is exported, and it dispatches data.table:::unique.data.table, which is not exported. The function data.table:::unique does not actually exist (or does it need to).
As eddi points out, base::unique dispatches based on the class of the object called. So base::unique will call data.table:::unique.data.table only if the object is a data.table. You can force a call to that method directly with something like data.table:::unique.data.table(iris), but internally that will mostly likely result in the next method getting called unless your object is actually a data.table.
There are actually two infix operators in R that pull functions from particular package namespaces. You used :: but there is also a ::: that retrieves "unexported" functions. The unique-function is actually a family of functions and its behavior will depend on both the class of its argument and the particular packages that have been loaded. The R term of this is "generic". Try:
data <- data.table:::unique(data) # assuming 'data' is a data.table
The other tool that lets you peek behind the curtain that the lack of "exportation" is creating is the getAnywhere-function. It lets you see the code at the console:
> unique.data.table
Error: object 'unique.data.table' not found
> getAnywhere(unique.data.table)
A single object matching ‘unique.data.table’ was found
It was found in the following places
registered S3 method for unique from namespace data.table
namespace:data.table
with value
function (x, incomparables = FALSE, fromLast = FALSE, by = key(x),
...)
{
if (!cedta())
return(NextMethod("unique"))
dups <- duplicated.data.table(x, incomparables, fromLast,
by, ...)
.Call(CsubsetDT, x, which_(dups, FALSE), seq_len(ncol(x)))
}
<bytecode: 0x2ff645950>
<environment: namespace:data.table>

Resources