I would like to define some S4 generics dispatching on the ... argument such that the more specialized methods call the inherited method through callNextMethod(). However, as illustrated by the MWE, this fails with the following error.
# sample function which returns the number of its arguments
f <- function(...) length(list(...))
setGeneric("f")
## [1] "f"
setMethod("f", "character", function(...){ print("character"); callNextMethod() })
## [1] "f"
f(1, 2, 3)
## [1] 3
f("a", "b", "c")
## [1] "character"
## Error in callNextMethod(): a call to callNextMethod() appears in a call to '.Method', but the call does not seem to come from either a generic function or another 'callNextMethod'
This behavior doesn't seem right to me, but maybe I'm missing something here. I would expect the failing callNextMethod() to dispatch to the inherited default method function(...) length(list(...)) effectively returning:
## [1] "character"
## [1] 3
Any thoughts on this?
Update
Additionally, I've found the following difference in behavior between S4 methods dispatching on formal arguments and ones dispatching on .... Consider the following example where switching the signature from x to ... changes the way objects are resolved.
f = function(x, ..., a = b) {
b = "missing 'a'"
cat(a)
}
f()
## missing 'a'
f(a = 1)
## 1
setGeneric("f", signature = "x")
f()
## missing 'a'
setGeneric("f", signature = "...")
f()
## Error in cat(a) : object 'b' not found
According to ?dotsMethods the dispatch on ... is implemented differently, but as suggested in the last sentence, this shouldn't cause any differences in behavior compared to regular generics. However, the above findings seem to prove the opposite.
Methods dispatching on “...” were introduced in version 2.8.0 of R. The initial implementation of the corresponding selection and dispatch is in an R function, for flexibility while the new mechanism is being studied. In this implementation, a local version of setGeneric is inserted in the generic function's environment. The local version selects a method according to the criteria above and calls that method, from the environment of the generic function. This is slightly different from the action taken by the C implementation when “...” is not involved. Aside from the extra computing time required, the method is evaluated in a true function call, as opposed to the special context constructed by the C version (which cannot be exactly replicated in R code.) However, situations in which different computational results would be obtained have not been encountered so far, and seem very unlikely.
Related
In the S4 setGeneric function documentation, the difference between the def and useAsDefault parameters is poorly explained: they seem to do the same thing, but in practice there's a confusing and poorly contextualised explanation. Could anybody provide a practical example where the two entries behave differently?
Some context
For better or worse, the behaviour of setGeneric differs in quite subtle ways depending on its arguments. It is possible that some of these quirks are not intentional. The help page accessed by ?setGeneric and even the reference book Extending R (Chambers, 2016) follow the Pareto principle by spelling out the most frequent use cases and only sketching out the less frequent ones, in particular not expounding about useAsDefault whose use is intended only for nonstandard generic functions.
S4 generic functions are designated by the formal class genericFunction, which has three direct subclasses: standardGeneric, nonstandardGenericFunction, and groupGenericFunction. To give context to the behaviour of setGeneric, we should distinguish between standard and nonstandard generic functions. (Group generic functions are defined by setGroupGeneric, not setGeneric, so we ignore them here.)
Standard generic functions only perform method dispatch: arguments are passed "as is" to the dispatched method, and the return value of the method is returned "as is" by the generic. Nonstandard generic functions perform method dispatch while allowing for pre-processing of arguments and post-processing of return values.
Hence if we define a generic function with setGeneric("zzz", def=), then in the standard case body(zzz) is just the call standardGeneric("zzz"), whereas in the nonstandard case it is a call containing the call standardGeneric("zzz"), typically of the form:
{
## optionally do some stuff with the arguments
val <- standardGeneric("zzz")
## optionally do some stuff with 'val', then return
}
To answer your question, let's restrict attention to calls to setGeneric of the form
setGeneric("zzz", def=, useAsDefault=)
where def is a function and useAsDefault is either a function or missing, and let's further assume that there is no existing function zzz.
Then setGeneric behaves in one of three ways, depending on body(def):
If body(def) does not contain the call standardGeneric("zzz"),
then setGeneric constructs a standard generic function and assigns
it to the symbol zzz in the calling environment. It associates
with zzz a default method (of formal class derivedDefaultMethod),
which is retrieved as zzz#default. The default method
is useAsDefault if not missing and def otherwise.
setGeneric("zzz", def = function(x, ...) 1 + x, useAsDefault = function(x, ...) x)
## [1] "zzz"
zzz
## standardGeneric for "zzz" defined from package ".GlobalEnv"
##
## function (x, ...)
## standardGeneric("zzz")
## <environment: 0x14f048860>
## Methods may be defined for arguments: x
## Use showMethods(zzz) for currently available ones.
zzz#default
## Method Definition (Class "derivedDefaultMethod"):
##
## function (x, ...)
## x
##
## Signatures:
## x
## target "ANY"
## defined "ANY"
zzz(0)
## [1] 0
If body(def) is precisely the call standardGeneric("zzz"),
then, as in Case 1, setGeneric constructs and assigns to zzz
a standard generic function. However, here, the default method
is determined entirely by useAsDefault.
If useAsDefault is missing, then zzz gets no default method.
setGeneric("zzz", def = function(x, ...) standardGeneric("zzz"))
## [1] "zzz"
zzz
## standardGeneric for "zzz" defined from package ".GlobalEnv"
##
## function (x, ...)
## standardGeneric("zzz")
## <environment: 0x112010f60>
## Methods may be defined for arguments: x
## Use showMethods(zzz) for currently available ones.
zzz#default
## NULL
zzz(0)
## Error in (function (classes, fdef, mtable) :
## unable to find an inherited method for function 'zzz' for signature '"numeric"'
If body(def) is not precisely the call standardGeneric("zzz"),
but does contain it, then setGeneric constructs and assigns to
zzz a nonstandard generic function. As in Case 2, useAsDefault
specifies the default method. However, here, the generic function
inherits its body from def.
setGeneric("zzz", def = function(x, ...) 1 + standardGeneric("zzz"), useAsDefault = function(x, ...) x)
## [1] "zzz"
zzz
## nonstandardGenericFunction for "zzz" defined from package ".GlobalEnv"
##
## function (x, ...)
## 1 + standardGeneric("zzz")
## <environment: 0x123fd5410>
## Methods may be defined for arguments: x
## Use showMethods(zzz) for currently available ones.
zzz#default
## Method Definition (Class "derivedDefaultMethod"):
##
## function (x, ...)
## x
##
## Signatures:
## x
## target "ANY"
## defined "ANY"
zzz(0)
## [1] 1
I am (probably) NOT referring to the "all other variables" meaning like var1~. here.
I was pointed to plyr once again and looked into mlplyand wondered why parameters are defined with leading dot like this:
function (.data, .fun = NULL, ..., .expand = TRUE, .progress = "none",
.parallel = FALSE)
{
if (is.matrix(.data) & !is.list(.data))
.data <- .matrix_to_df(.data)
f <- splat(.fun)
alply(.data = .data, .margins = 1, .fun = f, ..., .expand = .expand,
.progress = .progress, .parallel = .parallel)
}
<environment: namespace:plyr>
What's the use of that? Is it just personal preference, naming convention or more? Often R is so functional that I miss a trick that's long been done before.
A dot in function name can mean any of the following:
nothing at all
a separator between method and class in S3 methods
to hide the function name
Possible meanings
1. Nothing at all
The dot in data.frame doesn't separate data from frame, other than visually.
2. Separation of methods and classes in S3 methods
plot is one example of a generic S3 method. Thus plot.lm and plot.glm are the underlying function definitions that are used when calling plot(lm(...)) or plot(glm(...))
3. To hide internal functions
When writing packages, it is sometimes useful to use leading dots in function names because these functions are somewhat hidden from general view. Functions that are meant to be purely internal to a package sometimes use this.
In this context, "somewhat hidden" simply means that the variable (or function) won't normally show up when you list object with ls(). To force ls to show these variables, use ls(all.names=TRUE). By using a dot as first letter of a variable, you change the scope of the variable itself. For example:
x <- 3
.x <- 4
ls()
[1] "x"
ls(all.names=TRUE)
[1] ".x" "x"
x
[1] 3
.x
[1] 4
4. Other possible reasons
In Hadley's plyr package, he uses the convention to use leading dots in function names. This as a mechanism to try and ensure that when resolving variable names, the values resolve to the user variables rather than internal function variables.
Complications
This mishmash of different uses can lead to very confusing situations, because these different uses can all get mixed up in the same function name.
For example, to convert a data.frame to a list you use as.list(..)
as.list(iris)
In this case as.list is a S3 generic method, and you are passing a data.frame to it. Thus the S3 function is called as.list.data.frame:
> as.list.data.frame
function (x, ...)
{
x <- unclass(x)
attr(x, "row.names") <- NULL
x
}
<environment: namespace:base>
And for something truly spectacular, load the data.table package and look at the function as.data.table.data.frame:
> library(data.table)
> methods(as.data.table)
[1] as.data.table.data.frame* as.data.table.data.table* as.data.table.matrix*
Non-visible functions are asterisked
> data.table:::as.data.table.data.frame
function (x, keep.rownames = FALSE)
{
if (keep.rownames)
return(data.table(rn = rownames(x), x, keep.rownames = FALSE))
attr(x, "row.names") = .set_row_names(nrow(x))
class(x) = c("data.table", "data.frame")
x
}
<environment: namespace:data.table>
At the start of a name it works like the UNIX filename convention to keep objects hidden by default.
ls()
character(0)
.a <- 1
ls()
character(0)
ls(all.names = TRUE)
[1] ".a"
It can be just a token with no special meaning, it's not doing anything more than any other allowed token.
my.var <- 1
my_var <- 1
myVar <- 1
It's used for S3 method dispatch. So, if I define simple class "myClass" and create objects with that class attribute, then generic functions such as print() will automatically dispatch to my specific print method.
myvar <- 1
print(myvar)
class(myvar) <- c("myClass", class(myvar))
print.myClass <- function(x, ...) {
print(paste("a special message for myClass objects, this one has length", length(x)))
return(invisible(NULL))
}
print(myvar)
There is an ambiguity in the syntax for S3, since you cannot tell from a function's name whether it is an S3 method or just a dot in the name. But, it's a very simple mechanism that is very powerful.
There's a lot more to each of these three aspects, and you should not take my examples as good practice, but they are the basic differences.
If a user defines a function .doSomething and is lazy to specify all the roxygen documentation for parameters, it will not generate errors for compiling the package
I am (probably) NOT referring to the "all other variables" meaning like var1~. here.
I was pointed to plyr once again and looked into mlplyand wondered why parameters are defined with leading dot like this:
function (.data, .fun = NULL, ..., .expand = TRUE, .progress = "none",
.parallel = FALSE)
{
if (is.matrix(.data) & !is.list(.data))
.data <- .matrix_to_df(.data)
f <- splat(.fun)
alply(.data = .data, .margins = 1, .fun = f, ..., .expand = .expand,
.progress = .progress, .parallel = .parallel)
}
<environment: namespace:plyr>
What's the use of that? Is it just personal preference, naming convention or more? Often R is so functional that I miss a trick that's long been done before.
A dot in function name can mean any of the following:
nothing at all
a separator between method and class in S3 methods
to hide the function name
Possible meanings
1. Nothing at all
The dot in data.frame doesn't separate data from frame, other than visually.
2. Separation of methods and classes in S3 methods
plot is one example of a generic S3 method. Thus plot.lm and plot.glm are the underlying function definitions that are used when calling plot(lm(...)) or plot(glm(...))
3. To hide internal functions
When writing packages, it is sometimes useful to use leading dots in function names because these functions are somewhat hidden from general view. Functions that are meant to be purely internal to a package sometimes use this.
In this context, "somewhat hidden" simply means that the variable (or function) won't normally show up when you list object with ls(). To force ls to show these variables, use ls(all.names=TRUE). By using a dot as first letter of a variable, you change the scope of the variable itself. For example:
x <- 3
.x <- 4
ls()
[1] "x"
ls(all.names=TRUE)
[1] ".x" "x"
x
[1] 3
.x
[1] 4
4. Other possible reasons
In Hadley's plyr package, he uses the convention to use leading dots in function names. This as a mechanism to try and ensure that when resolving variable names, the values resolve to the user variables rather than internal function variables.
Complications
This mishmash of different uses can lead to very confusing situations, because these different uses can all get mixed up in the same function name.
For example, to convert a data.frame to a list you use as.list(..)
as.list(iris)
In this case as.list is a S3 generic method, and you are passing a data.frame to it. Thus the S3 function is called as.list.data.frame:
> as.list.data.frame
function (x, ...)
{
x <- unclass(x)
attr(x, "row.names") <- NULL
x
}
<environment: namespace:base>
And for something truly spectacular, load the data.table package and look at the function as.data.table.data.frame:
> library(data.table)
> methods(as.data.table)
[1] as.data.table.data.frame* as.data.table.data.table* as.data.table.matrix*
Non-visible functions are asterisked
> data.table:::as.data.table.data.frame
function (x, keep.rownames = FALSE)
{
if (keep.rownames)
return(data.table(rn = rownames(x), x, keep.rownames = FALSE))
attr(x, "row.names") = .set_row_names(nrow(x))
class(x) = c("data.table", "data.frame")
x
}
<environment: namespace:data.table>
At the start of a name it works like the UNIX filename convention to keep objects hidden by default.
ls()
character(0)
.a <- 1
ls()
character(0)
ls(all.names = TRUE)
[1] ".a"
It can be just a token with no special meaning, it's not doing anything more than any other allowed token.
my.var <- 1
my_var <- 1
myVar <- 1
It's used for S3 method dispatch. So, if I define simple class "myClass" and create objects with that class attribute, then generic functions such as print() will automatically dispatch to my specific print method.
myvar <- 1
print(myvar)
class(myvar) <- c("myClass", class(myvar))
print.myClass <- function(x, ...) {
print(paste("a special message for myClass objects, this one has length", length(x)))
return(invisible(NULL))
}
print(myvar)
There is an ambiguity in the syntax for S3, since you cannot tell from a function's name whether it is an S3 method or just a dot in the name. But, it's a very simple mechanism that is very powerful.
There's a lot more to each of these three aspects, and you should not take my examples as good practice, but they are the basic differences.
If a user defines a function .doSomething and is lazy to specify all the roxygen documentation for parameters, it will not generate errors for compiling the package
I saw:
“To understand computations in R, two slogans are helpful:
• Everything that exists is an object.
• Everything that happens is a function call."
— John Chambers
But I just found:
a <- 2
is.object(a)
# FALSE
Actually, if a variable is a pure base type, it's result is.object() would be FALSE. So it should not be an object.
So what's the real meaning about 'Everything that exists is an object' in R?
The function is.object seems only to look if the object has a "class" attribute. So it has not the same meaning as in the slogan.
For instance:
x <- 1
attributes(x) # it does not have a class attribute
NULL
is.object(x)
[1] FALSE
class(x) <- "my_class"
attributes(x) # now it has a class attribute
$class
[1] "my_class"
is.object(x)
[1] TRUE
Now, trying to answer your real question, about the slogan, this is how I would put it. Everything that exists in R is an object in the sense that it is a kind of data structure that can be manipulated. I think this is better understood with functions and expressions, which are not usually thought as data.
Taking a quote from Chambers (2008):
The central computation in R is a function call, defined by the
function object itself and the objects that are supplied as the
arguments. In the functional programming model, the result is defined
by another object, the value of the call. Hence the traditional motto
of the S language: everything is an object—the arguments, the value,
and in fact the function and the call itself: All of these are defined
as objects. Think of objects as collections of data of all kinds. The data contained and the way the data is organized depend on the class from which the object was generated.
Take this expression for example mean(rnorm(100), trim = 0.9). Until it is is evaluated, it is an object very much like any other. So you can change its elements just like you would do it with a list. For instance:
call <- substitute(mean(rnorm(100), trim = 0.9))
call[[2]] <- substitute(rt(100,2 ))
call
mean(rt(100, 2), trim = 0.9)
Or take a function, like rnorm:
rnorm
function (n, mean = 0, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
You can change its default arguments just like a simple object, like a list, too:
formals(rnorm)[2] <- 100
rnorm
function (n, mean = 100, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
Taking one more time from Chambers (2008):
The key concept is that expressions for evaluation are themselves
objects; in the traditional motto of the S language, everything is an
object. Evaluation consists of taking the object representing an
expression and returning the object that is the value of that
expression.
So going back to our call example, the call is an object which represents another object. When evaluated, it becomes that other object, which in this case is the numeric vector with one number: -0.008138572.
set.seed(1)
eval(call)
[1] -0.008138572
And that would take us to the second slogan, which you did not mention, but usually comes together with the first one: "Everything that happens is a function call".
Taking again from Chambers (2008), he actually qualifies this statement a little bit:
Nearly everything that happens in R results from a function call.
Therefore, basic programming centers on creating and refining
functions.
So what that means is that almost every transformation of data that happens in R is a function call. Even a simple thing, like a parenthesis, is a function in R.
So taking the parenthesis like an example, you can actually redefine it to do things like this:
`(` <- function(x) x + 1
(1)
[1] 2
Which is not a good idea but illustrates the point. So I guess this is how I would sum it up: Everything that exists in R is an object because they are data which can be manipulated. And (almost) everything that happens is a function call, which is an evaluation of this object which gives you another object.
I love that quote.
In another (as of now unpublished) write-up, the author continues with
R has a uniform internal structure for representing all objects. The evaluation process keys off that structure, in a simple form that is essentially
composed of function calls, with objects as arguments and an object as the
value. Understanding the central role of objects and functions in R makes
use of the software more effective for any challenging application, even those where extending R is not the goal.
but then spends several hundred pages expanding on it. It will be a great read once finished.
Objects For x to be an object means that it has a class thus class(x) returns a class for every object. Even functions have a class as do environments and other objects one might not expect:
class(sin)
## [1] "function"
class(.GlobalEnv)
## [1] "environment"
I would not pay too much attention to is.object. is.object(x) has a slightly different meaning than what we are using here -- it returns TRUE if x has a class name internally stored along with its value. If the class is stored then class(x) returns the stored value and if not then class(x) will compute it from the type. From a conceptual perspective it matters not how the class is stored internally (stored or computed) -- what matters is that in both cases x is still an object and still has a class.
Functions That all computation occurs through functions refers to the fact that even things that you might not expect to be functions are actually functions. For example when we write:
{ 1; 2 }
## [1] 2
if (pi > 0) 2 else 3
## [1] 2
1+2
## [1] 3
we are actually making invocations of the {, if and + functions:
`{`(1, 2)
## [1] 2
`if`(pi > 0, 2, 3)
## [1] 2
`+`(1, 2)
## [1] 3
I'm trying to figure out how NextMethod() works. The most detailed explanation I have found of the S3 class system is in Chambers & Hastie (edts.)'s Statistical Models in S (1993, Chapman & Hall), however I find the part concerning NextMethod invocation a little obscure. Following are the relevant paragraphs I'm trying to make sense of (pp. 268-269).
Turning now to methods invoked as a result of a call to
NextMethod(), these behave as if they had been called from the
previous method with a special call. The arguments in the call to the
inherited method are the same in number, order, and actual argument
names as those in the call to the current method (and, therefore, in
the call to the generic). The expressions for the arguments, however,
are the names of the corresponding formal arguments of the current
method. Suppose, for example, that the expression print(ratings) has
invoked the method print.ordered(). When this method invokes
NextMethod(), this is equivalent to a call to print.factor() of
the form print.factor(x), where x is here the x in the frame of
print.ordered(). If several arguments match the formal argument
"...", those arguments are represented in the call to the inherited
method y special names "..1", "..2", etc. The evaluator recognizes
these names and treats them appropriately (see page 476 for an
example).
This rather subtle definition exists to ensure that the semantics of
function calls in S carry over as cleanly as possible to the use of
methods (compare Becker, Chambers and Wilks's The New S Language,
page 354). In particular:
Arguments are passed down from the current method to the inherited method with their current values at the time NextMethod() is called.
Lazy evaluation continues in effect; unevaluated arguments stay unevaluated.
Missing arguments remain missing in the inherited method.
Arguments passed through the "..." formal argument arrive with the correct argument name.
Objects in the frame that do not correspond to actual arguments in the call will not be passed to the inherited method."
The inheritance process is essentially transparent so far as the
arguments go.
Two points that I find confusing are:
What is "the current method" and what is "the previous method"?
What is the difference between "The arguments in the call to the inherited method", "The expressions for the arguments" and "the names of the corresponding formal arguments of the current method"?
Generally speaking, if anyone could please restate the description given in the above paragraphs in a lucider fashion, I'd appreciate it.
Hard to go through all this post, but I think that this small example can help to demystify the NextMethod dispatching.
I create an object with 2 classes attributes (inheritance) 'first' and 'second'.
x <- 1
attr(x,'class') <- c('first','second')
Then I create a generic method Cat to print my object
Cate <- function(x,...)UseMethod('Cate')
I define Cate method for each class.
Cate.first <- function(x,...){
print(match.call())
print(paste('first:',x))
print('---------------------')
NextMethod() ## This will call Cate.second
}
Cate.second <- function(x,y){
print(match.call())
print(paste('second:',x,y))
}
Now you can can check Cate call using this example:
Cate(x,1:3)
Cate.first(x = x, 1:3)
[1] "first: 1"
[1] "---------------------"
Cate.second(x = x, y = 1:3)
[1] "second: 1 1" "second: 1 2" "second: 1 3"
For Cate.second the previous method is Cate.first
Arguments x and y are passed down from the current method to the inherited
method with their current values at the time NextMethod() is called.
Argument y passed through the "..." formal argument arrive with the correct argument name Cate.second(x = x, y = 1:3)
Consider this example where generic function f is called and it invokes f.ordered and then, using NextMethod, f.ordered invokes f.factor:
f <- function(x) UseMethod("f") # generic
f.ordered <- function(x) { x <- x[-1]; NextMethod() }
f.factor <- function(x) x # inherited method
x <- ordered(c("a", "b", "c"))
class(x)
## [1] "ordered" "factor"
f(x)
## [1] b c
## Levels: a < b < c
Now consider the original text:
Turning now to methods invoked as a result of a call to NextMethod(),
these behave as if they had been called from the previous method with
a special call.
Here f calls f.ordered which calls f.factor so the method "invoked as a
result of a call to NextMethod" is f.factor and the previous method is
f.ordered.
The arguments in the call to the inherited method are the same in
number, order, and actual argument names as those in the call to the
current method (and, therefore, in the call to the generic). The
expressions for the arguments, however, are the names of the
corresponding formal arguments of the current method. Suppose, for
example, that the expression print(ratings) has invoked the method
print.ordered(). When this method invokes NextMethod(), this is
equivalent to a call to print.factor() of the form print.factor(x),
where x is here the x in the frame of print.ordered()
Now we switch perspectives and we are sitting in f.ordered so now f.ordered
is the current method and f.factor is the inherited method.
At the point that f.ordered invokes NextMethod() a special call is constructed
to call f.factor whose arguments are the same as those passed to f.ordered and
to the generic f
except that they refer to the versions of the arguments in f.ordered (which
makes a difference here as f.ordered changes the argument before invoking
f.factor.