`UseMethod()` vs `inherits()` to determine an object's class in R - r

If I need to treat R objects in different ways according to their class, I can either use if and else within a single function:
foo <- function (x) {
if (inherits(x, 'list')) {
# Foo the list
} else if (inherits(x, 'numeric')) {
# Foo the numeric
} else {
# Throw an error
}
}
Or I can define a method:
foo <- function (x) UseMethod('foo')
foo.list <- function (x) {
# Foo the list
}
foo.numeric <- function (x) {
# Foo the numeric
}
What are the advantages to each approach? Are there performance implications?

OK, there is some background to be covered to answer this question (in my view)...
Within R, the class of an object is explicit in situations where you have user-defined object structures or an object such as a factor vector or data frame where other attributes play an important part in the handling of the object itself—for example, level labels of a factor vector, or variable names in a data frame, are modifiable attributes that play a primary role in accessing the observations of each object.
Note, however, that elementary R objects such as vectors, matrices, and arrays, are implicitly classed, which means the class is not identified with the attributes function. Whether implicit or explicit, the class of a given object can always be retrieved using the attribute-specific function class.
When a generic function foo is applied to an object with class attribute c("first", "second"), the system searches for a function called foo.first and, if it finds it, applies it to the object. If no such function is found, a function called foo.second is tried. If no class name produces a suitable function, the function foo.default is used (if it exists). If there is no class attribute, the implicit class is tried, then the default method.
The function class prints the vector of names of classes an object inherits from.
class <- sets the classes an object inherits from.
inherits() indicates whether its first argument inherits from any of the classes specified in the what argument. Method dispatch takes place based on the class of the first argument to the generic function. If which is TRUE then an integer vector of the same length as what is returned. Each element indicates the position in the class(x) matched by the element of what; zero indicates no match. If which is FALSE then TRUE is returned by inherits if any of the names in what match with any class.
All but inherits() are primitive functions.
Considerations
OK, so let us now consider your examples in reverse order...
foo <- function (x) UseMethod('foo')
foo.list <- function (x) {
# Foo the list
}
foo.numeric <- function (x) {
# Foo the numeric
}
now if we use the function methods()
methods(foo)
[1] foo.list foo.numeric
see '?methods' for accessing help and source code
> getS3method('foo','list')
function (x) {
# Foo the list
}
thus we have a class foo and two associated methods foo.list and foo.numeric. Thus, we now know that class foo, has methods to support list and numeric operations.
OK, now let's consider your first example...
function (x) {
if (inherits(x, 'list')) {
# Foo the list
print(paste0("List: ", x))
} else if (inherits(x, 'numeric')) {
# Foo the numeric
print(paste0("Numeric: ", x))
} else {
# Throw an error
print(paste0("Unhandled - Sorry!"))
}
}
the problem is that this is not an s3 class, it is an R function. If you run methods() against foo it returns "no methods found"
> methods(foo)
no methods found
> getS3method('foo','list')
Error in getS3method("foo", "list") : no function 'foo' could be found
so what is happening in the second example? The inherits() operation is matching the class of the parameter. inherits() -> Method dispatch takes place based on the class of the first argument to the generic function.
So your first example is simply looking up the class of the function argument x, no S3 class is created or exists.
What are the advantages to each approach? Are there performance implications?
OK, I am biased here but an object’s class is one of the most useful attributes for describing an entity in R. Every object you create is identified, either implicitly or explicitly, with at least one class. R is an object-oriented programming language, meaning entities are stored as objects and have methods that act upon them.
So the second approach is the way to go in my opinion. Why? Because you are truly using the language construct as intended. The first approach where you use inherits() explicitly feels like a hack. Readability is key to comprehension from my personal perspective, thus I worry that a person reading the first example might be led to ask the question "Why did they (the programmer) take said approach, what am I missing?". My concern then is that complexity is to be avoided as it can impede code comprehension. Thus, keep it simple is advantageous to code comprehension.
In reference to code performance, an if-else parser is generally going to be faster than an object lookup model though a lookup model is not equivalent to a class mapping process so I feel the performance question is tricky to answer in this context. Why? The two approaches are different.
I hope the above points you in the right direction. Stay safe, good karma flying your way.
A couple of Book recommendations here:
R Inferno by Patrick Burns
Advanced R by Hadley Wickham
R for Everyone: Advanced Analytics and Graphics

Related

What is the recommended pattern to design a function that can implement multiple algorithms for S4 class in R?

I have an object of class S4 "MyOb" and a generic function "MyFun". I would like to implement multiple different algorithms for MyFun to process MyOb and be able to select the algorithm I want by specifying the "type" in the generic function. Type would be an argument of MyFun and would be just a character (string): "Algo1", "Algo2"...
However, each algorithm would require different arguments. I have started has indicated in the code below but then I am not sure how to continue, should I have a switch in the setMethod function that redirect to other separate functions ?
setGeneric("MyFun", function(x, type, ...) standardGeneric("MyFun"))
setMethod("MyFun", c("MyOb", "character"), function(x, type, ...){
switch()??? #to Algo1, Algo2, ....
})
Algo1<-function(x, M, N){ #blabla }
Algo2<-function(x, F, G, H){ #blablabla }
Ideally, I like to end up with something like the function baseline in the baseline R package, with
MyFun.Algo1, MyFun.Algo2 being the different function and MyFun the generic one...
I have been looking for this type of pattern but could not find any tutorial...
Any hint, advice, recommendation would be appreciated!
Thank you very much!
Firstly, you probably want to only have x as the signature of your function (you don't want a different method based on the class of type, for example). So you should start with
setGeneric("MyFun", function(x, type, ...) standardGeneric("MyFun"), signature="x")
(you don't even have to have type amongst the arguments to the generic if you don't want to — it depends whether it would be used for other classes of input.)
If you need different other arguments for later algorithms, that is fine. The ... sorts this out for you. So if you have your two algorithms
Algo1<-function(x, M, N){ #blabla }
Algo2<-function(x, F, G, H){ #blablabla }
then these will get called correctly when you call
MyFun(x,type="Algo1",M=1,N=2) ## dispatched Algo1 with M=1 and N=2
MyFun(x,type="Algo2",F=3,G=4,H=-2.7) ## dispatches Algo2 with F, G and H
The recommended way to write the MyFun method is as follows (you were right with your intuition to use switch):
setMethod("MyFun",signature(x="MyObj"), function(x,type=c("Algo1","Algo2"),...){
type <- match.arg(type)
switch(type,
Algo1=Algo1(x,...),
Algo2=Algo2(x,...),
stop("unknown algorithm")
)
})
It would probably be wise to make sure that Algo1 and Algo2 do some argument checking to make sure they are receiving the arguments they expect. This is good programming practice in general, but perhaps more important here.
If you haven't come across match.arg before, it's the recommended way of ensuring an argument matches one of a defined set of values. It uses the default argument as the list of allowed values.

Read the inputs of a user-defined function in Julia?

I am defining a function that will take a new function from a user and do some "maths". I am clearly blank about this user defined function because I don't know what his/he inputs to his function are. I would need to know these user inputs later in the "maths" that my function is doing.
This is my function, for example,
function userfunction(x,y)
return x+y
end
Just for example, I am clearly unaware of the inputs x and y in the above user-defined function. I use this function to do some maths in my function but I need to be aware that x and y are the inputs of the userfunction.
function myfun(userfunction::Function)
....do so maths...
#but some maths here need to know the inputs of "userfunction"
end
Is there a function in Julia that can read the inputs of another function? Like for instance, read the inputs of "userfunction" and return an array?
I'm not sure exactly what you're looking for, but if you want to know what kinds of arguments your userfunction takes, you can get that.
First, remember that userfunction can have multiple method, each with different arguments (yay, multiple dispatch!). To get an array of them all, do:
meths = collect(methods(div))
Then, for each method, you can look up its signature:
signature = meths[1].sig, which will give you a type Tuple{typeof(div), Int, Int}, for example.
Then, discard the first element of the signature to get the arguments:
arguments = Tuple(signature.parameters[2:end]), to get the argument types: (Int, Int).
That's pretty complicated, and perhaps there's an easier way.

Lazy evaluation for S4 method arguments

I'm implementing an S4 class that contains a data.table, and attempting to implement [ subsetting of the object (as described here) such that it also subsets the data.table. For example (defining just i subsetting):
library(data.table)
.SuperDataTable <- setClass("SuperDataTable", representation(dt="data.table"))
setMethod("[", c("SuperDataTable", "ANY", "missing", "ANY"),
function(x, i, j, ..., drop=TRUE)
{
initialize(x, dt=x#dt[i])
})
d = data.table(a=1:4, b=rep(c("x", "y"), each=2))
s = new("SuperDataTable", dt=d)
At this point, subsetting with a numeric vector (s[1:2]) works as desired (it subsets the data.table in the slot). However, I'd like to add the ability to subset using an expression. This works for the data.table itself:
s#dt[b == "x"]
# a b
# 1: 1 x
# 2: 2 x
But not for the S4 [ method:
s[b == "x"]
# Error: object 'b' not found
The problem appears to be that arguments in the signature of the S4 method are not evaluated using R's traditional lazy evaluation- see here:
All arguments in the signature of the generic function will be
evaluated when the function is called, rather than using the
traditional lazy evaluation rules of S. Therefore, it's important to
exclude from the signature any arguments that need to be dealt with
symbolically (such as the first argument to function substitute).
This explains why it doesn't work, but not how one can implement this kind of subsetting, since i and j are included in the signature of the generic. Is there any way to have the i argument not be evaluated immediately?
You may be out of luck on this one. From the R developer notes,
Arguments appearing in the signature of the generic will be evaluated as soon as the generic function
is called; therefore, any arguments that need to take advantage of lazy evaluation must not be in
the signature. These are typically arguments treated literally, often via the substitute() function.
For example, if one wanted to turn substitute() itself into a generic, the first argument, expr,
would not be in the signature since it must not be evaluated but rather treated as a literal.
Furthermore, due to method caching,
All the arguments in the full signature are evaluated as described above, not just the active
ones. Otherwise, in special circumstances the behavior of the function could change for one
method when another method was cached, definitely undesirable.
I would follow the example from the data.table package writers and use an S3 object (see line 304 of R/data.table.R in their source code). Your S3 object can still create and manipulate an S4 object underneath to maintain the semi-static typing feature.
We can't get extraordinarily clever:
‘[’ is a primitive function; methods can be defined, but the generic function is implicit, and cannot be changed.
Defining both an S3 and S4 method will dispatch the S3 method, which makes it seem like we should be able to route around the S4 call and dispatch it manually, but unfortunately the argument evaluation still occurs! You can get close by borrowing plyr::., which would give you syntax like:
s <- new('SuperDataTable', dt = as.data.table(iris))
s[.(Sepal.Length > 4), 2]
Not ideal, but closer than anything else.

If function(x) can work, why would we need function()?

I understand how "function(x)" works, but what is the role of "function()" here?
z <- function() {
y <- 2
function(x) {
x + y
}
}
function is a keyword which is part of the creation of a function (in the programming sense that Gilles describes in his answer). The other parts are the argument list (in parentheses) and the function body (in braces).
In your example, z is a function which takes no arguments. It returns a function which takes 1 argument (named x) (since R returns the last evaluated statement as the return value by default). That function returns its argument x plus 2.
When z is called (with no arguments: z()) it assigns 2 to y (inside the functions variable scope, an additional concept that I'm not going to get into). Then it creates a function (without a name) which takes a single argument named x, which, when itself called, returns its argument x plus 2. That anonymous function is returned from the call to z and, presumably, stored so that it can be called later.
See https://github.com/hadley/devtools/wiki/Functions and https://github.com/hadley/devtools/wiki/Functionals for more discussion on passing around functions as objects.
The word “function” means somewhat different things in mathematics and in programming. In mathematics, a function is a correspondence between each possible value of the parameters and a result. In programming, a function is a sequence of instructions to compute the result from the parameters.
In mathematics, a function with no argument is a constant. In programming, this is not the case, because functions can have side effects, such as printing something. So you will encounter many functions with no arguments in programs.
Hre the function function(x) { x + y } depends on the variable y. There are no side effects, so this function is very much like the mathematical function defined by $f(x) = x + y$. However, this definition is only complete for a given value of y. The previous instruction sets y to 2, so
function() {
y <- 2
function(x) {
x + y
}
}
is equivalent to
function () {
function(x) {
x + 2
}
}
in the sense that both definitions produce the same results when applied to the same value. They are, however, computed in slightly different ways.
That function is given the name z. When you call z (with no argument, so you write z()), this builds the function function (x) { x + 2 }, or something equivalent: z() is a function of one argument that adds 2 to its argument. So you can write something like z()(3) — the result is 5.
This is obviously a toy example. As you progress in your lectures, you'll see progressively more complex examples where such function building is mixed with other features to achieve useful things.
With some help I've picked out a few examples of functions without formal arguments to help you understand why they could be useful.
Functions which have side-effects
plot.new() for instance, initializes a graphics device.
Want to update the console buffer? flush.console() has your back.
Functions which have a narrow purpose
This is probably the majority of the cases.
Want to know the date/time? Call date().
Want to know the version of R? Call getRversion().

How to specify FUN used in by( ) or related apply( ) functions

In a by() function, I will use cor (correlation) to be the FUN there. However, I'd like to setup use="complete.obs" too.
I don't know how to pass this argument in the FUN = cor part.
For example,
by(data, INDICES=list(data$Age), FUN=cor)
probably
by(data, INDICES=list(data$Age), FUN=cor, use = "complete.obs")
will work.
the arguments to by are passed to FUN.
If you start looking around at various R help files for functions like by, you may start to notice a curious 'argument' popping up over and over again: .... You're going to see an ellipsis listed along with all the other arguments to a function.
This is actually an argument itself. It will collect any other arguments you pass and hand them off to subsequent functions called later. The documentation will usually tell you what function these arguments will be handed to.
In this case, in ?by we see this:
... further arguments to FUN.
This means that any other arguments you pass to by that don't match the ones listed will be handed off to the function you pass to FUN.
Another common instance can be found in plot, where the documentation only lists two specific arguments, x and y. Then there's the ... which gathers up anything else you pass to plot and hands it off to methods or to par to set graphical parameter settings.
So in #kohske's example, use = "complete.obs" will be automatically passed on the cor, since it doesn't match any of the other arguments for by.
#kohske and #joran give equivalent answers showing built in features of by (which are also present in apply and the entire plyr family) for passing additional arguments to the supplied function since this is a common application/problem. #Tomas also shows another way to specify an anonymous function which is just a function that calls the "real" function with certain parameters fixed. Fixing parameters to a function call (to effectively make a function with fewer arguments) is a common approach, especially in functional approaches to programming; in that context it is called currying or partial application.
library("functional")
by(data, INDICES=list(data$Age), FUN=Curry(cor, use = "complete.obs"))
This approach can be used when one function does not use ... to "pass along" arguments, and you want to indicate the only reason that an anonymous function is needed is to specify certain arguments.
In general, you have 2 possibilities:
1) specify the arguments in the calling function (tapply() or by() in this case). This also works even if the key argument to fun() is not the first one:
fun <- function(arg1, arg2, arg3) { ... } # just to see how fun() looks like
tapply(var1, var2, fun, arg1 = something, arg3 = something2)
# arg2 will be filled by tapply
2) you may write your wrapper function (sometimes this is needed):
tapply(var1, var2, function (x) { fun(something, x, something2) })

Resources