I have seen the use of %||% within the Seurat package (e.g. line 1662) and was wondering what is the meaning of this expression
You can define custom operators in R. Their names can be pretty much arbitrary, but they need to be delimited by %…%.
%||% is such an operator. It isn’t predefined in core R, but you can define it yourself, and Seurat did that, in R/utilities.R.
Its definition is however quite a common one, and can be found in many packages, not just Seurat. Its semantics are effectively this:
`%||%` = function (lhs, rhs) {
if (is.null(lhs) rhs else lhs
}
That is: use the first operand, unless that is NULL. In that case, use the second operand.
Related
If I need to treat R objects in different ways according to their class, I can either use if and else within a single function:
foo <- function (x) {
if (inherits(x, 'list')) {
# Foo the list
} else if (inherits(x, 'numeric')) {
# Foo the numeric
} else {
# Throw an error
}
}
Or I can define a method:
foo <- function (x) UseMethod('foo')
foo.list <- function (x) {
# Foo the list
}
foo.numeric <- function (x) {
# Foo the numeric
}
What are the advantages to each approach? Are there performance implications?
OK, there is some background to be covered to answer this question (in my view)...
Within R, the class of an object is explicit in situations where you have user-defined object structures or an object such as a factor vector or data frame where other attributes play an important part in the handling of the object itself—for example, level labels of a factor vector, or variable names in a data frame, are modifiable attributes that play a primary role in accessing the observations of each object.
Note, however, that elementary R objects such as vectors, matrices, and arrays, are implicitly classed, which means the class is not identified with the attributes function. Whether implicit or explicit, the class of a given object can always be retrieved using the attribute-specific function class.
When a generic function foo is applied to an object with class attribute c("first", "second"), the system searches for a function called foo.first and, if it finds it, applies it to the object. If no such function is found, a function called foo.second is tried. If no class name produces a suitable function, the function foo.default is used (if it exists). If there is no class attribute, the implicit class is tried, then the default method.
The function class prints the vector of names of classes an object inherits from.
class <- sets the classes an object inherits from.
inherits() indicates whether its first argument inherits from any of the classes specified in the what argument. Method dispatch takes place based on the class of the first argument to the generic function. If which is TRUE then an integer vector of the same length as what is returned. Each element indicates the position in the class(x) matched by the element of what; zero indicates no match. If which is FALSE then TRUE is returned by inherits if any of the names in what match with any class.
All but inherits() are primitive functions.
Considerations
OK, so let us now consider your examples in reverse order...
foo <- function (x) UseMethod('foo')
foo.list <- function (x) {
# Foo the list
}
foo.numeric <- function (x) {
# Foo the numeric
}
now if we use the function methods()
methods(foo)
[1] foo.list foo.numeric
see '?methods' for accessing help and source code
> getS3method('foo','list')
function (x) {
# Foo the list
}
thus we have a class foo and two associated methods foo.list and foo.numeric. Thus, we now know that class foo, has methods to support list and numeric operations.
OK, now let's consider your first example...
function (x) {
if (inherits(x, 'list')) {
# Foo the list
print(paste0("List: ", x))
} else if (inherits(x, 'numeric')) {
# Foo the numeric
print(paste0("Numeric: ", x))
} else {
# Throw an error
print(paste0("Unhandled - Sorry!"))
}
}
the problem is that this is not an s3 class, it is an R function. If you run methods() against foo it returns "no methods found"
> methods(foo)
no methods found
> getS3method('foo','list')
Error in getS3method("foo", "list") : no function 'foo' could be found
so what is happening in the second example? The inherits() operation is matching the class of the parameter. inherits() -> Method dispatch takes place based on the class of the first argument to the generic function.
So your first example is simply looking up the class of the function argument x, no S3 class is created or exists.
What are the advantages to each approach? Are there performance implications?
OK, I am biased here but an object’s class is one of the most useful attributes for describing an entity in R. Every object you create is identified, either implicitly or explicitly, with at least one class. R is an object-oriented programming language, meaning entities are stored as objects and have methods that act upon them.
So the second approach is the way to go in my opinion. Why? Because you are truly using the language construct as intended. The first approach where you use inherits() explicitly feels like a hack. Readability is key to comprehension from my personal perspective, thus I worry that a person reading the first example might be led to ask the question "Why did they (the programmer) take said approach, what am I missing?". My concern then is that complexity is to be avoided as it can impede code comprehension. Thus, keep it simple is advantageous to code comprehension.
In reference to code performance, an if-else parser is generally going to be faster than an object lookup model though a lookup model is not equivalent to a class mapping process so I feel the performance question is tricky to answer in this context. Why? The two approaches are different.
I hope the above points you in the right direction. Stay safe, good karma flying your way.
A couple of Book recommendations here:
R Inferno by Patrick Burns
Advanced R by Hadley Wickham
R for Everyone: Advanced Analytics and Graphics
I have an object of class S4 "MyOb" and a generic function "MyFun". I would like to implement multiple different algorithms for MyFun to process MyOb and be able to select the algorithm I want by specifying the "type" in the generic function. Type would be an argument of MyFun and would be just a character (string): "Algo1", "Algo2"...
However, each algorithm would require different arguments. I have started has indicated in the code below but then I am not sure how to continue, should I have a switch in the setMethod function that redirect to other separate functions ?
setGeneric("MyFun", function(x, type, ...) standardGeneric("MyFun"))
setMethod("MyFun", c("MyOb", "character"), function(x, type, ...){
switch()??? #to Algo1, Algo2, ....
})
Algo1<-function(x, M, N){ #blabla }
Algo2<-function(x, F, G, H){ #blablabla }
Ideally, I like to end up with something like the function baseline in the baseline R package, with
MyFun.Algo1, MyFun.Algo2 being the different function and MyFun the generic one...
I have been looking for this type of pattern but could not find any tutorial...
Any hint, advice, recommendation would be appreciated!
Thank you very much!
Firstly, you probably want to only have x as the signature of your function (you don't want a different method based on the class of type, for example). So you should start with
setGeneric("MyFun", function(x, type, ...) standardGeneric("MyFun"), signature="x")
(you don't even have to have type amongst the arguments to the generic if you don't want to — it depends whether it would be used for other classes of input.)
If you need different other arguments for later algorithms, that is fine. The ... sorts this out for you. So if you have your two algorithms
Algo1<-function(x, M, N){ #blabla }
Algo2<-function(x, F, G, H){ #blablabla }
then these will get called correctly when you call
MyFun(x,type="Algo1",M=1,N=2) ## dispatched Algo1 with M=1 and N=2
MyFun(x,type="Algo2",F=3,G=4,H=-2.7) ## dispatches Algo2 with F, G and H
The recommended way to write the MyFun method is as follows (you were right with your intuition to use switch):
setMethod("MyFun",signature(x="MyObj"), function(x,type=c("Algo1","Algo2"),...){
type <- match.arg(type)
switch(type,
Algo1=Algo1(x,...),
Algo2=Algo2(x,...),
stop("unknown algorithm")
)
})
It would probably be wise to make sure that Algo1 and Algo2 do some argument checking to make sure they are receiving the arguments they expect. This is good programming practice in general, but perhaps more important here.
If you haven't come across match.arg before, it's the recommended way of ensuring an argument matches one of a defined set of values. It uses the default argument as the list of allowed values.
Consider the following R code:
y1 <- dataset %>% dplyr::filter(W == 1)
This works, but there seems to some magic here. Usually, when we have an expression like foo(bar), we should be able to do this:
baz <= bar
foo(baz)
However, in the presented code snippet, we cannot evaluate W == 1 outside of dplyr::filter()! W is not a defined variable.
What's going on?
dplyr uses a concept called Non-standard Evaluation (NSE) to make columns from the data frame argument accessible to its functions without quoting or using dataframe$column syntax. Basically:
[Non-standard evaluation] is a catch-all term that means they don’t follow the usual R rules of evaluation. Instead, they capture the expression that you typed and evaluate it in a custom way.1
In this case, the custom evaluation takes the argument(s) given to dplyr::filter, and parses them so that W can be used to refer to the dataset$W. The reason that you can't then take that variable and use it elsewhere is that NSE is only applied to the scope of the function.
NSE makes a trade-off: functions which modify scope are less safe and/or unusable in programming where you're building a program that uses functions to modify other functions:
This is an example of the general tension between functions that are designed for interactive use and functions that are safe to program with. A function that uses substitute() might reduce typing, but it can be difficult to call from another function.2
For example, if you wanted to write a function which would use the same code, but swap out W == 1 for W == 0 (or some completely different filter), NSE would make that more difficult to accomplish.
In 2017 the tidyverse started to build a solution to this in tidy evaluation.
I'm implementing an S4 class that contains a data.table, and attempting to implement [ subsetting of the object (as described here) such that it also subsets the data.table. For example (defining just i subsetting):
library(data.table)
.SuperDataTable <- setClass("SuperDataTable", representation(dt="data.table"))
setMethod("[", c("SuperDataTable", "ANY", "missing", "ANY"),
function(x, i, j, ..., drop=TRUE)
{
initialize(x, dt=x#dt[i])
})
d = data.table(a=1:4, b=rep(c("x", "y"), each=2))
s = new("SuperDataTable", dt=d)
At this point, subsetting with a numeric vector (s[1:2]) works as desired (it subsets the data.table in the slot). However, I'd like to add the ability to subset using an expression. This works for the data.table itself:
s#dt[b == "x"]
# a b
# 1: 1 x
# 2: 2 x
But not for the S4 [ method:
s[b == "x"]
# Error: object 'b' not found
The problem appears to be that arguments in the signature of the S4 method are not evaluated using R's traditional lazy evaluation- see here:
All arguments in the signature of the generic function will be
evaluated when the function is called, rather than using the
traditional lazy evaluation rules of S. Therefore, it's important to
exclude from the signature any arguments that need to be dealt with
symbolically (such as the first argument to function substitute).
This explains why it doesn't work, but not how one can implement this kind of subsetting, since i and j are included in the signature of the generic. Is there any way to have the i argument not be evaluated immediately?
You may be out of luck on this one. From the R developer notes,
Arguments appearing in the signature of the generic will be evaluated as soon as the generic function
is called; therefore, any arguments that need to take advantage of lazy evaluation must not be in
the signature. These are typically arguments treated literally, often via the substitute() function.
For example, if one wanted to turn substitute() itself into a generic, the first argument, expr,
would not be in the signature since it must not be evaluated but rather treated as a literal.
Furthermore, due to method caching,
All the arguments in the full signature are evaluated as described above, not just the active
ones. Otherwise, in special circumstances the behavior of the function could change for one
method when another method was cached, definitely undesirable.
I would follow the example from the data.table package writers and use an S3 object (see line 304 of R/data.table.R in their source code). Your S3 object can still create and manipulate an S4 object underneath to maintain the semi-static typing feature.
We can't get extraordinarily clever:
‘[’ is a primitive function; methods can be defined, but the generic function is implicit, and cannot be changed.
Defining both an S3 and S4 method will dispatch the S3 method, which makes it seem like we should be able to route around the S4 call and dispatch it manually, but unfortunately the argument evaluation still occurs! You can get close by borrowing plyr::., which would give you syntax like:
s <- new('SuperDataTable', dt = as.data.table(iris))
s[.(Sepal.Length > 4), 2]
Not ideal, but closer than anything else.
In a by() function, I will use cor (correlation) to be the FUN there. However, I'd like to setup use="complete.obs" too.
I don't know how to pass this argument in the FUN = cor part.
For example,
by(data, INDICES=list(data$Age), FUN=cor)
probably
by(data, INDICES=list(data$Age), FUN=cor, use = "complete.obs")
will work.
the arguments to by are passed to FUN.
If you start looking around at various R help files for functions like by, you may start to notice a curious 'argument' popping up over and over again: .... You're going to see an ellipsis listed along with all the other arguments to a function.
This is actually an argument itself. It will collect any other arguments you pass and hand them off to subsequent functions called later. The documentation will usually tell you what function these arguments will be handed to.
In this case, in ?by we see this:
... further arguments to FUN.
This means that any other arguments you pass to by that don't match the ones listed will be handed off to the function you pass to FUN.
Another common instance can be found in plot, where the documentation only lists two specific arguments, x and y. Then there's the ... which gathers up anything else you pass to plot and hands it off to methods or to par to set graphical parameter settings.
So in #kohske's example, use = "complete.obs" will be automatically passed on the cor, since it doesn't match any of the other arguments for by.
#kohske and #joran give equivalent answers showing built in features of by (which are also present in apply and the entire plyr family) for passing additional arguments to the supplied function since this is a common application/problem. #Tomas also shows another way to specify an anonymous function which is just a function that calls the "real" function with certain parameters fixed. Fixing parameters to a function call (to effectively make a function with fewer arguments) is a common approach, especially in functional approaches to programming; in that context it is called currying or partial application.
library("functional")
by(data, INDICES=list(data$Age), FUN=Curry(cor, use = "complete.obs"))
This approach can be used when one function does not use ... to "pass along" arguments, and you want to indicate the only reason that an anonymous function is needed is to specify certain arguments.
In general, you have 2 possibilities:
1) specify the arguments in the calling function (tapply() or by() in this case). This also works even if the key argument to fun() is not the first one:
fun <- function(arg1, arg2, arg3) { ... } # just to see how fun() looks like
tapply(var1, var2, fun, arg1 = something, arg3 = something2)
# arg2 will be filled by tapply
2) you may write your wrapper function (sometimes this is needed):
tapply(var1, var2, function (x) { fun(something, x, something2) })