Method dispatch when mixing S3 and S4 - r

I'd like to understand the steps R goes through to find the appropriate function when mixing S3 and S4. Here's an example:
set.seed(1)
d <- data.frame(a=rep(c('a', 'b'), each=15),
b=rep(c('x', 'y', 'z'), times=5),
y=rnorm(30))
m <- lme4::lmer(y ~ b + (1|a), data=d)
l <- lsmeans::lsmeans(m, 'b')
multcomp::cld(l)
I don't fully understand what happens when the final line gets executed.
multcomp::cld prints UseMethod("cld"), so S3 method dispatch.
isS4(l) shows that l is an S4 class object.
It seems that, despite calling an S3 generic, the S3 dispatch system is completely ignored. Creating a function print.lsmobj <- function(obj) print('S3') (since class(l) is lsmobj) and running cld(l) does not print "S3".
showMethods(lsmobj) or showMethods(ref.grid) (the super class), do not list anything that resembles a cld function.
Using debugonce(multcomp::cld) shows that the function that is called eventually is cld.ref.grid from lsmeans.
I was wondering, however, how to realise that cld.ref.grid will eventually be called without any "tricks" like debugonce. That is, what are the steps R performs to get to cld.ref.grid.

In order for S3 methods to be registered, the generic has to be available. Here, I write a simple foo method for merMod objects:
> library(lme4)
> foo.merMod = function(object, ...) { "foo" }
> showMethods(class = "merMod")
Function ".DollarNames":
<not an S4 generic function>
Function "complete":
<not an S4 generic function>
Function "formals<-":
<not an S4 generic function>
Function "functions":
<not an S4 generic function>
Function: getL (package lme4)
x="merMod"
Function "prompt":
<not an S4 generic function>
Function: show (package methods)
object="merMod"
> methods(class = "merMod")
[1] anova as.function coef confint cooks.distance
[6] deviance df.residual drop1 extractAIC family
[11] fitted fixef formula getL getME
[16] hatvalues influence isGLMM isLMM isNLMM
[21] isREML logLik model.frame model.matrix ngrps
[26] nobs plot predict print profile
[31] ranef refit refitML rePCA residuals
[36] rstudent show sigma simulate summary
[41] terms update VarCorr vcov weights
Neither list includes foo. But if we define the generic, then it shows up in methods() results:
> foo = function(object, ...) UseMethod("foo")
> methods(class = "merMod")
[1] anova as.function coef confint cooks.distance
[6] deviance df.residual drop1 extractAIC family
[11] fitted fixef foo formula getL
[16] getME hatvalues influence isGLMM isLMM
[21] isNLMM isREML logLik model.frame model.matrix
[26] ngrps nobs plot predict print
[31] profile ranef refit refitML rePCA
[36] residuals rstudent show sigma simulate
[41] summary terms update VarCorr vcov
[46] weights
Now it includes foo
Similarly, in your example, methods() will reveal the existence of cld if you do library(multcomp), because that is where the generic for cld sits.

The older R documentation (pre-2016) used to contain more details than the current documentation but roughly speaking, the process is as follows in descending order of priority:
1) if the function is a standard S4 generic and any of the arguments in the signature are S4 (according to isS4), then the best S4 method is chosen according to the usual rules.
2) if the function is a nonstandard S4 generic then its body is executed, which at some point then calls S4 dispatch itself.
3) if the function is a S3 generic function then S3 dispatch takes place on the first argument (except for internal generic binary operators).
4) if the function isn't a generic at all, then it is evaluated in the usual way with lazy evaluation for all its arguments.
Note that from the help page from setGeneric:
"Functions that dispatch S3 methods by calling UseMethod are ordinary functions, not objects from the "genericFunction" class. They are made generic like any other function, but some special considerations apply to ensure that S4 and S3 method dispatch is consistent (see Methods_for_S3)."

Related

What is the purpose of stepfun in ecdf()?

R> ecdf
function (x)
{
x <- sort(x)
n <- length(x)
if (n < 1)
stop("'x' must have 1 or more non-missing values")
vals <- unique(x)
rval <- approxfun(vals, cumsum(tabulate(match(x, vals)))/n,
method = "constant", yleft = 0, yright = 1, f = 0, ties = "ordered")
class(rval) <- c("ecdf", "stepfun", class(rval))
assign("nobs", n, envir = environment(rval))
attr(rval, "call") <- sys.call()
rval
}
The above is the code of ecdf(). I see that the return value is assigned with class stepfun. I don't understand what it is for. Given approxfun() does linear interpolation, why is stepfun needed? What is the purpose of adding stepfun to the class?
In the call to approxfun() in ecdf(), the method is "constant", which means it isn't doing linear interpolation, it's generating a step function.
To find out what the class on the result affects, you can use the methods() function:
methods(class="stepfun")
#> [1] knots lines plot print summary
#> see '?methods' for accessing help and source code
Created on 2022-02-12 by the reprex package (v2.0.1.9000)
So potentially if you call any of the knots(), lines(), plot(), print() or summary() functions on the result you'll get behaviour tailored to step functions. However, the class of the result is computed as c("ecdf", "stepfun", class(rval)), so there might be "ecdf" methods that override the "stepfun" methods:
methods(class="ecdf")
#> [1] plot print quantile summary
#> see '?methods' for accessing help and source code
Created on 2022-02-12 by the reprex package (v2.0.1.9000)
Yes, the plot(), print() and summary() functions will call the "ecdf" methods in preference to the "stepfun" methods. That still leaves knots() and lines(), and conceivably the others could call NextMethod() to get to the "stepfun" methods.
One clarification: the method argument to approxfun() is just the name of an argument; in the discussion above, "methods" was used to refer to one of the ways R does object oriented programming, using methods and classes.

find ALL applicable methods in R

For looking up methdos for a particular class, there is methods in R, e.g.
> methods(class="lm")
[1] add1 alias anova case.names coerce
[6] confint cooks.distance deviance dfbeta dfbetas
[11] drop1 dummy.coef effects extractAIC family
[16] formula hatvalues influence initialize kappa
[21] labels logLik model.frame model.matrix nobs
[26] plot predict print proj qr
[31] residuals rstandard rstudent show simulate
[36] slotsFromS3 summary variable.names vcov
Unfortunately this does not list all applicable methods: for instance AIC is missing in the above list, and I guess there are yet many more. From the AIC documentation, it can be concluded that it is applicable because it asks for a logLik method, but this can not be concluded from the output of methods.
Is there some way to find out which methods an object accepts?
Meanwhile I have figured out why AIC cannot be found as a method of lm, even though it is applicable. Instead of deleting my question, I here provide an answer in case others might find it helpful, too.
AIC is not an S4 "generic" method, but uses the S3 object mechanism which is based on naming conventions (resolved by UseMethod):
> # not an S4 method:
> isGeneric("AIC")
[1] FALSE
> # S3 methods that looks up with UseMethod():
> AIC
function (object, ..., k = 2)
UseMethod("AIC")
> methods(AIC)
[1] AIC.default* AIC.logLik*
This means that AIC calls AIC.logLik if called on an object of class logLik, but on all other cases it calls AIC.default, which tries to call logLik on the given object:
# internal method (*) => ':::' operator required to list code
> stats:::AIC.default
function (object, ..., k = 2)
{
ll <- if (isNamespaceLoaded("stats4"))
stats4::logLik
else logLik
if (!missing(...)) {
# [...]
}
else {
lls <- ll(object)
-2 * as.numeric(lls) + k * attr(lls, "df")
}
}
As this is hidden in the implementation of AIC.default and is not apparent from the naming convention, it is impossible for methods(class="lm") to figure out that AIC is a method of lm.

Methods of summary() function

I'm using summary() on output of mle(stats4) function, its output belongs to class mle. I would like to find out how summary() estimates standard deviation of coefficient returned by mle(stats4), but I do not see summary.mle in list printed by methods(summary), why can't I find summary.mle() function ?
(I guess the proper function is summary.mlm(), but I'm not sure that and don't know why it would be mlm, instead of mle)
It's actually what summary.mle would be if it were an S3 method. S3 methods get created and then dispatched using the generic_function_name.class_of_first_argument mechanism whereas S4 methods are dispatched on the basis of their argument "signature" which allows consideration of second and later arguments. This is how to get showMethods to display the code that is called when an S4-method is called. This is an instance where only the first argument is used as the signature. You can choose any of the object signatures that appear in the abbreviated output to specify the classes-agument, and it is the includeDefs flag that prompts display of the code:
showMethods("summary",classes="mle", includeDefs=TRUE)
#---(output to console)----
Function: summary (package base)
object="mle"
function (object, ...)
{
cmat <- cbind(Estimate = object#coef, `Std. Error` = sqrt(diag(object#vcov)))
m2logL <- 2 * object#min
new("summary.mle", call = object#call, coef = cmat, m2logL = m2logL)
}
As shown in
>library(stats4)
>showMethods("summary")
Function: summary (package base)
object="ANY"
object="mle"
The summary is interpreted in the S4 way. I don't know how to check the code in R directly, so I search the source of stats4 directly for you.
In stats4/R/mle.R, there is:
setMethod("summary", "mle", function(object, ...){
cmat <- cbind(Estimate = object#coef,
`Std. Error` = sqrt(diag(object#vcov)))
m2logL <- 2*object#min
new("summary.mle", call = object#call, coef = cmat, m2logL = m2logL)
})
So it creates a S4 object summary.mle. And I guess you could trace the code by yourself now.

Random Forest in R

If x is a Random Forest in R, for example,
x <- cforest (y~ a+b+c, data = football),
what does x[[9]] mean?
You can't subset this object, so in some sense, x[[9]] is nothing, it is not accessible as such.
x is an object of S4 class "RandomForest-class". This class is documented on help page ?'RandomForest-class'. The slots of this object are named and described there. You can also get the slot names via slotNames()
library("party")
foo <- cforest(ME ~ ., data = mammoexp, control = cforest_unbiased(ntree = 50))
> slotNames(foo)
[1] "ensemble" "where" "weights"
[4] "initweights" "data" "responses"
[7] "cond_distr_response" "predict_response" "prediction_weights"
[10] "get_where" "update"
If by x[[9]] you meant the 9th slot, then that is predict_weights and ?'RandomForest-class' tells us that this is
‘prediction_weights’: a function for extracting weights from
terminal nodes.

Extract formula from model in R

I'm building a function for many model types which needs to extract the formula used to make the model. Is there a flexible way to do this? For example:
x <- rnorm(10)
y <- rnorm(10)
z <- rnorm(10)
equation <- z ~ x + y
model <- lm(equation)
I what I need to do is extract the formula object "equation" once being passed the model.
You could get what you wanted by:
model$call
# lm(formula = formula)
And if you want to see what I did find out then use:
str(model)
Since you passed 'formula' (bad choice of names by the way) from the calling environment you might then need to extract from the object you passed:
eval(model$call[[2]])
# z ~ x + y
#JPMac offered a more compact method: formula(model). It's also worth looking at the mechanism used by the formula.lm function. The function named formula is generic and you use methods(formula) to see what S3 methods have been defined. Since the formula.lm method has an asterisk at its end, you need to wrap it in `getAnywhere:
> getAnywhere(formula.lm)
A single object matching ‘formula.lm’ was found
It was found in the following places
registered S3 method for formula from namespace stats
namespace:stats
with value
function (x, ...)
{
form <- x$formula
if (!is.null(form)) {
form <- formula(x$terms)
environment(form) <- environment(x$formula)
form
}
else formula(x$terms)
}
<bytecode: 0x36ff26158>
<environment: namespace:stats>
So it is using "$" to extract the list item named "formula" rather than pulling it from the call. If the $formula item is missing (which it is in your case) then It then replaces that with formula(x$terms) which I suspect is calling formula.default and looking at the operation of that function appears to only be adjusting the environment of the object.
As noted, model$call will get you the call that created the lm object, but if that call contains an object itself as the model formula, you get the object name, not the formula.
The evaluated object, ie the formula itself, can be accessed in model$terms (along with a bunch of auxiliary information on how it was treated). This should work regardless of the details of the call to lm.

Resources