slot assignment: `#` vs `slot()` vs `setReplaceMethod()`

slot assignment: `#` vs `slot()` vs `setReplaceMethod()` - r

I am writing my first R package and I am trying to figure out what the best way to assign values to a slot in an S4 object, keeping in mind that end-users shouldn't have to fuss with the details of the S4 class structures being used. Which of the following is best?
Accessing slot directly using object#MySlot <- value:
I understand that this bad practice (e.g., this Q&A).
Using slot(object, "MySlot") <- value:
The R help says there isn't checking when getting the values, but there is checking when setting (assuming check hasn't been set to FALSE). This sounds reasonable to me, and strikes me as a nice way to do it because I don't have to code my own get/set methods as per below.
Using custom methods with setReplaceMethod():
How does this approach compare against the second option above? It's more work to produce the necessary get/set methods, but I can more explicitly be sure that the values being written to the slots are valid for that slot type.
setGeneric("MySlot", function(object) {
standardGeneric("MySlot")
})
setMethod("MySlot",
signature = "MyClass",
definition = function(object) {
return(object#MySlot)
})
setGeneric("MySlot<-",
function(object, value) {
standardGeneric("MySlot<-")
})
setReplaceMethod("MySlot",
signature="MyClass",
function(object, value) {
object#MySlot<- value
validObject(object) # could add other checks
return(object)
})

By definition, "not having to fuss with the details of the S4 class structure" means end-users should not have to know about your slots. As such, any wrappers you write as in steps 2 and 3 are more for internal consistency checks. I think more important is to check for edge cases where you think your integrity checks would fail using unit tests.
As you have pointed out, #1 can be ruled out fairly easily, and should only be used in internal methods. Whether you encourage #2 or implement #3 depends on the contents of the variable and personal taste, but I would encourage the latter. For example, if you have a logical flag, you can use #2, but more descriptive would be enableFoo(). That being said, if you have to consider developer time (which in real life is almost always true) you should think briefly about the trade-off between relegating mutators to slot<- for members that probably won't be accessed frequently (e.g., by less than 1% of users), versus implementing custom mutators for everything as in #3.
Finally, since almost all three of R's OOP systems are essentially syntactic sugar and aren't really respected semantically by the language (where only S4 can claim some exception), it is easy to forget the fundamental ideas behind object-oriented programming as implemented in other languages: any code executing outside of your methods should not know about the object's members. You are providing an external interface to the world that is and should be a black box.

Related

Changing method dispatch in Common Lisp

I'm trying to simulate something akin to Haskell's typeclasses with Common Lisp's CLOS. That is, I'd like to be able to dispatch a method on an object's "typeclasses" instead of its superclasses.
I have a metaclass defined for classes which have and implement typeclasses(which are just other classes). Those classes(those that implement typeclasses) have a slot containing the list of the typeclasses they implement.
I'd like to be able to define methods for a typeclass, and then be able to dispatch that method on objects whose class implement that typeclass. And I'd like to be able to add and remove typeclasses dynamically.
I figure I could probably do this by changing the method dispatch algorithm, though that doesn't seem too simple.
Anybody is comfortable enough with CLOS and the MOP to give me some suggestions?
Thanks.
Edit: My question might be specified as, how do I implement compute-applicable-methods-using-classes and compute-applicable-methods for a "custom" generic-function class such that if some of the specializers of a generic function method are typeclasses(classes whose metaclass is the 'typeclass' class), then the corresponding argument's class must implement the typeclass(which simply means having the typeclass stored in a slot of the argument's class) for the method to be applicable?
From what I understand from documentation, when a generic function is called, compute-discriminating-functionis first called, which will first attempt to obtain applicable methods through compute-applicable-methods-using-classes, and if unsuccessful, will try the same with compute-applicable-methods.
While my definition of compute-applicable-methods-using-classes seems to work, the generic function fails to dispatch an applicable function. So the problem must be in compute-discriminating-function or compute-effective-method.
See code.

This is not easily achievable in Common Lisp.
In Common Lisp, operations (generic functions) are separate from types (classes), i.e. they're not "owned" by types. Their dispatch is done at runtime, with the possibility of adding, modifying and removing methods at runtime as well.
Usually, errors from missing methods are signaled only at runtime. The compiler has no way to know if a generic function is being "well" used or not.
The idiomatic way in Common Lisp is to use generic functions and describe its requirements, or in other words, the closest to an interface in Common Lisp is a set of generic functions and a marker mixin class. But most usually, only a protocol is specified, and its dependencies on other protocols. See, for instance, the CLIM specification.
As for type classes, it's a key feature that keeps the language not only fully type-safe, but also makes it very extensible in that aspect. Otherwise, either the type system would be too strict, or the lack of expressiveness would lead to type-unsafe situations, at least from the compiler's point of view. Note that Haskell doesn't keep, or doesn't have to keep, object types at runtime, it takes every type inference at compile-time, much in contrast with idiomatic Common Lisp.
To have something similar to type classes in Common Lisp at runtime, you have a few choices
Should you choose to support type classes with its rules, I suggest you use the meta-object protocol:
Define a new generic function meta-class (i.e. one which inherits from standard-generic-function)
Specialize compute-applicable-methods-using-classes to return false as a second value, because classes in Common Lisp are represented solely by their name, they're not "parameterizable" or "constrainable"
Specialize compute-applicable-methods to inspect the argument's meta-classes for types or rules, dispatch accordingly and possibly memoize results
Should you choose to only have parameterizable types (e.g. templates, generics), an existing option is the Lisp Interface Library, where you pass around an object that implements a particular strategy using a protocol. However, I see this mostly as an implementation of the strategy pattern, or an explicit inversion of control, rather than actual parameterizable types.
For actual parameterizable types, you could define abstract unparameterized classes from which you'd intern concrete instances with funny names, e.g. lib1:collection<lib2:object>, where collection is the abstract class defined in the lib1 package, and the lib2:object is actually part of the name as is for a concrete class.
The benefit of this last approach is that you could use these classes and names anywhere in CLOS.
The main disadvantage is that you must still generate concrete classes, so you'd probably have your own defmethod-like macro that would expand into code that uses a find-class-like function which knows how to do this. Thus breaking a significant part of the benefit I just mentioned, or otherwise you should follow the discipline of defining every concrete class in your own library before using them as specializers.
Another disadvantage is that without further non-trivial plumbing, this is too static, not really generic, as it doesn't take into account that e.g. lib1:collection<lib2:subobject> could be a subclass of lib1:collection<lib2:object> or vice-versa. Generically, it doesn't take into account what is known in computer science as covariance and contravariance.
But you could implement it: lib:collection<in out> could represent the abstract class with one contravariant argument and one covariant argument. The hard part would be generating and maintaining the relationships between concrete classes, if at all possible.
In general, a compile-time approach would be more appropriate at the Lisp implementation level. Such Lisp would most probably not be a Common Lisp. One thing you could do is to have a Lisp-like syntax for Haskell. The full meta-circle of it would be to make it totally type-safe at the macro-expansion level, e.g. generating compile-time type errors for macros themselves instead of only for the code they generate.
EDIT: After your question's edit, I must say that compute-applicable-methods-using-classes must return nil as a second value whenever there is a type class specializer in a method. You can call-next-method otherwise.
This is different than there being a type class specializer in an applicable method. Remember that CLOS doesn't know anything about type classes, so by returning something from c-a-m-u-c with a true second value, you're saying it's OK to memoize (cache) given the class alone.
You must really specialize compute-applicable-methods for proper type class dispatching. If there is opportunity for memoization (caching), you must do so yourself here.

I believe you'll need to override compute-applicable-methods and/or compute-applicable-methods-using-classes which compute the list of methods that will be needed to implement a generic function call. You'll then likely need to override compute-effective-method which combines that list and a few other things into a function which can be called at runtime to perform the method call.
I really recommend reading The Art of the Metaobject Protocol (as was already mentioned) which goes into great detail about this. To summarize, however, assume you have a method foo defined on some classes (the classes need not be related in any way). Evaluating the lisp code (foo obj) calls the function returned by compute-effective-method which examines the arguments in order to determine which methods to call, and then calls them. The purpose of compute-effective-method is to eliminate as much of the run-time cost of this as is possible, by compiling the type tests into a case statement or other conditional. The Lisp runtime thus does not have to query for the list of all methods each time you make a method call, but only when you add, remove or change a method implementation. Usually all of that is done once at load time and then saved into your lisp image for even better performance while still allowing you to change these things without stopping the system.

Localizing global variables

When using the Extended Program Check, I get the following warning:
Do not declare fields and field symbols (variable name) globally.
This is from declaring global data before the selection screen. The obvious solution is that they should be declared locally in a subroutine.
If I decide to do this, the data will now be out of scope for the other subroutines, so I would end up creating something to the effect of a main() function from C or Java. This sounds like a good idea - however, events such as INITIALIZATION are not allowed to be inside of subroutines, meaning that it forces a break in scope.
Observe the sample program below:
REPORT Z_EXAMPLE.
SELECTION-SCREEN BEGIN OF BLOCK upload WITH FRAME TITLE text-H01.
PARAMETERS: p_infile TYPE rlgrap-filename LOWER CASE OBLIGATORY.
SELECTION-SCREEN END OF BLOCK upload.
AT SELECTION-SCREEN ON VALUE-REQUEST FOR p_infile.
PERFORM main1 CHANGING p_infile.
INITIALIZATION.
PERFORM main2.
TOP-OF-PAGE.
PERFORM main3.
...
main1, main2, and main3 cannot to my knowledge pass any data to one another without global declaration. If the data is parsed from the uploaded file p_infile in main1, it cannot be accessed in main2 or main3. Aside from omitting events all together, is there any way to abide by the warning but let data be passed over events?

There are a variety of techniques - I prefer to code almost everything except for the basic selection screen handling in a separate controller class. The report simply defers to that class and calls its methods. Other than that - it's just a warning that you can ignore if you know what you're doing. Writing a program without any global variable at all will certainly not be practical - however, you should think at least twice before using global variables or attributes in a place where a method parameter would be more appropriate.

As #vwegert so rightly said, it's almost impossible to write an ABAP program that doesn't have at least a few global variables (the selection screen and events enforce that, unfortunately).
One approach is to use a controller class, another is to have a main subroutine and have it call other subroutines as required, passing values as required. I tend to favour the latter approach in a lot of cases, if only because it's easier to split the subroutines into logical groupings in separate includes (doing so with classes can sometimes be a little ugly). It really is a matter of approach though, but the key thing is reducing global variables to a minimum - unfortunately too few ABAP developers that I've encountered care about such issues.
Update
#Christian has reminded me that as of ABAP AS 7.02, subroutines are considered obsolete:
Subroutines should no longer be created in new programs for the following reasons:
The parameter interface has clear weaknesses when compared with the parameter interface of methods, such as:
positional parameters instead of keyword parameters
no genuine input parameters in pass by reference
typing is optional
no optional parameters
Every subroutine implicitly belongs to the public interface of its program. Generally this is not desirable.
Calling subroutines externally is critical with regard to the assignment of the container program to a program group in the internal session. This assignment cannot generally be defined as static.
Those are all valid points and I think in light of that, using classes for modularisation is definitely the preferred approach (and from a purely aesthetic point of view, they also "fit" better with the syntax enhancements in 7.02 and later).

Rationale behind Ada encapsulation of dynamically dispatching operations (primitives)

In Ada, Primitive operations of a type T can only be defined in the package where T is defined. For example, if a Vehicules package defines Car and Bike tagged record, both inheriting a common Vehicle abstract tagged type, then all operations than can dispatch on the class-wide Vehicle'Class type must be defined in this Vehicles package.
Let's say that you do not want to add primitive operations: you do not have the permission to edit the source file, or you do not want to clutter the package with unrelated features.
Then, you cannot define operations in other packages that implicitely dispatches on type Vehicle'Class.
For example, you may want to serialize vehicles (define a Vehicles_XML package with a To_Xml dispatching function) or display them as UI elements (define a Vehicles_GTK package with Get_Label, Get_Icon, ... dispatching functions), etc.
The only way to perform dynamic dispatch is to write the code explicitely; for example, inside Vechicle_XML:
if V in Car'Class then
return Car_XML (Car (V));
else
if V in Bike'Class then
return Bike_XML (Bike (V));
else
raise Constraint_Error
with "Vehicle_XML is only defined for Car and Bike."
end if;
(And a Visitor pattern defined in Vehicles and used elsewhere would work, of course, but that still requires the same kind of explicit dispatching code. edit in fact, no, but there is still some boilerplate code to write)
My question is then:
is there a reason why operations dynamically dispatching on T are restricted to be defined in the defining package of T?
Is this intentional? Is there some historical reasons behind this?
Thanks
EDIT:
Thanks for the current answers: basically, it seems that it is a matter of language implementation (freezing rules/virtual tables).
I agree that compilers are developped incrementally over time and that not all features fit nicely in an existing tool.
As such, isolating dispatching operators in a unique package seems to be a decision mostly guided by existing implementations than by language design. Other languages outside of the C++/Java family provide dynamic dispatch without such requirement (e.g. OCaml, Lisp (CLOS); if that matters, those are also compiled languages, or more precisely, language for which compilers exist).
When I asked this question, I wanted to know if there were more fundamental reasons, at language specification level, behind this part of Ada specifications (otherwise, does it really mean that the specification assumes/enforces a particular implementation of dynamic disapatch?)
Ideally, I am looking for an authoritative source, like a rationale or guideline section in Reference Manuals, or any kind of archived discussion about this specific part of the language.

I can think of several reasons:
(1) Your example has Car and Bike defined in the same package, both derived from Vehicles. However, that's not the "normal" use case, in my experience; it's more common to define each derived type in its own package. (Which I think is close to how "classes" are used in other compiled languages.) And note also that it's not uncommon to define new derived types afterwards. That's one of the whole points of object-oriented programming, to facilitate reuse; and it's a good thing if, when designing a new feature, you can find some existing type that you can derive from, and reuse its features.
So suppose you have your Vehicles package that defines Vehicle, Car, and Bike. Now in some other package V2, you want to define a new dispatching operation on a Vehicle. For this to work, you have to provide the overriding operations for Car and Bike, with their bodies; and assuming you are not allowed to modify Vehicles, then the language designers have to decide where the bodies of the new operation have to be. Presumably, you'd have to write them in V2. (One consequence is that the body that you write in V2 would not have access to the private part of Vehicles, and therefore it couldn't access implementation details of Car or Bike; so you could only write the body of that operation if terms of already-defined operations.) So then the question is: does V2 need to provide operations for all types that are derived from Vehicle? What about types derived from Vehicle that don't become part of the final program (maybe they're derived to be used in someone else's project)? What about types derived from Vehicle that haven't yet been defined (see preceding paragraph)? In theory, I suppose this could be made to work by checking everything at link time. However, that would be a major paradigm change for the language. It's not something that could be easily. (It's pretty common, by the way, for programmers to think "it would be nice to add feature X to a language, and it shouldn't be too hard because X is simple to talk about", without realizing just what a vast impact such a "simple" feature would have.)
(2) A practical reason has to do with how dispatching is implemented. Typically, it's done with a vector of procedure/function pointers. (I don't know for sure what the exact implementation is in all cases, but I think this is basically the case for every Ada compiler as well as for C++ and Java compilers, and probably C#.) What this means is that when you define a tagged type (or a class, in other languages), the compiler will set up a vector of pointers, and based on how many operations are defined for the type, say N, it will reserve slots 1..N in the vector for the addresses of the subprograms. If a type is derived from that type and defines overriding subprograms, the derived type gets its own vector, where slots 1..N will be pointers to the actual overriding subprograms. Then, when calling a dispatching subprogram, a program can look up the address in some known slot index assigned to that subprogram, and it will jump to the correct address depending on the object's actual type. If a derived type defines new primitive subprograms, new slots are assigned N+1..N2, and types derived from that could define new subprograms that get slots N2+1..N3, and so on.
Adding new dispatching subprograms to Vehicle would interfere with this. Since new types have been derived from Vehicle, you can't insert a new area into the vector after N, because code has already been generated that assumes the slots starting at N+1 have been assigned to new operations derived for derived types. And since we may not know all the types that have been derived from Vehicle and we don't know what other types will be derived from Vehicle in the future and how many new operations will be defined for them, it's hard to pick some other location in the vector that could be used for the new operations. Again, this could be done if all of the slot assignment were deferred until link time, but that would be a major paradigm change, again.
To be honest, I can think of other ways to make this work, by adding new operations not in the "main" dispatch vector but in an auxiliary one; dispatching would probably require a search for the correct vector (perhaps using an ID assigned to the package that defines the new operations). Also, adding interface types to Ada 2005 has already complicated the simple vector implementation somewhat. But I do think this (i.e. it doesn't fit into the model) is one reason why the ability to add new dispatching operations like you suggest isn't present in Ada (or in any other compiled language that I know of).

Without having checked the rationale for Ada 95 (where tagged types were introduced), I am pretty sure the freezing rules for tagged types are derived from the simple requirement that all objects in T'Class should have all the dispatching operations of type T.
To fulfill that requirement, you have to freeze type and say that no more dispatching operations can be added to type T once you:
Derive a type from T, or
Are at the end of the package specification where T was declared.
If you didn't do that, you could have a type derived from type T (i.e. in T'Class), which hadn't inherited all the dispatching operations of type T. If you passed an object of that type as a T'Class parameter to a subprogram, which knew of one more dispatching operation on type T, a call to that operation would have to fail. - We wouldn't want that to happen.

Answering your extended question:
Ada comes with both a Reference Manual (the ISO standard), a Rationale and an Annotated Reference Manual. And a large part of the discussions behind these documents are public as well.
For Ada 2012 see http://www.adaic.org/ada-resources/standards/ada12/
Tagged types (dynamic dispatching) was introduced in Ada 95. The documents related to that version of the standard can be found at http://www.adaic.org/ada-resources/standards/ada-95-documents/

Using generic functions of R, when and why?

I'm developing an major upgrade to the R package, and as part of the changes I want to start using the S3 methods so I can use the generic plot, summary and print functions. But I think I'm not totally sure I understand why and when to use generic functions in general.
For example, I currently have a function called logLikSSM, which computes the log-likelihood of a state space model. Instead of using this functions, I could make function logLik.SSM or something like that, as there is generic function logLik in R. The benefit of this would be that logLik is shorter to write than logLikSSM, but is there really any other point in this?
Similar case, there is a generic function called simulate in stats package, so in theory I could use that instead of simulateSSM. But now the description of the simulate function tells that function is used to "Simulate Responses", but my function actually simulates the hidden states, so it really doesn't fit into the description of the simulate function. So probably in this case I shouldn't use the generic function right?
I apologize if this question is too vague for here.

The advantages of creating methods for generics from the core of R include:
Ease of Use. Users of your package already familiar with those generics will have less to remember making it easier to use your package. They might even be able to do a certain amount without reading the documentation. If you come up with your own names then they must discover and remember new names which is an added cognitive burden.
Leverage Existing Functionality. Also any other functions that make use of generics you create methods for can then automatically use yours as well; otherwise, they would have to be changed. For example, AIC uses logLik.
A disadvantage is that the generic involves the extra level of dispatch and if logLik is in the inner loop of an optimization there could be an impact (although possibly not material). In that case you could check the performance of calling the generic vs. calling the method directly and use the latter if it makes a significant difference.
Regarding the case that your function has a completely different purpose than the generic in the core of R, then it might be more confusing than helpful so you might, in that case, not create a method but have your own function name.
You might want to read the zoo Design manual (see link to zoo Design under Vignettes near the bottom of that page) which discusses the design ideas that went into the zoo package. These include the idea being discussed here.
EDIT: Added disadvantates.

good question.
I'll split your Question into two parts; here's the first one:
i]s there really any other point in [making functions generic]?
Well, this pattern is usually invoked when the develper doesn't know the object class for every object he/she expects a user to pass in to the method under consideration.
And because of this uncertainty, this design pattern (which is called overloading in many other languages) is invokved, and which requires R to evaluate the object class, then dispatch that object to the appropriate method given the object type.
The second part of your Question: [i]n this case I shouldn't use [the generic function] right?
To try to give you an answer useful beyond the detail of your Question, consider what happens to the original method when you call setGeneric, passing that method in.
the original function body is replaced with code for performing a top-level dispatch based on type of object passed in. This replaces the original function body, which just slides down one level so that it becomes the default method that the top level (generic) function dispatches to.
showMethods() will let you see all of those methods which are called by the newly created dispatch function (generic function).

And now for one huge disadvantage:
Ease of MISUse:
Users of your package already familiar with those generics might do a certain amount without reading the documentation.
And therein lies the fallacy that components, reusable objects, services, etc are an easy panacea for all software challenges.
And why the overwhelming majority of software is buggy, bloated, and operates inconsistently with little hope of tech support being able to diagnose your problem.
There WAS a reason for static linking and small executables back in the day. But this generation of code now, get paid now, debug later if ever, before the layoffs/IPO come, has no memory of the days when code actually worked very reliably and installation/integration didn't require 200$/hr Big 4 consultants or hackers who spend a week trying to get some "simple" open source product installed and productively running.
But if you want to continue the tradition of writing ever shorter function/method names, be my guest.

What does "S3 methods" mean in R?

Since I am fairly new to R, I do not know what the S3 methods and objects are. I found that there are S3 and S4 object systems, and some recommend to use S3 over S4 if possible (See Google's R Style Guide at http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html)*. However, I do not know the exact definition of S3 methods/objects.
Update: As of 2019, Google's R Style Guide hyperlink is now here.

Most of the relevant information can be found by looking at ?S3 or ?UseMethod, but in a nutshell:
S3 refers to a scheme of method dispatching. If you've used R for a while, you'll notice that there are print, predict and summary methods for a lot of different kinds of objects.
In S3, this works by:
setting the class of objects of
interest (e.g.: the return value of a
call to method glm has class glm)
providing a method with the general
name (e.g. print), then a dot, and
then the classname (e.g.:
print.glm)
some preparation has to have been
done to this general name (print)
for this to work, but if you're
simply looking to conform yourself to
existing method names, you don't need
this (see the help I refered to
earlier if you do).
To the eye of the beholder, and particularly, the user of your newly created funky model fitting package, it is much more convenient to be able to type predict(myfit, type="class") than predict.mykindoffit(myfit, type="class").
There is quite a bit more to it, but this should get you started. There are quite a few disadvantages to this way of dispatching methods based upon an attribute (class) of objects (and C purists probably lie awake at night in horror of it), but for a lot of situations, it works decently. With the current version of R, newer ways have been implemented (S4 and reference classes), but most people still (only) use S3.

To get you started with S3, look at the code for the median function. Typing median at the command prompt reveals that it has one line in its body, namely
UseMethod("median")
That means that it is an S3 method. In other words, you can have a different median function for different S3 classes. To list all the possible median methods, type
methods(median) #actually not that interesting.
In this case, there's only one method, the default, which is called for anything. You can see the code for that by typing
median.default
A much more interesting example is the print function, which has many different methods.
methods(print) #very exciting
Notice that some of the methods have *s next to their name. That means that they are hidden inside some package's namespace. Use find to find out which package they are in. For example
find("acf") #it's in the stats package
stats:::print.acf

From http://adv-r.had.co.nz/OO-essentials.html:
R’s three OO systems differ in how classes and methods are defined:
S3 implements a style of OO programming called generic-function OO.
This is different from most programming languages, like Java, C++ and
C#, which implement message-passing OO. With message-passing, messages
(methods) are sent to objects and the object determines which function
to call. Typically, this object has a special appearance in the method
call, usually appearing before the name of the method/message: e.g.
canvas.drawRect("blue"). S3 is different. While computations are still
carried out via methods, a special type of function called a generic
function decides which method to call, e.g., drawRect(canvas, "blue").
S3 is a very casual system. It has no formal definition of classes.
S4 works similarly to S3, but is more formal. There are two major
differences to S3. S4 has formal class definitions, which describe the
representation and inheritance for each class, and has special helper
functions for defining generics and methods. S4 also has multiple
dispatch, which means that generic functions can pick methods based on
the class of any number of arguments, not just one.
Reference classes, called RC for short, are quite different from S3
and S4. RC implements message-passing OO, so methods belong to
classes, not functions. $ is used to separate objects and methods, so
method calls look like canvas$drawRect("blue"). RC objects are also
mutable: they don’t use R’s usual copy-on-modify semantics, but are
modified in place. This makes them harder to reason about, but allows
them to solve problems that are difficult to solve with S3 or S4.
There’s also one other system that’s not quite OO, but it’s important
to mention here:
base types, the internal C-level types that underlie the other OO
systems. Base types are mostly manipulated using C code, but they’re
important to know about because they provide the building blocks for
the other OO systems.

I came to this question mostly wondering where the names came from. It appears from this wikipedia article that the name refers to the version of the S Programming Language that R is based on. The method dispatching schemes described in the other answers come from S and are labelled appropriately according to version.

Try
methods(residuals)
which lists, among others, "residuals.lm" and "residuals.glm". This means when you have fitted a linear model, m, and type residuals(m), residuals.lm will be called. When you have fitted a generalized linear model, residuals.glm will be called.
It's kind of the C++ object model turned upside down. In C++, you define a base class having virtual functions, which are overrided by derived classed.
In R you define a virtual (aka generic) function and then you decide which classes will override this function (aka define a method). Note that the classes doing this do not need to be derived from one common super class.
I would not agree to generally prefer S3 over S4. S4 has more formalism (= more typing) and this may be too much for some applications. S4 classes, however, can be de defined like a class or struct in C++. You can specify that an object of a certain class is made up of a string and two numbers for example:
setClass("myClass", representation(label = "character", x = "numeric", y = "numeric"))
Methods that are called with an object of that class can rely on the object having those members. That's very different from S3 classes, which are just a list of a bunch of elements.
With S3 and S4, you call a member function by fun(object, args) and not by object$fun(args). If you are looking for something like the latter, have a look at the proto package.

Here is an updated fast rundown of the numerous R object systems according to "Advanced R, 2nd edition" (CRC Press, 2019) by Hadley Wickham (Chief Scientist at RStudio), which has a web representation here, based on the chapter about Object-Oriented Programming.
The first edition from 2015 has a web representation here, with the corresponding chapter on OO here.
Approaches to OO systems
Hadley defines the following to distinguish two distinct approaches to OO programming:
Functional OOP: methods (callable code pieces) belong to generic functions (not to be confused with Java/C# generic methods). Think of the methods as being located in a global lookup table. The method to execute is found by the runtime system based on the name of the function and the type (or object class) of one or more arguments passed to that function (this is called "method dispatch"). Syntax-wise, method calls may look like ordinary function calls: myfunc(object, arg1, arg2). This call would lead the runtime to look for the method associated to the pair ("myfunc", typeof(object)) or possibly ("myfunc", typeof(object), typeof(arg1), typeof(arg2)) if the language supports that. In R's S3, the full name of the generic function gives the (function-name, class) pair. For example: mean.Date is the method to compute the mean of Dates. Try methods("mean") to list the generic methods with function name mean. The Functional OOP approach is found for example in the OO pioneer Smalltalk, the Common Lisp Object System and Julia. Hadley notes that "Compared to R, Julia’s implementation is fully developed and extremely performant."
Encapsulated OOP: methods belong to objects or classes, and method calls typically look like object.method(arg1, arg2). This is called encapsulated because the object encapsulates both data (fields) and behaviour (methods). Think of the method as being located in a lookup table attached to the object or the object's class description. The runtime looks the method up based on method name and possibly the type of one or more arguments. This is the approach found in "popular" OO languages like C++, Java, C#.
In both cases, if inheritance is supported (it probably is), the runtime may traverse the class hierarchy upwards until it has found a match for the call lookup key.
How to find out what system an R object belongs to
library(sloop) # formerly, "pryr"
otype(mtcars)
#> [1] "S3"
The R object systems
S3
Functional OOP approach.
Most important system according to Hadley.
Simplest, most common. First OO system used by R.
Comes with base R, used throughout base R.
Relies on conventions rather than enforced guarantees.
See Chambers, John M, and Trevor J Hastie. 1992. "Statistical Models in S." Wadsworth & Brooks/Cole Advanced Books & Software.
Details in "Advanced R, 2nd edition" here.
S4
Functional OOP approach.
Third most important system according to Hadley.
Rewrite of S3, therefore similar to S3, but more formal and more strict: it forces you to think carefully about program design. Suited for building large systems (e.g. the Bioconductor project).
Implemented in the base "methods" package.
See: Chambers, John M. 1998. "Programming with Data: A Guide to the S Language." Springer.
Details in "Advanced R, 2nd edition" here.
RC aka "Reference Classes"
Encapsulated OOP approach.
Comes with base R.
Based on S4.
RC objects are special type of S4 objects that are also "mutable". i.e. instead of using R's usual copy-on-modify semantics, they can be modified in-place. Note that mutable state is hard to reason about and a source of ugly bugs but can lead to more efficient code in certain applications.
R6
Encapsulated OOP approach.
Second most important system according to Hadley.
Can be found in the R6 package (install with library(R6))
Similar to RC, but lighter & much faster: it does not depend on S4 or the methods package. Built on top of R environments. Also has:
public and private methods
active bindings (fields, that, when accessed, actually call a method)
class inhertance which works across packages
both class methods (code that belongs to class and can access an instance via self, private, super) and member functions (functions assigned to fields, but which are not methods, just functions)
Provides a standardised way to escape R's "copy-on-modify" semantics
See the package site: "R6: Encapsulated object-oriented programming for R".
Details in "Advanced R, 2nd edition" here.
Others
There are others, like R.oo (similar to RC), proto (prototype-based, think JavaScript) and Mutatr. However, "Advanced R" says:
Apart from R6, which is widely used, these systems are primarily of
theoretical interest. They do have their strengths, but few R users
know and understand them, so it is hard for others to read and
contribute to your code.
Be sure to read the chapter on trade-offs in "Advanced R, 2nd edition", too.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

slot assignment: `#` vs `slot()` vs `setReplaceMethod()` - r

Related

Changing method dispatch in Common Lisp

Localizing global variables

Rationale behind Ada encapsulation of dynamically dispatching operations (primitives)

Using generic functions of R, when and why?

What does "S3 methods" mean in R?

Categories

Resources