Can we combine S3 flexibility with S4 representation checking? - r

I'm looking for a method to validate S3 objects in my package Momocs.
Earlier versions of the package were written using S4, then I shifted back to S3 for the sake of flexibility, because users were more into S3, because I do not really need multiple inheritance, etc.. The main cost of this change was actually losing S4 representation / validity checking.
My problem follows: how can we prevent one from inadvertently "unvalidate" an S3 object, for instance trying to extend existing methods or manipulating object structure?
I have already written some validate function but, so far, I only validate before crucial steps, typically those turning an object from a class into another.
My question is:
do I want to have my cake and eat it (S3 flexibility and S4 representation checking) ? In that case, I would need to add my validate function across all the methods of my package?
or is there a smarter way on top of S3, something like "any time we do something on an object of a particular class, call a validate function on it"?

The easiest thing would be to write a validation function for each class and pass objects through it before S3 method dispatch or within each class's method. Here's an example with a simple validation function called check_example_class for an object of class "example_class":
check_example_class <- function(x) {
stopifnot(length(x) == 2)
stopifnot("a" %in% names(x))
stopifnot("b" %in% names(x))
stopifnot(is.numeric(x$a))
stopifnot(is.character(x$b))
NULL
}
print.example_class <- function(x, ...) {
check_example_class(x)
cat("Example class object where b =", x$b, "\n")
invisible(x)
}
# an object of the class
good <- structure(list(a = 1, b = "foo"), class = "example_class")
# an object that pretends to be of the class
bad <- structure(1, class = "example_class")
print(good) # works
## Example class object where b = foo
print(bad) # fails
## Error: length(x) == 2 is not TRUE

Related

R: how to define class serializers?

I'm looking for the R equivalent of Python's __reduce__ for serialization and de-serialization of S3 classes - i.e. some method of manually specifying how to serialize and de-serialize objects which belong to a certain class.
Simple example:
Object creator:
make_obj <- function(a = 1) {
obj <- list(a = a, b = a + 1)
class(obj) <- "myClass"
return(obj)
}
Serializer and de-serializer:
serializer <- function(obj) return(as.character(obj$a))
deserializer <- function(s) {
a <- as.numeric(s)
return(make_obj(a))
}
I see R has functions like saveRDS and readRDS which accept an argument refhook for customized serialization, and can be used with those two functions as intended:
myObj <- make_obj(10)
saveRDS(myObj, "myObj.Rds", refhook = serializer)
newObj <- readRDS("myObj.Rds", refhook = deserializer)
But I'm looking for some way of making this automatic based on the object's class, so that (a) it would work with RStudio's save and restore session when those objects are in the environment, and so that (b) someone could just load a package and then use the internal R serialization functions without extra hassle.
I though of defining a custom saveRDS.myClass and registering it as an S3 method - e.g.:
saveRDS.myClass <- function(obj, ...) {
s <- serializer(obj)
saveRDS(s, ...)
}
But this wouldn't work with RStudio's save session, and when calling readRDS it will not know that it should use the custom de-serialization function once it loads this object.
Is there any way of making these serialization and de-serialization functions be attached to an S3 class, so to say?

S4 class constructor and validation

I present a short code to create a S4 class myclass and ensure that objects are created if they verify a condition given by a parameter param
setClass("myclass", slot = c(x = "numeric"))
#constructor
ValidmyClass<- function(object, param = 1)
{
if(object#x == param) return(TRUE)
else return("problem")
}
setValidity("myclass", ValidmyClass)
setMethod("initialize","myclass", function(.Object,...){
.Object <- callNextMethod()
validObject(.Object,...)
.Object
})
For which I get the following error message Error in substituteFunctionArgs(validity, "object", functionName = sprintf("validity method for class '%s'", :
trying to change the argument list of for validity method for class 'myclass' with 2 arguments to have arguments (object)
I understand the issue with the arguments but I cannot find a way to solve this. The document about setValidity mentions that the argument method should be "validity method; that is, either NULL or a function of one argument (object)". Hence from my understanding excluding more than one argument.
Nevertheless, the idea behind this example is that I want to be able to test the construction of a myclass object based on the value of an external given parameter. If more conditions were to be added, I would like enough flexibility so only the function ValidmyClass needs to be updated, without necessarily adding more slots.
The validity function has to have one argument named object. When I need to create one argument functions but really have more arguments or data to pass in I often fall back to using closures. Here the implementation of your ValidmyClass changes in that it will now return the actual validity function. The argument of the enclosing function is then the set of additional arguments you are interested in.
setClass("myclass", slot = c(x = "numeric"))
#constructor
ValidmyClass <- function(param) {
force(param)
function(object) {
if (object#x == param) TRUE
else "problem"
}
}
setValidity("myclass", ValidmyClass(1))
Also the validity function is called automatically on init; however not when the slot x is changed after the object is created.
setMethod("initialize", "myclass", function(.Object,...) {
.Object <- callNextMethod()
.Object
})
new("myclass", x = 2)
new("myclass", x = 1)
For more information on closures see adv-R. Although I think this answers your question, I do not see how this implementation is actually helpful. When you define your class, you basically also fix the additional parameters which the validity function knows about. If you have several classes for which you can abstract the validity function then I would use the closure. If you have one class with changing parameters at runtime, I would consider to add a slot to the class. If you do not want to alter the class definition you can add a slot of class list where you the can pass in an arbitrary number of values to test against.

Using a method/function within a reference class method of the same name

When defining a new reference class in R there is a bunch of boiler-plate methods that are expected (by R conventions), such as length, show etc. When these are defined they aggressively masks similar named methods/functions when called from within the class' methods. As you can not necessarily know the namespace of the foreign function it is not possible to use the package:: specifier.
Is there a way to tell a method to ignore its own methods unless called specifically using .self$?
Example:
tC <- setRefClass(
'testClass',
fields = list(data='list'),
methods = list(
length=function() {
length(data)
}
)
)
example <- tC(data=list(a=1, b=2, c=3))
example$length() # Will cause error as length is defined without arguments
Alternatively one could resort to defining S4 methods for the class instead (as reference classes are S4 classes under the hood), but this seems to be working against the reference class idea...
Edit:
To avoid focusing on instances where you know the class of the data in advance consider this example:
tC <- setRefClass(
'testClass',
fields = list(data='list'),
methods = list(
length=function() {
length(data)
},
combineLengths = function(otherObject) {
.self.length() + length(otherObject)
}
)
)
example <- tC(data=list(a=1, b=2, c=3))
example$combineLength(rep(1, 3)) # Will cause error as length is defined without arguments
I am aware that it is possible to write your own dispatching to the correct method/function, but this seems as such a common situation that I thought it might have already been solved within the methods package (sort of the reverse of usingMethods())
My question is thus, and I apologise if this wasn't clear before: Are there ways of ignoring there reference class methods and fields within the method definitions and solely rely on .self for accessing these, so that methods/functions defined outside the class are not masked?
The example is not very clear. I don't know for what reason you can't know the namespace of your method. Whatever, here a couple of methods to work around this problem:
You can use a different name for the reference class method Length with Capital "L" for example
You can find dynamically the namespace of the generic function.
For example:
methods = list(
.show =function(data) {
ns = sub(".*:","",getAnywhere("show")$where[1])
func = get("show",envir = getNamespace(ns))
func(data)
},
show=function() {
.show(data)
}
)
You can use the new reference class System R6.
For example:
tC6 <- R6Class('testClass',
public = list(
data=NA,
initialize = function(data) {
if (!missing(data)) self$data <- data
},
show=function() show(self$data)
)
)

S3 style dispatching for S3 objects using formal method definitions

Related to this question, but slightly different and hopefully more clear.
I am looking for a clean way to formally register methods for both S4 and S3 classes, but without relying on the terrible S3-dot-naming-scheme for dispatching. An example:
setClass("foo");
setClass("bar");
setGeneric("test", function(x, ...){
standardGeneric("test");
});
setMethod("test", "bar", function(x, ...){
return("success (bar).");
});
obj1 <- 123;
class(obj1) <- "bar";
test(obj1);
This example shows how we can register a test method for S3 objects of class bar, without the need to name the function test.bar, which is great. However, the limitation is if we register methods this way, they will only be dispatched to the first S3 class of the object. E.g:
obj2 <- 123;
class(obj2) <- c("foo", "bar");
test(obj2);
This doesn't work, because S4 method dispatching will only try class foo and its superclasses. How could this example be extended so that it will automatically select the test method for bar when no appropriate method for foo was found? E.g. S3 style dispatching but without having to go back to naming everything test.foo and test.bar?
So in summary: how to create a generic function that uses formal method dispatching, but in addition fall back on the second, third, etc class of an object for S3 objects with multiple classes.
?setOldClass will give the answer:
setOldClass(c("foo", "bar"))
setGeneric("test", function(x, ...)standardGeneric("test"))
setMethod("test", "bar", function(x, ...)return("success (bar)."))
You could write a method
test = function(x, ...) UseMethod("test")
setGeneric("test")
.redispatch = function(x, ...)
{
if (is.object(x) && !isS4(x) && length(class(x)) != 1L) {
class(x) = class(x)[-1]
callGeneric(x, ...)
} else callNextMethod(x, ...)
}
setMethod(test, "ANY", .redispatch)
But I personally wouldn't mix S3 and S4 in this way.

Forcing specific data types as arguments to a function

I was just wondering if there was a way to force a function to only accept certain data types, without having to check for it within the function; or, is this not possible because R's type-checking is done at runtime (as opposed to those programming languages, such as Java, where type-checking is done during compilation)?
For example, in Java, you have to specify a data type:
class t2 {
public int addone (int n) {
return n+1;
}
}
In R, a similar function might be
addone <- function(n)
{
return(n+1)
}
but if a vector is supplied, a vector will (obviously) be returned. If you only want a single integer to be accepted, then is the only way to do to have a condition within the function, along the lines of
addone <- function(n)
{
if(is.vector(n) && length(n)==1)
{
return(n+1)
} else
{
return ("You must enter a single integer")
}
}
Thanks,
Chris
This is entirely possible using S3 classes. Your example is somewhat contrived in the context or R, since I can't think of a practical reason why one would want to create a class of a single value. Nonetheless, this is possible. As an added bonus, I demonstrate how the function addone can be used to add the value of one to numeric vectors (trivial) and character vectors (so A turns to B, etc.):
Start by creating a generic S3 method for addone, utlising the S3 despatch mechanism UseMethod:
addone <- function(x){
UseMethod("addone", x)
}
Next, create the contrived class single, defined as the first element of whatever is passed to it:
as.single <- function(x){
ret <- unlist(x)[1]
class(ret) <- "single"
ret
}
Now create methods to handle the various classes. The default method will be called unless a specific class is defined:
addone.default <- function(x) x + 1
addone.character <- function(x)rawToChar(as.raw(as.numeric(charToRaw(x))+1))
addone.single <- function(x)x + 1
Finally, test it with some sample data:
addone(1:5)
[1] 2 3 4 5 6
addone(as.single(1:5))
[1] 2
attr(,"class")
[1] "single"
addone("abc")
[1] "bcd"
Some additional information:
Hadley's devtools wiki is a valuable source of information on all things, including the S3 object system.
The S3 method doesn't provide strict typing. It can quite easily be abused. For stricter object orientation, have a look at S4 classes, reference based classesor the proto package for Prototype object-based programming.
You could write a wrapper like the following:
check.types = function(classes, func) {
n = as.name
params = formals(func)
param.names = lapply(names(params), n)
handler = function() { }
formals(handler) = params
checks = lapply(seq_along(param.names), function(I) {
as.call(list(n('assert.class'), param.names[[I]], classes[[I]]))
})
body(handler) = as.call(c(
list(n('{')),
checks,
list(as.call(list(n('<-'), n('.func'), func))),
list(as.call(c(list(n('.func')), lapply(param.names, as.name))))
))
handler
}
assert.class = function(x, cls) {
stopifnot(cls %in% class(x))
}
And use it like
f = check.types(c('numeric', 'numeric'), function(x, y) {
x + y
})
> f(1, 2)
[1] 3
> f("1", "2")
Error: cls %in% class(x) is not TRUE
Made somewhat inconvenient by R not having decorators. This is kind of hacky
and it suffers from some serious problems:
You lose lazy evaluation, because you must evaluate an argument to determine
its type.
You still can't check the types until call time; real static type checking
lets you check the types even of a call that never actually happens.
Since R uses lazy evaluation, (2) might make type checking not very useful,
because the call might not actually occur until very late, or never.
The answer to (2) would be to add static type information. You could probably
do this by transforming expressions, but I don't think you want to go there.
I've found stopifnot() to be highly useful for these situations as well.
x <- function(n) {
stopifnot(is.vector(n) && length(n)==1)
print(n)
}
The reason it is so useful is because it provides a pretty clear error message to the user if the condition is false.

Resources