R: how to define class serializers? - r

I'm looking for the R equivalent of Python's __reduce__ for serialization and de-serialization of S3 classes - i.e. some method of manually specifying how to serialize and de-serialize objects which belong to a certain class.
Simple example:
Object creator:
make_obj <- function(a = 1) {
obj <- list(a = a, b = a + 1)
class(obj) <- "myClass"
return(obj)
}
Serializer and de-serializer:
serializer <- function(obj) return(as.character(obj$a))
deserializer <- function(s) {
a <- as.numeric(s)
return(make_obj(a))
}
I see R has functions like saveRDS and readRDS which accept an argument refhook for customized serialization, and can be used with those two functions as intended:
myObj <- make_obj(10)
saveRDS(myObj, "myObj.Rds", refhook = serializer)
newObj <- readRDS("myObj.Rds", refhook = deserializer)
But I'm looking for some way of making this automatic based on the object's class, so that (a) it would work with RStudio's save and restore session when those objects are in the environment, and so that (b) someone could just load a package and then use the internal R serialization functions without extra hassle.
I though of defining a custom saveRDS.myClass and registering it as an S3 method - e.g.:
saveRDS.myClass <- function(obj, ...) {
s <- serializer(obj)
saveRDS(s, ...)
}
But this wouldn't work with RStudio's save session, and when calling readRDS it will not know that it should use the custom de-serialization function once it loads this object.
Is there any way of making these serialization and de-serialization functions be attached to an S3 class, so to say?

Related

R Why do I have to assign a formal argument variable to itself in order for this function to work?

I have developed the following two functions:
save_sysdata <- function(...) {
data <- eval(substitute(alist(...)))
data <- purrr::map_chr(data, add_dot)
save(list = data, file = "sysdata.rda", compress = "bzip2", version = 2)
}
add_dot <- function(object) {
object <- object # Why is this required?
name <- paste0(".", deparse(substitute(object)))
# parent.frame(3) because evaluating in global (or caller function); 2 because assigning in save_sysdata.
assign(name, eval(object, envir = parent.frame(3)), envir = parent.frame(2))
return(name)
}
The purpose of this set of functions is to provide an object (x) and save it as a sysdata.rda file but as a hidden object. This requires adding a . to the object symbol (.x).
The set of functions as I have it works and accomplishes what I want. However, it requires a bit of code that I don't understand why it works or what it's doing. I'm not even sure how I came up with this particular line as a solution.
If I remove the line object <- object from the add_dot function, the whole thing fails to work. It actually just generates an empty sysdata.rda file.
Can anyone explain why this line is necessary and what it is doing?
And if you have a more efficient way of accomplishing this, please let me know. It was a fun exercise to figure this out myself but I'm sure there is a better way.
For a reprex, simply copy the above functions and run:
x <- "test"
save_sysdata(x)
Then load the sysdata.rda file into your global environment and type .x. You should return [1] "test".
Here's an alternative version
save_sysdata <- function(...) {
pnames <- sapply(match.call(expand.dots=FALSE)$..., deparse)
snames <- paste0(".", pnames)
senv <- setNames(list(...), snames)
save(list = snames, envir=list2env(senv), file = "sysdata.rda", compress = "bzip2", version = 2)
}
We dump the values into a named list and granbing the names of the parameter with match.call(). We add dots to the names and then turn that list into an environment that we can use with save.
The reason your version required object <- object is that function parameters are lazily evaluated. Since you never actually use the value of that object in your function without the assignment, it remains a promise and is never added tot he function environment. Sometimes you'll see force(object) instead which does the same thing.

Serialize closure without environment

I am serializing objects with the serialize function.
For example
serialize_object <- serialize(some_object, NULL)
Now I have an issue with closures. For example:
closure <- function(){
member <- NULL
list(init=function(val){member <<- val})
}
closure_serialized <- serialize(closure(), NULL)
This raw object closure_serialized is huge: some 200MB. I am quite sure that also the environment in which it is made is serialized. But I don't need its environment. I only need the closure and its contents.
Am I doing something wrong? Am I initializing or defining the closure in a wrong way? How can I make it only to serialize the closure and not the rest of the environment? Serializing closures from some packages do not have this effect, and I can not find the culprit.
This is mainly because the definition of the closure is within a function.
fn <- function(){
# make big variables
closure <- function(){
member <- NULL
list(init=function(val){member <<- val})
}
closure_serialized <- serialize(closure(), NULL)
}
# serialize will copy the environment within the function in closure_serialized
fn()
The serialize function will in that case copy the environment also. A "workaround" is to place the definition of the closure in the global environment.
closure <- function(){
member <- NULL
list(init=function(val){member <<- val})
}
fn <- function(){
# make big variables
closure_serialized <- serialize(closure(), NULL)
}
# serialize will not copy the global environment.
fn()
The serialize doesn't copy .GlobalEnv environment. See also here for a related topic.

Can we combine S3 flexibility with S4 representation checking?

I'm looking for a method to validate S3 objects in my package Momocs.
Earlier versions of the package were written using S4, then I shifted back to S3 for the sake of flexibility, because users were more into S3, because I do not really need multiple inheritance, etc.. The main cost of this change was actually losing S4 representation / validity checking.
My problem follows: how can we prevent one from inadvertently "unvalidate" an S3 object, for instance trying to extend existing methods or manipulating object structure?
I have already written some validate function but, so far, I only validate before crucial steps, typically those turning an object from a class into another.
My question is:
do I want to have my cake and eat it (S3 flexibility and S4 representation checking) ? In that case, I would need to add my validate function across all the methods of my package?
or is there a smarter way on top of S3, something like "any time we do something on an object of a particular class, call a validate function on it"?
The easiest thing would be to write a validation function for each class and pass objects through it before S3 method dispatch or within each class's method. Here's an example with a simple validation function called check_example_class for an object of class "example_class":
check_example_class <- function(x) {
stopifnot(length(x) == 2)
stopifnot("a" %in% names(x))
stopifnot("b" %in% names(x))
stopifnot(is.numeric(x$a))
stopifnot(is.character(x$b))
NULL
}
print.example_class <- function(x, ...) {
check_example_class(x)
cat("Example class object where b =", x$b, "\n")
invisible(x)
}
# an object of the class
good <- structure(list(a = 1, b = "foo"), class = "example_class")
# an object that pretends to be of the class
bad <- structure(1, class = "example_class")
print(good) # works
## Example class object where b = foo
print(bad) # fails
## Error: length(x) == 2 is not TRUE

Using a method/function within a reference class method of the same name

When defining a new reference class in R there is a bunch of boiler-plate methods that are expected (by R conventions), such as length, show etc. When these are defined they aggressively masks similar named methods/functions when called from within the class' methods. As you can not necessarily know the namespace of the foreign function it is not possible to use the package:: specifier.
Is there a way to tell a method to ignore its own methods unless called specifically using .self$?
Example:
tC <- setRefClass(
'testClass',
fields = list(data='list'),
methods = list(
length=function() {
length(data)
}
)
)
example <- tC(data=list(a=1, b=2, c=3))
example$length() # Will cause error as length is defined without arguments
Alternatively one could resort to defining S4 methods for the class instead (as reference classes are S4 classes under the hood), but this seems to be working against the reference class idea...
Edit:
To avoid focusing on instances where you know the class of the data in advance consider this example:
tC <- setRefClass(
'testClass',
fields = list(data='list'),
methods = list(
length=function() {
length(data)
},
combineLengths = function(otherObject) {
.self.length() + length(otherObject)
}
)
)
example <- tC(data=list(a=1, b=2, c=3))
example$combineLength(rep(1, 3)) # Will cause error as length is defined without arguments
I am aware that it is possible to write your own dispatching to the correct method/function, but this seems as such a common situation that I thought it might have already been solved within the methods package (sort of the reverse of usingMethods())
My question is thus, and I apologise if this wasn't clear before: Are there ways of ignoring there reference class methods and fields within the method definitions and solely rely on .self for accessing these, so that methods/functions defined outside the class are not masked?
The example is not very clear. I don't know for what reason you can't know the namespace of your method. Whatever, here a couple of methods to work around this problem:
You can use a different name for the reference class method Length with Capital "L" for example
You can find dynamically the namespace of the generic function.
For example:
methods = list(
.show =function(data) {
ns = sub(".*:","",getAnywhere("show")$where[1])
func = get("show",envir = getNamespace(ns))
func(data)
},
show=function() {
.show(data)
}
)
You can use the new reference class System R6.
For example:
tC6 <- R6Class('testClass',
public = list(
data=NA,
initialize = function(data) {
if (!missing(data)) self$data <- data
},
show=function() show(self$data)
)
)

S3 style dispatching for S3 objects using formal method definitions

Related to this question, but slightly different and hopefully more clear.
I am looking for a clean way to formally register methods for both S4 and S3 classes, but without relying on the terrible S3-dot-naming-scheme for dispatching. An example:
setClass("foo");
setClass("bar");
setGeneric("test", function(x, ...){
standardGeneric("test");
});
setMethod("test", "bar", function(x, ...){
return("success (bar).");
});
obj1 <- 123;
class(obj1) <- "bar";
test(obj1);
This example shows how we can register a test method for S3 objects of class bar, without the need to name the function test.bar, which is great. However, the limitation is if we register methods this way, they will only be dispatched to the first S3 class of the object. E.g:
obj2 <- 123;
class(obj2) <- c("foo", "bar");
test(obj2);
This doesn't work, because S4 method dispatching will only try class foo and its superclasses. How could this example be extended so that it will automatically select the test method for bar when no appropriate method for foo was found? E.g. S3 style dispatching but without having to go back to naming everything test.foo and test.bar?
So in summary: how to create a generic function that uses formal method dispatching, but in addition fall back on the second, third, etc class of an object for S3 objects with multiple classes.
?setOldClass will give the answer:
setOldClass(c("foo", "bar"))
setGeneric("test", function(x, ...)standardGeneric("test"))
setMethod("test", "bar", function(x, ...)return("success (bar)."))
You could write a method
test = function(x, ...) UseMethod("test")
setGeneric("test")
.redispatch = function(x, ...)
{
if (is.object(x) && !isS4(x) && length(class(x)) != 1L) {
class(x) = class(x)[-1]
callGeneric(x, ...)
} else callNextMethod(x, ...)
}
setMethod(test, "ANY", .redispatch)
But I personally wouldn't mix S3 and S4 in this way.

Resources