R methods for custom S3 class - r

I have my s3 class student
# a constructor function for the "student" class
student <- function(n,a,g) {
# we can add our own integrity checks
if(g>4 || g<0) stop("GPA must be between 0 and 4")
value <- list(name = n, age = a, GPA = g)
# class can be set using class() or attr() function
attr(value, "class") <- "student"
value
}
stud <- student("name", 10, 3.5)
Now I would like to create a method similar to stud.doubleGPA() which would double the GPA of the student. I know I can achieve this using
stud$GPA <- stud$GPA*2
stud$GPA # 7
However trying to define a function doesn't seem to work.
doubleGPA <- function(student) {
if(!class(student)=="student") stop("nope")
student$GPA <- student$GPA*2
}
doubleGPA(stud)
stud$GPA # 7 again (didn't work)
And replacing <- with <<- in the above function gives
Error in student$GPA <<- student$GPA * 2 :
object of type 'closure' is not subsettable
How can I define such a method which would belong to an s3 class and therefore be inherited by children ?
Cheers

You are thinking of a different kind of object oriented programming than the S3 style, something more like C++ or Java. You can do that in R, just not in the S3 system.
In the S3 system, methods "belong to" generic functions, not to classes. Like most functions in R, generic functions don't modify their arguments, they calculate new values and return those. So you might define a generic function doubleGPA(), and have it work on the "student" class using
doubleGPA <- function(x) UseMethod("doubleGPA")
doubleGPA.student <- function(x) {
x$GPA <- x$GPA*2
x
}
and then use it as
stud <- student("name", 10, 3.5)
stud <- doubleGPA(stud)
If you actually want something more like C++ or Java, there are a couple of choices: "reference classes" from the methods package (see ?methods::setRefClass) and "R6 classes" from the R6 package. There are also several prototype-based styles in packages proto, ggplot2, R.oo, and are probably more that I've forgotten to mention.

Related

An imported package is an object of what kind? [duplicate]

I am (probably) NOT referring to the "all other variables" meaning like var1~. here.
I was pointed to plyr once again and looked into mlplyand wondered why parameters are defined with leading dot like this:
function (.data, .fun = NULL, ..., .expand = TRUE, .progress = "none",
.parallel = FALSE)
{
if (is.matrix(.data) & !is.list(.data))
.data <- .matrix_to_df(.data)
f <- splat(.fun)
alply(.data = .data, .margins = 1, .fun = f, ..., .expand = .expand,
.progress = .progress, .parallel = .parallel)
}
<environment: namespace:plyr>
What's the use of that? Is it just personal preference, naming convention or more? Often R is so functional that I miss a trick that's long been done before.
A dot in function name can mean any of the following:
nothing at all
a separator between method and class in S3 methods
to hide the function name
Possible meanings
1. Nothing at all
The dot in data.frame doesn't separate data from frame, other than visually.
2. Separation of methods and classes in S3 methods
plot is one example of a generic S3 method. Thus plot.lm and plot.glm are the underlying function definitions that are used when calling plot(lm(...)) or plot(glm(...))
3. To hide internal functions
When writing packages, it is sometimes useful to use leading dots in function names because these functions are somewhat hidden from general view. Functions that are meant to be purely internal to a package sometimes use this.
In this context, "somewhat hidden" simply means that the variable (or function) won't normally show up when you list object with ls(). To force ls to show these variables, use ls(all.names=TRUE). By using a dot as first letter of a variable, you change the scope of the variable itself. For example:
x <- 3
.x <- 4
ls()
[1] "x"
ls(all.names=TRUE)
[1] ".x" "x"
x
[1] 3
.x
[1] 4
4. Other possible reasons
In Hadley's plyr package, he uses the convention to use leading dots in function names. This as a mechanism to try and ensure that when resolving variable names, the values resolve to the user variables rather than internal function variables.
Complications
This mishmash of different uses can lead to very confusing situations, because these different uses can all get mixed up in the same function name.
For example, to convert a data.frame to a list you use as.list(..)
as.list(iris)
In this case as.list is a S3 generic method, and you are passing a data.frame to it. Thus the S3 function is called as.list.data.frame:
> as.list.data.frame
function (x, ...)
{
x <- unclass(x)
attr(x, "row.names") <- NULL
x
}
<environment: namespace:base>
And for something truly spectacular, load the data.table package and look at the function as.data.table.data.frame:
> library(data.table)
> methods(as.data.table)
[1] as.data.table.data.frame* as.data.table.data.table* as.data.table.matrix*
Non-visible functions are asterisked
> data.table:::as.data.table.data.frame
function (x, keep.rownames = FALSE)
{
if (keep.rownames)
return(data.table(rn = rownames(x), x, keep.rownames = FALSE))
attr(x, "row.names") = .set_row_names(nrow(x))
class(x) = c("data.table", "data.frame")
x
}
<environment: namespace:data.table>
At the start of a name it works like the UNIX filename convention to keep objects hidden by default.
ls()
character(0)
.a <- 1
ls()
character(0)
ls(all.names = TRUE)
[1] ".a"
It can be just a token with no special meaning, it's not doing anything more than any other allowed token.
my.var <- 1
my_var <- 1
myVar <- 1
It's used for S3 method dispatch. So, if I define simple class "myClass" and create objects with that class attribute, then generic functions such as print() will automatically dispatch to my specific print method.
myvar <- 1
print(myvar)
class(myvar) <- c("myClass", class(myvar))
print.myClass <- function(x, ...) {
print(paste("a special message for myClass objects, this one has length", length(x)))
return(invisible(NULL))
}
print(myvar)
There is an ambiguity in the syntax for S3, since you cannot tell from a function's name whether it is an S3 method or just a dot in the name. But, it's a very simple mechanism that is very powerful.
There's a lot more to each of these three aspects, and you should not take my examples as good practice, but they are the basic differences.
If a user defines a function .doSomething and is lazy to specify all the roxygen documentation for parameters, it will not generate errors for compiling the package

Using a dot before a function in R [duplicate]

I am (probably) NOT referring to the "all other variables" meaning like var1~. here.
I was pointed to plyr once again and looked into mlplyand wondered why parameters are defined with leading dot like this:
function (.data, .fun = NULL, ..., .expand = TRUE, .progress = "none",
.parallel = FALSE)
{
if (is.matrix(.data) & !is.list(.data))
.data <- .matrix_to_df(.data)
f <- splat(.fun)
alply(.data = .data, .margins = 1, .fun = f, ..., .expand = .expand,
.progress = .progress, .parallel = .parallel)
}
<environment: namespace:plyr>
What's the use of that? Is it just personal preference, naming convention or more? Often R is so functional that I miss a trick that's long been done before.
A dot in function name can mean any of the following:
nothing at all
a separator between method and class in S3 methods
to hide the function name
Possible meanings
1. Nothing at all
The dot in data.frame doesn't separate data from frame, other than visually.
2. Separation of methods and classes in S3 methods
plot is one example of a generic S3 method. Thus plot.lm and plot.glm are the underlying function definitions that are used when calling plot(lm(...)) or plot(glm(...))
3. To hide internal functions
When writing packages, it is sometimes useful to use leading dots in function names because these functions are somewhat hidden from general view. Functions that are meant to be purely internal to a package sometimes use this.
In this context, "somewhat hidden" simply means that the variable (or function) won't normally show up when you list object with ls(). To force ls to show these variables, use ls(all.names=TRUE). By using a dot as first letter of a variable, you change the scope of the variable itself. For example:
x <- 3
.x <- 4
ls()
[1] "x"
ls(all.names=TRUE)
[1] ".x" "x"
x
[1] 3
.x
[1] 4
4. Other possible reasons
In Hadley's plyr package, he uses the convention to use leading dots in function names. This as a mechanism to try and ensure that when resolving variable names, the values resolve to the user variables rather than internal function variables.
Complications
This mishmash of different uses can lead to very confusing situations, because these different uses can all get mixed up in the same function name.
For example, to convert a data.frame to a list you use as.list(..)
as.list(iris)
In this case as.list is a S3 generic method, and you are passing a data.frame to it. Thus the S3 function is called as.list.data.frame:
> as.list.data.frame
function (x, ...)
{
x <- unclass(x)
attr(x, "row.names") <- NULL
x
}
<environment: namespace:base>
And for something truly spectacular, load the data.table package and look at the function as.data.table.data.frame:
> library(data.table)
> methods(as.data.table)
[1] as.data.table.data.frame* as.data.table.data.table* as.data.table.matrix*
Non-visible functions are asterisked
> data.table:::as.data.table.data.frame
function (x, keep.rownames = FALSE)
{
if (keep.rownames)
return(data.table(rn = rownames(x), x, keep.rownames = FALSE))
attr(x, "row.names") = .set_row_names(nrow(x))
class(x) = c("data.table", "data.frame")
x
}
<environment: namespace:data.table>
At the start of a name it works like the UNIX filename convention to keep objects hidden by default.
ls()
character(0)
.a <- 1
ls()
character(0)
ls(all.names = TRUE)
[1] ".a"
It can be just a token with no special meaning, it's not doing anything more than any other allowed token.
my.var <- 1
my_var <- 1
myVar <- 1
It's used for S3 method dispatch. So, if I define simple class "myClass" and create objects with that class attribute, then generic functions such as print() will automatically dispatch to my specific print method.
myvar <- 1
print(myvar)
class(myvar) <- c("myClass", class(myvar))
print.myClass <- function(x, ...) {
print(paste("a special message for myClass objects, this one has length", length(x)))
return(invisible(NULL))
}
print(myvar)
There is an ambiguity in the syntax for S3, since you cannot tell from a function's name whether it is an S3 method or just a dot in the name. But, it's a very simple mechanism that is very powerful.
There's a lot more to each of these three aspects, and you should not take my examples as good practice, but they are the basic differences.
If a user defines a function .doSomething and is lazy to specify all the roxygen documentation for parameters, it will not generate errors for compiling the package

How to find the default construstor methods for a class

Problems comes from experimenting a package and find using new(Class = 'ddmatrix', Data = X) and ddmatrix(Data = X) yields different results, in which X is a matrix(one can think class ddmatrix is a transformed Class matrix).
Document
In the package, a S4 class ddmatrix is defined. A generic constructor function by setGeneric(name = 'ddmatrix'). Further, the pacakge defines setMethod('ddmatrix', signature = 'matrix', ...) as below:
setMethod("ddmatrix", signature(data="matrix"),
function(data, nrow=1, ncol=1, byrow=FALSE, ...
bldim=.pbd_env$BLDIM, ICTXT=.pbd_env$ICTXT)
{
dim(data) <- NULL
ret <- ddmatrix(data=data, nrow=nrow, ncol=ncol, byrow=byrow, bldim=bldim, ICTXT=ICTXT)
return( ret )
}
)
I am confused how a method ddmatrix is used in the above setMethod('ddmatrix', signature = 'matrix') step. Is this ddmatrix method the default method for the generic ddmatrix?
Meanwhile, when call new('ddmatrix', Data = X), which method it will call to build a new ddmatrix object from a matrix object? new function is:
function (Class, ...)
{
ClassDef <- getClass(Class, where = topenv(parent.frame()))
value <- .Call(C_new_object, ClassDef)
initialize(value, ...)
}
Question
To answer the discrepancy between new('ddmatrix') and ddmatrix(), I think one way is to find the default constructor. Meanwhile, the package also defines setMethod('ddmatrix', signature = 'vector',...), is this the default one?
At some level this is up to the author. Many people view new() and # or slot() (for slot access) as strictly for the package developer -- these expose the implementation details directly to the user -- and prefer to write constructors and accessors that place an interface on top of the implementation. This appears to be the case for the package that you are considering, where ddmatrix() is meant to be the user-oriented constructor.
The author appears to have implemented a facade pattern, where several different methods make relatively minor data transformations before calling another function / method to do the actual object construction. From what you show, it seems ddmatrix,matrix-method invokes ddmatrix,vector-method (because inside ddmatrix,matrix-method the function sets dim(data) <- NULL, turning the matrix into a vector, and then calls ddmatrix() which now dispatches to the vector method), and this constructs the object via new() at https://github.com/RBigData/pbdDMAT/blob/master/R/constructors.r#L191. A different package author could have adopted a different design, where several methods separately call new().
The documentation often also helps, e.g., ?ddmatrix does not discuss direct object construction via new().
Here's a simpler example. I create a class "A", with a single slot containing a numeric vector
setClass("A", slots=c(x="numeric"))
Here I create a constructor, because I want the user to see the interface to the class, rather than it's implementation
A = function(x=numeric())
new("A", x=x)
So far, A() and new("A") return an object with the same structure, e.g.,
> new("A")
An object of class "A"
Slot "x":
numeric(0)
> A()
An object of class "A"
Slot "x":
numeric(0)
Maybe as the developer of the "A" class, I want an uninitialized object of class 'A' to have 'NA' as the value of the slot x, so I modify
A = function(x = NA_real_)
new("A", x=x)
now a direct call to new() returns a different object from a call to A()
> new("A")
An object of class "A"
Slot "x":
numeric(0)
> A()
An object of class "A"
Slot "x":
[1] NA
Which one is 'correct'? Well, both are correct, but as the creator of the class I intend for the user to create an object of class "A" by calling the function A().
A typical reason for separating the interface (using A() to construct an object) from the implementation (using new() to construct an object) is because the implementation is not obvious to the user. This seems to be the case with the ddmatrix() function -- for reasons that only the package author needs to know about, it is convenient to store an R matrix as a vector with information about dimensions. I guess a simple equivalent might be
setClass("A", slots=c(data="numeric", nrow="integer", ncol="integer"))
A = function(m=matrix(0, 0, 0)) {
stopifnot(is(m, "matrix"))
new("A", data=as.vector(m), nrow=nrow(m), ncol=ncol(m))
}
for instance
> A(matrix(1:10, 5))
An object of class "A"
Slot "data":
[1] 1 2 3 4 5 6 7 8 9 10
Slot "nrow":
[1] 5
Slot "ncol":
[1] 2
Why does the author want to do this? It doesn't matter to us as users. Why can't we create the same object by calling m = matrix(1:10, 5); new("A", data=as.vector(m), nrow=nrow(m), ncol(m))? We could, but then when the author decided to change their implementation such that the offsets to the start of each row were to be stored, we'd have to understand what the author had done and update our code.

Extent S4 class, how to use superclass' constructor

In my package, I want to subclass a class TheBaseClass from a contributed package (so it is out of my reach). There is a function for creating objects of this class. Here is a minimal example for that code.
setClass("TheBaseClass", representation(a="numeric"))
initBase <- function() new("TheBaseClass", a=1) # in reality more complex
Now I want simply use initBase as constructor for my subclass, but I do not know how
to set the new class
setClass("MyInheritedClass", contains="TheBaseClass")
initInher <- function() {
res <- initBase()
class(res) <- "MyInheritedClass" # this does not work for S4
}
How can I alter the last line to make it work? Copy & paste the initBase function is not an option, since it involves a .C call. I read about setIs, but this seems not to be the right function here.
Any hint appreciated!
Perhaps this answer provides more extensive explanation. One pattern is to provide an instance of the base class as an unnamed argument to your class constructor
.MyInheritedClass <- setClass("MyInheritedClass", contains="TheBaseClass")
.MyInheritedClass(initBase())
(setClass returns a generator function, which is really no different from calling new but seems cleaner; I use . in front, because generators are maybe a little too crude for "end users", e.g., there is no hint about what the arguments are supposed to be, just ...). This assumes that you have not written an initialize method for your class, or that your initialize method has been constructed in a way that is consistent with the contract of initialize,ANY-method, with a slightly more complicated class
.A <- setClass("A", contains="TheBaseClass",
representation=representation(x="numeric"))
setMethod(initialize, "A",
function(.Object, ..., x)
{
x <- log(x) # your class-specific initialization...
callNextMethod(.Object, ..., x = x) # passed to parent constructor
})
This pattern requires that the initialize method of the base class has been designed correctly. In action:
> .A(initBase(), x=1:2)
An object of class "A"
Slot "x":
[1] 0.0000000 0.6931472
Slot "a":
numeric(0)

What does the dot mean in R – personal preference, naming convention or more?

I am (probably) NOT referring to the "all other variables" meaning like var1~. here.
I was pointed to plyr once again and looked into mlplyand wondered why parameters are defined with leading dot like this:
function (.data, .fun = NULL, ..., .expand = TRUE, .progress = "none",
.parallel = FALSE)
{
if (is.matrix(.data) & !is.list(.data))
.data <- .matrix_to_df(.data)
f <- splat(.fun)
alply(.data = .data, .margins = 1, .fun = f, ..., .expand = .expand,
.progress = .progress, .parallel = .parallel)
}
<environment: namespace:plyr>
What's the use of that? Is it just personal preference, naming convention or more? Often R is so functional that I miss a trick that's long been done before.
A dot in function name can mean any of the following:
nothing at all
a separator between method and class in S3 methods
to hide the function name
Possible meanings
1. Nothing at all
The dot in data.frame doesn't separate data from frame, other than visually.
2. Separation of methods and classes in S3 methods
plot is one example of a generic S3 method. Thus plot.lm and plot.glm are the underlying function definitions that are used when calling plot(lm(...)) or plot(glm(...))
3. To hide internal functions
When writing packages, it is sometimes useful to use leading dots in function names because these functions are somewhat hidden from general view. Functions that are meant to be purely internal to a package sometimes use this.
In this context, "somewhat hidden" simply means that the variable (or function) won't normally show up when you list object with ls(). To force ls to show these variables, use ls(all.names=TRUE). By using a dot as first letter of a variable, you change the scope of the variable itself. For example:
x <- 3
.x <- 4
ls()
[1] "x"
ls(all.names=TRUE)
[1] ".x" "x"
x
[1] 3
.x
[1] 4
4. Other possible reasons
In Hadley's plyr package, he uses the convention to use leading dots in function names. This as a mechanism to try and ensure that when resolving variable names, the values resolve to the user variables rather than internal function variables.
Complications
This mishmash of different uses can lead to very confusing situations, because these different uses can all get mixed up in the same function name.
For example, to convert a data.frame to a list you use as.list(..)
as.list(iris)
In this case as.list is a S3 generic method, and you are passing a data.frame to it. Thus the S3 function is called as.list.data.frame:
> as.list.data.frame
function (x, ...)
{
x <- unclass(x)
attr(x, "row.names") <- NULL
x
}
<environment: namespace:base>
And for something truly spectacular, load the data.table package and look at the function as.data.table.data.frame:
> library(data.table)
> methods(as.data.table)
[1] as.data.table.data.frame* as.data.table.data.table* as.data.table.matrix*
Non-visible functions are asterisked
> data.table:::as.data.table.data.frame
function (x, keep.rownames = FALSE)
{
if (keep.rownames)
return(data.table(rn = rownames(x), x, keep.rownames = FALSE))
attr(x, "row.names") = .set_row_names(nrow(x))
class(x) = c("data.table", "data.frame")
x
}
<environment: namespace:data.table>
At the start of a name it works like the UNIX filename convention to keep objects hidden by default.
ls()
character(0)
.a <- 1
ls()
character(0)
ls(all.names = TRUE)
[1] ".a"
It can be just a token with no special meaning, it's not doing anything more than any other allowed token.
my.var <- 1
my_var <- 1
myVar <- 1
It's used for S3 method dispatch. So, if I define simple class "myClass" and create objects with that class attribute, then generic functions such as print() will automatically dispatch to my specific print method.
myvar <- 1
print(myvar)
class(myvar) <- c("myClass", class(myvar))
print.myClass <- function(x, ...) {
print(paste("a special message for myClass objects, this one has length", length(x)))
return(invisible(NULL))
}
print(myvar)
There is an ambiguity in the syntax for S3, since you cannot tell from a function's name whether it is an S3 method or just a dot in the name. But, it's a very simple mechanism that is very powerful.
There's a lot more to each of these three aspects, and you should not take my examples as good practice, but they are the basic differences.
If a user defines a function .doSomething and is lazy to specify all the roxygen documentation for parameters, it will not generate errors for compiling the package

Resources