I have a package which uses a data.frame based S4 class:
setClass(Class="foobar",
slots=c(a="character", b="character", c="character"),
contains="data.frame")
Works as intended. However, I observe weird warnings when combining with tidyverse:
df <- data.frame(ID=1:5)
df2 <- new("foobar", df)
as_tibble(df2)
The last statement incites a warning message:
Warning message:
In class(x) <- c(subclass, tibble_class) :
Setting class(x) to multiple strings ("tbl_df", "tbl", ...); result will no longer be an S4 object
This is because tidyverse does not support S4 data frames. This can be circumvented in downstream code by using asS3(df). However, users of my package may be puzzled if they see these warnings. I am now faced with the following choices and I don't really know which would be the most reasonable and correct:
Keep the S4 model and hope that the users won't mind seeing this warning each time they pass my data frames into something else.
Use S3. However, I already have another S4 class defined in published versions of my package. I am afraid that I would break someones code.
Mix S3 and S4. Is it even allowed?
Is there another solution I might be overlooking?
There is no brilliant solution to this which is entirely within your control.
The tidyverse package may call class<- on any data-frame-like object given to it, and as you have seen this will destroy the S4 nature of any object. This can't be worked around by (for instance) defining a method for coerce or calling setAs, as class<- doesn't use that mechanism. (class<- isn't generic either, you can't set a method for it.) The only way to make tidyverse support S4 is for tidyverse's author to alter the code to use as or similar, and it doesn't look like that is top of their to-do-list.
You are correct to be worried about dramatically altering the way your class works when you have released a version of your package already with an S4 class.
If:
your package is quite new and doesn't yet have many users;
you can do all you need to do with S3; and
you don't know of another package which has built new classes on top of yours
then it may be best to redefine it as S3, and include a message when your package is installed or loaded to say
thanks for installing myPackage v2. Code may be incompatible with v1.2 or earlier; see help(blah) for details
otherwise, stick with S4.
You can't exactly mix S3 and S4 for class definitions (you can for method definitions). The closest you can come is setOldClass which registers a S3 class as an S4 one (whereas you wanted the opposite). Still, that may help you achieve "you can do all you need to do with S3" above.
One other possibility is to define your own version of class<- which checks to see if an object of S4 class foobar is attempting to be coerced to S3 and calls the ordinary class<- if not. The cure is probably worse than the disease in this case; this will slow down all future S3 class conversions (since class<- is now an ordinary function call, not a primitive) but it should work in principle. Another reason that it is not recommended is that you are relying on no other package higher in the search path doing something similar (what if another package author had the same issue and wanted to do the same trick? Then the results would depend on which package was higher up the search path!)
Related
I'm reading through a tutorial that is using the lme4 package and one of the input options to predict is re.form=Na.
m_lmer <- lmer(log(beakh) ~ log(wingl) + (1 | taxon), data = d)
d$predict_lmer_population <- predict(m_lmer, re.form = NA)
I want to get help for the predict call, but clearly doing ?predict is incorrect.
I then tried asking for the class of the model:
> class(m_lmer)
[1] "lmerMod"
attr(,"package")
[1] "lme4"
I then tried ?lmerMod which RStudio automagically changed to ?`lmerMod-class`. I get the addition of ` to the name because of the - "special character" but where did class come from?
The help then describes the "merMod" class, not "lmerMod". Why the name change (leading l dropped)?
After some searching in that help I found a link to predict.merMod
Further searching confirmed I could have done: methods('predict') and found the same method, although it is listed predict.merMod* for some reason (added * symbol).
In the end I feel like I would be able to find something similar much more quickly the next time but it still seems very hard to find good help for class methods in R. I'm not sure if this would work the same for S4 or R6 (from the documentation it seems predict.merMod is a S3 method)? It is not clear why the l was dropped from the class name (lmerMod to merMod) or why the -class suffix is needed when asking for help. I feel like I'm missing some extremely basic lesson on R documentation.
Throwing this "help in R" link in for reference that seems to omit class based methods help and also seems like it should just point to some official R documentation website rather than being such a long SO post ...
How to get help in R?
This is a very good question. There are a bunch of things going on here, having to do with (1) the difference between S3 and S4 classes and methods, (2) the underlying class structures in lme4.
I want to get help for the predict call, but clearly doing ?predict is incorrect.
?predict gets you help for the generic function which, as you've noticed, isn't useful. In general it's up to the package developers to decide whether their specialized version of a particular method (e.g., the predict() method for merMod objects) is sufficiently special (e.g., has different or unusual arguments) that it should be documented separately. (Writing R Extensions says "If it is necessary or desired to provide an explicit function declaration (in a \usage section) for [a ...] method ...") (emphasis added).
In general, if they're documented, docs for S3 methods will be available as ?function.class, S4 methods will be documented as ?"function,class-method" (which needs quotation marks since it has a hyphen in it).`
The methods() function gives some clues about where to look: if the bbmle and lme4 packages are loaded, predict.merMod* and predict,mle2-method* both show up in the list (the stars mean the functions are hidden, i.e. you can use them by calling predict(my_fit), but the function definitions are not easily available).
I then tried asking for the class of the model:
class(m_lmer)
[1] "lmerMod"
attr(,"package")
[1] "lme4"
lmer() produces an object of class lmerMod, which is a subclass of the merMod class (the other subclass is glmerMod, for GLMMs).
I then tried ?lmerMod which RStudio automagically changed to ?lmerMod-class. I get the addition of ` to the name because of the - "special character" but where did class come from?
I don't know that much about RStudio: the "-class" part is specific to methods for S4 classes.
The help then describes the "merMod" class, not "lmerMod". Why the name change (leading l dropped)?
See above.
The most opaque part of all of this (IMO) is figuring out S4 class hierarchies - if you say methods(class = "lmerMod") you only get two results (getL and show), it's hard to figure out that you need to say methods(class = "merMod") to get most of the available methods (i.e., only a few methods are specific to lmerMod and glmerMod subclasses - most are more general).
According to this answer you can find the subclasses of merMod as follows:
library(lme4)
cls <- getClass("merMod")
names(cls#subclasses)
## [1] "lmerMod" "glmerMod" "nlmerMod"
How about the other direction?
cls <- getClass("lmerMod")
names(cls#contains)
## [1] "merMod"
(Don't ask me more questions, I really don't understand S4 classes all that well!)
You can find all the objects in a package with
objs <- mget(ls("package:base"), inherits = TRUE)
You can select the functions from these with
funs <- objs[is.function(objs)]
You can get a complete list of the dependencies of the listed functions in a package by applying codetools::findGlobals(), miniCRAN::makeDepGraph, pkgnet::CreatePackageReport (or others) to the function list. All of these functions either graph the resulting dependencies or return an object easily plotable with, e.g., igraph or DependenciesGraph.
Is there an comparable set of commands to find all the classes created by a package and the inheritance structure of those classes? I know that for most packages the resulting web of class inheritance would be relatively simple, but I think that in a few cases, such as ggplot2 and the survey package, the resulting web of class inheritance could be quite helpful.
I have found a package, classGraph, that creates directed acyclic graphs for S4 class structures, but I am more interested in the much more common S3 structures.
This seems brute-force and sloppy, but I suppose if I had a list of all the class attributes used by objects in the base packages, and all the class attributes of objects in a package, then any of the latter which is not among the former would be new classes created by the package or inherited from another non-base package.
This is slightly tricky since I am not aware of any formal definition of a S3 class. For R objects the S3 classes are governed by a very simple character vector of class names stored in the class attribute. Method dispatch is then done by matching element(s) of that attribute with a function name.
You could essentially do:
x <- 1:5
class(x) <- "MyMadeUpClass"
x
# [1] 1 2 3 4 5
# attr(,"class")
# [1] "MyMadeUpClass"
Does the above really define a class in the intuitive formal understanding of the term ?
You can create a print method for objects of this class like (silly example incoming):
print.MyMadeUpClass <- function(x, ...) {
print(sprintf("Pretty vector: %s", paste(x, collapse = ",")))
}
x
# [1] "Pretty vector: 1,2,3,4,5"
The important distinction here is that methods in S3
"belong to" (generic) functions, not classes
are chosen based on classes of the arguments provided to the function call
Point I am trying to make is that S3 does not really have a formally defined inheritance (which I assume is what you are looking for), with contrast to S4 which implements this via the contains concept, so I am not really sure what would you like to see as a result.
Very good read on the topic Object-Oriented Programming, Functional
Programming and R by John M. Chambers: https://arxiv.org/pdf/1409.3531.pdf
Edit (after question edit) - the sloop package:
From S3 perspective I think it makes a lot of sense to examine the structure of generics and methods. A found the sloop package to be a very useful tool for this: https://github.com/r-lib/sloop.
When I attached the package ffbase for the first time it alerted me that the functions %in% and table were being masked from the base package. I use both functions quite a lot, so I immediately investigated what this means and I am not really sure I understand what's going on here.
As far as I can tell, for table this means that a new method was added:
methods(table)
[1] table.default* table.ff
And for %in%, it's truly been overwritten so the default is the ff version, with base playing backup:
getAnywhere(`%in%`)
2 differing objects matching '%in%' were found in the following places
package::ffbase
package:base
namespace:base
namespace:ffbase
I have two questions now. The first is - if a new method is added an S3 generic, then why would you need to warn about masking? In my mind, table isn't truly masked because doesn't R just figure out what data type I have and dispatch the correct method?
And secondly, if you have actually overwritten a function then why does it still work if I do base functionality without specifying the right namespace?
x <- c(1, 23)
23 %in% x
[1] TRUE
I would have assumed I would have needed to use base::%in% to get this right?
I suppose this second question really boils down to - I trust R when it comes to the generic method dispatch, because the point of a having a class is to provide some way to signal what method you're supposed to use. But if you have this system where package functions (not associated with a class) just get loaded in order of package load, then I don't understand how R knows when the first one it encounters isn't going to work?
Probably the most basic question on S4 classes imaginable here.
What is the simplest way to save an S4 class you have defined so that you can reuse it elsewhere. I have a project where I'm taking a number of very large datasets and compiling summary information from them into small S4 objects. Since I'll therefore be switching R sessions to create the summary object for each dataset, it'd be good to be able to load in the definition of the class from a saved object (or have it load automatically) rather than having to include the long definition of the object at the top of each script (which I assume is bad practice anyway because the code defining the object might become inconsistent).
So what's the syntax along the lines of saveclass("myClass"), loadclass("myclass") or am I just thinking about this in the wrong way?
setClass("track", representation(x="numeric", y="numeric"))
x <- new("track", x=1:4, y=5:8)
save as binary
fn <- tempfile()
save(x, ascii=FALSE, file=fn)
rm(x)
load(fn)
x
save as ASCII
save(x, ascii=TRUE, file=fn)
ASCII text representation from which to regenerate the data
dput(x, file=fn)
y <- dget(fn)
The original source can be found here.
From the question, I think you really do want to include the class definition at the top of each script (although not literally; see below), rather than saving a binary representation of the class definition and load that. The reason is the general one that binary representations are more fragile (subject to changes in software implementation) compared to simple text representations (for instance, in the not too distant past S4 objects were based on simple lists with a class attribute; more recently they have been built around an S4 'bit' set on the underlying C-level data representation).
Instead of copying and pasting the definition into each script, really the best practice is to included the class definition (and related methods) in an R package, and to load the package at the top of the script. It is not actually hard to write packages; an easy way to get started is to use Rstudio to create a 'New Project' as an 'R package'. Use a version number in the package to keep track of the specific version of the class definition / methods you're using, and version control (svn or git, for instance) to make it easy to track the changes / explorations you make as your class matures. Share with your colleagues and eventually the larger R community to let others benefit from your hard work and insight!
Is there a way to specify that a library should not throw warnings regarding name clashes and masked objects whenever it is attached? I imagine a solution would involve editing the description or one of the special functions such as .onAttach but I can't find anything solving this issue.
I ask becuase the warnings are unneeded. I have defined my own S3 class and the masked function is still called by the default method of the masking function:
median <- function(x, ...) UseMethod("median")
median.default <- stats::median.default
In the event that a user is using median on a typical R data structure such as a vector, the median method in my package will call the masked function automatically, so there is no real need for the user to be aware of the masking.
I'm not sure if your question is that you don't want the user to see the warnings, or that you don't want the warnings to occur.
If the former, you might be able to use shhh in the tfse library around your library call. Or, if it's just for yourself, you could set the warn.conflicts = FALSE argument when calling the library.
If the latter, it would be clearly more elegant to rewrite the offending method so it doesn't conflict in the namespace.