You can find all the objects in a package with
objs <- mget(ls("package:base"), inherits = TRUE)
You can select the functions from these with
funs <- objs[is.function(objs)]
You can get a complete list of the dependencies of the listed functions in a package by applying codetools::findGlobals(), miniCRAN::makeDepGraph, pkgnet::CreatePackageReport (or others) to the function list. All of these functions either graph the resulting dependencies or return an object easily plotable with, e.g., igraph or DependenciesGraph.
Is there an comparable set of commands to find all the classes created by a package and the inheritance structure of those classes? I know that for most packages the resulting web of class inheritance would be relatively simple, but I think that in a few cases, such as ggplot2 and the survey package, the resulting web of class inheritance could be quite helpful.
I have found a package, classGraph, that creates directed acyclic graphs for S4 class structures, but I am more interested in the much more common S3 structures.
This seems brute-force and sloppy, but I suppose if I had a list of all the class attributes used by objects in the base packages, and all the class attributes of objects in a package, then any of the latter which is not among the former would be new classes created by the package or inherited from another non-base package.
This is slightly tricky since I am not aware of any formal definition of a S3 class. For R objects the S3 classes are governed by a very simple character vector of class names stored in the class attribute. Method dispatch is then done by matching element(s) of that attribute with a function name.
You could essentially do:
x <- 1:5
class(x) <- "MyMadeUpClass"
x
# [1] 1 2 3 4 5
# attr(,"class")
# [1] "MyMadeUpClass"
Does the above really define a class in the intuitive formal understanding of the term ?
You can create a print method for objects of this class like (silly example incoming):
print.MyMadeUpClass <- function(x, ...) {
print(sprintf("Pretty vector: %s", paste(x, collapse = ",")))
}
x
# [1] "Pretty vector: 1,2,3,4,5"
The important distinction here is that methods in S3
"belong to" (generic) functions, not classes
are chosen based on classes of the arguments provided to the function call
Point I am trying to make is that S3 does not really have a formally defined inheritance (which I assume is what you are looking for), with contrast to S4 which implements this via the contains concept, so I am not really sure what would you like to see as a result.
Very good read on the topic Object-Oriented Programming, Functional
Programming and R by John M. Chambers: https://arxiv.org/pdf/1409.3531.pdf
Edit (after question edit) - the sloop package:
From S3 perspective I think it makes a lot of sense to examine the structure of generics and methods. A found the sloop package to be a very useful tool for this: https://github.com/r-lib/sloop.
Related
I'm reading through a tutorial that is using the lme4 package and one of the input options to predict is re.form=Na.
m_lmer <- lmer(log(beakh) ~ log(wingl) + (1 | taxon), data = d)
d$predict_lmer_population <- predict(m_lmer, re.form = NA)
I want to get help for the predict call, but clearly doing ?predict is incorrect.
I then tried asking for the class of the model:
> class(m_lmer)
[1] "lmerMod"
attr(,"package")
[1] "lme4"
I then tried ?lmerMod which RStudio automagically changed to ?`lmerMod-class`. I get the addition of ` to the name because of the - "special character" but where did class come from?
The help then describes the "merMod" class, not "lmerMod". Why the name change (leading l dropped)?
After some searching in that help I found a link to predict.merMod
Further searching confirmed I could have done: methods('predict') and found the same method, although it is listed predict.merMod* for some reason (added * symbol).
In the end I feel like I would be able to find something similar much more quickly the next time but it still seems very hard to find good help for class methods in R. I'm not sure if this would work the same for S4 or R6 (from the documentation it seems predict.merMod is a S3 method)? It is not clear why the l was dropped from the class name (lmerMod to merMod) or why the -class suffix is needed when asking for help. I feel like I'm missing some extremely basic lesson on R documentation.
Throwing this "help in R" link in for reference that seems to omit class based methods help and also seems like it should just point to some official R documentation website rather than being such a long SO post ...
How to get help in R?
This is a very good question. There are a bunch of things going on here, having to do with (1) the difference between S3 and S4 classes and methods, (2) the underlying class structures in lme4.
I want to get help for the predict call, but clearly doing ?predict is incorrect.
?predict gets you help for the generic function which, as you've noticed, isn't useful. In general it's up to the package developers to decide whether their specialized version of a particular method (e.g., the predict() method for merMod objects) is sufficiently special (e.g., has different or unusual arguments) that it should be documented separately. (Writing R Extensions says "If it is necessary or desired to provide an explicit function declaration (in a \usage section) for [a ...] method ...") (emphasis added).
In general, if they're documented, docs for S3 methods will be available as ?function.class, S4 methods will be documented as ?"function,class-method" (which needs quotation marks since it has a hyphen in it).`
The methods() function gives some clues about where to look: if the bbmle and lme4 packages are loaded, predict.merMod* and predict,mle2-method* both show up in the list (the stars mean the functions are hidden, i.e. you can use them by calling predict(my_fit), but the function definitions are not easily available).
I then tried asking for the class of the model:
class(m_lmer)
[1] "lmerMod"
attr(,"package")
[1] "lme4"
lmer() produces an object of class lmerMod, which is a subclass of the merMod class (the other subclass is glmerMod, for GLMMs).
I then tried ?lmerMod which RStudio automagically changed to ?lmerMod-class. I get the addition of ` to the name because of the - "special character" but where did class come from?
I don't know that much about RStudio: the "-class" part is specific to methods for S4 classes.
The help then describes the "merMod" class, not "lmerMod". Why the name change (leading l dropped)?
See above.
The most opaque part of all of this (IMO) is figuring out S4 class hierarchies - if you say methods(class = "lmerMod") you only get two results (getL and show), it's hard to figure out that you need to say methods(class = "merMod") to get most of the available methods (i.e., only a few methods are specific to lmerMod and glmerMod subclasses - most are more general).
According to this answer you can find the subclasses of merMod as follows:
library(lme4)
cls <- getClass("merMod")
names(cls#subclasses)
## [1] "lmerMod" "glmerMod" "nlmerMod"
How about the other direction?
cls <- getClass("lmerMod")
names(cls#contains)
## [1] "merMod"
(Don't ask me more questions, I really don't understand S4 classes all that well!)
Two R questions:
What is the difference between the type (returned by typeof) and the class (returned by class) of a variable? Is the difference similar to that in, say, C++ language?
What are possible types and classes of variables?
In R every "object" has a mode and a class. The former represents how an object is stored in memory (numeric, character, list and function) while the later represents its abstract type. For example:
d <- data.frame(V1=c(1,2))
class(d)
# [1] "data.frame"
mode(d)
# [1] "list"
typeof(d)
# list
As you can see data frames are stored in memory as list but they are wrapped into data.frame objects. The latter allows for usage of member functions as well as overloading functions such as print with a custom behavior.
typeof(storage.mode) will usually give the same information as mode but not always. Case in point:
typeof(c(1,2))
# [1] "double"
mode(c(1,2))
# [1] "numeric"
The reasoning behind this can be found here:
The R specific function typeof returns the type of an R object
Function mode gives information about the mode of an object in the sense of Becker, Chambers & Wilks (1988), and is more compatible with other implementations of the S language
The link that I posted above also contains a list of all native R basic types (vectors, lists etc.) and all compound objects (factors and data.frames) as well as some examples of how mode, typeof and class are related for each type.
type really refers to the different data structures available in R. This discussion in the R Language Definition manual may get you started on objects and types.
On the other hand, class means something else in R than what you may expect. From
the R Language Definition manual (that came with your version of R):
2.2.4 Classes
R has an elaborate class system1, principally controlled via the class attribute. This attribute is a character vector containing the list
of classes that an object inherits from. This forms the basis of the “generic methods” functionality in R.
This attribute can be accessed and manipulated virtually without restriction by users. There is no checking that an object actually contains the components that class methods expect. Thus, altering the class attribute should be done with caution, and when they are available specific creation and coercion functions should be preferred.
There are a number of tests which, applied to an object of a given class, produce information about that object. Consider objects of class "function". The functions is.primitive() or is.closure(), or (from rlang) is_primitive_eager() or is_primitive_lazy(), provide information about a function object. However, Using methods(class = "function") (with rlang loaded) does not return any of these functions:
[1] as.data.frame as.list coerce coerce<- fortify head latex plot print tail .
Using extends(class1 = "function", maybe = TRUE, fullInfo = TRUE) shows two superclasses, "OptionalFunction" and "PossibleMethod".
Using completeClassDefinition(Class = "function", doExtends=TRUE) provides 23 subclasses. However, it appears to me (though I am not sure of this) that all or almost all of the super- and sub-classes from these two functions are specifically of S4 classes, which I generally do not use. One of these subclasses is "genericFunction", so I tried to apply it to a base R function which I knew to be generic. Although is(object=plot, class2 = "genericFunction") returns TRUE, and plot() antedates S4 classes, there is no "is.generic" test in base R, but there is an "isGeneric" test in the methods package, which suggests to me that plot() has been rewritten as an S4 object.
At any rate, there are a lot of obvious potential properties of functions, like whether they are generic, for which there are no is.<whatever> tests that I can find, and I would like to know if there are other ways I can search for them, e.g., in packages.
A more generic way of asking this same question is whether there is any way of identifying functions that will accept objects of a specified class and not return an error or nonsense. If so I could take a list of the functions in the reccomended packages or in some specified package and test whether each returns a sensable response when handed a function. This is not exactly an answer --- such a method would return TRUE for quote(), for example -- but it would at least cut the problem down to size.
I've been developing a S4 class which is essentially a data.frame with a little bit of extra information. For the purposes of this question, the "extra" features of this class are irrelevant. What matters is that the class contains a data.frame object stored in one of it's slots. (I put the data.frame in a slot, instead of naming it a superclass, because I find that S4 classes which contain data.frames simplify the data.frames to lists for some reason).
Here's a basic example:
setClass('tmp_class', slots = c(df = 'data.frame'))
test_object <- new('tmp_class', df = data.frame(Num = 1:10, Let = letters[1:10]))
Now what I'd like to do is make it so that essentially any function applied to an object of this class is applied to the data.frame in slot #df. It's easy to write methods for specific functions to do this, like:
setMethod('dim', signature = c(x = 'tmp_class'), function(x) dim(x#df))
But I'm limited to only the functions I can think of, and any function invented by a user wouldn't work.
It is a simple matter to write a sort of wrapper/closure to modify a function to work on my class, like this:
tmp_classize <- function(func){
function(tmp, ...){ func(tmp#df, ...) }
}
So, rather than writing methods for, say, colnames() or ncol(), I could just run:
tmp_classize(colnames)(test_object)
or
tmp_classize(ncol)(test_object)
But what I'd like to do is somehow evoke my "tmp_classize" function on any function applied to my class, automatically. I can't figure out how to do it. I was thinking that if could somehow call a "universal method" with an input signature of class "tmp_class", and then use sys.function() to grab the actual function being called, maybe I could make something work, but A) there are recursion problems B) I don't know how to call such a "universal" method. It seems to me that the solution, if it exists at all, might necessitate non-standard evaluation, which I'd rather avoid, but might use if necessary.
Thanks!
P.S. I realize this undertaking may be unwise/poor programming technique, and I may never actually implement it in a package. Still I'm curious to know if it is possible.
P.P.S. I'd also be interested in the same idea applied to S3 classes!
In principal what you could do is make a classUnion for your class and data.frame and write methods for your class that deal with all of the ways to read and write to data.frames such as $, [, dim(), <- and many more. Then when other functions seek to use your new class as data.frame there will be methods for this to work. This is somewhat explained in John Chambers "Software for Data Analysis" starting on page 375. That said this system may be very difficult to implement.
A simpler system may be to just add an extra attribute to your data.frame with the extra info you need. For example:
x<-data.frame(a=1:3,b=4:6)
attr(x,"Info")<-"Extra info I need"
attributes(x)$Info
[1] "Extra info I need"
This is not as elegant as a S4 class but will do everything a data.frame does. I suspect that someone who is familiar with S3 classes could improve on this idea quite a bit.
The simplest solution is to have your class contain data.frame instead of having it as one of the slots. For example here is a data.frame with a timestamp:
setclass(
"timestampedDF",
slots=c(timestamp="POSIXt"),
contains="data.frame"
)
Now all functions which work for a data.frame (such as head) will automatically work for timestampedDF objects. If you need to get at the "data frame part", then that is held in a hidden slot object#.Data.
I am trying to learn how to use R. I can use it to do basic things like reading in data and running a t-test. However, I am struggling to understand the way R is structured (I am have a very mediocre java background).
What I don't understand is the way the functions are classified.
For example in is.na(someVector), is is a class? Or for read.csv, is csv a method of the read class?
I need an easier way to learn the functions than simply memorizing them randomly. I like the idea of things belonging to other things. To me it seems like this gives a language a tree structure which makes learning more efficient.
Thank you
Sorry if this is an obvious question I am genuinely confused and have been reading/watching quite a few tutorials.
Your confusion is entirely understandable, since R mixes two conventions of using (1) . as a general-purpose word separator (as in is.na(), which.min(), update.formula(), data.frame() ...) and (2) . as an indicator of an S3 method, method.class (i.e. foo.bar() would be the "foo" method for objects with class attribute "bar"). This makes functions like summary.data.frame() (i.e., the summary method for objects with class data.frame) especially confusing.
As #thelatemail points out above, there are some other sets of functions that repeat the same prefix for a variety of different options (as in read.table(), read.delim(), read.fwf() ...), but these are entirely conventional, not specified anywhere in the formal language definition.
dotfuns <- apropos("[a-z]\\.[a-z]")
dotstart <- gsub("\\.[a-zA-Z]+","",dotfuns)
head(dotstart)
tt <- table(dotstart)
head(rev(sort(tt)),10)
## as is print Sys file summary dev format all sys
## 118 51 32 18 17 16 16 15 14 13
(Some of these are actually S3 generics, some are not. For example, Sys.*(), dev.*(), and file.*() are not.)
Historically _ was used as a shortcut for the assignment operator <- (before = was available as a synonym), so it wasn't available as a word separator. I don't know offhand why camelCase wasn't adopted instead.
Confusingly, methods("is") returns is.na() among many others, but it is effectively just searching for functions whose names start with "is."; it warns that "function 'is' appears not to be generic"
Rasmus Bååth's presentation on naming conventions is informative and entertaining (if a little bit depressing).
extra credit: are there any dot-separated S3 method names, i.e. cases where a function name of the form x.y.z represents the x.y method for objects with class attribute z ?
answer (from Hadley Wickham in comments): as.data.frame.data.frame() wins. as.data.frame is an S3 generic (unlike, say, as.numeric), and as.data.frame.data.frame is its method for data.frame objects. Its purpose (from ?as.data.frame):
If a data frame is supplied, all classes preceding ‘"data.frame"’
are stripped, and the row names are changed if that argument is
supplied.