dollar suggestions method in S3 - r

I'm making right now a R package and I have to chose between returning lists and an object with S3 attributes.
The good thing, as for the lists, is that it's very easy to use for beginners, due to the dollar sign making all the elements easy to find.
The bad thing, is that it removes direct inheritance (I'd like to return a ts object with some additional informations).
The alternative would be to set the dollar for my S3 class, like this example :
object <- 1
class(object) <- "MyClass"
attr(object,"MyAttribute") <- "This is a secret"
`$.MyClass` <- function(x,name) attr(object,name)
object$MyAttribute
However, I have 2 questions about this :
Where to set the dollar partial matching function for the user to see "MyAttribute" as a valid choice in rstudio ?
Besides, is that a fine practice to do so or should I keep on using simple lists
Thanks

I don’t think RStudio currently allows this kind of customisation. In other R terminals you could play with rcompgen to generate completions but IIRC RStudio does its own thing.
That said, your question seems to be based on a false dichotomy:
Besides, is that a fine practice to do so or should I keep on using simple lists
You don’t need to choose either–or. In fact, it’s common to have lists with S3 classes, and it is not common to use attributes to store S3 information that are then accessed via $. Just make your class a list:
object = structure(
list(value = 1, MyAttribute = "This is a secret"),
class = "MyClass"
)
object$MyAttribute

Related

R generic dispatching to attached environment

I have a bunch of functions and I'm trying to keep my workspace clean by defining them in an environment and attaching the environment. Some of the functions are S3 generics, and they don't seem to play well with this approach.
A minimum example of what I'm experiencing requires 4 files:
testfun.R
ttt.xxx <- function(object) print("x")
ttt <- function(object) UseMethod("ttt")
ttt2 <- function() {
yyy <- structure(1, class="xxx")
ttt(yyy)
}
In testfun.R I define an S3 generic ttt and a method ttt.xxx, I also define a function ttt2 calling the generic.
testenv.R
test_env <- new.env(parent=globalenv())
source("testfun.R", local=test_env)
attach(test_env)
In testenv.R I source testfun.R to an environment, which I attach.
test1.R
source("testfun.R")
ttt2()
xxx <- structure(1, class="xxx")
ttt(xxx)
test1.R sources testfun.R to the global environment. Both ttt2 and a direct function call work.
test2.R
source("testenv.R")
ttt2()
xxx <- structure(1, class="xxx")
ttt(xxx)
test2.R uses the "attach" approach. ttt2 still works (and prints "x" to the console), but the direct function call fails:
Error in UseMethod("ttt") :
no applicable method for 'ttt' applied to an object of class "xxx"
however, calling ttt and ttt.xxx without arguments show that they are known, ls(pos=2) shows they are on the search path, and sloop::s3_dispatch(ttt(xxx)) tells me it should work.
This questions is related to Confusion about UseMethod search mechanism and the link therein https://blog.thatbuthow.com/how-r-searches-and-finds-stuff/, but I cannot get my head around what is going on: why is it not working and how can I get this to work.
I've tried both R Studio and R in the shell.
UPDATE:
Based on the answers below I changed my testenv.R to:
test_env <- new.env(parent=globalenv())
source("testfun.R", local=test_env)
attach(test_env)
if (is.null(.__S3MethodsTable__.))
.__S3MethodsTable__. <- new.env(parent = baseenv())
for (func in grep(".", ls(envir = test_env), fixed = TRUE, value = TRUE))
.__S3MethodsTable__.[[func]] <- test_env[[func]]
rm(test_env, func)
... and this works (I am only using "." as an S3 dispatching separator).
It’s a little-known fact that you must use .S3method() to define methods for S3 generics inside custom environments (outside of packages).1 The reason almost nobody knows this is because it is not necessary in the global environment; but it is necessary everywhere else since R version 3.6.
There’s virtually no documentation of this change, just a technical blog post by Kurt Hornik about some of the background. Note that the blog post says the change was made in R 3.5.0; however, the actual effect you are observing — that S3 methods are no longer searched in attached environments — only started happening with R 3.6.0; before that, it was somehow not active yet.
… except just using .S3method will not fix your code, since your calling environment is the global environment. I do not understand the precise reason why this doesn’t work, and I suspect it’s due to a subtle bug in R’s S3 method lookup. In fact, using getS3method('ttt', 'xxx') does work, even though that should have the same behaviour as actual S3 method lookup.
I have found that the only way to make this work is to add the following to testenv.R:
if (is.null(.__S3MethodsTable__.)) {
.__S3MethodsTable__. <- new.env(parent = baseenv())
}
.__S3MethodsTable__.$ttt.xxx <- ttt.xxx
… in other words: supply .GlobalEnv manually with an S3 methods lookup table. Unfortunately this relies on an undocumented S3 implementation detail that might theoretically change in the future.
Alternatively, it “just works” if you use ‘box’ modules instead of source. That is, you can replace the entirety of your testenv.R by the following:
box::use(./testfun[...])
This code treats testfun.R as a local module and loads it, attaching all exported names (via the attach declaration [...]).
1 (and inside packages you need to use the equivalent S3method namespace declaration, though if you’re using ‘roxygen2’ then that’s taken care of for you)
First of all, my advice would be: don't try to reinvent R packages. They solve all the problems you say you are trying to solve, and others as well.
Secondly, I'll try to explain what went wrong in test2.R. It calls ttt on an xxx object, and ttt.xxx is on the search list, but is not found.
The problem is how the search for ttt.xxx happens. The search doesn't look for ttt.xxx in the search list, it looks for it in the environment from which ttt was called, then in an object called .__S3MethodsTable__.. I think there are two reasons for this:
First, it's a lot faster. It only needs to look in one or two places, and the table can be updated whenever a package is attached or detached, a relatively rare operation.
Second, it's more reliable. Each package has its own methods table, because two packages can use the same name for generics that have nothing to do with each other, or can use the same class names that are unrelated. So package code needs to be able to count on finding its own definitions first.
Since your call to ttt() happens at the top level, that's where R looks first for ttt.xxx(), but it's not there. Then it looks in the global .__S3MethodsTable__. (which is actually in the base environment), and it's not there either. So it fails.
There is a workaround that will make your code work. If you run
.__S3MethodsTable__. <- list2env(list(ttt.xxx = ttt.xxx))
as the last line of testenv.R, then you'll create a methods table in the global environment. (Normally there isn't one there, because that's user space, and R doesn't like putting things there unless the user asks for it.)
R will find that methods table, and will find the ttt.xxx method that it defines. I wouldn't be surprised if this breaks some other aspect of S3 dispatch, so I don't recommend doing it, but give it a try if you insist on reinventing the package system.

How to overload S4 slot selector `#` to be a generic function

I am trying to turn the # operator in R into a generic function for the S3 system.
Based on the chapter in Writing R extensions: adding new generic I tried implementing the generic for # like so:
`#` <- function(object, name) UseMethod("#")
`#.default` <- function(object, name) base::`#`(object, name)
However this doesn't seem to work as it breaks the # for the S4 methods. I am using Matrix package as an example of S4 instance:
Matrix::Matrix(1:4, nrow=2, ncol=2)#Dim
Error in #.default(Matrix::Matrix(1:4, nrow = 2, ncol = 2), Dim) :
no slot of name "name" for this object of class "dgeMatrix"
How to implement a generic # so it correctly dispatches in the case of S4 classes?
EDIT
Also interested in opinions about why it might not be a good idea?
R's documentation is somewhat confusing as to whether # is already a generic or not: the help page for # says it is, but it isn't listed on the internalGenerics page.
The # operator has specific behaviour as well as (perhaps) being a generic. From the help page for #: "It is checked that object is an S4 object (see isS4), and it is an error to attempt to use # on any other object." That would appear to rule out writing methods for S3 classes, though the documentation is unclear if this check happens before method dispatch (if there is any) or after (whence it could be skipped if you supplied a specific method for some S3 class).
You can implement what you want by completely redefining what # is, along the line of the suggestion in comments:
`#.default` <- function(e1,e2) slot(e1,substitute(e2))
but there are two reasons not to do this:
1) As soon as someone loads your package, it supersedes the normal # function, so if people call it with other S4 objects, they are getting your version rather than the R base version.
2) This version is considerably less efficient than the internal one, and because of (1) you have just forced your users to use it (unless they use the cumbersome construction base::"#"(e1,e2)). Efficiency may not matter to your use case, but it may matter to your users' other code that uses S4.
Practically, a reasonable compromise might be to define your own binary operator %#%, and have the default method call #. That is,
`%#%` <- function(e1,e2) slot(e1,substitute(e2))
setGeneric("%#%")
This is called in practice as follows:
> setClass("testClass",slots=c(a="character")) -> testClass
> x <- testClass(a="cheese")
> x %#% a
[1] "cheese"

How to suggest hints to Rstudio for auto completion for my code?

Normally if a is a data.frame then one can autocomplete the column names by doing a$ tab. The chunked package has a nice feature where if you run
a <- chunked::read_csv_chunkwise("some.csv")
then when you type a[ then tab then it will show a list of variable via autocompletion even though a is not a data.frame.
I was trying to replicate this for my own code but I couldn't find any relevant resources after googling for "rstudio autocompletion" and various other searches.
I note that class(a) returns
[1] "chunkwise" "tbl"
I had a look at all the functions that belong to the S3 class "chunked" and I note that it has a method called tbl_vars, so I thought maybe that's what Rstudio uses to do the autocomplete.
So to test it out I tried
write.csv(data.frame(a = 1, b = 2), file = "test.csv",row.names = F)
tbl_vars.test_auto_complete <- function(fs) {
names(fread(fs$path))
}
test_auto_complete <- list(path = "test.csv")
class(test_auto_complete) <- "test_auto_complete"
tbl_vars(test_auto_complete)
[1] "a" "b"
But then when I type test_auto_complete tab the auto-complete doesn't show the variables that I want.
How can we give hints to Rstudio to make auto-completion work?
For objects that inherit from the tbl class, RStudio does indeed call tbl_vars() to populate completions. (This is an RStudio-specific autocompletion system feature.)
In your example, the object you're creating does not inherit from tbl, so this autocompletion pathway doesn't kick in.
However, this form of 'ad-hoc' S3 dispatch (where you define S3 methods directly as code like this) is not detected by RStudio, so you won't be able to verify this with test code like this. You'll have to explicitly define and register the S3 method in an R package.
Alternatively, you can try explicitly registering the S3 method with something like:
registerS3method("tbl_vars", "test_auto_complete", tbl_vars.test_auto_complete)
for inline testing.

Use one Class of an Object with Multiple

This is a general question motivated by a specific event.
When an object holds multiple classes, each with different generic actions, how can I specify to use "this" class, rather than "that" class?
The example code here is bundled with geepack.
library(stargazer)
library(geepack)
data(dietox)
dietox$Cu <- as.factor(dietox$Cu)
mf <- formula(Weight~Cu*(Time+I(Time^2)+I(Time^3)))
gee0 <- glm(mf, data = dietox, family = poisson("identity")) # a wrong model
gee1 <- geeglm(mf, data=dietox, id=Pig, family=poisson("identity"),corstr="ar1")
class(gee0)
class(gee1)
summary(gee0)
summary(gee1)
stargazer(gee0, type = "text")
stargazer(gee1, type = "text")
I'd like to work with the "glm" class object, not the "geeglm" class object.
#Richard Scriven: I'd just like to pull the results out into a stargazer(...) report. Thanks for the clarifying question.
The class system that uses the class(foo) attribute is not strongly typed. The class vector is used by R to determine which methods to use when that object is passed to a generic like print. For example, if you were to call print(gee1), R would first search for a function called print.geeglm which, in this case, it would find in the package geepack, and R calls that function with the arguments supplied to print().
If R did not find a function called print.geeglm, it would then search for print.gee, then print.glm, then print.default.
So in short, gee1 does not contain 3 objects with different classes, it is a single object with a class vector that informs R where to look for generic methods.
To make things slightly more confusing R has multiple type systems and the class vetcor is used by the S3 type system. A google search for "R s3 class" will get you lots more info on R's class system.

In R, how can one make a method of an S4 object that directly adjusts the values inside the slots of that object?

Is there a way to allow a method of an S4 object to directly adjust the values inside the slots of that object without copying the entire object into memory and having to re-write it to the parent environment at the end of the method? Right now I have an object that has slots where it keeps track of its own state. I call a method that advances it to the next state, but right now it seems like I have to assign() each value (or a copy of the object invoking the method) back to the parent environment. As a result, the object oriented code seems to be running a lot slower than code that simply adjusts the various state variables in a loop.
R has three object oriented (OO) systems: S3, S4 and Reference Classes (where the latter were for a while referred to as [[R5]], yet their official name is Reference Classes).
Reference Classes (or refclasses) are new in R 2.12. They fill a long standing need for mutable objects that had previously been filled by non-core packages like R.oo, proto and mutatr. While the core functionality is solid, reference classes are still under active development and some details will change. The most up-to-date documentation for Reference Classes can always be found in ?ReferenceClasses.
There are two main differences between reference classes and S3 and S4:
Refclass objects use message-passing OO
Refclass objects are mutable: the usual R copy on modify semantics do
not apply.
These properties makes this object system behave much more like Java and C#.
read more here:
http://adv-r.had.co.nz/R5.html
http://www.inside-r.org/r-doc/methods/ReferenceClasses
I asked this question on the R-list myself, and found a work-around to simulate a pass by reference, something in the style of :
eval(
eval(
substitute(
expression(object#slot <<- value)
,env=parent.frame(1) )
)
)
Far from the cleanest code around I'd say...
A suggestion coming from the R-help list, uses an environment to deal with these cases.
EDIT : tweaked code inserted.
setClass("MyClass", representation(.cache='environment',masterlist="list"))
setMethod("initialize", "MyClass",
function(.Object, .cache=new.env()) {
.Object#masterlist <- list()
callNextMethod(.Object, .cache=.cache)
})
sv <- function(object,name,value) {} #store value
setMethod("sv",signature=c("MyClass","character","vector"),
function(object, name, value) {
object#.cache$masterlist[[name]] <- value
})
rv <- function(object,name) {} #retrieve value
setMethod("rv",signature=c("MyClass","character"),
function(object, name) {
return(object#.cache$masterlist[[name]])
})
As far as I know (and if I get you correctly), you have to recopy the whole object. You can't easily pass values by reference, it is always passed "by value". So once you have modified (a copy of) your object, you need to recopy it back to your object.
John Chamber is pretty explicit about it in his book Software for Data Analysis. It's a way to avoid surprises or side effects.
I think there are some workaround using environments, but I can't help with this.

Resources