Partial matching confusion when arguments passed through dots ('...') - r

I've been working on an R package that is just a REST API wrapper for a graph database. I have a function createNode that returns an object with class node and entity:
# Connect to the db.
graph = startGraph("http://localhost:7474/db/data/")
# Create two nodes in the db.
alice = createNode(graph, name = "Alice")
bob = createNode(graph, name = "Bob")
> class(alice)
[1] "node" "entity"
> class(bob)
[1] "node" "entity"
I have another function, createRel, that creates a relationship between two nodes in the database. It is specified as follows:
createRel = function(fromNode, type, toNode, ...) {
UseMethod("createRel")
}
createRel.default = function(fromNode, ...) {
stop("Invalid object. Must supply node object.")
}
createRel.node = function(fromNode, type, toNode, ...) {
params = list(...)
# Check if toNode is a node.
stopifnot("node" %in% class(toNode))
# Making REST API calls through RCurl and stuff.
}
The ... allows the user to add an arbitrary amount of properties to the relationship in the form key = value. For example,
rel = createRel(alice, "KNOWS", bob, since = 2000, through = "Work")
This creates an (Alice)-[KNOWS]->(Bob) relationship in the db, with the properties since and through and their respective values. However, if a user specifies properties with keys from or to in the ... argument, R gets confused about the classes of fromNode and toNode.
Specifying a property with key from creates confusion about the class of fromNode. It is using createRel.default:
> createRel(alice, "KNOWS", bob, from = "Work")
Error in createRel.default(alice, "KNOWS", bob, from = "Work") :
Invalid object. Must supply node object.
3 stop("Invalid object. Must supply node object.")
2 createRel.default(alice, "KNOWS", bob, from = "Work")
1 createRel(alice, "KNOWS", bob, from = "Work")
Similarly, if a user specifies a property with key to, there is confusion about the class of toNode, and stops at the stopifnot():
Error: "node" %in% class(toNode) is not TRUE
4 stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"),
ch), call. = FALSE, domain = NA)
3 stopifnot("node" %in% class(toNode))
2 createRel.node(alice, "KNOWS", bob, to = "Something")
1 createRel(alice, "KNOWS", bob, to = "Something")
I've found that explicitly setting the parameters in createRel works fine:
rel = createRel(fromNode = alice,
type = "KNOWS",
toNode = bob,
from = "Work",
to = "Something")
# OK
But I am wondering how I need to edit my createRel function so that the following syntax will work without error:
rel = createRel(alice, "KNOWS", bob, from = "Work", to = "Something")
# Errors galore.
The GitHub user who opened the issue mentioned it is most likely a conflict with setAs on dispatch, which has arguments called from and to. One solution is to get rid of ... and change createRel to the following:
createRel = function(fromNode, type, toNode, params = list()) {
UseMethod("createRel")
}
createRel.default = function(fromNode, ...) {
stop("Invalid object. Must supply node object.")
}
createRel.node = function(fromNode, type, toNode, params = list()) {
# Check if toNode is a node.
stopifnot("node" %in% class(toNode))
# Making REST API calls through RCurl and stuff.
}
But, I wanted to see if I had any other options before making this change.

Not really an answer, but...
The problem is that the user-provided argument 'from' is being (partially) matched to the formal argument 'fromNode'.
f = function(fromNode, ...) fromNode
f(1, from=2)
## [1] 2
The rules are outlined in section 4.3.2 of RShowDoc('R-lang'), where named arguments are exact matched, then partial matched, and then unnamed arguments are assigned by position.
It's hard to know how to enforce exact matching, other than using single-letter argument names! Actually, for a generic this might not be as trite as it sounds -- x is a pretty generic variable name. If 'from' and 'to' were common arguments to ... you could change the argument list to "fromNode, , ..., from, to", check for missing(from) in the body of the function, and act accordingly; I don't think this would be pleasant, and the user would invariable provide an argument 'fro'.
While enforcing exact matching (and errors, via warn=2) by setting global options() might be helpful in debugging (though by then you'd probably know what you were looking for!) it doesn't help the package author who is trying to write code to work for users in general.
It might be reasonable to ask on the R-devel mailing list whether it might be time for this behavior to be changed (on the 'several releases' time scale); partial matching probably dates as a 'convenience' from the days before tab completion.

Related

When does initialize check for object validity?

From Chambers' (excellent) Extending R (2016):
A validity method will be called automatically from the default method for initialize(). The recommended form of an initialize method ends with a callNextMethod() call, to ensure that subclass slots can be specified in a call to the generator for the class. If this convention is followed, initialization will end with a call to the default method, and the validity method will be called after all initialization has occurred.
I thought I understood, but the behavior I am getting does not seem to follow this convention.
setClass("A", slots = c(s1 = "numeric"))
setValidity("A", function(object) {
if (length(object#s1) > 5) {
return("s1 longer than 5")
}
TRUE
})
setMethod("initialize", "A", function(.Object, s1, ...) {
if (!missing(s1)) .Object#s1 <- s1 + 4
callNextMethod(.Object, ...)
})
A <- new("A", rep(1.0, 6))
A
# An object of class "A"
# Slot "s1":
# [1] 5 5 5 5 5 5
validObject(A)
# Error in validObject(A) : invalid class “A” object: s1 longer than 5
I expected the validity checking to be done by adding callNextMethod() to the end of the initialize method. Adding an explicit validObject(.Object) before callNextMethod() works, but I am clearly not understanding something here.
Obviously, I can also do all the same checks in the validity method, but ideally all of the validity checking would occur within setValidity so future edits live in one place.
Changing the initialize function slightly gives the desired result -- is there a reason to use one approach over the other? Chambers seems to prefer using .Object#<- whereas I have seen the following method elsewhere (Gentlemman & Hadley).
setMethod("initialize", "A", function(.Object, s1, ...) {
if (!missing(s1)) s1 + 4
else s1 <- numeric()
callNextMethod(.Object, s1 = s1, ...)
})
Perhaps the best guide comes from initialize itself — if you inspect the code for the default method
getMethod("initialize",signature(.Object="ANY"))
then you see that it does indeed contain an explicit call to validObject at the end:
...
validObject(.Object)
}
.Object
}
so if you define your own initialize method, the most similar thing you could do would be to call it at the end of your method, right before you call callNextMethod.
In your case, when you call callNextMethod, that is only checking that the slot you have created is a valid numeric object (which it is), rather than checking the validity of the larger object (which requires the s1 slot to be no longer than 5 elements)

How to use S4 object programming in R

What's wrong with my R script? I'm trying to use a vector of user-defined objects (here a vector of "Page" objects) within another user-defined object (here a "Book" object)
setClass("Page",
slots = c(PageNo = "numeric", #scalar
Contents = "character") #vector of strings
)
setClass("Book",
slots = c(Pages = "vector", # Something wrong here? vector of pages ? "Page" or vector" or "list"
Title = "character") #vector of strings
)
setGeneric(name="AddPage", def=function(aBook, pageNo){standardGeneric("AddPage")})
setMethod(f="AddPage", signature="Book",
definition=function(aBook, pageNo)
{
page1 = new("Page")
page1#PageNo = pageNo
aBook#Pages = c(aBook#Pages, page1) # Something wrong here?
}
)
book1 = new("Book")
book1#Title = "Sample Book"
book1
book1#Pages
AddPage(book1, 1)
AddPage(book1, 2)
book1#Pages
Remember that R does not use reference semantics, so AddPage(book1, 1) creates a copy of book1, and updates that. In the method you don't return the updated object, and book1 remains unchanged.
Update the method so that it returns the modified object
setMethod(f="AddPage", signature="Book",
definition=function(aBook, pageNo)
{
page1 = new("Page")
page1#PageNo = pageNo
aBook#Pages = c(aBook#Pages, page1) # Something wrong here?
aBook
}
)
and assign the return value to the old variable
book1 = AddPage(book1, 1)
But this is a very inefficient approach -- the line aBook#Pages = c(aBook#Pages, page1) makes a copy of all existing pages (on the right-hand side, to create a longer vector; this will scale with the square of the number of Pages added to the book) and then copies the entire Book (for the assignment). In addition, creating individual objects is expensive and does not exploit R's 'vectorization'. A first step is to think of the object 'Page' as instead 'Pages', where the object models the columns rather than rows of a data frame. 'Book' then doesn't have vector of Page objects, but a single Pages object. This also implies a different approach to creating your 'book'.

Validity checks for ReferenceClass

S4 classes allow you to define validity checks using validObject() or setValidity(). However, this does not appear to work for ReferenceClasses.
I have tried adding assert_that() or if (badness) stop(message) clauses to the $initialize() method of a ReferenceClass. However, when I simulate loading the package (using devtools::load_all()), it must try to create some prototype class because the initialize method executes and fails (because no fields have been set).
What am I doing wrong?
Implement a validity method on the reference class
A = setRefClass("A", fields=list(x="numeric", y="numeric"))
setValidity("A", function(object) {
if (length(object$x) != length(object$y)) {
"x, y lengths differ"
} else NULL
})
and invoke the validity method explicitly
> validObject(A())
[1] TRUE
> validObject(A(x=1:5, y=5:1))
[1] TRUE
> validObject(A(x=1:5, y=5:4))
Error in validObject(A(x = 1:5, y = 5:4)) :
invalid class "A" object: x, y lengths differ
Unfortunately, setValidity() would need to be called explicitly as the penultimate line of an initialize method or constructor.
Ok so you can do this in initialize. It should have the form:
initialize = function (...) {
if (nargs()) return ()
# Capture arguments in list
args <- list(...)
# If the field name is passed to the initialize function
# then check whether it is valid and assign it. Otherwise
# assign a zero length value (character if field_name has
# that type)
if (!is.null(args$field_name)) {
assert_that(check_field_name(args$field_name))
field_name <<- field_name
} else {
field_name <<- character()
}
# Make sure you callSuper as this will then assign other
# fields included in ... that weren't already specially
# processed like `field_name`
callSuper(...)
}
This is based on the strategy set out in the lme4 package.

Modify contents of object with "call by reference"

I am trying to modify the contents of an object defined by a self-written class with a function that takes two objects of this class and adds the contents.
setClass("test",representation(val="numeric"),prototype(val=1))
I know that R not really works with "call by reference" but can mimic that behaviour with a method like this one:
setGeneric("value<-", function(test,value) standardGeneric("value<-"))
setReplaceMethod("value",signature = c("test","numeric"),
definition=function(test,value) {
test#val <- value
test
})
foo = new("test") #foo#val is 1 per prototype
value(foo)<-2 #foo#val is now set to 2
Until here, anything I did and got as result is consitent with my research here on stackexchange,
Call by reference in R (using function to modify an object)
and with this code from a lecture (commented and written in German)
What I wish to achieve now is a similar result with the following method:
setGeneric("add<-", function(testA,testB) standardGeneric("add<-"))
setReplaceMethod("add",signature = c("test","test"),
definition=function(testA,testB) {
testA#val <- testA#val + testB#val
testA
})
bar = new("test")
add(foo)<-bar #should add the value slot of both objects and save the result to foo
Instead I get the following error:
Error in `add<-`(`*tmp*`, value = <S4 object of class "test">) :
unused argument (value = <S4 object of class "test">)
The function call works with:
"add<-"(foo,bar)
But this does not save the value into foo. Using
foo <- "add<-"(foo,bar)
#or using
setMethod("add",signature = c("test","test"), definition= #as above... )
foo <- add(foo,bar)
works but this is inconsistent with the modifying method value(foo)<-2
I have the feeling that I am missing something simple here.
Any help is very much appreciated!
I do not remember why, but for <- functions, the last argument must be named 'value'.
So in your case:
setGeneric("add<-", function(testA,value) standardGeneric("add<-"))
setReplaceMethod("add",signature = c("test","test"),
definition=function(testA,value) {
testA#val <- testA#val + value#val
testA
})
bar = new("test")
add(foo)<-bar
You may also use a Reference class ig you want to avoid the traditional arguments as values thing.

Avoiding consideration of enclosing frames when retrieving field value of a S4 Reference Class

I'm a huge fan of S4 Reference Classes as they allow for a hybrid programming style (functional/pass-by-value vs. oop/pass-by-reference; example) and thus increase flexibility dramatically.
However, I think I just came across an undesired behavior with respect to the way R scans through environments/frames when you ask it to retrieve a certain field value via method $field() (see help page). The problem is that R also seems to look in enclosing environments/frames if the desired field is not found in the actual local/target environment (which would be the environment making up the S4 Reference Class), i.e. it's just like running get(<objname>, inherits=TRUE) (see help page).
Actual question
In order to have R just look in the local/target environment, I was thinking something like $field(name="<fieldname>", inherits=FALSE) but $field() doesn't have a ... argument that would allow me to pass inherits=FALSE along to get() (which I'm guessing is called somewhere along the way). Is there a workaround to this?
Code Example
For those interested in more details: here's a little code example illustrating the behavior
setRefClass("A", fields=list(a="character"))
x <- getRefClass("A")$new(a="a")
There is a field a in class A, so it's found in the target environment and the value is returned:
> x$field("a")
[1] "a"
Things look differently if we try to access a field that is not a field of the reference class but happens to have a name identical to that of some other object in the workspace/searchpath (in this case "lm"):
require("MASS")
> x$field("lm")
function (formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...)
{
ret.x <- x
ret.y <- y
[omitted]
if (!qr)
z$qr <- NULL
z
}
<bytecode: 0x02e6b654>
<environment: namespace:stats>
Not really what I would expect at this point. IMHO an error or at least a warning would be much better. Or opening method $field() for arguments that can be passed along to other functions via .... I'm guessing somewhere along the way get() is called when calling $field(), so something like this could prevent the above behavior from occurring:
x$field("digest", inherits=FALSE)
Workaround: own proposal
This should do the trick, but maybe there's something more elegant that doesn't involve the specification of a new method on top of $field():
setRefClass("A", fields=list(a="character"),
methods=list(
myField=function(name, ...) {
# VALIDATE NAME //
if (!name %in% names(getRefClass(class(.self))$fields())) {
stop(paste0("Invalid field name: '", name, "'"))
}
# //
.self$field(name=name)
}
)
)
x <- getRefClass("A")$new(a="a")
> x$myField("a")
[1] "a"
> x$myField("lm")
Error in x$myField("lm") : Invalid field name: 'lm'
The default field() method can be replaced with your own. So adding an inherits argument to avoid the enclosing frames is simply a matter of grabbing the existing x$field definition and adding it...
setRefClass( Class="B",
fields= list( a="character" ),
methods= list(
field = function(name, value, inherits=TRUE ) {
if( missing(value) ) {
get( name, envir=.self, inherits=inherits )
} else {
if( is.na( match( name, names( .refClassDef#fieldClasses ) ) ) ) {
stop(gettextf("%s is not a field in this class", sQuote(name)), domain = NA)
}
assign(name, value, envir = .self)
}
}
),
)
Or you could have a nice error message with a little rearranging
setRefClass( Class="C",
fields= list( a="character" ),
methods= list(
field = function(name, value, inherits=TRUE ) {
if( is.na( match( name, names( .refClassDef#fieldClasses ) ) ) &&
( !missing(value) || inherits==FALSE) ) {
stop(gettextf("%s is not a field in this class", sQuote(name)), domain = NA)
}
if( missing(value) ) {
get( name, envir=.self, inherits=inherits )
} else {
assign(name, value, envir = .self)
}
}
),
)
Since you can define any of your own methods to replace the defaults pretty much any logic you want can be implemented for your refclasses. Perhaps an error if the variable is acquired using inheritance but the mode matches to c("expression", "name", "symbol", "function") and warning if it doesn't directly match the local refClass field names?

Resources