Validity checks for ReferenceClass - r

S4 classes allow you to define validity checks using validObject() or setValidity(). However, this does not appear to work for ReferenceClasses.
I have tried adding assert_that() or if (badness) stop(message) clauses to the $initialize() method of a ReferenceClass. However, when I simulate loading the package (using devtools::load_all()), it must try to create some prototype class because the initialize method executes and fails (because no fields have been set).
What am I doing wrong?

Implement a validity method on the reference class
A = setRefClass("A", fields=list(x="numeric", y="numeric"))
setValidity("A", function(object) {
if (length(object$x) != length(object$y)) {
"x, y lengths differ"
} else NULL
})
and invoke the validity method explicitly
> validObject(A())
[1] TRUE
> validObject(A(x=1:5, y=5:1))
[1] TRUE
> validObject(A(x=1:5, y=5:4))
Error in validObject(A(x = 1:5, y = 5:4)) :
invalid class "A" object: x, y lengths differ
Unfortunately, setValidity() would need to be called explicitly as the penultimate line of an initialize method or constructor.

Ok so you can do this in initialize. It should have the form:
initialize = function (...) {
if (nargs()) return ()
# Capture arguments in list
args <- list(...)
# If the field name is passed to the initialize function
# then check whether it is valid and assign it. Otherwise
# assign a zero length value (character if field_name has
# that type)
if (!is.null(args$field_name)) {
assert_that(check_field_name(args$field_name))
field_name <<- field_name
} else {
field_name <<- character()
}
# Make sure you callSuper as this will then assign other
# fields included in ... that weren't already specially
# processed like `field_name`
callSuper(...)
}
This is based on the strategy set out in the lme4 package.

Related

Handling quoted NULL arguments in rlang::ensym

I want to use quoted arguments in my function and I would like to allow the user to specify that they don't want to use the argument by setting it to NULL. However, rlang::ensym throws an error when it receives a NULL argument. Here is my code:
f <- function(var){
rlang::ensym(var)
return(var + 2)
}
# This works
variable = 2
f(variable)
# This throws an error
f(NULL)
The error message is:
Error: Only strings can be converted to symbols
I already tried adding an if-clause with is.null(var) before the expression with rlang::ensym, but of course, this doesn't work as the variable is not yet quoted at this time.
How can I check that the supplied quoted variable is NULL in order to handle it differently?
If you need to allow for NULL, it's more robust to use quosures first. Then you can inspect the quosure to see what's inside. For example
f <- function(var){
var <- rlang::enquo(var)
if (rlang::quo_is_null(var)) {
var <- NULL
} else if (rlang::quo_is_symbol(var)) {
var <- rlang::get_expr(var)
} else {
stop(paste("Expected symbol but found", class(rlang::get_expr(var))))
}
return(var)
}
And that returns
f(variable)
# variable
f(NULL)
# NULL
f(x+1)
# Error in f(x + 1) : Expected symbol but found call
Or you can use whatever logic is appropriate for your actual requirements.

When does initialize check for object validity?

From Chambers' (excellent) Extending R (2016):
A validity method will be called automatically from the default method for initialize(). The recommended form of an initialize method ends with a callNextMethod() call, to ensure that subclass slots can be specified in a call to the generator for the class. If this convention is followed, initialization will end with a call to the default method, and the validity method will be called after all initialization has occurred.
I thought I understood, but the behavior I am getting does not seem to follow this convention.
setClass("A", slots = c(s1 = "numeric"))
setValidity("A", function(object) {
if (length(object#s1) > 5) {
return("s1 longer than 5")
}
TRUE
})
setMethod("initialize", "A", function(.Object, s1, ...) {
if (!missing(s1)) .Object#s1 <- s1 + 4
callNextMethod(.Object, ...)
})
A <- new("A", rep(1.0, 6))
A
# An object of class "A"
# Slot "s1":
# [1] 5 5 5 5 5 5
validObject(A)
# Error in validObject(A) : invalid class “A” object: s1 longer than 5
I expected the validity checking to be done by adding callNextMethod() to the end of the initialize method. Adding an explicit validObject(.Object) before callNextMethod() works, but I am clearly not understanding something here.
Obviously, I can also do all the same checks in the validity method, but ideally all of the validity checking would occur within setValidity so future edits live in one place.
Changing the initialize function slightly gives the desired result -- is there a reason to use one approach over the other? Chambers seems to prefer using .Object#<- whereas I have seen the following method elsewhere (Gentlemman & Hadley).
setMethod("initialize", "A", function(.Object, s1, ...) {
if (!missing(s1)) s1 + 4
else s1 <- numeric()
callNextMethod(.Object, s1 = s1, ...)
})
Perhaps the best guide comes from initialize itself — if you inspect the code for the default method
getMethod("initialize",signature(.Object="ANY"))
then you see that it does indeed contain an explicit call to validObject at the end:
...
validObject(.Object)
}
.Object
}
so if you define your own initialize method, the most similar thing you could do would be to call it at the end of your method, right before you call callNextMethod.
In your case, when you call callNextMethod, that is only checking that the slot you have created is a valid numeric object (which it is), rather than checking the validity of the larger object (which requires the s1 slot to be no longer than 5 elements)

Using a closure to generate an R6 binding

I'm using active bindings in an R6 class to check values before assignment to fields. I thought I could use a closure to generate the bindings as below, but this doesn't work.
The binding isn't evaluated in the way I expect (at all?) because the error shows the closure's name argument. What am I missing?
library(R6)
library(pryr)
# pass a field name to create its binding
generate_binding <- function(name) {
function(value) {
if (!missing(value) && length(value) > 0) {
private$name <- value
}
private$name
}
}
bind_x = generate_binding(x_)
# created as intended:
unenclose(bind_x)
# function (value)
# {
# if (!missing(value) && length(value) > 0) {
# private$x_ <- value
# }
# private$x_
# }
MyClass <- R6::R6Class("MyClass",
private = list(
x_ = NULL
),
active = list(
x = bind_x
),
)
my_class_instance <- MyClass$new()
my_class_instance$x <- "foo"
# Error in private$name <- value :
# cannot add bindings to a locked environment
I think you’re misunderstanding how closures work. unenclose is a red herring here (as it doesn’t actually show you what the closure looks like). The closure contains the statement private$name <- value — it does not contain the statement private$x_ <- value.
The usual solution to this problem would be to rewrite the closure such that the unevaluated name argument is deparsed into its string representation, and then used to subset the private environment (private[[name]] <- value). However, this doesn’t work here since R6 active bindings strip closures of their enclosing environment.
This is where unenclose comes in then:
MyClass <- R6::R6Class("MyClass",
private = list(
x_ = NULL
),
active = list(
x = pryr::unenclose(bind_x)
),
)

Order of methods in R reference class and multiple files

There is one thing I really don't like about R reference class: the order you write the methods matters. Suppose your class goes like this:
myclass = setRefClass("myclass",
fields = list(
x = "numeric",
y = "numeric"
))
myclass$methods(
afunc = function(i) {
message("In afunc, I just call bfunc...")
bfunc(i)
}
)
myclass$methods(
bfunc = function(i) {
message("In bfunc, I just call cfunc...")
cfunc(i)
}
)
myclass$methods(
cfunc = function(i) {
message("In cfunc, I print out the sum of i, x and y...")
message(paste("i + x + y = ", i+x+y))
}
)
myclass$methods(
initialize = function(x, y) {
x <<- x
y <<- y
}
)
And then you start an instance, and call a method:
x = myclass(5, 6)
x$afunc(1)
You will get an error:
Error in x$afunc(1) : could not find function "bfunc"
I am interested in two things:
Is there a way to work around this nuisance?
Does this mean I can never split a really long class file into multiple files? (e.g. one file for each method.)
Calling bfunc(i) isn't going to invoke the method since it doesn't know what object it is operating on!
In your method definitions, .self is the object being methodded on (?). So change your code to:
myclass$methods(
afunc = function(i) {
message("In afunc, I just call bfunc...")
.self$bfunc(i)
}
)
(and similarly for bfunc). Are you coming from C++ or some language where functions within methods are automatically invoked within the object's context?
Some languages make this more explicit, for example in Python a method with one argument like yours actually has two arguments when defined, and would be:
def afunc(self, i):
[code]
but called like:
x.afunc(1)
then within the afunc there is the self variable which referes to x (although calling it self is a universal convention, it could be called anything).
In R, the .self is a little bit of magic sprinkled over reference classes. I don't think you could change it to .this even if you wanted.

S4 method with a scalar(non vector) return value

I want to define an S4 method that return a scalar return value. Here I mean by scalar value , the contrary of a vector.
setGeneric("getScalar", function(value, ...)
standardGeneric("getScalar")
)
setMethod("getScalar",
signature(value = "ANY"),
def = function(value, ...) getScalar(value,...), ## call external function
valueClass = "atomic" ### atomic is false, what should I do ?
)
I can't override the method by its output , I mean I can't define many function having the same signature with a different return valueClass :numeric , integer , character ,..
So How can I do this?
EDIT to give more context :
I think is atomic is confusing here. I mean by scalar a numeric value or a boolean or a character, of length one. To give more context I will have 3 functions in my package:
dbGetQuery :return a list/data.frame : i.e some table rows
dbGetScalar :return a scalar value : i.e count(*),table_name,..
dbGetNoQuery :return nothing : update/insert actions
It is an extension to DBI interface.
EDIT2
We can assume that scalar is a vector of length 1. But I can't express this condition using S4. in c# or c, I would write
double[] // vector
double // scalar
Maybe I should just change the name of my function.
One possibility is to check the value of the return type after method dispatch
setGeneric("getScalar", function(x, ...) {
value <- standardGeneric("getScalar")
if (!is.atomic(value) || length(value) != 1L)
stop("not a scalar atomic vector")
value
})
setMethod(getScalar, "ANY", function(x, ...) x)
Another possibility is to define a 'Scalar' class, with a validity check on the base class that enforces the constraint
.Scalar <- setClass("Scalar", contains="ANY", validity=function(object) {
if (length(object) != 1L)
"non-scalar object"
else TRUE
}, prototype=NA)
or controlling scalar types more strongly with a small hierarchy based on a virtual class
setClass("Scalar", validity=function(object) {
if (length(object) != 1L)
"non-scalar object"
else TRUE
})
.ScalarInteger <- setClass("ScalarInteger",
contains=c("Scalar", "integer"),
prototype=prototype(NA_integer_))
This is the approach taken in Bioconductor's Biobase package, with a mkScalar constructor.

Resources