R: Overloading primitive, non-generic functions - r

I'd like to find the minimal hack to be able to say module::obj when module is not a package but a list or environment.
After some digging, I see the following works for the new use case, but breaks the native one:
module = structure(list(f = \(x) x + 1), class = "module_cls")
`::` = function(mod, key) UseMethod("::")
`::.default` = function(mod, key) .Primitive("::")
`::.module_cls` = function(mod, key) mod[[as.character(substitute(key))]]
module::f(1) # works!
base::sum(1, 1) # Error in base::sum : object 'base' not found
The problem seems to be either in the definition of the default method
or how anything that is not module_cls is depatched to default.

Related

How to use `foreach` and `%dopar%` with an `R6` class in R?

I ran into an issue trying to use %dopar% and foreach() together with an R6 class. Searching around, I could only find two resources related to this, an unanswered SO question and an open GitHub issue on the R6 repository.
In one comment (i.e., GitHub issue) an workaround is suggested by reassigning the parent_env of the class as SomeClass$parent_env <- environment(). I would like to understand what exactly does environment() refer to when this expression (i.e., SomeClass$parent_env <- environment()) is called within the %dopar% of foreach?
Here is a minimal reproducible example:
Work <- R6::R6Class("Work",
public = list(
values = NULL,
initialize = function() {
self$values <- "some values"
}
)
)
Now, the following Task class uses the Work class in the constructor.
Task <- R6::R6Class("Task",
private = list(
..work = NULL
),
public = list(
initialize = function(time) {
private$..work <- Work$new()
Sys.sleep(time)
}
),
active = list(
work = function() {
return(private$..work)
}
)
)
In the Factory class, the Task class is created and the foreach is implemented in ..m.thread().
Factory<- R6::R6Class("Factory",
private = list(
..warehouse = list(),
..amount = NULL,
..parallel = NULL,
..m.thread = function(object, ...) {
cluster <- parallel::makeCluster(parallel::detectCores() - 1)
doParallel::registerDoParallel(cluster)
private$..warehouse <- foreach::foreach(1:private$..amount, .export = c("Work")) %dopar% {
# What exactly does `environment()` encapsulate in this context?
object$parent_env <- environment()
object$new(...)
}
parallel::stopCluster(cluster)
},
..s.thread = function(object, ...) {
for (i in 1:private$..amount) {
private$..warehouse[[i]] <- object$new(...)
}
},
..run = function(object, ...) {
if(private$..parallel) {
private$..m.thread(object, ...)
} else {
private$..s.thread(object, ...)
}
}
),
public = list(
initialize = function(object, ..., amount = 10, parallel = FALSE) {
private$..amount = amount
private$..parallel = parallel
private$..run(object, ...)
}
),
active = list(
warehouse = function() {
return(private$..warehouse)
}
)
)
Then, it is called as:
library(foreach)
x = Factory$new(Task, time = 2, amount = 10, parallel = TRUE)
Without the following line object$parent_env <- environment(), it throws an error (i.e., as mentioned in the other two links): Error in { : task 1 failed - "object 'Work' not found".
I would like to know, (1) what are some potential pitfalls when assigning the parent_env inside foreach and (2) why does it work in the first place?
Update 1:
I returned environment() from within foreach(), such that private$..warehouse captures those environments
using rlang::env_print() in a debug session (i.e., the browser() statement was placed right after foreach has ended execution) here is what they consist of:
Browse[1]> env_print(private$..warehouse[[1]])
# <environment: 000000001A8332F0>
# parent: <environment: global>
# bindings:
# * Work: <S3: R6ClassGenerator>
# * ...: <...>
Browse[1]> env_print(environment())
# <environment: 000000001AC0F890>
# parent: <environment: 000000001AC20AF0>
# bindings:
# * private: <env>
# * cluster: <S3: SOCKcluster>
# * ...: <...>
Browse[1]> env_print(parent.env(environment()))
# <environment: 000000001AC20AF0>
# parent: <environment: global>
# bindings:
# * private: <env>
# * self: <S3: Factory>
Browse[1]> env_print(parent.env(parent.env(environment())))
# <environment: global>
# parent: <environment: package:rlang>
# bindings:
# * Work: <S3: R6ClassGenerator>
# * .Random.seed: <int>
# * Factory: <S3: R6ClassGenerator>
# * Task: <S3: R6ClassGenerator>
Disclaimer: a lot of what I say here are educated guesses and inferences based on what I know,
I can't guarantee everything is 100% correct.
I think there can be many pitfalls,
and which one applies really depends on what you do.
I think your second question is more important,
because if you understand that,
you'll be able to evaluate some of the pitfalls by yourself.
The topic is rather complex,
but you can probably start by reading about R's lexical scoping.
In essence, R has a sort of hierarchy of environments,
and when R code is executed,
variables whose values are not found in the current environment
(which is what environment() returns)
are sought in the parent environments
(not to be confused with the caller environments).
Based on the GitHub issue you linked,
R6 generators save a "reference" to their parent environments,
and they expect that everything their classes may need can be found in said parent or somewhere along the environment hierarchy,
starting at that parent and going "up".
The reason the workaround you're using works is because you're replacing the generator's parent environment with the one in the current foreach call inside the parallel worker
(which may be a different R process, not necessarily a different thread),
and, given your .export specification probably exports necessary values,
R's lexical scoping can then search for missing values starting from the foreach call in the separate thread/process.
For the specific example you linked,
I found that a simpler way to make it work
(at least on my Linux machine)
is to do the following:
library(doParallel)
cluster <- parallel::makeCluster(parallel::detectCores() - 1)
doParallel::registerDoParallel(cluster)
parallel::clusterExport(cluster, setdiff(ls(), "cluster"))
x = Factory$new(Task, time = 1, amount = 3)
but leaving the ..m.thread function as:
..m.thread = function(object, amount, ...) {
private$..warehouse <- foreach::foreach(1:amount) %dopar% {
object$new(...)
}
}
(and manually call stopCluster when done).
The clusterExport call should have semantics similar to*:
take everything from the main R process' global environment except cluster,
and make it available in each parallel worker's global environment.
That way, any code inside the foreach call can use the generators when lexical scoping reaches their respective global environments.
foreach can be clever and exports some variables automatically
(as shown in the GitHub issue),
but it has limitations,
and the hierarchy used during lexical scoping can get very messy.
*I say "similar to" because I don't know what exactly R does to distinguish (global) environments if forks are used,
but since that export is needed,
I assume they are indeed independent of each other.
PS: I'd use a call to on.exit(parallel::stopCluster(cluster)) if you create workers inside a function call,
that way you avoid leaving processes around until they are somehow stopped if an error occurs.

Using oop in R, unable to understand the concept

Its my first try to using OOP in R and it's difficult for me to understand the main concept. For example, what are these:
slot, setGeneric, representation
I was unable to find anything helpful by searching the internet. How do these work in R? For example, I have the following MATLAB class:
classdef windTurbine < handle
properties
NumOfBlades
blade#blade
sweptArea
end
methods
function obj = windTurbine(NumOfBlades,blade)
obj.NumOfBlades = NumOfBlades;
obj.blade = blade;
obj.sweptArea = CalcSweptArea(obj);
end
sweptArea = CalcSweptArea(obj)
end
How do I write this in R? How do I add calculations to the constructor? Make functions private? And mainly use the consept of OOP in R. An example would be helpfull, or a nice tutorial explanation
In addition to http://adv-r.had.co.nz/OO-essentials.html, which presents R objects as in base and recommended packages, you have also R6, which is much closer to what you are doing in Matlab. Your example translates like this:
# Need to install R6 first:
# install.packages("R6")
library(R6)
windTurbine <- R6Class("windTurbine",
public = list(
# Properties (fields)
NumOfBlades = integer(0),
blade = NULL, # Which kind of object is it?
sweptArea = numeric(0),
# Methods
initialize = function(NumOfBlades, blade) {
self$NumOfBlades <- as.integer(NumOfBlades)
self$blade <- blade
self$sweptArea <- self$CalcSweptArea()
},
CalcSweptArea = function() {
# < your code here>
# (Return a fake value, just for testing)
return(10)
}
))
wt <- windTurbine$new(NumOfBlades = 6, blade = 3)
wt$sweptArea
Look at ?R6Class(). You have also a private = argument for private fields or methods.

Partial matching confusion when arguments passed through dots ('...')

I've been working on an R package that is just a REST API wrapper for a graph database. I have a function createNode that returns an object with class node and entity:
# Connect to the db.
graph = startGraph("http://localhost:7474/db/data/")
# Create two nodes in the db.
alice = createNode(graph, name = "Alice")
bob = createNode(graph, name = "Bob")
> class(alice)
[1] "node" "entity"
> class(bob)
[1] "node" "entity"
I have another function, createRel, that creates a relationship between two nodes in the database. It is specified as follows:
createRel = function(fromNode, type, toNode, ...) {
UseMethod("createRel")
}
createRel.default = function(fromNode, ...) {
stop("Invalid object. Must supply node object.")
}
createRel.node = function(fromNode, type, toNode, ...) {
params = list(...)
# Check if toNode is a node.
stopifnot("node" %in% class(toNode))
# Making REST API calls through RCurl and stuff.
}
The ... allows the user to add an arbitrary amount of properties to the relationship in the form key = value. For example,
rel = createRel(alice, "KNOWS", bob, since = 2000, through = "Work")
This creates an (Alice)-[KNOWS]->(Bob) relationship in the db, with the properties since and through and their respective values. However, if a user specifies properties with keys from or to in the ... argument, R gets confused about the classes of fromNode and toNode.
Specifying a property with key from creates confusion about the class of fromNode. It is using createRel.default:
> createRel(alice, "KNOWS", bob, from = "Work")
Error in createRel.default(alice, "KNOWS", bob, from = "Work") :
Invalid object. Must supply node object.
3 stop("Invalid object. Must supply node object.")
2 createRel.default(alice, "KNOWS", bob, from = "Work")
1 createRel(alice, "KNOWS", bob, from = "Work")
Similarly, if a user specifies a property with key to, there is confusion about the class of toNode, and stops at the stopifnot():
Error: "node" %in% class(toNode) is not TRUE
4 stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"),
ch), call. = FALSE, domain = NA)
3 stopifnot("node" %in% class(toNode))
2 createRel.node(alice, "KNOWS", bob, to = "Something")
1 createRel(alice, "KNOWS", bob, to = "Something")
I've found that explicitly setting the parameters in createRel works fine:
rel = createRel(fromNode = alice,
type = "KNOWS",
toNode = bob,
from = "Work",
to = "Something")
# OK
But I am wondering how I need to edit my createRel function so that the following syntax will work without error:
rel = createRel(alice, "KNOWS", bob, from = "Work", to = "Something")
# Errors galore.
The GitHub user who opened the issue mentioned it is most likely a conflict with setAs on dispatch, which has arguments called from and to. One solution is to get rid of ... and change createRel to the following:
createRel = function(fromNode, type, toNode, params = list()) {
UseMethod("createRel")
}
createRel.default = function(fromNode, ...) {
stop("Invalid object. Must supply node object.")
}
createRel.node = function(fromNode, type, toNode, params = list()) {
# Check if toNode is a node.
stopifnot("node" %in% class(toNode))
# Making REST API calls through RCurl and stuff.
}
But, I wanted to see if I had any other options before making this change.
Not really an answer, but...
The problem is that the user-provided argument 'from' is being (partially) matched to the formal argument 'fromNode'.
f = function(fromNode, ...) fromNode
f(1, from=2)
## [1] 2
The rules are outlined in section 4.3.2 of RShowDoc('R-lang'), where named arguments are exact matched, then partial matched, and then unnamed arguments are assigned by position.
It's hard to know how to enforce exact matching, other than using single-letter argument names! Actually, for a generic this might not be as trite as it sounds -- x is a pretty generic variable name. If 'from' and 'to' were common arguments to ... you could change the argument list to "fromNode, , ..., from, to", check for missing(from) in the body of the function, and act accordingly; I don't think this would be pleasant, and the user would invariable provide an argument 'fro'.
While enforcing exact matching (and errors, via warn=2) by setting global options() might be helpful in debugging (though by then you'd probably know what you were looking for!) it doesn't help the package author who is trying to write code to work for users in general.
It might be reasonable to ask on the R-devel mailing list whether it might be time for this behavior to be changed (on the 'several releases' time scale); partial matching probably dates as a 'convenience' from the days before tab completion.

Order of methods in R reference class and multiple files

There is one thing I really don't like about R reference class: the order you write the methods matters. Suppose your class goes like this:
myclass = setRefClass("myclass",
fields = list(
x = "numeric",
y = "numeric"
))
myclass$methods(
afunc = function(i) {
message("In afunc, I just call bfunc...")
bfunc(i)
}
)
myclass$methods(
bfunc = function(i) {
message("In bfunc, I just call cfunc...")
cfunc(i)
}
)
myclass$methods(
cfunc = function(i) {
message("In cfunc, I print out the sum of i, x and y...")
message(paste("i + x + y = ", i+x+y))
}
)
myclass$methods(
initialize = function(x, y) {
x <<- x
y <<- y
}
)
And then you start an instance, and call a method:
x = myclass(5, 6)
x$afunc(1)
You will get an error:
Error in x$afunc(1) : could not find function "bfunc"
I am interested in two things:
Is there a way to work around this nuisance?
Does this mean I can never split a really long class file into multiple files? (e.g. one file for each method.)
Calling bfunc(i) isn't going to invoke the method since it doesn't know what object it is operating on!
In your method definitions, .self is the object being methodded on (?). So change your code to:
myclass$methods(
afunc = function(i) {
message("In afunc, I just call bfunc...")
.self$bfunc(i)
}
)
(and similarly for bfunc). Are you coming from C++ or some language where functions within methods are automatically invoked within the object's context?
Some languages make this more explicit, for example in Python a method with one argument like yours actually has two arguments when defined, and would be:
def afunc(self, i):
[code]
but called like:
x.afunc(1)
then within the afunc there is the self variable which referes to x (although calling it self is a universal convention, it could be called anything).
In R, the .self is a little bit of magic sprinkled over reference classes. I don't think you could change it to .this even if you wanted.

Modify contents of object with "call by reference"

I am trying to modify the contents of an object defined by a self-written class with a function that takes two objects of this class and adds the contents.
setClass("test",representation(val="numeric"),prototype(val=1))
I know that R not really works with "call by reference" but can mimic that behaviour with a method like this one:
setGeneric("value<-", function(test,value) standardGeneric("value<-"))
setReplaceMethod("value",signature = c("test","numeric"),
definition=function(test,value) {
test#val <- value
test
})
foo = new("test") #foo#val is 1 per prototype
value(foo)<-2 #foo#val is now set to 2
Until here, anything I did and got as result is consitent with my research here on stackexchange,
Call by reference in R (using function to modify an object)
and with this code from a lecture (commented and written in German)
What I wish to achieve now is a similar result with the following method:
setGeneric("add<-", function(testA,testB) standardGeneric("add<-"))
setReplaceMethod("add",signature = c("test","test"),
definition=function(testA,testB) {
testA#val <- testA#val + testB#val
testA
})
bar = new("test")
add(foo)<-bar #should add the value slot of both objects and save the result to foo
Instead I get the following error:
Error in `add<-`(`*tmp*`, value = <S4 object of class "test">) :
unused argument (value = <S4 object of class "test">)
The function call works with:
"add<-"(foo,bar)
But this does not save the value into foo. Using
foo <- "add<-"(foo,bar)
#or using
setMethod("add",signature = c("test","test"), definition= #as above... )
foo <- add(foo,bar)
works but this is inconsistent with the modifying method value(foo)<-2
I have the feeling that I am missing something simple here.
Any help is very much appreciated!
I do not remember why, but for <- functions, the last argument must be named 'value'.
So in your case:
setGeneric("add<-", function(testA,value) standardGeneric("add<-"))
setReplaceMethod("add",signature = c("test","test"),
definition=function(testA,value) {
testA#val <- testA#val + value#val
testA
})
bar = new("test")
add(foo)<-bar
You may also use a Reference class ig you want to avoid the traditional arguments as values thing.

Resources