Order of methods in R reference class and multiple files - r

There is one thing I really don't like about R reference class: the order you write the methods matters. Suppose your class goes like this:
myclass = setRefClass("myclass",
fields = list(
x = "numeric",
y = "numeric"
))
myclass$methods(
afunc = function(i) {
message("In afunc, I just call bfunc...")
bfunc(i)
}
)
myclass$methods(
bfunc = function(i) {
message("In bfunc, I just call cfunc...")
cfunc(i)
}
)
myclass$methods(
cfunc = function(i) {
message("In cfunc, I print out the sum of i, x and y...")
message(paste("i + x + y = ", i+x+y))
}
)
myclass$methods(
initialize = function(x, y) {
x <<- x
y <<- y
}
)
And then you start an instance, and call a method:
x = myclass(5, 6)
x$afunc(1)
You will get an error:
Error in x$afunc(1) : could not find function "bfunc"
I am interested in two things:
Is there a way to work around this nuisance?
Does this mean I can never split a really long class file into multiple files? (e.g. one file for each method.)

Calling bfunc(i) isn't going to invoke the method since it doesn't know what object it is operating on!
In your method definitions, .self is the object being methodded on (?). So change your code to:
myclass$methods(
afunc = function(i) {
message("In afunc, I just call bfunc...")
.self$bfunc(i)
}
)
(and similarly for bfunc). Are you coming from C++ or some language where functions within methods are automatically invoked within the object's context?
Some languages make this more explicit, for example in Python a method with one argument like yours actually has two arguments when defined, and would be:
def afunc(self, i):
[code]
but called like:
x.afunc(1)
then within the afunc there is the self variable which referes to x (although calling it self is a universal convention, it could be called anything).
In R, the .self is a little bit of magic sprinkled over reference classes. I don't think you could change it to .this even if you wanted.

Related

How to return event$data in rstudio/websocket

I am trying to extend websocket::Websocket with a method that sends some data and returns the message, so that I can assign it to an object. My question is pretty much identical to https://community.rstudio.com/t/capture-streaming-json-over-websocket/16986. Unfortunately, the user there never revealed how they solved it themselves. My idea was to have the onMessage method return the event$data, i.e. something like:
my_websocket <- R6::R6Class("My websocket",
inherit = websocket::WebSocket,
public = list(
foo = function(x) {
msg <- super$send(paste("x"))
return(msg)
} )
)
load_websocket <- function(){
ws <- my_websocket$new("ws://foo.local")
ws$onMessage(function(event) {
return(event$data)
})
return(ws)
}
my_ws <- load_websocket()
my_ws$foo("hello") # returns NULL
but after spending a good hour on the Websocket source code, I am still completely in the dark as to where exactly the callback happens, "R environment wise".
You need to use super assignment operator <<-. <<- is most useful in conjunction with closures to maintain state. Unlike the usual single arrow assignment (<-) that always works on the current level, the double arrow operator can modify variables in parent levels.
my_websocket <- R6::R6Class("My websocket",
inherit = websocket::WebSocket,
public = list(
foo = function(x) {
msg <<- super$send(paste("x"))
return(msg)
} )
)
load_websocket <- function(){
ws <- my_websocket$new("ws://foo.local")
ws$onMessage(function(event) {
return(event$data)
})
return(ws)
}
my_ws <- load_websocket()
my_ws$foo("hello")

How to use `foreach` and `%dopar%` with an `R6` class in R?

I ran into an issue trying to use %dopar% and foreach() together with an R6 class. Searching around, I could only find two resources related to this, an unanswered SO question and an open GitHub issue on the R6 repository.
In one comment (i.e., GitHub issue) an workaround is suggested by reassigning the parent_env of the class as SomeClass$parent_env <- environment(). I would like to understand what exactly does environment() refer to when this expression (i.e., SomeClass$parent_env <- environment()) is called within the %dopar% of foreach?
Here is a minimal reproducible example:
Work <- R6::R6Class("Work",
public = list(
values = NULL,
initialize = function() {
self$values <- "some values"
}
)
)
Now, the following Task class uses the Work class in the constructor.
Task <- R6::R6Class("Task",
private = list(
..work = NULL
),
public = list(
initialize = function(time) {
private$..work <- Work$new()
Sys.sleep(time)
}
),
active = list(
work = function() {
return(private$..work)
}
)
)
In the Factory class, the Task class is created and the foreach is implemented in ..m.thread().
Factory<- R6::R6Class("Factory",
private = list(
..warehouse = list(),
..amount = NULL,
..parallel = NULL,
..m.thread = function(object, ...) {
cluster <- parallel::makeCluster(parallel::detectCores() - 1)
doParallel::registerDoParallel(cluster)
private$..warehouse <- foreach::foreach(1:private$..amount, .export = c("Work")) %dopar% {
# What exactly does `environment()` encapsulate in this context?
object$parent_env <- environment()
object$new(...)
}
parallel::stopCluster(cluster)
},
..s.thread = function(object, ...) {
for (i in 1:private$..amount) {
private$..warehouse[[i]] <- object$new(...)
}
},
..run = function(object, ...) {
if(private$..parallel) {
private$..m.thread(object, ...)
} else {
private$..s.thread(object, ...)
}
}
),
public = list(
initialize = function(object, ..., amount = 10, parallel = FALSE) {
private$..amount = amount
private$..parallel = parallel
private$..run(object, ...)
}
),
active = list(
warehouse = function() {
return(private$..warehouse)
}
)
)
Then, it is called as:
library(foreach)
x = Factory$new(Task, time = 2, amount = 10, parallel = TRUE)
Without the following line object$parent_env <- environment(), it throws an error (i.e., as mentioned in the other two links): Error in { : task 1 failed - "object 'Work' not found".
I would like to know, (1) what are some potential pitfalls when assigning the parent_env inside foreach and (2) why does it work in the first place?
Update 1:
I returned environment() from within foreach(), such that private$..warehouse captures those environments
using rlang::env_print() in a debug session (i.e., the browser() statement was placed right after foreach has ended execution) here is what they consist of:
Browse[1]> env_print(private$..warehouse[[1]])
# <environment: 000000001A8332F0>
# parent: <environment: global>
# bindings:
# * Work: <S3: R6ClassGenerator>
# * ...: <...>
Browse[1]> env_print(environment())
# <environment: 000000001AC0F890>
# parent: <environment: 000000001AC20AF0>
# bindings:
# * private: <env>
# * cluster: <S3: SOCKcluster>
# * ...: <...>
Browse[1]> env_print(parent.env(environment()))
# <environment: 000000001AC20AF0>
# parent: <environment: global>
# bindings:
# * private: <env>
# * self: <S3: Factory>
Browse[1]> env_print(parent.env(parent.env(environment())))
# <environment: global>
# parent: <environment: package:rlang>
# bindings:
# * Work: <S3: R6ClassGenerator>
# * .Random.seed: <int>
# * Factory: <S3: R6ClassGenerator>
# * Task: <S3: R6ClassGenerator>
Disclaimer: a lot of what I say here are educated guesses and inferences based on what I know,
I can't guarantee everything is 100% correct.
I think there can be many pitfalls,
and which one applies really depends on what you do.
I think your second question is more important,
because if you understand that,
you'll be able to evaluate some of the pitfalls by yourself.
The topic is rather complex,
but you can probably start by reading about R's lexical scoping.
In essence, R has a sort of hierarchy of environments,
and when R code is executed,
variables whose values are not found in the current environment
(which is what environment() returns)
are sought in the parent environments
(not to be confused with the caller environments).
Based on the GitHub issue you linked,
R6 generators save a "reference" to their parent environments,
and they expect that everything their classes may need can be found in said parent or somewhere along the environment hierarchy,
starting at that parent and going "up".
The reason the workaround you're using works is because you're replacing the generator's parent environment with the one in the current foreach call inside the parallel worker
(which may be a different R process, not necessarily a different thread),
and, given your .export specification probably exports necessary values,
R's lexical scoping can then search for missing values starting from the foreach call in the separate thread/process.
For the specific example you linked,
I found that a simpler way to make it work
(at least on my Linux machine)
is to do the following:
library(doParallel)
cluster <- parallel::makeCluster(parallel::detectCores() - 1)
doParallel::registerDoParallel(cluster)
parallel::clusterExport(cluster, setdiff(ls(), "cluster"))
x = Factory$new(Task, time = 1, amount = 3)
but leaving the ..m.thread function as:
..m.thread = function(object, amount, ...) {
private$..warehouse <- foreach::foreach(1:amount) %dopar% {
object$new(...)
}
}
(and manually call stopCluster when done).
The clusterExport call should have semantics similar to*:
take everything from the main R process' global environment except cluster,
and make it available in each parallel worker's global environment.
That way, any code inside the foreach call can use the generators when lexical scoping reaches their respective global environments.
foreach can be clever and exports some variables automatically
(as shown in the GitHub issue),
but it has limitations,
and the hierarchy used during lexical scoping can get very messy.
*I say "similar to" because I don't know what exactly R does to distinguish (global) environments if forks are used,
but since that export is needed,
I assume they are indeed independent of each other.
PS: I'd use a call to on.exit(parallel::stopCluster(cluster)) if you create workers inside a function call,
that way you avoid leaving processes around until they are somehow stopped if an error occurs.

Call S4 function from within same object

I made this (simplified) S4 class:
setClass(
Class = "WOW",
representation = representation(
x = 'numeric',
y = 'numeric'
)
)
setGeneric("run", function(.Object, x, y, do)
standardGeneric("run") )
setGeneric("add", function(.Object, x, y, do)
standardGeneric("add") )
setMethod("add", signature("WOW"), function(.Object, x, y ) {
return(x + y)
})
setMethod("run", signature("WOW"), function(.Object, x, y, do) {
if (do == 'ADD') {
print('ADDING')
# --> RUN wow function
}
})
I basically want to user to call the run function and tell it what to do (via the do parameter) with the two input numbers (x and y). So i the user says "ADD" it should call the add function and return the results to the run function. I know this can be done in python and java via self or this but how can this be done in R?
I figured out it could work calling add(.Object, x, y) but I cannot find any reference stating that this is the accepted way of doing so.
(Obviously I could just add the number here in a single line of code so I don't need a function but my application is much more complex requiring a function to keep it neat)

Capture a function from parameter in NESTED function (closure function)

Consider a code snippet as follow:
f = function(y) function() y()
f(version)()
Error in f(version)() : could not find function "y"
P.s. It seems that the closure mechanism is quite different from C# Lambda. (?)
Q: How can I capture a function in the closure?
--EDIT--
Scenario: Actually, I would like to write a function factory, and I don't want to add parameter to the nested function.
Like this:
theme_factory = function(theme_fun)
{
function(device)
{
if (!is.onMac()) # Not Mac
{
(device == "RStudioGD") %?% theme_fun(): theme_fun(base_family="Heiti")
}
else
{
theme_fun(base_family="STHeiti")
}
}
}
And I defined two customized theme function for ggplot
theme_bw_rmd = theme_factory(theme_bw)
theme_grey_rmd = theme_factory(theme_grey)
Then I use them like:
function(device)
ggplot(data) + geom_point() something like that + theme_bw_rmd(device)
Thanks.
So the problem is with passing parameter? What about something like this:
alwaysaddone <- function(f) function(...) f(...)+1
biggersum <- alwaysaddone(sum)
sum(1:3)
# 6
biggersum(1:3)
# 7
You can use ... to "pass-through" any parameters you like.
Use eval(func, envir = list(... captured parameters)) or substitute(func, envir) to eval the captured function in a specific environment.

Avoiding consideration of enclosing frames when retrieving field value of a S4 Reference Class

I'm a huge fan of S4 Reference Classes as they allow for a hybrid programming style (functional/pass-by-value vs. oop/pass-by-reference; example) and thus increase flexibility dramatically.
However, I think I just came across an undesired behavior with respect to the way R scans through environments/frames when you ask it to retrieve a certain field value via method $field() (see help page). The problem is that R also seems to look in enclosing environments/frames if the desired field is not found in the actual local/target environment (which would be the environment making up the S4 Reference Class), i.e. it's just like running get(<objname>, inherits=TRUE) (see help page).
Actual question
In order to have R just look in the local/target environment, I was thinking something like $field(name="<fieldname>", inherits=FALSE) but $field() doesn't have a ... argument that would allow me to pass inherits=FALSE along to get() (which I'm guessing is called somewhere along the way). Is there a workaround to this?
Code Example
For those interested in more details: here's a little code example illustrating the behavior
setRefClass("A", fields=list(a="character"))
x <- getRefClass("A")$new(a="a")
There is a field a in class A, so it's found in the target environment and the value is returned:
> x$field("a")
[1] "a"
Things look differently if we try to access a field that is not a field of the reference class but happens to have a name identical to that of some other object in the workspace/searchpath (in this case "lm"):
require("MASS")
> x$field("lm")
function (formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...)
{
ret.x <- x
ret.y <- y
[omitted]
if (!qr)
z$qr <- NULL
z
}
<bytecode: 0x02e6b654>
<environment: namespace:stats>
Not really what I would expect at this point. IMHO an error or at least a warning would be much better. Or opening method $field() for arguments that can be passed along to other functions via .... I'm guessing somewhere along the way get() is called when calling $field(), so something like this could prevent the above behavior from occurring:
x$field("digest", inherits=FALSE)
Workaround: own proposal
This should do the trick, but maybe there's something more elegant that doesn't involve the specification of a new method on top of $field():
setRefClass("A", fields=list(a="character"),
methods=list(
myField=function(name, ...) {
# VALIDATE NAME //
if (!name %in% names(getRefClass(class(.self))$fields())) {
stop(paste0("Invalid field name: '", name, "'"))
}
# //
.self$field(name=name)
}
)
)
x <- getRefClass("A")$new(a="a")
> x$myField("a")
[1] "a"
> x$myField("lm")
Error in x$myField("lm") : Invalid field name: 'lm'
The default field() method can be replaced with your own. So adding an inherits argument to avoid the enclosing frames is simply a matter of grabbing the existing x$field definition and adding it...
setRefClass( Class="B",
fields= list( a="character" ),
methods= list(
field = function(name, value, inherits=TRUE ) {
if( missing(value) ) {
get( name, envir=.self, inherits=inherits )
} else {
if( is.na( match( name, names( .refClassDef#fieldClasses ) ) ) ) {
stop(gettextf("%s is not a field in this class", sQuote(name)), domain = NA)
}
assign(name, value, envir = .self)
}
}
),
)
Or you could have a nice error message with a little rearranging
setRefClass( Class="C",
fields= list( a="character" ),
methods= list(
field = function(name, value, inherits=TRUE ) {
if( is.na( match( name, names( .refClassDef#fieldClasses ) ) ) &&
( !missing(value) || inherits==FALSE) ) {
stop(gettextf("%s is not a field in this class", sQuote(name)), domain = NA)
}
if( missing(value) ) {
get( name, envir=.self, inherits=inherits )
} else {
assign(name, value, envir = .self)
}
}
),
)
Since you can define any of your own methods to replace the defaults pretty much any logic you want can be implemented for your refclasses. Perhaps an error if the variable is acquired using inheritance but the mode matches to c("expression", "name", "symbol", "function") and warning if it doesn't directly match the local refClass field names?

Resources