How to subclass a function in R? - r

I was wondering if it is possible to use Reference Classes to subclass a function in R. For instance, the following
> CustomFunction <- setRefClass("CustomFunction", contains = "function")
> foo <- CustomFunction()
> foo()
NULL
works OK (does not throw an error), but how can I customise the behaviour (i.e. other than returning NULL)? How can I define function arguments?
I also tried
> setMethod("(",
> signature(x = "CustomFunction"),
> function(...) {
> "Hello!" # A function that always returns "Hello!"
> }
> )
Error in genericForPrimitive(f) :
methods may not be defined for primitive function ‘(’ in this version of R
but that doesn't seem to work.
I was hoping that being able to subclass functions means that I could implement custom behaviour before and after function calls. E.g. to have functions that automatically logs the call expression each time it is called (for audit purposes), or to create functions that automatically throws an error if NULL is returned etc etc.

You don't need Reference Classes for this, you can just enclose the function of interest
logger <- function(f) {
force(f)
function(...) {
print("running function...")
f(...)
}
}
printhello <- function(name="Al") print(paste("hello", name))
printhello_logged <- logger(printhello)
printhello()
# [1] "hello Al"
printhello_logged("Zed")
# [1] "running function..."
# [1] "hello Zed"
If this is for auditing/testing type purposes, you might be interested in trace() which allows you to attach code to various parts of functions.

Related

How can I use get0 in my R package to search only within package namespace?

Let's say I have an internal function in my package, call it is_(), which can be thought of as a more generalized version if is(). In particular, it searches for specialized functions that are used to tell whether the supplied object is the given class. For example, is_() might be programmed as:
is_ <- function(obj, class) {
if (exists(paste0("is.", class))) {
get0(paste0("is.", class))(obj)
}
else inherits(obj, class)
}
That way, if I'm doing argument checking in my package, I can run something like
is_(x, "numeric_vector")
where I've defined
is.numeric_vector <- function(x) is.numeric(x) && is.null(dim(x))
within my package.
A problem arises when is.numeric_vector() is defined outside my package, e.g., by another package the user has loaded. In that case, exists() and get0() both find the function wherever they can, but I want to restrict the search to function defined in my package and included in my package's namespace (i.e., all imported packages). The envir and inherits arguments seem to get at what I want, but I don't know how to supply them to get the result I want. How can I restrict get0() to only search for its argument within my package's namespace?
The problem is that your package namespace will inherit from the base namespace which inherits from the global environment. For a more detailed explanation, see here: https://adv-r.hadley.nz/environments.html#namespaces. If you want more control over the symbol look up, you'll need to do the work yourself. You could include your own get function in your package
# These are private helpers that do not need to be exported from your package.
.pkgenv <- environment()
get1 <- function(name, env = .pkgenv) {
if (identical(env, emptyenv())) {
NULL
} else if (identical(env, globalenv())) {
# stop at global env
NULL
} else if (exists(name, envir=env, inherits = FALSE)) {
env[[name]]
} else {
# try parent
get1(name, parent.env(env))
}
}
This will recursively search for the symbol in environments but stops at the global environment. You could use it with your is_ function like
is_ <- function(obj, class) {
if (!is.null(fn <- get1(paste0("is.", class)))) {
fn(obj)
} else {
inherits(obj, class)
}
}
Here we just check for null rather than separately verifying the name and then retrieving the value. If get1 is something that will be called a bunch you might want to consider caching the result so you don't always have to walk the inheritance tree.

Defaults of function argument which name correspond to existing function name

Is there a way to prevent a promise already under evaluation error when
you want the name of a function argument to be the name of an existing function
and you want to set a default for this particular argument
and you want to be able to call the outer function with its defaults only (i.e. without the need to pass an explicit value to each argument)?
In my example below, while foo(1:5, bar) works, foo(1:5) throws such an error.
Of course I could go and change the argument name from bar to, say, bar_fun, but I would rather stick with the actual function's name if possible.
foo <- function(x, bar = bar) {
bar(x)
}
bar <- function(x) {
UseMethod("bar")
}
bar.default <- function(x) {
sum(x)
}
foo(1:5)
#> Error in foo(1:5): promise already under evaluation: recursive default argument reference or earlier problems?
foo(1:5, bar)
#> [1] 15
Motivation (first order)
The real-world use case is that bar() is actually settings(), a function which returns a list of settings. I'd like to version those settings. So there'd be e.g. methods like settings.v1, settings.v2, ..., and settings.default.
And I thought of using settings.default to define the "runtime version" of settings to use, so e.g.:
settings <- function(x) {
UseMethod("settings")
}
settings.v1 <- function(x) {
list(system = "dev")
}
settings.v2 <- function(x) {
list(system = "production")
}
settings.default <- function(x) {
settings.v2(
}
foo <- function(x, settings = settings) {
settings()
}
foo()
#> Error in foo(): promise already under evaluation: recursive default argument reference or earlier problems?
foo(settings = settings)
#> $system
#> [1] "production"
Since settings.default() calls the settings method I want to use, it'd be great if I could just call foo() with its defaults (which would then always pick up the settings.default() method).
Motivation (second order)
I'm experimenting with adhering more to principles of functional programming (see e.g. chapter from Advanced R or wikipedia link) and its distinction of pure and effectful/side-effecty functions.
Previously, I probably would have implemented settings via some sort of a global variable that thus each foo() had access to, so I could be lazy and not define it as an argument of function foo(), but foo() then would depend on things outside of its scope - which is a very bad thing in FP.
Now I want to at least state the dependency of foo() on my settings by handing it a function that returns the settings values - which is sort of my lazyness at least complying to some extend with top-level FP principles.
Of course the non-lazy (and arguably best) solution would be to carefully state all actual settings dependencies one by one as function arguments in foo(), e.g. foo(settings_system = settings()$system) ;-)
1) Try explicitly getting it from the parent:
foo <- function(x, bar = get("bar", 1)) {
bar(x)
}
2) Another possibility is to use an argument name such as bar.. The user can still write foo(1:15, bar = whatever), e.g. any of these three calls work:
foo <- function(x, bar. = bar) {
bar.(x)
}
foo(1:5)
foo(1:5, bar)
foo(1:5, bar = bar)

Resuming stopped evaluation in R with correct search path

I have a function within a loaded library that stops the evaluation on its arguments using the substitute function. This function then calls another within that same library, which calls another function from that library, and so forth, until several calls later when I want to evaluated that initial argument in the original environment in which it was provided. The problem I have is that the search path for functions in loaded libraries includes namespace::base before the global environment. For example, let's say that foo and bar are both functions in the library lib. As such, the environment in which they are defined is namespace::lib. Consider the following:
> require(lib)
> foo
function (x)
{
x <- substitute(x)
bar(x)
}
<environment: namespace:lib>
> bar
function (x)
{
eval(x)
}
<environment: namespace:lib>
> length = 2
> foo(length)
function (x) .Primitive("length")
Because bar is a function within a loaded library, it searches namespace::base first and finds the above. However, if bar was defined by the user in the interactive session, it would have returned 2. I am looking for a way to cause these functions to behave as if I never halted evaluation, in which case 2 would be returned regardless of the environment in which the functions are defined.
I can't simply use mget to evaluate length starting at .GlobalEnv because then the following would not work:
> baz = function()
+ {
+ length <- 3
+ foo(length)
+ }
> baz()
function (x) .Primitive("length")
I could instead add an extra argument to all involved functions that tracks how many frames ago the evaluation was halted. However, this is pretty messy and not ideal.
I could also call sys.function inside the last function, bar, and trace my way back through the previous calls and evaluate my argument in the environment above the function that halted the evaluation. For example, if I call sys.function(1) within the body of bar after calling foo(length) then I get the following:
function (x)
{
eval(x)
}
This is indeed identical to foo. I can then use eval with sys.frames. This seems more general but less than perfect. I would have to know which functions stop evaluation.
Does anyone have a more general solution?
Does adding a default enviroment to these function help with the problem?
lib<-new.env()
assign("foo", function(x, env=parent.frame()) {
x<-substitute(x);
bar(x, env)
}, envir=lib)
assign("bar", function(x, env=parent.frame()) {
eval(x, env)
}, envir=lib)
attach(lib)
length = 2
foo(length)
# [1] 2
baz <- function() {
length <- 3
foo(length)
}
baz()
# [1] 3
bar(expression(baz()))
# [1] 3
If not, perhaps you could make a clearer, reproducible example with function calls and your expected output. Otherwise it's unclear where you are having trouble.

Environment chaining in R

In my R development I need to wrap function primitives in proto objects so that a number of arguments can be automatically passed to the functions when the $perform() method of the object is invoked. The function invocation internally happens via do.call(). All is well, except when the function attempts to access variables from the closure within which it is defined. In that case, the function cannot resolve the names.
Here is the smallest example I have found that reproduces the behavior:
library(proto)
make_command <- function(operation) {
proto(
func = operation,
perform = function(., ...) {
func <- with(., func) # unbinds proto method
do.call(func, list(), envir=environment(operation))
}
)
}
test_case <- function() {
result <- 100
make_command(function() result)$perform()
}
# Will generate error:
# Error in function () : object 'result' not found
test_case()
I have a reproducible testthat test that also outputs a lot of diagnostic output. The diagnostic output has me stumped. By looking up the parent environment chain, my diagnostic code, which lives inside the function, finds and prints the very same variable the function fails to find. See this gist..
How can the environment for do.call be set up correctly?
This was the final answer after an offline discussion with the poster:
make_command <- function(operation) {
proto(perform = function(.) operation())
}
I think the issue here is clearer and easier to explore if you:
Replace the anonymous function within make_command() with a named one.
Make that function open a browser() (instead of trying to get result). That way you can look around to see where you are and what's going on.
Try this, which should clarify the cause of your problem:
test_case <- function() {
result <- 100
myFun <- function() browser()
make_command(myFun)$perform()
}
test_case()
## Then from within the browser:
#
parent.env(environment())
# <environment: 0x0d8de854>
# attr(,"class")
# [1] "proto" "environment"
get("result", parent.env(environment()))
# Error in get("result", parent.env(environment())) :
# object 'result' not found
#
parent.frame()
# <environment: 0x0d8ddfc0>
get("result", parent.frame()) ## (This works, so points towards a solution.)
# [1] 100
Here's the problem. Although you think you're evaluating myFun(), whose environment is the evaluation frame of test_case(), your call to do.call(func, ...) is really evaluating func(), whose environment is the proto environment within which it was defined. After looking for and not finding result in its own frame, the call to func() follows the rules of lexical scoping, and next looks in the proto environment. Neither it nor its parent environment contains an object named result, resulting in the error message you received.
If this doesn't immediately make sense, you can keep poking around within the browser. Here are a few further calls you might find helpful:
environment(get("myFun", parent.frame()))
ls(environment(get("myFun", parent.frame())))
environment(get("func", parent.env(environment())))
ls(environment(get("func", parent.env(environment()))))

Forcing specific data types as arguments to a function

I was just wondering if there was a way to force a function to only accept certain data types, without having to check for it within the function; or, is this not possible because R's type-checking is done at runtime (as opposed to those programming languages, such as Java, where type-checking is done during compilation)?
For example, in Java, you have to specify a data type:
class t2 {
public int addone (int n) {
return n+1;
}
}
In R, a similar function might be
addone <- function(n)
{
return(n+1)
}
but if a vector is supplied, a vector will (obviously) be returned. If you only want a single integer to be accepted, then is the only way to do to have a condition within the function, along the lines of
addone <- function(n)
{
if(is.vector(n) && length(n)==1)
{
return(n+1)
} else
{
return ("You must enter a single integer")
}
}
Thanks,
Chris
This is entirely possible using S3 classes. Your example is somewhat contrived in the context or R, since I can't think of a practical reason why one would want to create a class of a single value. Nonetheless, this is possible. As an added bonus, I demonstrate how the function addone can be used to add the value of one to numeric vectors (trivial) and character vectors (so A turns to B, etc.):
Start by creating a generic S3 method for addone, utlising the S3 despatch mechanism UseMethod:
addone <- function(x){
UseMethod("addone", x)
}
Next, create the contrived class single, defined as the first element of whatever is passed to it:
as.single <- function(x){
ret <- unlist(x)[1]
class(ret) <- "single"
ret
}
Now create methods to handle the various classes. The default method will be called unless a specific class is defined:
addone.default <- function(x) x + 1
addone.character <- function(x)rawToChar(as.raw(as.numeric(charToRaw(x))+1))
addone.single <- function(x)x + 1
Finally, test it with some sample data:
addone(1:5)
[1] 2 3 4 5 6
addone(as.single(1:5))
[1] 2
attr(,"class")
[1] "single"
addone("abc")
[1] "bcd"
Some additional information:
Hadley's devtools wiki is a valuable source of information on all things, including the S3 object system.
The S3 method doesn't provide strict typing. It can quite easily be abused. For stricter object orientation, have a look at S4 classes, reference based classesor the proto package for Prototype object-based programming.
You could write a wrapper like the following:
check.types = function(classes, func) {
n = as.name
params = formals(func)
param.names = lapply(names(params), n)
handler = function() { }
formals(handler) = params
checks = lapply(seq_along(param.names), function(I) {
as.call(list(n('assert.class'), param.names[[I]], classes[[I]]))
})
body(handler) = as.call(c(
list(n('{')),
checks,
list(as.call(list(n('<-'), n('.func'), func))),
list(as.call(c(list(n('.func')), lapply(param.names, as.name))))
))
handler
}
assert.class = function(x, cls) {
stopifnot(cls %in% class(x))
}
And use it like
f = check.types(c('numeric', 'numeric'), function(x, y) {
x + y
})
> f(1, 2)
[1] 3
> f("1", "2")
Error: cls %in% class(x) is not TRUE
Made somewhat inconvenient by R not having decorators. This is kind of hacky
and it suffers from some serious problems:
You lose lazy evaluation, because you must evaluate an argument to determine
its type.
You still can't check the types until call time; real static type checking
lets you check the types even of a call that never actually happens.
Since R uses lazy evaluation, (2) might make type checking not very useful,
because the call might not actually occur until very late, or never.
The answer to (2) would be to add static type information. You could probably
do this by transforming expressions, but I don't think you want to go there.
I've found stopifnot() to be highly useful for these situations as well.
x <- function(n) {
stopifnot(is.vector(n) && length(n)==1)
print(n)
}
The reason it is so useful is because it provides a pretty clear error message to the user if the condition is false.

Resources