Method initialisation in R reference classes - r

I've noticed some strange behaviour in R reference classes when trying to implement some optimisation algorithm. There seems to be some behind-the-scenes parsing magic involved in initialising methods in a particular which makes it difficult to work with anonymous functions.
Here's an example that illustrates the difficulty: I define a function to optimise (f_opt), a function that runs optim on it, and a reference class that has these two as methods. The odd behaviour will be clearer in the code
f_opt <- function(x) (t(x)%*%x)
do_optim_opt <- function(x) optim(x,f)
do_optim2_opt <- function(x)
{
f(x) #Pointless extra evaluation
optim(x,f)
}
optClass <- setRefClass("optClass",methods=list(do_optim=do_optim_opt,
do_optim2=do_optim2_opt,
f=f_opt))
oc <- optClass$new()
oc$do_optim(rep(0,2)) #Doesn't work: Error in function (par) : object 'f' not found
oc$do_optim2(rep(0,2)) #Works.
oc$do_optim(rep(0,2)) #Parsing magic has presumably happened, and now this works too.
Is it just me, or does this look like a bug to other people too?

This post in R-devel seems relevant, with workaround
do_optim_opt <- function(x, f) optim(x, .self$f)
Seems worth a post to R-devel.

Related

source code for grow function in randomForest R package

In source code of R randomForest package, I find the following code in grow.R. What's the purpose for UseMethod? Why does function grow not have function definition and just grow.default and grow.randomForest have definition? Is this something related to calling C function in the R package?
grow <- function(x, ...) UseMethod("grow")
grow.default <- function(x, ...)
stop("grow has not been implemented for this class of object")
grow.randomForest <- function(x, how.many, ...) {
y <- update(x, ntree=how.many)
combine(x, y)
}
Also, in the randomForest.R file, I only find the following code. There is randomForest.default.R file too. Why is there no randomForest.randomForest definition like function grow?
"randomForest" <-
function(x, ...)
UseMethod("randomForest")
What's the purpose for UseMethod? Why does function grow not have function definition and just grow.default and grow.randomForest have definition?
I'd suggest reading about S3 dispatch to understand the patterns you see. Advanced R has a chapter on S3. You can also see related questions here on Stack Overflow.
Is this something related to calling C function in the R package?
No.
Why is there no randomForest.randomForest definition like function grow?
This should make sense if you do the recommended reading above. S3 dispatch uses a pattern of function_name.class to call the correct version of the function (method) based on class of the input. You don't give a randomForest object as an input to the randomForest function, so there is no randomForest.randomForest method defined.
grow() does get called on randomForest objects, hence the grow.randomForest() method. Presumably the authors wanted grow() to error early if it gets called on inappropriate input, so the default for other classes is an immediate error, but they still keep dispatch flexible to work with other classes, enabling extensions of the package and nice play with other packages that may have their own grow() implementations.

Detect methods in other environments in R -- for testing in testthat

Is it possible to allow "UseMethod" to see class functions defined in other environments? See the below code for an example. I would like the h.logical to be detected too such that h(TRUE) would return "logical".
h <- function(x) {
UseMethod("h")
}
h.character <- function(x){ "char"}
h.numeric <- function(x) { "num" }
aa = list(h.logical=function(x){"logical"})
attach(aa)
h("a")
h(10)
h(TRUE)
This code now throws an error in the last line:
Error in UseMethod("h") : no applicable method for 'h' applied to an object of class "logical"
Solving the issue with this example suffices. If that is not possible, I would appreciate help solving the actual use case in another way.
The use case is as follows: I have a package with functions like h above, then I want to add a class to that function. This works fine just adding the new function to .Globalenv. The problem occurs when I want to test this using testthat as I am not allowed to write to .Globalenv within a test. Adding the new class function to some other environment within the test makes it detectable by methods(), however, UseMethod still does not see it and throws an error. Any ideas?
Can I use something else then UseMethod for this purpose or do some other hack while testing to mimic the actual usage?
Any help or pointers to how to handle this is highly appreciated!

Global Variables as Function Parameters

In an R project, we have a global dataframe df that is to be used inside a function my_func(). The dataframe will not be changed, but it will be used as a "read-only" table.
Can you please assist me, on, what is the best practice:
Include the dataframe in the parameters of the function, as in
my_func(df)
{
a <- df[1,2]
}
OR
Not include it in the parameters, just use it (read it) in the function body, as in
my_func()
{
a <- df[1,2]
}
In an ideal world, data enters a function as an argument and leaves it as a return value. That is a good principle. Besides it is prefereable for code reuse. Right now you may be conviced, that you will only ever call this code on df (bad name by the way, as there is a function calles df already in R and that can lead to terrible error messages).
The only exception from this rule, and the reason, why <<- exist(*), may rarely be performance.
However in the read-only case, there are no performance gains, as R does behave cleverly.
Will will need to install the microbenchmark package for the following code to run:
expl <- data.frame(a = rep("Hello world.", 1e8),
b = rep(1, 1e8))
fun1 <- function(dataframe) return(sum(dataframe$b))
fun2 <- function() return(sum(expl$b))
microbenchmark::microbenchmark(fun1(expl), fun2())
Try it and you will see, that there is no performance gain in fun2over fun1, even though the dataframe has considerable size.
Edit:
(*) as I have learned from Konrad Rudolph's comment below, <<- can be usefull, when giving data to the parent, not necessarily the global namespace. Very interesting read even if not strictly on topic here: http://adv-r.had.co.nz/Functional-programming.html#mutable-state

In R, do an operation temporarily using a setting such as working directory

I'm almost certain I've read somewhere how to do this. Instead of having to save the current option (say working directory) to a variable, change the w.d, do an operation, and then revert back to what it was, doing this inside a function akin to "with" relative to attach/detach. A solution just for working directory is what I need now, but there might be a more generic function that does that sort of things? Or ain't it?
So to illustrate... The way it is now:
curdir <- getwd()
setwd("../some/place")
# some operation
setwd(curdir)
The way it is in my wildest dreams:
with.dir("../some/place", # some operation)
I know I could write a function for this, I just have the impression there's something more readily available and generalizable to other parameters too.
Thanks
There is an idiom for this in some of R's base plotting functions
op <- par(no.readonly = TRUE)
# par(blah = stuff)
# plot(stuff)
par(op)
that is so unbelievably crude as to be fully portable to options() and setwd().
Fortunately it's also easy to implement a crude wrapper:
with_dir <- function(dir, expr) {
old_wd <- getwd()
setwd(dir)
result <- evalq(expr)
setwd(old_wd)
result
}
I'm no wizard with nonstandard evaluation so evalq could be unstable somehow. More on NSE in an old write-up by Lumley and also in Wickham's Advanced R, but it's dense stuff and I haven't wrapped my head around it all yet.
edit: as per Ben Bolker's comment, it's probably better to use on.exit for this:
with_dir <- function(dir, expr) {
old_wd <- getwd()
on.exit(setwd(old_wd))
setwd(dir)
evalq(expr)
}
From the R docs:
on.exit records the expression given as its argument as needing to be executed when the current function exits (either naturally or as the result of an error). This is useful for resetting graphical parameters or performing other cleanup actions.
What you're describing depends upon two things: detecting when you enter and leave a particular lexical scope, and defining a behavior to do on entrance and on exit. Python has these, called "Context Managers". This was a big deal when it was released, and many parts of Python's standard library now behave like context managers, and have to define the "enter" and "exit" behavior in explicitly, or by leveraging some clever inheritance scheme.
with.default
function (data, expr, ...)
eval(substitute(expr), data, enclos = parent.frame())
<bytecode: 0x07d02ccc>
<environment: namespace:base>
R's with function works sort of like a context manager, because it can pass scopes around easily. That said, this doesn't give you the "enter" and "exit" operations for free. Especially consider that the current working directory isn't an entry in the current scope, but a state of the R interpreter, which can only be queried or changed by function calls behind the .Internal shield.
You can easily define your own object types to have methods that are context manager-like for the with generic function, as well as writing and registering methods for other types you commonly use, but it is not part of the base R language.

Access (exporting) function from namespace

I know something similar has been asked before here on SO, but the solution given there doesn't seem to apply in my case.
I'm trying to follow convention in creating a package by referring to functions exported from other namespaces and avoiding use of require() within a function.
I'm basically trying to prevent a function taking too long to run. For example,
fun <- function(i){
require(R.utils)
setTimeLimit(elapsed=10, transient=TRUE) # prevent taking more than 10secs
return(i^i)
}
>fun(10)
Works fine, but if I try:
require(R.utils)
fun <- function(i){
R.utils:::setTimeLimit(elapsed=10, transient=TRUE) # prevent taking more than 10secs
return(i^i)
}
>fun(10)
I get:
Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
object 'setTimeLimit' not found
Changing ::: to :: doesn't change this behavior.
I'm open to any simpler methods to achieving the same objective.
Also is it really so bad to have require() calls inside a function?
Many thanks!
EDIT:
If import works then great, thanks. Still in development so wanted to make sure it would be OK.
EDIT:
Apologies, it's there in base. Not sure how I missed this; I was originally using R.utils::evalWithTimeout and must have assumed both were in the same package. *looks sheepish*
I'm just posting this to prevent the question from showing up as unanswered, but will be glad to accept another...
isTRUE("setTimeLimit" %in% ls(getNamespace("base"), all.names=TRUE))

Resources