How does local() differ from other approaches to closure in R? - r

Yesterday I learned from Bill Venables how local() can help create static functions and variables, e.g.,
example <- local({
hidden.x <- "You can't see me!"
hidden.fn <- function(){
cat("\"hidden.fn()\"")
}
function(){
cat("You can see and call example()\n")
cat("but you can't see hidden.x\n")
cat("and you can't call ")
hidden.fn()
cat("\n")
}
})
which behaves as follows from the command prompt:
> ls()
[1] "example"
> example()
You can see and call example()
but you can't see hidden.x
and you can't call "hidden.fn()"
> hidden.x
Error: object 'hidden.x' not found
> hidden.fn()
Error: could not find function "hidden.fn"
I've seen this kind of thing discussed in Static Variables in R where a different approach was employed.
What the pros and cons of these two methods?

Encapsulation
The advantage of this style of programming is that the hidden objects won't likely be overwritten by anything else so you can be more confident that they contain what you think. They won't be used by mistake since they can't readily be accessed. In the linked-to post in the question there is a global variable, count, which could be accessed and overwritten from anywhere so if we are debugging code and looking at count and see its changed we cannnot really be sure what part of the code has changed it. In contrast, in the example code of the question we have greater assurance that no other part of the code is involved.
Note that we actually can access the hidden function although its not that easy:
# run hidden.fn
environment(example)$hidden.fn()
Object Oriented Programming
Also note that this is very close to object oriented programming where example and hidden.fn are methods and hidden.x is a property. We could do it like this to make it explicit:
library(proto)
p <- proto(x = "x",
fn = function(.) cat(' "fn()"\n '),
example = function(.) .$fn()
)
p$example() # prints "fn()"
proto does not hide x and fn but its not that easy to access them by mistake since you must use p$x and p$fn() to access them which is not really that different than being able to write e <- environment(example); e$hidden.fn()
EDIT:
The object oriented approach does add the possibility of inheritance, e.g. one could define a child of p which acts like p except that it overrides fn.
ch <- p$proto(fn = function(.) cat("Hello from ch\n")) # child
ch$example() # prints: Hello from ch

local() can implement a singleton pattern -- e.g., the snow package uses this to track the single Rmpi instance that the user might create.
getMPIcluster <- NULL
setMPIcluster <- NULL
local({
cl <- NULL
getMPIcluster <<- function() cl
setMPIcluster <<- function(new) cl <<- new
})
local() might also be used to manage memory in a script, e.g., allocating large intermediate objects required to create a final object on the last line of the clause. The large intermediate objects are available for garbage collection when local returns.
Using a function to create a closure is a factory pattern -- the bank account example in the Introduction To R documentation, where each time open.account is invoked, a new account is created.
As #otsaw mentions, memoization might be implemented using local, e.g., to cache web sites in a crawler
library(XML)
crawler <- local({
seen <- new.env(parent=emptyenv())
.do_crawl <- function(url, base, pattern) {
if (!exists(url, seen)) {
message(url)
xml <- htmlTreeParse(url, useInternal=TRUE)
hrefs <- unlist(getNodeSet(xml, "//a/#href"))
urls <-
sprintf("%s%s", base, grep(pattern, hrefs, value=TRUE))
seen[[url]] <- length(urls)
for (url in urls)
.do_crawl(url, base, pattern)
}
}
.do_report <- function(url) {
urls <- as.list(seen)
data.frame(Url=names(urls), Links=unlist(unname(urls)),
stringsAsFactors=FALSE)
}
list(crawl=function(base, pattern="^/.*html$") {
.do_crawl(base, base, pattern)
}, report=.do_report)
})
crawler$crawl(favorite_url)
dim(crawler$report())
(the usual example of memoization, Fibonacci numbers, is not satisfying -- the range of numbers that don't overflow R's numeric representation is small , so one would probably use a look-up table of efficiently pre-calculated values). Interesting how crawler here is a singleton; could as easily have followed a factory pattern, so one crawler per base URL.

Related

Precise specification of function environments

I would like to have the option of specifying very precisely (exactly, if possible) the behavior of functions. For example,
foo <- function() {}
Env <- new.env(parent=emptyenv())
environment(foo) <- Env
foo()
# Error in { : could not find function "{"
Env$`{` <- base::`{`
environment(foo) <- Env
foo()
# NULL
I know I need to set the environment of foo, and that whatever foo depends upon is defined in that environment (or that environment's parent, or the parent's parent, etc.). But
what all must be specified, and
what can't be specified (i.e., in what cases am I stuck with default behavior)?
For example, in the following, why is there no error when the call to bar is attempted, when I leave " undefined (what kind of a thing is ", anyway? Or is it ""?).
bar <- function() {""}
Env <- new.env(parent=emptyenv())
Env$`{` <- base::`{`
environment(bar) <- Env
bar()
# [1] ""
My larger exercise is exploring ways to build very large systems with lots of dependencies, but no circular dependencies. I think I will need precise control over function environments in order to guarantee that there are no cycles. I am aiming for a ball of mud of sorts, where I can, given certain discipline, put all the R code I ever write into a single structure (which might not all be in memory, or even on local storage)—and, whenever I write new code, have the option of depending on some code I wrote in the past, provided that I have access to that ealier code.
Related posts include:
[distinct], [specify], and [environments-enclosures-frames].

Why does rm inside a function not delete objects?

rel.mem <- function(nm) {
rm(nm)
}
I defined the above function rel.mem -- takes a single argument and passes it to rm
> ls()
[1] "rel.mem"
> x<-1:10
> ls()
[1] "rel.mem" "x"
> rel.mem(x)
> ls()
[1] "rel.mem" "x"
Now you can see what I call rel.mem x is not deleted -- I know this is due to the incorrect environment on which rm is being attempted.
What is a good fix for this?
Criteria for a good fix:
The caller should not have to pass the environment
The callee (rel.mem) should be able to determine the environment by using an R language facility (call stack inspection, aspects, etc.)
The interface of the function rel.mem should be kept simple -- idiot proof: call rel.mem -- then rel.mem takes it from there -- no need to pass environments.
NOTES:
As many commenters have pointed out that one easy fix is to pass the environment.
What I meant by a good fix [and I should have clarified it] is that the callee function (in this case rel.mem) is able to calculate/find out the environment when the caller was referring to and then remove the object from the right environment.
The type of reasoning in "2" can be done in other languages by inspecting the call stack -- for example in Java I would throw a dummy exception -- catch it and then parse the call stack. In other languages still I could use Aspect Oriented techniques. The question is can something like that be done in R?
As one commenter has suggested that there may be multiple objects with the same name and thus the "right" environment is meaningless -- as I've stated above that in other languages it is possible (sometimes with some creative trickery) to interpret the call-stack -- this may not be possible in R
As one commenter has suggested that rm(list=nm, envir = parent.frame()) will remove this from the parent environment. This is correct -- however I'm looking for something that will work for an arbitrary call depth.
The quick answer is that you're in a different environment - essentially picture the variables in a box: you have a box for the function and one for the Global Environment. You just need to tell rm where to find that box.
So
rel_mem <- function(nm) {
# State the environment
rm(list=nm, envir = .GlobalEnv )
}
x = 10
rel_mem("x")
Alternatively, you can use the pos argument, e.g.
rel_mem <- function(nm) {
rm(list=nm, pos=1 )
}
If you type search() you will see a vector of environments, the global is number 1.
Another two options are
envir = parent.frame() if you want to go one level up the call stack
Use inherits = TRUE to go up the call stack until you find something
In the above code, notice that I'm passing the object as a character - I'm passing the "x" not x. We can be clever and avoid this using the substitute function
rel_mem <- function(nm) {
rm(list = as.character(substitute(nm)), envir = .GlobalEnv )
}
To finish I'll just add that deleting things in the .GlobalEnv from a function is generally a bad idea.
Further resources:
Environments:http://adv-r.had.co.nz/Environments.html
Substitute function: http://adv-r.had.co.nz/Computing-on-the-language.html#capturing-expressions
If you are using another function to find the global objects within your function such as ls(), you must state the environment in it explicitly too:
rel_mem <- function(nm) {
# State the environment in both functions
rm(list = ls(envir = .GlobalEnv) %>% .[startsWith(., "plot_")], envir = .GlobalEnv)
}

Environment expansion in R

I have some problem I don't know how to solve.
Prehistory: I use R.NET for my calculation (need WPF application). So, I want to parallelize my app, and I created dynamic proxy for REngine class. It needs to serialize data to pass and receive data from-to REngine instance via TCP. Bad news - R.NET classes cannot be serialized. So, I have an idea to serialize R objects in R and pass R serialized data between processes.
So I have same script like this:
a <- 5;
b <- 10;
x <- a+b;
I need to wrap it like this:
wrapFunction <- function()
{
a <- 5;
b <- 10;
x <- a+b;
}
serializedResult <- serialize(wrapFunction());
I'll get serializedResult and pass it as byte array. Also I need to pass environments. But I won't get a, b, x in .GlobalEnv after these manipulations.
How is it possible to get all variables, defined in function body, in my .GlobalEnv?
I don't know names and count, I can't rewrite basic script, replacing "<-" by "<<-".
Other ways?
Thank you.
I'm not sure I fully understand your requirements. They seem to go against the functional language paradigm, which R tries to follow. The following might be helpful or not:
e <- new.env()
wrapFunction <- function(){
with(e, {
a <- 5;
b <- 10;
x <- a+b;
})
}
wrapFunction()
e$a
#[1] 5
You can of course use the .GlobalEnv instead of e, but at least in R that would be considered an even worse practice.

How to change the loop structure into sapply function?

I had changed the for loop into sapply function ,but failed .
I want to know why ?
list.files("c:/",pattern="mp3$",recursive=TRUE,full.names=TRUE)->z
c()->w
left<-basename(z)
for (i in 1:length(z)){
if (is.element(basename(z[i]),left))
{
append(w,values=z[i])->w;
setdiff(left,basename(z[i]))->left
}}
print(w)
list.files("c:/",pattern="mp3$",recursive=TRUE,full.names=TRUE)->z
c()->w
left<-basename(z)
sapply(z,function(y){
if (is.element(basename(y),left))
{ append(w,values=y)->w;
setdiff(left,basename(y))->left
}})
print(w)
my rule of selecting music is that if the basename(music) is the same ,then save only one full.name of music ,so unique can not be used directly.there are two concepts full.name and basename in file path which can confuse people here.
The problem you have here is that you want your function to have two side-effects. By side-effect, I mean modify objects that are outside its scope:w and left.
Curently, w and left are only modified within the function's scope, then they are eventually lost as the function call ends.
Instead, you want to modify w and left outside the function's environment. For that you can use <<- instead of <-:
sapply(z, function(y) {
if (is.element(basename(y),left)) {
w <<- append(w, values = y)
left <<- setdiff(left, basename(y))
}
})
Note that I have been saying "you want", "you can", but this is not what "you should" do. Functions with side-effects are really considered bad programming. Try reading about it.
Also, it is good to reserve the *apply tools to functions that can run their inputs independently. Here instead, you have an algorithm where the outcome of an iteration depends on the outcome of the previous ones. These are cases where you're better off using a for loop, unless you can rethink the algorithm in a framework that better suits *apply or can make use of functions that can handle such dependent situations: filter, unique, rle, etc.
For example, using unique, your code can be rewritten as:
base.names <- basename(z)
left <- unique(base.names)
w <- z[match(left, base.names)]
It also has the advantage that it is not building an object recursively, another no-no of your current code.

find all functions in a package that use a function

I would like to find all functions in a package that use a function. By functionB "using" functionA I mean that there exists a set of parameters such that functionA is called when functionB is given those parameters.
Also, it would be nice to be able to control the level at which the results are reported. For example, if I have the following:
outer_fn <- function(a,b,c) {
inner_fn <- function(a,b) {
my_arg <- function(a) {
a^2
}
my_arg(a)
}
inner_fn(a,b)
}
I might or might not care to have inner_fn reported. Probably in most cases not, but I think this might be difficult to do.
Can someone give me some direction on this?
Thanks
A small step to find uses of functions is to find where the function name is used. Here's a small example of how to do that:
findRefs <- function(pkg, fn) {
ns <- getNamespace(pkg)
found <- vapply(ls(ns, all.names=TRUE), function(n) {
f <- get(n, ns)
is.function(f) && fn %in% all.names(body(f))
}, logical(1))
names(found[found])
}
findRefs('stats', 'lm.fit')
#[1] "add1.lm" "aov" "drop1.lm" "lm" "promax"
...To go further you'd need to analyze the body to ensure it is a function call or the FUN argument to an apply-like function or the f argument to Map etc... - so in the general case, it is nearly impossible to find all legal references...
Then you should really also check that getting the name from that function's environment returns the same function you are looking for (it might use a different function with the same name)... This would actually handle your "inner function" case.
(Upgraded from a comment.) There is a very nice foodweb function in Mark Bravington's mvbutils package with a lot of this capability, including graphical representations of the resulting call graphs. This blog post gives a brief description.

Resources