Query of List of Functions - r

I have the following example code from a course in coursera
makeVector <- function(x = numeric()) {
m <- NULL
set <- function(y) {
x <<- y
m <<- NULL
}
get <- function() x
setmean <- function(mean) m <<- mean
getmean <- function() m
list(set = set, get = get,
setmean = setmean,
getmean = getmean)
}
However , i dont understand the significant of the last list function. Can some one explain it ? Thank you.

The final call to list() builds a new list object and returns it from makeVector() (because it's the last statement in the function).
The new list object is populated with four components: set, get, setmean, and getmean. The value of each of these four components is populated with a function that was defined dynamically during the execution of the makeVector() function.
All four functions are a little bit special in that they all will end up sharing a pointer to the execution environment that was in effect at the time the makeVector() function was executing for that particular invocation of itself. Because they variously read and write the variables x and m that were local to that invocation, they all end up sharing those variables as pseudo-private variables. This is sort of an implementation of the functional pattern of object-oriented programming, where temporary local variables are closured by a new set of dynamically defined functions within a temporary scope. This pattern is seen in various languages, including R, Perl, and JavaScript.
Also note that the writability of the shared variables only works because the superassignment operator (<<-) was used to assign to them from the scope of the dynamic mutator functions (set() and setmean()). If the normal assignment operator (<-) had been used, then a new local variable would be temporarily created in the scope of the dynamic mutator function itself at the time it would be called, which would not affect the closure variables.

Related

R: Where are formals for a function stored in memory?

When a function has been defined but has not yet been called, do the formals that do not have default values exist? If they do, do they exist in the execution environment, or in the environment where the function definition is located, or somewhere else?
If a function has been defined but not yet called, and a formal has been assigned a default value, does that value exist? If it does, in what environment does it exist? If the default expression evaluates to a constant, has the formal been assigned to that value, to be overwritten when the function is called if a value is supplied? If not, in what environment is that (fixed) default value located between the moment of definition and the time the function is called?
After the function has been called and actual or default values have been assigned to the formals, passed into the body, and if necessary scoped and/or evaluated, do the formals continue to exist? If so, in what environment do they then exist?
The formals for a function exist as objects within the environment of a function once an instance of the function is loaded into memory by being called. In Advanced R, Hadley Wickham calls this environment the execution environment. The memory locations of the objects can be accessed via pryr::address().
As an example I'll use a modified version of code that I previously wrote to illustrate memory locations in the makeVector() function from the second programming assignment for the Johns Hopkins R Programming course on coursera.org.
makeVector <- function(x = 200) {
library(pryr)
message(paste("Address of x argument is:",address(x)))
message(paste("Number of references to x is:",refs(x)))
m <- NULL
set <- function(y) {
x <<- y
message(paste("set() address of x is:",address(x)))
message(paste("Number of references to x is:",refs(x)))
m <<- NULL
}
get <- function() x
setmean <- function(mean) m <<- mean
getmean <- function() m
list(set = set, get = get,
setmean = setmean,
getmean = getmean)
}
As coded above, makeVector() is an S3 object, which means we can access objects within its environment via getters and setters, also known as mutator methods.
We can load an instance of the makeVector() object into memory and query the address and value of x with the following code.
makeVector()$get()
...and the result:
> makeVector()$get()
Address of x argument is: 0x1103df4e0
Number of references to x is: 0
[1] 200
>
As we can see from the output, x does have a memory location, but there are no other objects that contain references to it. Also, x was set to its default value of a vector of length 1 with the value 200.
I provide a detailed walkthrough of the objects in the makeVector() environment in my answer to Caching the Mean of a Vector in R.
Regarding the question about how long the formals exist in memory, they exist as long as the environment created to store the called instance of the function is in memory. Since the garbage collector operates on objects that have no external references, if the function instance is not saved to an object, it is eligible for garbage collection as soon as the function call returns a result to the parent environment.

Call Arguments of Function inside Function / R language

I have a function:
func <- function (x)
{
arguments <- match.call()
return(arguments)
}
1) If I call my function with specifying argument in the call:
func("value")
I get:
func(x = "value")
2) If I call my function by passing a variable:
my_variable <-"value"
func(my_variable)
I get:
func(x = my_variable)
Why is the first and the second result different?
Can I somehow get in the second call "func(x = "value")"?
I'm thinking my problem is that the Environment inside a function simply doesn't contain values if they were passed by variables. The Environment contains only names of variables for further lookup. Is there a way to follow such reference and get value from inside a function?
In R, when you pass my_variable as formal argument x into a function, the value of my_variable will only be retrieved when the function tries to read x (if it does not use x, my_variable will not be read at all). The same applies when you pass more complicated arguments, such as func(x = compute_my_variable()) -- the call to compute_my_variable will take place when func tries to read x (this is referred to as lazy evaluation).
Given lazy evaluation, what you are trying to do is not well defined because of side effects - in which order would you like to evaluate the arguments? Which arguments would you like to evaluate at all? (note a function can just take an expression for its argument using substitute, but not evaluate it). As a side effect, compute_my_variable could modify something that would impact the result of another argument of func. This can happen even when you only passed variables and constants as arguments (function func could modify some of the variables that will be later read, or even reading a variable such as my_variable could trigger code that would modify some of the variables that will be read later, e.g. with active bindings or delayed assignment).
So, if all you want to do is to log how a function was called, you can use sys.call (or match.call but that indeed expands argument names, etc). If you wanted a more complete stacktrace, you can use e.g. traceback(1).
If for some reason you really wanted values of all arguments, say as if they were all read in the order of match.call, which is the order in which they are declared, you can do it using eval (returns them as list):
lapply(as.list(match.call())[-1], eval)
can't you simply
return paste('func(x =', x, ')')

lapply() emptied list step by step while processing

First of all, excuse me for the bad title. I'm still so confused about this behavior, that I wasn't able to describe it; however I was able to reproduce it and broke it down to an (goofy) example.
Please, could you be so kind and explain why other.list appears to be full of NULLs after calling lapply()?
some.list <- rep(list(rnorm(1)),33)
other.list <- rep(list(), length = 33)
lapply(seq_along(some.list), function(i, other.list) {
other.list[[i]] <- some.list[[i]]
browser()
}, other.list)
I watched this in debugging mode in RStudio. For certain i, other.list[[i]] gets some.list[[i]] assigned, but it will be NULLed for the next iteration. I want to understand this behavior so bad!
The reason is that the assignment is taking place inside a function, and you've used the normal assignment operator <-, rather than the superassignment operator <<-. When inside a function scope, IOW when a function is executed, the normal assignment operator always assigns to a local variable in the evaluation environment that is created for that particular evaluation of that function (returned by a call to environment() from inside the function with fun=NULL). Thus, your global other.list variable, which is defined in the global environment (returned by globalenv()), will not be touched by such an assignment. The superassignment operator, on the other hand, will follow the closure environment chain (can be followed recursively via parent.env()) back until it finds a variable with the name on the LHS of the assignment, and then it assigns to that. The global environment is always at the base of the closure environment chain. If no such variable is found, the superassignment operator creates one in the global environment.
Thus, if you change <- to <<- in the assignment that takes place inside the function, you will be able to modify the global other.list variable.
See https://stat.ethz.ch/R-manual/R-devel/library/base/html/assignOps.html.
Here, I tried to make a little demo to demonstrate these concepts. In all my assignments, I'm assigning the actual environment that contains the variable being assigned to:
oldGlobal <- environment(); ## environment() is same as globalenv() in global scope
(function() {
newLocal1 <- environment(); ## creates a new local variable in this function evaluation's evaluation environment
print(newLocal1); ## <environment: 0x6014cbca8> (different for every evaluation)
oldGlobal <<- parent.env(environment()); ## target search hits oldGlobal in closure environment; RHS is same as globalenv()
newGlobal1 <<- globalenv(); ## target search fails; creates a new variable in the global environment
(function() {
newLocal2 <- environment(); ## creates a new local variable in this function evaluation's evaluation environment
print(newLocal2); ## <environment: 0x6014d2160> (different for every evaluation)
newLocal1 <<- parent.env(environment()); ## target search hits the existing newLocal1 in closure environment
print(newLocal1); ## same value that was already in newLocal1
oldGlobal <<- parent.env(parent.env(environment())); ## target search hits oldGlobal two closure environments up in the chain; RHS is same as globalenv()
newGlobal2 <<- globalenv(); ## target search fails; creates a new variable in the global environment
})();
})();
oldGlobal; ## <environment: R_GlobalEnv>
newGlobal1; ## <environment: R_GlobalEnv>
newGlobal2; ## <environment: R_GlobalEnv>
I haven't run your code, but two observations:
I usually avoid putting browser() as the last line inside a function because that gets treated as the return value
other.list does not get modified by your lapply. You need to understand the basics of environments and that any bindings you make inside lapply do not hold outside of it. It's a design feature and the whole point is that lapply can't have side effects - you should only use its return value. You can either use the <<- operator instead of <- though I don't recommend that, or you can use the assign function instead. Or you can do it properly the way lapply is meant to be used:
others.list <- lapply(seq_along(some.list), function(i, other.list) {
some.list[[i]]
})
Note that it's generally recommended to not make assignments inside lapply that change variables outside of it. lapply is meant to perform a function on every element and return a list, and that list should be all that lapply is used for

Mutating a variable in a closure [duplicate]

This question already has answers here:
Global and local variables in R
(3 answers)
Closed 8 years ago.
I'm pretty new to R, but coming from Scheme—which is also lexically scoped and has closures—I would expect being able to mutate outer variables in a closure.
E.g., in
foo <- function() {
s <- 100
add <- function() {
s <- s + 1
}
add()
s
}
cat(foo(), "\n") # prints 100 and not 101
I would expect foo() to return 101, but it actually returns 100:
$ Rscript foo.R
100
I know that Python has the global keyword to declare scope of variables (doesn't work with this example, though). Does R need something similar?
What am I doing wrong?
Update
Ah, is the problem that in add I am creating a new, local variable s that shadows the outer s? If so, how can I mutate s without creating a local variable?
Use the <<- operator for assignment in the add() function.
From ?"<<-":
The operators <<- and ->> are normally only used in functions, and cause a search to made through parent environments for an existing definition of the variable being assigned. If such a variable is found (and its binding is not locked) then its value is redefined, otherwise assignment takes place in the global environment. Note that their semantics differ from that in the S language, but are useful in conjunction with the scoping rules of R. See ‘The R Language Definition’ manual for further details and examples.
You can also use assign and define the scope precisely using the envir argument, works the same way as <<- in your add function in this case but makes your intention a little more clear:
foo <- function() {
s <- 100
add <- function() {
assign("s", s + 1, envir = parent.frame())
}
add()
s
}
cat(foo(), "\n")
Of course the better way for this kind of thing in R is to have your function return the variable (or variables) it modifies and explicitly reassigning them to the original variable:
foo <- function() {
s <- 100
add <- function(x) x + 1
s <- add(s)
s
}
cat(foo(), "\n")
Here is one more approach that can be a little safer than the assign or <<- approaches:
foo <- function() {
e <- environment()
s <- 100
add <- function() {
e$s <- e$s + 1
}
add()
s
}
foo()
The <<- assignment can cause problems if you accidentally misspell your variable name, it will still do something, but it will not be what you are expecting and can be hard to find the source of the problem. The assign approach can be tricky if you then want to move your add function to inside another function, or call it from another function. The best approach overall is to not have the functions modify variables outside their own scope and have the function return anything that is important. But when that is not possible, the above method uses lexical scoping to access the environment e, then assigns into the environment so it will always assign specifically into that function, never above or below.

How does local() differ from other approaches to closure in R?

Yesterday I learned from Bill Venables how local() can help create static functions and variables, e.g.,
example <- local({
hidden.x <- "You can't see me!"
hidden.fn <- function(){
cat("\"hidden.fn()\"")
}
function(){
cat("You can see and call example()\n")
cat("but you can't see hidden.x\n")
cat("and you can't call ")
hidden.fn()
cat("\n")
}
})
which behaves as follows from the command prompt:
> ls()
[1] "example"
> example()
You can see and call example()
but you can't see hidden.x
and you can't call "hidden.fn()"
> hidden.x
Error: object 'hidden.x' not found
> hidden.fn()
Error: could not find function "hidden.fn"
I've seen this kind of thing discussed in Static Variables in R where a different approach was employed.
What the pros and cons of these two methods?
Encapsulation
The advantage of this style of programming is that the hidden objects won't likely be overwritten by anything else so you can be more confident that they contain what you think. They won't be used by mistake since they can't readily be accessed. In the linked-to post in the question there is a global variable, count, which could be accessed and overwritten from anywhere so if we are debugging code and looking at count and see its changed we cannnot really be sure what part of the code has changed it. In contrast, in the example code of the question we have greater assurance that no other part of the code is involved.
Note that we actually can access the hidden function although its not that easy:
# run hidden.fn
environment(example)$hidden.fn()
Object Oriented Programming
Also note that this is very close to object oriented programming where example and hidden.fn are methods and hidden.x is a property. We could do it like this to make it explicit:
library(proto)
p <- proto(x = "x",
fn = function(.) cat(' "fn()"\n '),
example = function(.) .$fn()
)
p$example() # prints "fn()"
proto does not hide x and fn but its not that easy to access them by mistake since you must use p$x and p$fn() to access them which is not really that different than being able to write e <- environment(example); e$hidden.fn()
EDIT:
The object oriented approach does add the possibility of inheritance, e.g. one could define a child of p which acts like p except that it overrides fn.
ch <- p$proto(fn = function(.) cat("Hello from ch\n")) # child
ch$example() # prints: Hello from ch
local() can implement a singleton pattern -- e.g., the snow package uses this to track the single Rmpi instance that the user might create.
getMPIcluster <- NULL
setMPIcluster <- NULL
local({
cl <- NULL
getMPIcluster <<- function() cl
setMPIcluster <<- function(new) cl <<- new
})
local() might also be used to manage memory in a script, e.g., allocating large intermediate objects required to create a final object on the last line of the clause. The large intermediate objects are available for garbage collection when local returns.
Using a function to create a closure is a factory pattern -- the bank account example in the Introduction To R documentation, where each time open.account is invoked, a new account is created.
As #otsaw mentions, memoization might be implemented using local, e.g., to cache web sites in a crawler
library(XML)
crawler <- local({
seen <- new.env(parent=emptyenv())
.do_crawl <- function(url, base, pattern) {
if (!exists(url, seen)) {
message(url)
xml <- htmlTreeParse(url, useInternal=TRUE)
hrefs <- unlist(getNodeSet(xml, "//a/#href"))
urls <-
sprintf("%s%s", base, grep(pattern, hrefs, value=TRUE))
seen[[url]] <- length(urls)
for (url in urls)
.do_crawl(url, base, pattern)
}
}
.do_report <- function(url) {
urls <- as.list(seen)
data.frame(Url=names(urls), Links=unlist(unname(urls)),
stringsAsFactors=FALSE)
}
list(crawl=function(base, pattern="^/.*html$") {
.do_crawl(base, base, pattern)
}, report=.do_report)
})
crawler$crawl(favorite_url)
dim(crawler$report())
(the usual example of memoization, Fibonacci numbers, is not satisfying -- the range of numbers that don't overflow R's numeric representation is small , so one would probably use a look-up table of efficiently pre-calculated values). Interesting how crawler here is a singleton; could as easily have followed a factory pattern, so one crawler per base URL.

Resources