Note: This is separate from, though perhaps similar to, the
deparse-substitute trick
of attaining the name of a passed argument.
Consider the following situation: I have some function to be called, and the return value
is to be assigned to some variable, say x.
Inside the function, how can I capture that the name to be assigned to the
returned value is x, upon calling and assigning the function?
For example:
nameCapture <- function() {
# arbitrary code
captureVarName()
}
x <- nameCapture()
x
## should return some reference to the name "x"
What in R closest approximates captureVarName() referenced in the example?
My intuition was that there would be something in the call stack to do with
assign(), where x would be an argument and could be extracted, but
sys.call() yielded nothing of the sort; does it then occur internally, and if
so, what is a sensible way to attain something like captureVarName()?
My notion is that it would act in a similar manner to how the following works, though without the assign() function, using the <- operator instead:
nameCapture <- function() sys.call(1)[[2]]
assign("x", nameCapture())
x
# [1] "x"
I'm creating an S3 method for a generic defined in another package. An earlier method for the generic produces some console output that's not returned as part of the function return value, it's only printed to the console. I'd like to capture that output for use in my own method.
I tried using capture.output() on NextMethod(), but that just results in a bizarre error:
foo <- function(x, ...) UseMethod("foo")
foo.bar <- function(x, ...) cat(x, "\n")
foo.baz <- function(x, ...) capture.output(NextMethod())
foo(structure(1, class = "bar"))
#> 1
foo(structure(1, class = c("baz", "bar")))
#> Error: 'function' is not a function, but of type 8
Is this expected behaviour, a known limitation, or a bug? I couldn't find anything matching this error with a quick search.
How can I capture the output of the next S3 method in another S3 method?
This is... "expected behavior." I say that because I believe it's technically true, but there's probably no way for a user to expect it necessarily. If you don't care why it happens, but just want to see how to work around it, skip down to the heading "The Fix", because the following explanation of the error is a little involved.
What does 'function' is not a function, but of type 8 mean?
type 8 refers to a type 8 SEXP. From Section one of the R Internals Manual:
What R users think of as variables or objects are symbols which are
bound to a value. The value can be thought of as either a SEXP (a
pointer), or the structure it points to, a SEXPREC...
Currently SEXPTYPEs 0:10 and 13:25 are in use....
no SEXPTYPE Description
...
3 CLOSXP closures
...
8 BUILTINSXP builtin functions
NextMethod() expects a CLOSXP, not a BUILTINSXP. We can see this if we look at the source code (around line 717) of do_nextmethod(), the C function underlying NextMethod()
SEXP attribute_hidden do_nextmethod(SEXP call, SEXP op, SEXP args, SEXP env)
{
// Some code omitted
if (TYPEOF(s) != CLOSXP){ /* R_LookupMethod looked for a function */
if (s == R_UnboundValue)
error(_("no calling generic was found: was a method called directly?"));
else
errorcall(R_NilValue,
_("'function' is not a function, but of type %d"),
TYPEOF(s));
}
So why did that happen here? This is where it gets tricky. I believe it's because by passing NextMethod() through capture.output(), it gets called using eval(), which is a built-in (see builtins()).
So how can we deal with this? Read on...
The Fix
We can simulate capture output with clever use of sink(), cat(), and tempfile():
foo.baz <- function(x, ...) {
# Create a temporary file to store the output
tmp <- tempfile("tmp.txt")
# start sink()
sink(tmp)
# call NextMethod() just for the purpose of capturing output
NextMethod()
# stop sink'ing
sink()
# store the output in an R object
y <- readLines(tmp)
# here we'll cat() the output to make sure it worked
cat("The output was:", y, "\n")
# destroy the temporary file
unlink(tmp)
# and call NextMethod for its actual execution
NextMethod()
}
foo(structure(1, class = c("baz", "bar")))
# 1
I'm not sure if what you saw is documented or not: the documentation ?NextMethod makes clear that it isn't a regular function, but I didn't follow all the details to see if your usage would be allowed.
One way to do what you want would be
foo.baz <- function(x, ...) {class(x) <- class(x)[-1]; capture.output(foo(x, ...))}
This assumes that the method was called directly from a call to the generic; it won't work if there's a third level, and foo.baz was itself invoked by NextMethod().
One of the most important things to know about the evaluation of arguments to a function is that supplied arguments and default arguments are treated differently. The supplied arguments to a function are evaluated in the evaluation frame of the calling function. The default arguments to a function are evaluated in the evaluation frame of the function.
I don't quite understand what it is meant by calling function. Is it the function that is invoked (like in interactive sesion with function that has named assigned you type name and hit enter). If yes how evaluation frame of the callinig function differs from evaluation frame of the function?
First change to standard terms. The arguments that are used in the function definition are the formal arguments and the arguments that are passed to the function when calling it are the actual arguments. (The quoted passage in the question is referring to the actual arguments when it uses the nonstandard term, supplied arguments.)
Consider two cases via example.
Case 1
Below f has the formal argument x and when f is called in the last line of code there are no actual arguments.
Now when f is called in the last line of code x gets the value 2 because x is not set until it is used and when it is used a is looked up within the function where it has the value 2, not in the caller where it has the value 1.
a <- 1
f <- function(x = a) {
a <- 2
x
}
f()
## [1] 2
Case 2
On the other hand the actual arguments are evaluated in the caller. In the last line of code below x is set to 1 because that is the value of b in the caller. Again, x is not evaluated until it is used but now even though b has been set to 2 in the function itself this has no effect on x. x is set to 1, not 2.
b <- 1
g <- function(x) { b <- 2; x + b }
g(b)
## [1] 3
Other
Although this covers the two cases in the quote note that there exists another case which is the situation that occurs when x is referred to in a function but is not defined in the function. In the code below a is a free variable in g since a is not an argument or otherwise defined in g. In this case when gg (which equals g) is called R attempts to look up a in the function g and fails but the next place it looks is not the caller (where a is 1) but the environment in which the function was defined, i.e. the environment where the word function appears and a is 2 in that environment.
a <- 1
f <- function() {
a <- 2
g <- function() a
}
gg <- f()
gg()
## [1] 2
This is referred to as lexical scoping since one can tell where the free variables are looked up by simply looking at the function definitions.
I saw:
“To understand computations in R, two slogans are helpful:
• Everything that exists is an object.
• Everything that happens is a function call."
— John Chambers
But I just found:
a <- 2
is.object(a)
# FALSE
Actually, if a variable is a pure base type, it's result is.object() would be FALSE. So it should not be an object.
So what's the real meaning about 'Everything that exists is an object' in R?
The function is.object seems only to look if the object has a "class" attribute. So it has not the same meaning as in the slogan.
For instance:
x <- 1
attributes(x) # it does not have a class attribute
NULL
is.object(x)
[1] FALSE
class(x) <- "my_class"
attributes(x) # now it has a class attribute
$class
[1] "my_class"
is.object(x)
[1] TRUE
Now, trying to answer your real question, about the slogan, this is how I would put it. Everything that exists in R is an object in the sense that it is a kind of data structure that can be manipulated. I think this is better understood with functions and expressions, which are not usually thought as data.
Taking a quote from Chambers (2008):
The central computation in R is a function call, defined by the
function object itself and the objects that are supplied as the
arguments. In the functional programming model, the result is defined
by another object, the value of the call. Hence the traditional motto
of the S language: everything is an object—the arguments, the value,
and in fact the function and the call itself: All of these are defined
as objects. Think of objects as collections of data of all kinds. The data contained and the way the data is organized depend on the class from which the object was generated.
Take this expression for example mean(rnorm(100), trim = 0.9). Until it is is evaluated, it is an object very much like any other. So you can change its elements just like you would do it with a list. For instance:
call <- substitute(mean(rnorm(100), trim = 0.9))
call[[2]] <- substitute(rt(100,2 ))
call
mean(rt(100, 2), trim = 0.9)
Or take a function, like rnorm:
rnorm
function (n, mean = 0, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
You can change its default arguments just like a simple object, like a list, too:
formals(rnorm)[2] <- 100
rnorm
function (n, mean = 100, sd = 1)
.Call(C_rnorm, n, mean, sd)
<environment: namespace:stats>
Taking one more time from Chambers (2008):
The key concept is that expressions for evaluation are themselves
objects; in the traditional motto of the S language, everything is an
object. Evaluation consists of taking the object representing an
expression and returning the object that is the value of that
expression.
So going back to our call example, the call is an object which represents another object. When evaluated, it becomes that other object, which in this case is the numeric vector with one number: -0.008138572.
set.seed(1)
eval(call)
[1] -0.008138572
And that would take us to the second slogan, which you did not mention, but usually comes together with the first one: "Everything that happens is a function call".
Taking again from Chambers (2008), he actually qualifies this statement a little bit:
Nearly everything that happens in R results from a function call.
Therefore, basic programming centers on creating and refining
functions.
So what that means is that almost every transformation of data that happens in R is a function call. Even a simple thing, like a parenthesis, is a function in R.
So taking the parenthesis like an example, you can actually redefine it to do things like this:
`(` <- function(x) x + 1
(1)
[1] 2
Which is not a good idea but illustrates the point. So I guess this is how I would sum it up: Everything that exists in R is an object because they are data which can be manipulated. And (almost) everything that happens is a function call, which is an evaluation of this object which gives you another object.
I love that quote.
In another (as of now unpublished) write-up, the author continues with
R has a uniform internal structure for representing all objects. The evaluation process keys off that structure, in a simple form that is essentially
composed of function calls, with objects as arguments and an object as the
value. Understanding the central role of objects and functions in R makes
use of the software more effective for any challenging application, even those where extending R is not the goal.
but then spends several hundred pages expanding on it. It will be a great read once finished.
Objects For x to be an object means that it has a class thus class(x) returns a class for every object. Even functions have a class as do environments and other objects one might not expect:
class(sin)
## [1] "function"
class(.GlobalEnv)
## [1] "environment"
I would not pay too much attention to is.object. is.object(x) has a slightly different meaning than what we are using here -- it returns TRUE if x has a class name internally stored along with its value. If the class is stored then class(x) returns the stored value and if not then class(x) will compute it from the type. From a conceptual perspective it matters not how the class is stored internally (stored or computed) -- what matters is that in both cases x is still an object and still has a class.
Functions That all computation occurs through functions refers to the fact that even things that you might not expect to be functions are actually functions. For example when we write:
{ 1; 2 }
## [1] 2
if (pi > 0) 2 else 3
## [1] 2
1+2
## [1] 3
we are actually making invocations of the {, if and + functions:
`{`(1, 2)
## [1] 2
`if`(pi > 0, 2, 3)
## [1] 2
`+`(1, 2)
## [1] 3
I am learning R and came across some code as part of the practice assignment.
makeVector <- function(x = numeric()) {
m <- NULL
set <- function(y) {
x <<- y
m <<- NULL
}
get <- function() x
setmean <- function(mean) m <<- mean
getmean <- function() m
list(set = set, get = get,
setmean = setmean,
getmean = getmean)
}
The documentation says:
The function, makeVector creates a special "vector", which is
really a list containing a function to
set the value of the vector
get the value of the vector
set the value of the mean
get the value of the mean
But i can not understand how the function works except for the point that it is assigning mean value to the variable m in that particular environment.
m <- NULL begins by setting the mean to NULL as a placeholder for a future value
set <- function(y) {x <<- y; m <<- NULL} defines a function to set the vector, x, to a new
vector, y, and resets the mean, m, to NULL
get <- function() x returns the vector, x
setmean <- function(mean) m <<- mean sets the mean, m, to mean
getmean <- function() m returns the mean, m
list(set = set, get = get,setmean = setmean,getmean = getmean) returns the 'special
vector' containing all of the functions just defined
This answer is an excerpt from an article I originally wrote in 2016 as a Community Mentor in the Johns Hopkins University R Programming course: Demystifying makeVector().
Overall Design of makeVector() and cachemean()
The cachemean.R file contains two functions, makeVector() and cachemean(). The first function in the file, makeVector() creates an R object that stores a vector and its mean. The second function, cachemean() requires an argument that is returned by makeVector() in order to retrieve the mean from the cached value that is stored in the makeVector() object's environment.
What's going on in makeVector()?
The key concept to understand in makeVector() is that it builds a set of functions and returns the functions within a list to the parent environment. That is,
myVector <- makeVector(1:15)
results in an object, myVector, that contains four functions: set(), get(), setmean(), and getmean(). It also includes the two data objects, x and m.
Due to lexical scoping, myVector contains a complete copy of the environment for makeVector(), including any objects that are defined within makeVector() at design time (i.e., when it was coded). A diagram of the environment hierarchy makes it clear what is accessible within myVector.
Illustrated as a hierarchy, the global environment contains the makeVector() environment. All other content is present in the makeVector() environment, as illustrated below.
Since each function has its own environment in R, the hierarchy illustrates that the objects x and m are siblings of the four functions, get(), set(), getmean(), and setmean().
Once the function is run and an object of type makeVector() is instantiated (that is, created), the environment containing myVector looks like:
Notice that the object x contains the vector 1:15, even though myVector$set() has not been executed. This is the case because the value 1:15 was passed as an argument into the makeVector() function. What explains this behavior?
When an R function returns an object that contains functions to its parent environment (as is the case with a call like myVector <- makeVector(1:15)), not only does myVector have access to the specific functions in its list, but it also retains access to the entire environment defined by makeVector(), including the original argument used to start the function.
Why is this the case? myVector contains pointers to functions that are within the makeVector() environment after the function ends, so these pointers prevent the memory consumed by makeVector() from being released by the garbage collector. Therefore, the entire makeVector() environment stays in memory, and myVector can access its functions as well as any data in that environment that is referenced in its functions.
This feature explains why x (the argument initialized on the original function call) is accessible by subsequent calls to functions on myVector such as myVector$get(), and it also explains why the code works without having to explicitly issue myVector$set() to set the value of x.
makeVector() step by step
Now, let's break the behavior of the function down, step by step.
Step 1: Initialize objects
The first thing that occurs in the function is the initialization of two objects, x and m.
makeVector(x = numeric()) {
m <- NULL
...
}
Notice that x is initialized as a function argument, so no further initialization is required within the function. m is set to NULL, initializing it as an object within the makeVector() environment to be used by later code in the function.
Furthermore, the formals part of the function declaration define the default value of x as an empty numeric vector. Initialization of the vector with a default value is important because without a default value, data <- x$get() generates the following error message.
Error in x$get() : argument "x" is missing, with no default
Step 2: Define the "behaviors" or functions for objects of type makeVector()
After initializing key objects that store key information within makeVector(), the code provides four basic behaviors that are typical for data elements within an object-oriented program. They're called "getters and settters," and more formally known as mutator and accessor methods. As one might expect, "getters" are program modules that retrieve (access) data within an object, and "setters" are program modules that set (mutate) the data values within an object.
First makeVector() defines the set() function. Most of the "magic" in makeVector() takes place in the set() function.
set <- function(y) {
x <<- y
m <<- NULL
}
set() takes an argument that is named as y. It is assumed that this value is a numeric vector, but is not stated directly in the function formals. For the purposes of the set() function, it doesn't matter whether this argument is called y, aVector or any object name other than x. Why? Since there is an x object already defined in the makeVector() environment, using the same object name would make the code more difficult to understand.
Within set() we use the <<- form of the assignment operator, which assigns the value on the right side of the operator to an object in the parent environment named by the object on the left side of the operator.
When set() is executed, it does two things:
Assign the input argument to the x object in the parent environment, and
Assign the value of NULL to the m object in the parent environment. This line of code clears any value of m that had been cached by a prior execution of cachemean().
Therefore, if there is already a valid mean cached in m, whenever x is reset, the value of m cached in the memory of the object is cleared, forcing subsequent calls to cachemean() to recalculate the mean rather than retrieving the wrong value from cache.
Notice that the two lines of code in set() do exactly the same thing as the first two lines in the main function: set the value of x, and NULL the value of m.
Second, makeVector() defines the getter for the vector x.
get <- function() x
Again, this function takes advantage of the lexical scoping features in R. Since the symbol x is not defined within get(), R retrieves it from the parent environment of makeVector().
Third, makeVector() defines the setter for the mean m.
setmean <- function(mean) m <<- mean
Since m is defined in the parent environment and we need to access it after setmean() completes, the code uses the <<- form of the assignment operator to assign the input argument to the value of m in the parent environment.
Finally, makeVector() defines the getter for the mean m. Just like the getter for x, R takes advantage of lexical scoping to find the correct symbol m to retrieve its value.
getmean <- function() m
At this point we have getters and setters defined for both of the data objects within our makeVector() object.
Step 3: Create a new object by returning a list()
Here is the other part of the "magic" in the operations of the makeVector() function. The last section of code assigns each of these functions as an element within a list(), and returns it to the parent environment.
list(set = set, get = get,
setmean = setmean,
getmean = getmean)
When the function ends, it returns a fully formed object of type makeVector() to be used by downstream R code. One other important subtlety about this code is that each element in the list is named. That is, each element in the list is created with a elementName = value syntax, as follows:
list(set = set, # gives the name 'set' to the set() function defined above
get = get, # gives the name 'get' to the get() function defined above
setmean = setmean, # gives the name 'setmean' to the setmean() function defined above
getmean = getmean) # gives the name 'getmean' to the getmean() function defined above
Naming the list elements is what allows us to use the $ form of the extract operator to access the functions by name rather than using the [[ form of the extract operator, as in myVector[[2]](), to get the contents of the vector.
Here it's important to note that the cachemean() function REQUIRES an input argument of type makeVector(). If one passes a regular vector to the function, as in
aResult <- cachemean(1:15)
the function call will fail with an error explaining that cachemean() was unable to access $getmean() on the input argument because $ does not work with atomic vectors. This is accurate, because a primitive vector is not a list, nor does it contain a $getmean() function, as illustrated below.
> aVector <- 1:10
> cachemean(aVector)
Error in x$getmean : $ operator is invalid for atomic vectors
Explaining cachemean()
Without cachemean(), the makeVector() function is incomplete. Why? As designed, cachemean() is required to populate or retrieve the mean from an object of type makeVector().
cachemean <- function(x, ...) {
...
Like makeVector(), cachemean() starts with a single argument, x, and an ellipsis that allows the caller to pass additional arguments into the function.
Next, the function attempts to retrieve a mean from the object passed in as the argument. First, it calls the getmean() function on the input object.
m <- x$getmean()
Then it checks to see whether the result is NULL. Since makeVector() sets the cached mean to NULL whenever a new vector is set into the object, if the value here is not equal to NULL, we have a valid, cached mean and can return it to the parent environment
if(!is.null(m)) {
message("getting cached data")
return(m)
}
If the result of !is.null(m) is FALSE, cachemean() gets the vector from the input object, calculates a mean(), uses the setmean() function on the input object to set the mean in the input object, and then returns the value of the mean to the parent environment by printing the mean object.
data <- x$get()
m <- mean(data, ...)
x$setmean(m)
m
Note that cachemean() is the only place where the mean() function is executed, which is why makeVector() is incomplete without cachemean().
Putting the Pieces Together: How the functions work at runtime
Now that we've explained the design of each of these functions, here is an illustration of how they work when used in an R script.
aVector <- makeVector(1:10)
aVector$get() # retrieve the value of x
aVector$getmean() # retrieve the value of m, which should be NULL
aVector$set(30:50) # reset value with a new vector
cachemean(aVector) # notice mean calculated is mean of 30:50, not 1:10
aVector$getmean() # retrieve it directly, now that it has been cached
Conclusion: what makes cachemean() work?
To summarize, the lexical scoping assignment in R Programming takes advantage of lexical scoping and the fact that functions that return objects of type list() also allow access to any other objects defined in the environment of the original function. In the specific instance of makeVector() this means that subsequent code can access the values of x or m through the use of getters and setters. This is how cachemean() is able to calculate and store the mean for the input argument if it is of type makeVector(). Because list elements in makeVector() are defined with names, we can access these functions with the $ form of the extract operator.
For additional commentary that explains how the assignment uses features of the S3 object system, please review makeCacheMatrix() as an Object.
Appendix A: What's the Point of this Assignment?
Once students get through the assignment, they frequently ask questions about its value and purpose. A good article explaining the value of lexical scoping in statistical computing is Lexical Scoping and Statistical Computing, written by Robert Gentleman and Ross Ihaka at the University of Auckland.
Appendix B: cachemean.R
Here is the entire listing for cachemean.R.
makeVector <- function(x = numeric()) {
m <- NULL
set <- function(y) {
x <<- y
m <<- NULL
}
get <- function() x
setmean <- function(mean) m <<- mean
getmean <- function() m
list(set = set, get = get,
setmean = setmean,
getmean = getmean)
}
cachemean <- function(x, ...) {
m <- x$getmean()
if(!is.null(m)) {
message("getting cached data")
return(m)
}
data <- x$get()
m <- mean(data, ...)
x$setmean(m)
m
}
Appendix C: Frequently Asked Questions
Q: Why doesn't cachemean() return the cached value? My code looks like:
cachemean(makeVector(1:100))
cachemean(makeVector(1:100))
A: Code written this way creates two different objects of type makeVector(), so the two calls to cachemean() initialize the means of each instance, rather than caching and retrieving from a single instance. Another way of illustrating how the above code operates is as follows.
Notice how the first call to cachemean() sets the cache, and the second call retrieves data from it.
Q: Why is set() never used in the code?
A: set() is included so that once an object of type makeVector() is created, its value can be changed without initializing another instance of the object. It is unnecessary the first time an object of type makeVector() is instantiated. Why? First, the value of x is set as a function argument, as in makeVector(1:30). Then, the first line of code in the function sets m <- NULL, simultaneously allocating memory for m and setting it to NULL. When a reference to this object is passed to the parent environment when the function ends, both x and m are available to be accessed by their respective get and set functions.
The following code illustrates the use of set().
Q: Why is x set with a default value in makeVector()?
A: Since x is an argument, the only place where one can set a default for it is in the formals. The type of error returned by cachemean() when a default value is not set,
Error in x$get() : argument "x" is missing, with no default
is undesirable. Our code should directly handle error conditions rather than relying on the underlying error handling in R.
It's perfectly valid to create an object of type makeVector() without populating its value during initialization. makeVector() includes a setter function so one can set its value after the object is created. However, the object must have valid data, a numeric vector, prior to executing cachemean().
Ideally, cachemean() would include logic to validate that x is not empty prior to calculating a mean. The default setting of x enables cachemean() to return NaN, which is a reasonable result.
References
Chi, Yau -- R-Tutor Named List Members, retrieved July 20, 2016.
Wickham, Hadley -- Advanced-R Functions, retrieved July 17, 2016.
Wickham, Hadley -- Advanced-R Scoping Issues, retrieved July 17, 2016.
I think that one good way to understand this example is trying the follow:
First check that when you use the function make_Vector now you have of four different setting
> mvec <- makeVector()
> x <- 1:4
> mvec$set(x)
> mvec$get()
> [1] 1 2 3 4
> mvec$getmean()
> NULL
> mvec$setmean(3.4)
> mvec$getmean()
> 3.4
3.4 It's not the correct mean, I put these number then you can check that you can set whatever number that you want.
The second part of the assignment is the follow:
cachemean <- function(x, ...) {
m <- x$getmean()
if(!is.null(m)) {
message("getting cached data")
return(m)
}
data <- x$get()
m <- mean(data, ...)
x$setmean(m)
m
}
These part or code check if you have the mean of the vector of interest. If these exist then you don't need calculate and you can use the cache variable.

I put a wrong number for the mean, then you can see that already I set the mean value as follow:
> cachemean(mvec)
> 3.4
You must pass the original mvec list used in the example