Caching the mean of a Vector in R - r

I am learning R and came across some code as part of the practice assignment.
makeVector <- function(x = numeric()) {
m <- NULL
set <- function(y) {
x <<- y
m <<- NULL
}
get <- function() x
setmean <- function(mean) m <<- mean
getmean <- function() m
list(set = set, get = get,
setmean = setmean,
getmean = getmean)
}
The documentation says:
The function, makeVector creates a special "vector", which is
really a list containing a function to
set the value of the vector
get the value of the vector
set the value of the mean
get the value of the mean
But i can not understand how the function works except for the point that it is assigning mean value to the variable m in that particular environment.

m <- NULL begins by setting the mean to NULL as a placeholder for a future value
set <- function(y) {x <<- y; m <<- NULL} defines a function to set the vector, x, to a new
vector, y, and resets the mean, m, to NULL
get <- function() x returns the vector, x
setmean <- function(mean) m <<- mean sets the mean, m, to mean
getmean <- function() m returns the mean, m
list(set = set, get = get,setmean = setmean,getmean = getmean) returns the 'special
vector' containing all of the functions just defined

This answer is an excerpt from an article I originally wrote in 2016 as a Community Mentor in the Johns Hopkins University R Programming course: Demystifying makeVector().
Overall Design of makeVector() and cachemean()
The cachemean.R file contains two functions, makeVector() and cachemean(). The first function in the file, makeVector() creates an R object that stores a vector and its mean. The second function, cachemean() requires an argument that is returned by makeVector() in order to retrieve the mean from the cached value that is stored in the makeVector() object's environment.
What's going on in makeVector()?
The key concept to understand in makeVector() is that it builds a set of functions and returns the functions within a list to the parent environment. That is,
myVector <- makeVector(1:15)
results in an object, myVector, that contains four functions: set(), get(), setmean(), and getmean(). It also includes the two data objects, x and m.
Due to lexical scoping, myVector contains a complete copy of the environment for makeVector(), including any objects that are defined within makeVector() at design time (i.e., when it was coded). A diagram of the environment hierarchy makes it clear what is accessible within myVector.
Illustrated as a hierarchy, the global environment contains the makeVector() environment. All other content is present in the makeVector() environment, as illustrated below.
Since each function has its own environment in R, the hierarchy illustrates that the objects x and m are siblings of the four functions, get(), set(), getmean(), and setmean().
Once the function is run and an object of type makeVector() is instantiated (that is, created), the environment containing myVector looks like:
Notice that the object x contains the vector 1:15, even though myVector$set() has not been executed. This is the case because the value 1:15 was passed as an argument into the makeVector() function. What explains this behavior?
When an R function returns an object that contains functions to its parent environment (as is the case with a call like myVector <- makeVector(1:15)), not only does myVector have access to the specific functions in its list, but it also retains access to the entire environment defined by makeVector(), including the original argument used to start the function.
Why is this the case? myVector contains pointers to functions that are within the makeVector() environment after the function ends, so these pointers prevent the memory consumed by makeVector() from being released by the garbage collector. Therefore, the entire makeVector() environment stays in memory, and myVector can access its functions as well as any data in that environment that is referenced in its functions.
This feature explains why x (the argument initialized on the original function call) is accessible by subsequent calls to functions on myVector such as myVector$get(), and it also explains why the code works without having to explicitly issue myVector$set() to set the value of x.
makeVector() step by step
Now, let's break the behavior of the function down, step by step.
Step 1: Initialize objects
The first thing that occurs in the function is the initialization of two objects, x and m.
makeVector(x = numeric()) {
m <- NULL
...
}
Notice that x is initialized as a function argument, so no further initialization is required within the function. m is set to NULL, initializing it as an object within the makeVector() environment to be used by later code in the function.
Furthermore, the formals part of the function declaration define the default value of x as an empty numeric vector. Initialization of the vector with a default value is important because without a default value, data <- x$get() generates the following error message.
Error in x$get() : argument "x" is missing, with no default
Step 2: Define the "behaviors" or functions for objects of type makeVector()
After initializing key objects that store key information within makeVector(), the code provides four basic behaviors that are typical for data elements within an object-oriented program. They're called "getters and settters," and more formally known as mutator and accessor methods. As one might expect, "getters" are program modules that retrieve (access) data within an object, and "setters" are program modules that set (mutate) the data values within an object.
First makeVector() defines the set() function. Most of the "magic" in makeVector() takes place in the set() function.
set <- function(y) {
x <<- y
m <<- NULL
}
set() takes an argument that is named as y. It is assumed that this value is a numeric vector, but is not stated directly in the function formals. For the purposes of the set() function, it doesn't matter whether this argument is called y, aVector or any object name other than x. Why? Since there is an x object already defined in the makeVector() environment, using the same object name would make the code more difficult to understand.
Within set() we use the <<- form of the assignment operator, which assigns the value on the right side of the operator to an object in the parent environment named by the object on the left side of the operator.
When set() is executed, it does two things:
Assign the input argument to the x object in the parent environment, and
Assign the value of NULL to the m object in the parent environment. This line of code clears any value of m that had been cached by a prior execution of cachemean().
Therefore, if there is already a valid mean cached in m, whenever x is reset, the value of m cached in the memory of the object is cleared, forcing subsequent calls to cachemean() to recalculate the mean rather than retrieving the wrong value from cache.
Notice that the two lines of code in set() do exactly the same thing as the first two lines in the main function: set the value of x, and NULL the value of m.
Second, makeVector() defines the getter for the vector x.
get <- function() x
Again, this function takes advantage of the lexical scoping features in R. Since the symbol x is not defined within get(), R retrieves it from the parent environment of makeVector().
Third, makeVector() defines the setter for the mean m.
setmean <- function(mean) m <<- mean
Since m is defined in the parent environment and we need to access it after setmean() completes, the code uses the <<- form of the assignment operator to assign the input argument to the value of m in the parent environment.
Finally, makeVector() defines the getter for the mean m. Just like the getter for x, R takes advantage of lexical scoping to find the correct symbol m to retrieve its value.
getmean <- function() m
At this point we have getters and setters defined for both of the data objects within our makeVector() object.
Step 3: Create a new object by returning a list()
Here is the other part of the "magic" in the operations of the makeVector() function. The last section of code assigns each of these functions as an element within a list(), and returns it to the parent environment.
list(set = set, get = get,
setmean = setmean,
getmean = getmean)
When the function ends, it returns a fully formed object of type makeVector() to be used by downstream R code. One other important subtlety about this code is that each element in the list is named. That is, each element in the list is created with a elementName = value syntax, as follows:
list(set = set, # gives the name 'set' to the set() function defined above
get = get, # gives the name 'get' to the get() function defined above
setmean = setmean, # gives the name 'setmean' to the setmean() function defined above
getmean = getmean) # gives the name 'getmean' to the getmean() function defined above
Naming the list elements is what allows us to use the $ form of the extract operator to access the functions by name rather than using the [[ form of the extract operator, as in myVector[[2]](), to get the contents of the vector.
Here it's important to note that the cachemean() function REQUIRES an input argument of type makeVector(). If one passes a regular vector to the function, as in
aResult <- cachemean(1:15)
the function call will fail with an error explaining that cachemean() was unable to access $getmean() on the input argument because $ does not work with atomic vectors. This is accurate, because a primitive vector is not a list, nor does it contain a $getmean() function, as illustrated below.
> aVector <- 1:10
> cachemean(aVector)
Error in x$getmean : $ operator is invalid for atomic vectors
Explaining cachemean()
Without cachemean(), the makeVector() function is incomplete. Why? As designed, cachemean() is required to populate or retrieve the mean from an object of type makeVector().
cachemean <- function(x, ...) {
...
Like makeVector(), cachemean() starts with a single argument, x, and an ellipsis that allows the caller to pass additional arguments into the function.
Next, the function attempts to retrieve a mean from the object passed in as the argument. First, it calls the getmean() function on the input object.
m <- x$getmean()
Then it checks to see whether the result is NULL. Since makeVector() sets the cached mean to NULL whenever a new vector is set into the object, if the value here is not equal to NULL, we have a valid, cached mean and can return it to the parent environment
if(!is.null(m)) {
message("getting cached data")
return(m)
}
If the result of !is.null(m) is FALSE, cachemean() gets the vector from the input object, calculates a mean(), uses the setmean() function on the input object to set the mean in the input object, and then returns the value of the mean to the parent environment by printing the mean object.
data <- x$get()
m <- mean(data, ...)
x$setmean(m)
m
Note that cachemean() is the only place where the mean() function is executed, which is why makeVector() is incomplete without cachemean().
Putting the Pieces Together: How the functions work at runtime
Now that we've explained the design of each of these functions, here is an illustration of how they work when used in an R script.
aVector <- makeVector(1:10)
aVector$get() # retrieve the value of x
aVector$getmean() # retrieve the value of m, which should be NULL
aVector$set(30:50) # reset value with a new vector
cachemean(aVector) # notice mean calculated is mean of 30:50, not 1:10
aVector$getmean() # retrieve it directly, now that it has been cached
Conclusion: what makes cachemean() work?
To summarize, the lexical scoping assignment in R Programming takes advantage of lexical scoping and the fact that functions that return objects of type list() also allow access to any other objects defined in the environment of the original function. In the specific instance of makeVector() this means that subsequent code can access the values of x or m through the use of getters and setters. This is how cachemean() is able to calculate and store the mean for the input argument if it is of type makeVector(). Because list elements in makeVector() are defined with names, we can access these functions with the $ form of the extract operator.
For additional commentary that explains how the assignment uses features of the S3 object system, please review makeCacheMatrix() as an Object.
Appendix A: What's the Point of this Assignment?
Once students get through the assignment, they frequently ask questions about its value and purpose. A good article explaining the value of lexical scoping in statistical computing is Lexical Scoping and Statistical Computing, written by Robert Gentleman and Ross Ihaka at the University of Auckland.
Appendix B: cachemean.R
Here is the entire listing for cachemean.R.
makeVector <- function(x = numeric()) {
m <- NULL
set <- function(y) {
x <<- y
m <<- NULL
}
get <- function() x
setmean <- function(mean) m <<- mean
getmean <- function() m
list(set = set, get = get,
setmean = setmean,
getmean = getmean)
}
cachemean <- function(x, ...) {
m <- x$getmean()
if(!is.null(m)) {
message("getting cached data")
return(m)
}
data <- x$get()
m <- mean(data, ...)
x$setmean(m)
m
}
Appendix C: Frequently Asked Questions
Q: Why doesn't cachemean() return the cached value? My code looks like:
cachemean(makeVector(1:100))
cachemean(makeVector(1:100))
A: Code written this way creates two different objects of type makeVector(), so the two calls to cachemean() initialize the means of each instance, rather than caching and retrieving from a single instance. Another way of illustrating how the above code operates is as follows.
Notice how the first call to cachemean() sets the cache, and the second call retrieves data from it.
Q: Why is set() never used in the code?
A: set() is included so that once an object of type makeVector() is created, its value can be changed without initializing another instance of the object. It is unnecessary the first time an object of type makeVector() is instantiated. Why? First, the value of x is set as a function argument, as in makeVector(1:30). Then, the first line of code in the function sets m <- NULL, simultaneously allocating memory for m and setting it to NULL. When a reference to this object is passed to the parent environment when the function ends, both x and m are available to be accessed by their respective get and set functions.
The following code illustrates the use of set().
Q: Why is x set with a default value in makeVector()?
A: Since x is an argument, the only place where one can set a default for it is in the formals. The type of error returned by cachemean() when a default value is not set,
Error in x$get() : argument "x" is missing, with no default
is undesirable. Our code should directly handle error conditions rather than relying on the underlying error handling in R.
It's perfectly valid to create an object of type makeVector() without populating its value during initialization. makeVector() includes a setter function so one can set its value after the object is created. However, the object must have valid data, a numeric vector, prior to executing cachemean().
Ideally, cachemean() would include logic to validate that x is not empty prior to calculating a mean. The default setting of x enables cachemean() to return NaN, which is a reasonable result.
References
Chi, Yau -- R-Tutor Named List Members, retrieved July 20, 2016.
Wickham, Hadley -- Advanced-R Functions, retrieved July 17, 2016.
Wickham, Hadley -- Advanced-R Scoping Issues, retrieved July 17, 2016.

I think that one good way to understand this example is trying the follow:
First check that when you use the function make_Vector now you have of four different setting
> mvec <- makeVector()
> x <- 1:4
> mvec$set(x)
> mvec$get()
> [1] 1 2 3 4
> mvec$getmean()
> NULL
> mvec$setmean(3.4)
> mvec$getmean()
> 3.4
3.4 It's not the correct mean, I put these number then you can check that you can set whatever number that you want.
The second part of the assignment is the follow:
cachemean <- function(x, ...) {
m <- x$getmean()
if(!is.null(m)) {
message("getting cached data")
return(m)
}
data <- x$get()
m <- mean(data, ...)
x$setmean(m)
m
}
These part or code check if you have the mean of the vector of interest. If these exist then you don't need calculate and you can use the cache variable.

I put a wrong number for the mean, then you can see that already I set the mean value as follow:
> cachemean(mvec)
> 3.4
You must pass the original mvec list used in the example

Related

How can I capture the name of a variable still to be assigned in R?

Note: This is separate from, though perhaps similar to, the
deparse-substitute trick
of attaining the name of a passed argument.
Consider the following situation: I have some function to be called, and the return value
is to be assigned to some variable, say x.
Inside the function, how can I capture that the name to be assigned to the
returned value is x, upon calling and assigning the function?
For example:
nameCapture <- function() {
# arbitrary code
captureVarName()
}
x <- nameCapture()
x
## should return some reference to the name "x"
What in R closest approximates captureVarName() referenced in the example?
My intuition was that there would be something in the call stack to do with
assign(), where x would be an argument and could be extracted, but
sys.call() yielded nothing of the sort; does it then occur internally, and if
so, what is a sensible way to attain something like captureVarName()?
My notion is that it would act in a similar manner to how the following works, though without the assign() function, using the <- operator instead:
nameCapture <- function() sys.call(1)[[2]]
assign("x", nameCapture())
x
# [1] "x"

R: Scoping, timing, and <<-

In the code below, I expect both f and the final a to return 3. But in fact they both return 2. Why is this? Hasn't 3 replaced 2 in the enclosing environment at the time the promise is evaluated?
a <- 1
f <- function(a){
a <<- 3
cat(a)
}
f(a <- 2)
a
Note that If I use an = instead of a <- in the call to f, the final a is 3 as expected, but f remains 2.
Let's walk through the code
a <- 1
assigns the value 1 to the name a in the global environment.
f <- function(a){...}
creates a function saved to the name f in the global environment.
f(a <- 2)
Now we are calling the function f with the expression a<-2 as a parameter. This expression is not evaluated immediately. It is passed as a promise. The global value of a remains 1.
Now we enter the body of the function f. The expression we've passed in is assigned to the local variable a in the function scope (still un-evaluated) and a in the global remains 1. The fact they they both involve the symbol a is irrelevant. There is no direct connection between the two a variables here.
a <<- 3
This assigns the value of 3 to a in a parent scope via <<- rather than the local scope as <- would do. This means that the a refered to here is not the local a that now hold the parameter passed to the function. So this changes the value of a in the global scope to 3. And finally
cat(a)
Now we are finally using the value that was passed to the function since the a here refers to the a in the local function scope. This triggers the promise a <- 2 to be run in the calling scope (which happens to be the global scope). Thus the global value of a is set to 2. This assignment expression returns the right-hand-side value so "2" is displayed from cat().
The function exits and
a
shows the value of the a in the global environment which is now a. It was only the value 3 in the brief moment between the two expressions in f.
If you where to call
f( a=2 )
This is very different. Now we are not passing an expression to the function anymore, we are passing the value 2 to the named function parameter a. If you tried f(x=2) you would get an error that the function doesn't recognize the parameter named "x". There is no fancy lazy expression/promise evaluation in this scenario since 2 is a constant. This would leave the global value set to 3 after the function call. f(a <- 2) and f(a = a <- 2) would behave the same way.

R: what does the small function do when it is not called from anywhere?

Here is the setup.
Below are two functions that are used to create a special object that stores a numeric vector and cache's its mean.
The first function, makeVector creates a special "vector", which is really a list containing a function to
set the value of the vector
get the value of the vector
set the value of the mean
get the value of the mean
makeVector <- function(x = numeric()) {
m <- NULL
set <- function(y) {
x <<- y
m <<- NULL
}
get <- function() x
setmean <- function(mean) m <<- mean
getmean <- function() m
list(set = set, get = get,
setmean = setmean,
getmean = getmean)
}
The following function calculates the mean of the special "vector" created with the above function. However, it first checks to see if the mean has already been calculated. If so, it gets the mean from the cache and skips the computation. Otherwise, it calculates the mean of the data and sets the value of the mean in the cache via the setmean function.
cachemean <- function(x, ...) {
m <- x$getmean()
if(!is.null(m)) {
message("getting cached data")
return(m)
}
data <- x$get()
m <- mean(data, ...)
x$setmean(m)
m
}
Ok, I was thinking about this 2 function for 2 days. I tried to understand how it work because I will need to write similar function for different purpose.
After all this thinking I could not understand one thing - why the heck we need set() function in makeVector and what the heck does it do?
Below is the description of what I managed to understand and how I understand the logic (taken from my comment on Coursera):
makeVector works without set() function?
set() is never called in cachemean. And no any function in makeVector is calling set() as well. What is really happening as I understand it:
cachemean calls getmean() function and assigns it to variable m. m here is local variable in cachemean scope
What getmean() does it returns m which is already from makeVector scope! In makeVector scope m is NULL at the moment of execution
cachemean checks if m is NULL or not. At the first call it is NULL so we skip if condition
Now we create a local variable data and assign it to a call of get() function. What get() does it returns the value of x which is in fact formal parameter of makeVector. In other words it returns a vector which we initially passed to makeVector function as an argument. After this we have that data is our numeric vector
Next step is rather simple. We reassign cahchemean scope m to mean() of data. That is we find mean of our numeric vector we passed as an argument to makeVector.
Now comes the step which allows us to cache result! we call setmean() function and pass mean of our numeric vector - m - to it as an argument. If you look at makeVector you will see that what setmean() does is just assigns the argument passed to it to m BUT in makeVector scope!.
Finally cachemean returns m.
The interesting things start to happen when we call cachemean on the same vector once again.
The first line of cachemean is the key line here. We assign getmean() to m in cachemean scope. But what getmean() does in this case? If you look at the makeVector, getmean() just returns the value of m BUT from makeVector scope!. And the value of m in that scope is our mean of numeric vector.
After that it is simple. We check if m is NULL, it is obviously not. So we print a message and return m which is simply the mean we have found during the first call of cachemean.
Now comes the question which confuses me a lot. Why the heck we need to define set() function?
We do not call it from cachemean.
Moreover if you delete set() from makeVector, it still works and cachemean still works and returns cached value of mean of a numeric vector.
Moreover-moreover if I look closer to the set() function, this function does not really make sense. It assigns formal parameter y to parent environment variable x. But why we need to do this? If to get an initial numeric vector, then we do it by get().
And then it assigns NULL to m which also does not make sense. Just a line before we have done the same thing.
So can please someone explain me:
Do I understand the logic correct?
What set() function is doing and why do we need it at all?
Thanks a lot in advance!
P.S. I am new to R and I am stupid, but I want to understand, what I am missing here.
EDIT: Really sorry for long read. But everyone on this site is asking what I have already done/tried, so I tried to describe it :-)

Use of the <<- operator in R [duplicate]

I just finished reading about scoping in the R intro, and am very curious about the <<- assignment.
The manual showed one (very interesting) example for <<-, which I feel I understood. What I am still missing is the context of when this can be useful.
So what I would love to read from you are examples (or links to examples) on when the use of <<- can be interesting/useful. What might be the dangers of using it (it looks easy to loose track of), and any tips you might feel like sharing.
<<- is most useful in conjunction with closures to maintain state. Here's a section from a recent paper of mine:
A closure is a function written by another function. Closures are
so-called because they enclose the environment of the parent
function, and can access all variables and parameters in that
function. This is useful because it allows us to have two levels of
parameters. One level of parameters (the parent) controls how the
function works. The other level (the child) does the work. The
following example shows how can use this idea to generate a family of
power functions. The parent function (power) creates child functions
(square and cube) that actually do the hard work.
power <- function(exponent) {
function(x) x ^ exponent
}
square <- power(2)
square(2) # -> [1] 4
square(4) # -> [1] 16
cube <- power(3)
cube(2) # -> [1] 8
cube(4) # -> [1] 64
The ability to manage variables at two levels also makes it possible to maintain the state across function invocations by allowing a function to modify variables in the environment of its parent. The key to managing variables at different levels is the double arrow assignment operator <<-. Unlike the usual single arrow assignment (<-) that always works on the current level, the double arrow operator can modify variables in parent levels.
This makes it possible to maintain a counter that records how many times a function has been called, as the following example shows. Each time new_counter is run, it creates an environment, initialises the counter i in this environment, and then creates a new function.
new_counter <- function() {
i <- 0
function() {
# do something useful, then ...
i <<- i + 1
i
}
}
The new function is a closure, and its environment is the enclosing environment. When the closures counter_one and counter_two are run, each one modifies the counter in its enclosing environment and then returns the current count.
counter_one <- new_counter()
counter_two <- new_counter()
counter_one() # -> [1] 1
counter_one() # -> [1] 2
counter_two() # -> [1] 1
It helps to think of <<- as equivalent to assign (if you set the inherits parameter in that function to TRUE). The benefit of assign is that it allows you to specify more parameters (e.g. the environment), so I prefer to use assign over <<- in most cases.
Using <<- and assign(x, value, inherits=TRUE) means that "enclosing environments of the supplied environment are searched until the variable 'x' is encountered." In other words, it will keep going through the environments in order until it finds a variable with that name, and it will assign it to that. This can be within the scope of a function, or in the global environment.
In order to understand what these functions do, you need to also understand R environments (e.g. using search).
I regularly use these functions when I'm running a large simulation and I want to save intermediate results. This allows you to create the object outside the scope of the given function or apply loop. That's very helpful, especially if you have any concern about a large loop ending unexpectedly (e.g. a database disconnection), in which case you could lose everything in the process. This would be equivalent to writing your results out to a database or file during a long running process, except that it's storing the results within the R environment instead.
My primary warning with this: be careful because you're now working with global variables, especially when using <<-. That means that you can end up with situations where a function is using an object value from the environment, when you expected it to be using one that was supplied as a parameter. This is one of the main things that functional programming tries to avoid (see side effects). I avoid this problem by assigning my values to a unique variable names (using paste with a set or unique parameters) that are never used within the function, but just used for caching and in case I need to recover later on (or do some meta-analysis on the intermediate results).
One place where I used <<- was in simple GUIs using tcl/tk. Some of the initial examples have it -- as you need to make a distinction between local and global variables for statefullness. See for example
library(tcltk)
demo(tkdensity)
which uses <<-. Otherwise I concur with Marek :) -- a Google search can help.
On this subject I'd like to point out that the <<- operator will behave strangely when applied (incorrectly) within a for loop (there may be other cases too). Given the following code:
fortest <- function() {
mySum <- 0
for (i in c(1, 2, 3)) {
mySum <<- mySum + i
}
mySum
}
you might expect that the function would return the expected sum, 6, but instead it returns 0, with a global variable mySum being created and assigned the value 3. I can't fully explain what is going on here but certainly the body of a for loop is not a new scope 'level'. Instead, it seems that R looks outside of the fortest function, can't find a mySum variable to assign to, so creates one and assigns the value 1, the first time through the loop. On subsequent iterations, the RHS in the assignment must be referring to the (unchanged) inner mySum variable whereas the LHS refers to the global variable. Therefore each iteration overwrites the value of the global variable to that iteration's value of i, hence it has the value 3 on exit from the function.
Hope this helps someone - this stumped me for a couple of hours today! (BTW, just replace <<- with <- and the function works as expected).
f <- function(n, x0) {x <- x0; replicate(n, (function(){x <<- x+rnorm(1)})())}
plot(f(1000,0),typ="l")
The <<- operator can also be useful for Reference Classes when writing Reference Methods. For example:
myRFclass <- setRefClass(Class = "RF",
fields = list(A = "numeric",
B = "numeric",
C = function() A + B))
myRFclass$methods(show = function() cat("A =", A, "B =", B, "C =",C))
myRFclass$methods(changeA = function() A <<- A*B) # note the <<-
obj1 <- myRFclass(A = 2, B = 3)
obj1
# A = 2 B = 3 C = 5
obj1$changeA()
obj1
# A = 6 B = 3 C = 9
I use it in order to change inside map() an object in the global environment.
a = c(1,0,0,1,0,0,0,0)
Say I want to obtain a vector which is c(1,2,3,1,2,3,4,5), that is if there is a 1, let it 1, otherwise add 1 until the next 1.
map(
.x = seq(1,(length(a))),
.f = function(x) {
a[x] <<- ifelse(a[x]==1, a[x], a[x-1]+1)
})
a
[1] 1 2 3 1 2 3 4 5

R: Passing parameters from a wrapper function to internal functions

I am not surprised that this function doesn't work, but I cannot quite understand why.
computeMeans <- function(data,dv,fun) {
x <- with(data,aggregate(dv,
list(
method=method,
hypo=hypothesis,
pre.group=pre.group,
pre.smooth=pre.smooth
),
fun ) )
return(x)
}
computeMeans(df.basic,dprime,mean)
Where df.basic is a dataframe with factors method, hypothesis, etc, and several dependent variables (and I specify one with the dv parameter, dprime).
I have multiple dependent variables and several dataframes all of the same form, so I wanted to write this little function to keep things "simple". The error I get is:
Error in aggregate(dv, list(method = method, hypo = hypothesis,
pre.group = pre.group, :
object 'dprime' not found
But dprime does exist in df.basic, which is referenced with with(). Can anyone explain the problem? Thank you!
EDIT: This is the R programming language. http://www.r-project.org/
Although dprime exists in df.basic, when you call it at computeMeans it has no idea what you are referring to, unless you explicitly reference it.
computeMeans(df.basic,df.basic$dprime,mean)
will work.
Alternatively
computeMeans <- function(data,dv,fun) {
dv <- eval(substitute(dv), envir=data)
x <- with(data,aggregate(dv,
list(
method=method,
hypo=hypothesis,
pre.group=pre.group,
pre.smooth=pre.smooth
),
fun ) )
return(x)
}
You might think that since dv is in the with(data, (.)) call, it gets evaluated within the environment of data. It does not.
When a function is called the arguments are matched and then each of
the formal arguments is bound to a promise. The expression that was
given for that formal argument and a pointer to the environment the
function was called from are stored in the promise.
Until that argument is accessed there is no value associated with the
promise. When the argument is accessed, the stored expression is
evaluated in the stored environment, and the result is returned. The
result is also saved by the promise.
source
A promise is therefore evaluated within the environment in which it was created (ie, the environment where the function was called), regardless of the environment in which the promise is first called. Observe:
delayedAssign("x", y)
local({
y <- 10
x
})
Error in eval(expr, envir, enclos) : object 'y' not found
w <- 10
delayedAssign("z", w)
local({
w <- 11
z
})
[1] 10
Note that delayedAssign creates a promise. In the first example, x is assigned the value of y via a promise in the global environemnt, but y has not been defined in the global enviornment. x is called in an enviornment where y has been defined, yet calling x still results in an error indicating that y does not exist. This demonstrates that x is evaluated in environment in which the promise was defined, not in its current environment.
In the second example, z is assigned the value of w via a promise in the global environment, and w is defined in the global environment. z is then called in an enviornment where w has been assigned a different value, yet z still returns the value of the w in the environment where the promise has been created.
Passing in the dprime argument as a character string would allow you to sidestep any consideration of the involved scoping and evaluation rules discussed in #Michael's answer:
computeMeans <- function(data, dv, fun) {
x <- aggregate(data[[dv]],
list(
method = data[["method"]],
hypo = data[["hypothesis"]],
pre.group = data[["pre.group"]],
pre.smooth = data[["pre.smooth"]]
),
fun )
return(x)
}
computeMeans(df.basic, "dprime", mean)

Resources