Weird mapply behaviour: what have I missed? - r

The following code does not work as I expected:
a <- list(0, 1)
b <- list(0, 1)
# return a linear function with slope `a` and intercept `b`.
f <- function(a, b) function(x) a*x + b
# create a list of functions with different parameters.
fs <- mapply(f, a, b)
# test
fs[[1]](3)
# [1] 4 # expected zero!
fs[[2]](3)
# [1] 4
Can anyone tell me why?
NB: I've found a workaround, so I'm not looking for a different way to achieve the desired result. But I'm curious as to why this particular approach didn't work.
Update:
As of R 3.2.0, this now works as expected:
a <- list(0, 1)
b <- list(0, 1)
f <- function(a, b) function(x) a*x + b
fs <- mapply(f, a, b)
# test
fs[[1]](3)
# [1] 0
fs[[2]](3)
# [1] 4

This is the result of lazy evaluation -- all arguments are passed down the call tree as promises to avoid unnecessary execution and remain in this suspended state till R is convinced that they are used.
In your code you just populate functions with a same promise to a and same promise to b; then they all got committed to a last pair of vales. As #Tommy already showed, the solution is to force commitment by "using" the value before the function gets defined.

[Update] My initial analysis was correct but the conclusions were wrong :) Let's get to the conclusions after the analysis.
Here's some code demonstrating the effects:
x <- lapply(1:3, function(x) sys.frame(sys.nframe()))
x[[1]] # An environment
x[[2]] # Another environment
x[[3]] # Yet nother environment
x[[1]]$x # 3!!! (should be 1)
x[[2]]$x # 3!! (should be 2)
x[[3]]$x # 3 as expected
# Accessing the variable within the function will "fix" the weird behavior:
x <- lapply(1:3, function(x) {x; sys.frame(sys.nframe())})
x[[1]]$x # 1
x[[2]]$x # 2
x[[3]]$x # 3
So the work-around in your case:
f <- function(a, b) { a;b; function(x) a*x + b }
Btw, as #James notes there is a force function that makes accessing a variable more explicit:
f <- function(a, b) { force(a);force(b); function(x) a*x + b }
Conclusions
Well, as #mbq and #hadley noted, this is due to lazy evaluation. It' easier to show with a simple for-loop:
fs <- list(); for(i in 1:2) fs[[i]] <- f(a[[i]], b[[i]])
The function f's x argument will not get the value of a[[i]] (which is 0), but the whole expression and the environment where a and i exist. When you access x, it gets evaluated and therefore uses the i at the time of evaluation. If the for-loop has moved on since the call to f, you get the "wrong" result...
Initially I said that this was due to a bug in *apply, which it isn't. ...but since I hate to be wrong, I can point out that *apply DOES have a bug (or perhaps more of an inconsistency) in these cases:
lapply(11:12, function(x) sys.call())
#[[1]]
#FUN(11:12[[1L]], ...)
#
#[[2]]
#FUN(11:12[[2L]], ...)
lapply(11:12, function(x) function() x)[[1]]() # 12
lapply(11:12, function(x) function() x)[[2]]() # 12
As you see above, the lapply code says it calls the function with 11:12[[1L]]. If you evaluate that "later" you should still get the value 11 - but you actually get 12!
This is probably due to the fact that lapply is implemented in C code for performance reasons and cheat a bit, so the expression that it shows is not the expression that gets evaluated - ergo, a bug...
QED

Related

Evaluate call that contains another call (call within call)

I have encountered a snippet of code where call contains another call. For example:
a <- 1
b <- 2
# First call
foo <- quote(a + a)
# Second call (call contains another call)
bar <- quote(foo ^ b)
We can evaluate calls with eval (eval(foo)), however eval(bar) won't work. This is expected as R tries to run "foo" ^ 2 (sees foo as non-numeric object).
How to evaluate such callception?
To answer this question it might be helpful to split it up in 3 sub problems
Locate any call within a call
For each call, evaluate the call (invisibly), or replace the call with the original call
Return the initial call.
For the answer to be complete, we need to locate any subsequently nested call within the call. In addition we would need to avoid the endless loop of bar <- quote(bar + 3).
As any call might have nested called eg:
a <- 3
zz <- quote(a + 3)
foo <- quote(zz^a)
bar <- quote(foo^zz)
we will have to make sure each stack is evaluated before evaluating the final call.
Following this line of thought, the following function will evaluate even complicated calls.
eval_throughout <- function(x, envir = NULL){
if(!is.call(x))
stop("X must be a call!")
if(isNullEnvir <- is.null(envir))
envir <- environment()
#At the first call decide the environment to evaluate each expression in (standard, global environment)
#Evaluate each part of the initial call, replace the call with its evaluated value
# If we encounter a call within the call, evaluate this throughout.
for(i in seq_along(x)){
new_xi <- tryCatch(eval(x[[i]], envir = envir),
error = function(e)
tryCatch(get(x[[i]],envir = envir),
error = function(e)
eval_throughout(x[[i]], envir)))
#Test for endless call stacks. (Avoiding primitives, and none call errors)
if(!is.primitive(new_xi) && is.call(new_xi) && any(grepl(deparse(x[[i]]), new_xi)))
stop("The call or subpart of the call is nesting itself (eg: x = x + 3). ")
#Overwrite the old value, either with the evaluated call,
if(!is.null(new_xi))
x[[i]] <-
if(is.call(new_xi)){
eval_throughout(new_xi, envir)
}else
new_xi
}
#Evaluate the final call
eval(x)
}
Showcase
So lets try a few examples. Initially I'll use the example in the question, with one additional slightly more complicated call.
a <- 1
b <- 2
c <- 3
foo <- quote(a + a)
bar <- quote(foo ^ b)
zz <- quote(bar + c)
Evaluating each of these gives the desired result:
>eval_throughout(foo)
2
>eval_throughout(bar)
4
>eval_throughout(zz)
7
This is not restricted to simple calls however. Lets extend it to a more interesting call.
massive_call <- quote({
set.seed(1)
a <- 2
dat <- data.frame(MASS::mvrnorm(n = 200, mu = c(3,7), Sigma = matrix(c(2,4,4,8), ncol = 2), empirical = TRUE))
names(dat) <- c("A","B")
fit <- lm(A~B, data = dat)
diff(coef(fit)) + 3 + foo^bar / (zz^bar)
})
Suprisingly enough this also works out just fine.
>eval_throughout(massive_call)
B
4
as when we try to evaluate only the segment that is actually necessary, we get the same result:
>set.seed(1)
>a <- 2
>dat <- data.frame(MASS::mvrnorm(n = 200, mu = c(3,7), Sigma = matrix(c(2,4,4,8), ncol = 2), empirical = TRUE))
>names(dat) <- c("A","B")
>fit <- lm(A~B, data = dat)
>diff(coef(fit)) + 3 + eval_throughout(quote(foo^bar / (zz^bar)))
B
4
Note that this is likely not the most efficient evaluating scheme. Initially the envir variable should be NULL, unless calls like dat <- x should be evaluated and saved in a specific environment.
Edit: Summary of currently provided answers and performance overview
This question have been given quite some attention since the additional reward was given, and many different answers have been proposed. In this section I'll give a short overview of the answers, their limitations and some of their benefits as well. Note all the answers currently provided are good options, but solve the problem to a differing degree, with different upsides and downsides. This section is thus not meant as a negative review for any of the answers, but a trial to leave an overview of the different methods.
The examples presented in above in my answer have been adopted by some of the other answers, while a few have been suggested in the comments of this answer which represented different aspects of the problem. I will use the examples in my answer as well as a few below, to try and illustrate the usefulness of the different methods suggested throughout this post. For completion the different examples are shown in code below. Thanks to #Moody_Mudskipper for the additional examples suggested in the comments below!
#Example 1-4:
a <- 1
b <- 2
c <- 3
foo <- quote(a + a)
bar <- quote(foo ^ b)
zz <- quote(bar + c)
massive_call <- quote({
set.seed(1)
a <- 2
dat <- data.frame(MASS::mvrnorm(n = 200, mu = c(3,7), Sigma = matrix(c(2,4,4,8), ncol = 2), empirical = TRUE))
names(dat) <- c("A","B")
fit <- lm(A~B, data = dat)
diff(coef(fit)) + 3 + foo^bar / (zz^bar)
})
#Example 5
baz <- 1
quz <- quote(if(TRUE) baz else stop())
#Example 6 (Endless recursion)
ball <- quote(ball + 3)
#Example 7 (x undefined)
zaz <- quote(x > 3)
Solution versatility
The solutions provided in the answers to the question, solve the problem to various extends. One question might be to which extend these solve the various tasks of evaluating the quoted expressions.
To test the versatility of the solutions, example 1 to 5 was evaluated using the raw function provided in each answer. Example 6 & 7 present different kind of problems, and will be treated seperately in a section below (Safety of Implementation). Note the oshka::expand returns an unevaluated expression, which was evaluated for after running the function call.
In the table below I've visualized the results from the versatility test. Each row is a seperate function in an answer to the question while each column marks an example. For each test the succes is marked as sucess, ERROR and failed for a succesfuly, early interrupted and failed evaluation respectively.
(Codes are availible at the end of the answer for reproducability.)
function bar foo massive_call quz zz
1: eval_throughout succes succes succes ERROR succes
2: evalception succes succes ERROR ERROR succes
3: fun succes succes ERROR succes succes
4: oshka::expand sucess sucess sucess sucess sucess
5: replace_with_eval sucess sucess ERROR ERROR ERROR
Interestingly the simpler calls bar, foo and zz are mostly handled by all but one answer. Only oshka::expand succesfuly evaluates every method. Only two methods succeed the massive_call and quz examples, while only oshka::expand craetes a succesfuly evaluating expression for the particularly nasty conditional statement.
One may however note that by design the any intermediate results are saved using the oshka::expand method, which should be kept in mind while used. This could however be simply fixed by evaluating the expression within function or child-environment to the global environment.
Another important note is the 5'th example represents a special problem with most of the answers. As each expression is evaluated individually in 3 out of 5 answers, the call to the stop function, simply breaks the call. Thus any quoted expression containing a call to stop shows a simply and especially devious example.
Efficiency comparison:
An alternative performance meassure often of concern is pure efficiency or speed. Even if certain methods failed, being aware of the methods limitations, can yield situations where a simpler method is better, due to the speed performance.
To compare the methods we need to assume that it is the case that we know the method is sufficient for our problems. For this reason and in order to compare the different methods a benchmarking test was performed using zz as the standard. This cuts out one method, for which no benchmarking has been performed. The results are shown below.
Unit: microseconds
expr min lq mean median uq max neval
eval_throughout 128.378 141.5935 170.06306 152.9205 190.3010 403.635 100
evalception 44.177 46.8200 55.83349 49.4635 57.5815 125.735 100
fun 75.894 88.5430 110.96032 98.7385 127.0565 260.909 100
oshka_expand 1638.325 1671.5515 2033.30476 1835.8000 1964.5545 5982.017 100
For the purposes of comparison, the median is a better estimate, as the garbage cleaner might taint certain results and thus the mean.
From the output a clear pattern is visible. The more advanced functions takes longer to evaluate.
Of the four functions oshka::expand is the slowest competitor, being a factor 12 slower than the closest competitor (1835.8 / 152.9 = 12), while evalception is the fastest being about twice as fast as fun (98.7 / 49.5 = 2) and three times faster than eval_throughout (damn!)
As such if speed is required, it seems the simplest method that will evaluate succesfuly is the way to go.
Safety of implementation
An important aspect of good implementations is their ability identify and handle devious input. For this aspect example 6 & 7 represent different problems, that could break implementations. Example 6 represents an endless recursion, which might break the R session. Example 7 represents the missing value problem.
Example 6 was run under the same condition. The results are shown below.
eval_throughout(ball) #Stops successfully
eval(oshka::expand(ball)) #Stops succesfully
fun(ball) #Stops succesfully
#Do not run below code! Endless recursion
evalception(ball)
Of the four answer, only evalception(bar) fails to detect the endless recursion, and crashes the R session, while the remaining succesfuly stops.
Note: i do not suggest running the latter example.
Example 7 was run under the same condition. The results are shown below.
eval_throughout(zaz) #fails
oshka::expand(zaz) #succesfully evaluates
fun(zaz) #fails
evalception(zaz) #fails
An important note is that any evaluation of example 7 will fail. Only oshka::expand succeeds, as it is designed to impute any existing value into the expression using the underlying environment. This especially useful feature lets one create complex calls and imputing any quoted expression to expand the expression, while the remaining answers (including my own) fail by design, as they evaluate the expression.
Final comments
So there you go. I hope the summary of the answers proves useful, showing the positives and possible negatives of each implementation. Each have their possible scenarios where they would outperform the remaining, while only one could be successfully used in all of the represented circumstances.
For versatility the oshka::expand is the clear winner, while if speed is preferred one would have to evaluate if the answers could be used for the situation at hand. Great speed improvements is achievable by going with the simpler answers, while they represent different risks possibly crashing the R session. Unlike my earlier summary, the reader is left to decide for themselves which implementation would work best for their specific problem.
Code for reproducing the summary
Note this code is not cleaned, simply put together for the summary. In addition it does not contain the examples or function, only their evaluations.
require(data.table)
require(oshka)
evals <- function(fun, quotedstuff, output_val, epsilon = sqrt(.Machine$double.eps)){
fun <- if(fun != "oshka::expand"){
get(fun, env = globalenv())
}else
oshka::expand
quotedstuff <- get(quotedstuff, env = globalenv())
output <- tryCatch(ifelse(fun(quotedstuff) - output_val < epsilon, "succes", "failed"),
error = function(e){
return("ERROR")
})
output
}
call_table <- data.table(CJ(example = c("foo",
"bar",
"zz",
"massive_call",
"quz"),
`function` = c("eval_throughout",
"fun",
"evalception",
"replace_with_eval",
"oshka::expand")))
call_table[, incalls := paste0(`function`,"(",example,")")]
call_table[, output_val := switch(example, "foo" = 2, "bar" = 4, "zz" = 7, "quz" = 1, "massive_call" = 4),
by = .(example, `function`)]
call_table[, versatility := evals(`function`, example, output_val),
by = .(example, `function`)]
#some calls failed that, try once more
fun(foo)
fun(bar) #suces
fun(zz) #succes
fun(massive_call) #error
fun(quz)
fun(zaz)
eval(expand(foo)) #success
eval(expand(bar)) #sucess
eval(expand(zz)) #sucess
eval(expand(massive_call)) #succes (but overwrites environment)
eval(expand(quz))
replace_with_eval(foo, a) #sucess
replace_with_eval(bar, foo) #sucess
replace_with_eval(zz, bar) #error
evalception(zaz)
#Overwrite incorrect values.
call_table[`function` == "fun" & example %in% c("bar", "zz"), versatility := "succes"]
call_table[`function` == "oshka::expand", versatility := "sucess"]
call_table[`function` == "replace_with_eval" & example %in% c("bar","foo"), versatility := "sucess"]
dcast(call_table, `function` ~ example, value.var = "versatility")
require(microbenchmark)
microbenchmark(eval_throughout = eval_throughout(zz),
evalception = evalception(zz),
fun = fun(zz),
oshka_expand = eval(oshka::expand(zz)))
microbenchmark(eval_throughout = eval_throughout(massive_call),
oshka_expand = eval(oshka::expand(massive_call)))
ball <- quote(ball + 3)
eval_throughout(ball) #Stops successfully
eval(oshka::expand(ball)) #Stops succesfully
fun(ball) #Stops succesfully
#Do not run below code! Endless recursion
evalception(ball)
baz <- 1
quz <- quote(if(TRUE) baz else stop())
zaz <- quote(x > 3)
eval_throughout(zaz) #fails
oshka::expand(zaz) #succesfully evaluates
fun(zaz) #fails
evalception(zaz) #fails
I think you might want :
eval(do.call(substitute, list(bar, list(foo = foo))))
# [1] 4
The call before evaluation :
do.call(substitute, list(bar, list(foo = foo)))
#(a + a)^b
This also works and might be easier to understand:
eval(eval(substitute(
substitute(bar, list(foo=foo)),
list(bar = bar))))
# [1] 4
and going backwards :
eval(substitute(
substitute(bar, list(foo=foo)),
list(bar = bar)))
# (a + a)^b
And some more
substitute(
substitute(bar, list(foo=foo)),
list(bar = bar))
# substitute(foo^b, list(foo = foo))
Not completely the same but you could use bquote here too if you can afford to define bar differently :
bar2 <- bquote(.(foo)^b)
bar2
# (a + a)^b
eval(bar2)
# [1] 4
And in that case the close equivalent using rlang will be :
library(rlang)
foo <- expr(a + a) # same as quote(a + a)
bar2 <- expr((!!foo) ^ b)
bar2
# (a + a)^b
eval(bar2)
# [1] 4
And a minor thing, you say :
This is expected as R tries to run "foo" ^ 2
It doesn't, it tries to run quote(foo)^b , which will return this same error if you run it directly in the console.
Addendum on recursion
Borrowing Oliver's example you can deal with recursion by looping on my solution until you've evaluated all you can, we just have to slightly modifiy our substitute call to provide all the environment and not explicit substitutions :
a <- 1
b <- 2
c <- 3
foo <- quote(a + a)
bar <- quote(foo ^ b)
zz <- quote(bar + c)
fun <- function(x){
while(x != (
x <- do.call(substitute, list(x, as.list(parent.frame())))
)){}
eval.parent(x)
}
fun(bar)
# [1] 4
fun(zz)
# [1] 7
fun(foo)
# [1] 2
I found a CRAN package that can do this - oshka: Recursive Quoted Language Expansion.
It recursively replaces quoted language calls by objects in environment.
a <- 1
b <- 2
foo <- quote(a + a)
bar <- quote(foo ^ b)
So call oshka::expand(bar) gives (a + a)^b and eval(oshka::expand(bar)) returns 4.
It also works with more complicated calls that #Oliver suggested:
d <- 3
zz <- quote(bar + d)
oshka::expand(zz)
# (a + a)^b + d
I came up with a simple solution to this, but it seems a little improper and I hope that a more canonical method exists to cope with this situation. Nevertheless, this should hopefully get the job done.
The basic idea is to iterate through your expression and replace the un-evaluated first call with its evaluated value. Code below:
a <- 1
b <- 2
# First call
foo <- quote(a + a)
# Second call (call contains another call)
bar <- quote(foo ^ b)
bar[[grep("foo", bar)]] <- eval(foo)
eval(bar)
#> [1] 4
So far this is pretty easy. Of course if your expressions are more complicated this becomes more complicated quickly. For instance, if your expression has foo^2 + a then we need to be sure to replace the term foo^2 with eval(foo)^2 and not eval(foo) and so on. We can write a little helper function, but it would need a good deal of work to robustly generalize to complexly nested cases:
# but if your expressions are more complex this can
# fail and you need to descend another level
bar1 <- quote(foo ^ b + 2*a)
# little two-level wrapper funciton
replace_with_eval <- function(call2, call1) {
to.fix <- grep(deparse(substitute(call1)), call2)
for (ind in to.fix) {
if (length(call2[[ind]]) > 1) {
to.fix.sub <- grep(deparse(substitute(call1)), call2[[ind]])
call2[[ind]][[to.fix.sub]] <- eval(call1)
} else {
call2[[ind]] <- eval(call1)
}
}
call2
}
replace_with_eval(bar1, foo)
#> 2^b + 2 * a
eval(replace_with_eval(bar1, foo))
#> [1] 6
bar3 <- quote(foo^b + foo)
eval(replace_with_eval(bar3, foo))
#> [1] 6
I thought I should somehow be able to do this with substitute() but couldn't figure it out. I'm hopeful a more authoritative solution emerges but in the meantime this may work.
Here's something that (at least partially) works:
evalception <- function (expr) {
if (is.call(expr)) {
for (i in seq_along(expr))
expr[[i]] <- eval(evalception(expr[[i]]))
eval(expr)
}
else if (is.symbol(expr)) {
evalception(eval(expr))
}
else {
expr
}
}
It supports arbitrary nesting but will probably fail with objects of mode expression.
> a <- 1
> b <- 2
> # First call
> foo <- quote(a + a)
> # Second call (call contains another call)
> bar <- quote(foo ^ b)
> baz <- quote(bar * (bar + foo))
> sample <- quote(rnorm(baz, 0, sd=10))
> evalception(quote(boxplot.stats(sample)))
$stats
[1] -23.717520 -8.710366 1.530292 7.354067 19.801701
$n
[1] 24
$conf
[1] -3.650747 6.711331
$out
numeric(0)

Basics of argument passing in R's functions

How R interpret line : arg.list <- list(x, y) in below definition of function, does it copy x and y into arg.list object when execution happen or they are passed by reference ?
fplot <- function(x, y, add=FALSE){
arg.list <- list(x, y)
if(!add){
plot(arg.list))
}else{
lines(arg.list)
}
}
The variables are embedded into the list by reference (at least if you use vectors).
Proof:
library(pryr)
x <- 1:100
y <- 201:200
arg.list <- list(x,y)
al.x <- arg.list[[1]]
al.y <- arg.list[[2]]
Now look at the memory addresses (they are the same):
> address(x)
[1] "0x37598c0"
> address(y)
[1] "0x40fd6f8"
> address(al.x)
[1] "0x37598c0"
> address(al.y)
[1] "0x40fd6f8"
If you change one item a copy will be created ("copy on modification"):
> x[1]=42
> address(x)
[1] "0x417a470"
> al.x <- arg.list[[1]]
> address(al.x)
[1] "0x37598c0"
Edit:
As #HongOoi said: R semantically never uses references (except for objects in the environment class) but copies for variables. It "is clever enough to avoid copies until they are really required" ("copy on [first] modification"). Function parameters are passed "by value" semantically (even though references are used until a modification occurs).
The semantics of R is that function arguments are always passed by value. The underlying implementation may not necessarily make new copies of the arguments, so as to save memory. But your function will behave as if it has brand-new copies to work with.
This means you don't have to worry about changing a variable outside a function because you changed it inside:
x <- 1
f <- function(z) {
z <- z + 1
z
}
y <- f(x)
print(y) # y now contains 2
print(x) # but x still contains 1
If R was pass-by-reference, then modifying the argument of f would also modify the variable that was passed in. This doesn't happen.

substitute known variables in quoted expression

I want to write a function f that given:
x <- 1
f(.(y==x))
It returns the string:
"y == 1"
The function should be general enough, that the expression can be arbitrary. For instance, f(.(y==x & z==x)) should return "y == 1 & z == 1".
Any ideas?
The function:
g <- function(e) as.character(unlist(e))
does not work for me, as it returns "y == x". I want the replacement to take place.
UPDATE:
Also, this question is not a duplicate of How do I substitute symbols in a language object? I want to create a function f, that means that somehow I need to get the value of x from the caller's environment. My current attempt is this:
require(plyr)
x <- 1
f <- function(e) do.call(substitute, list(e[[1]], parent.frame(1)))
f(.(y==x & z==x))
But it still does not work. Maybe I should use something else instead of parent.frame?
Many thanks.
library(plyr)
g <- function(e) {
e[[1]][[3]] <- eval.parent(e[[1]][[3]])
deparse(e[[1]])
}
x <- 1
g(.(y==x))
This works, but is perhaps not as general as you wanted. (I would rather have a solution that substituted values if possible and left the values alone otherwise.)
UPDATE: not working yet, but maybe helpful.
g <- function(e) {
e1 <- e[[1]]
pe <- parent.env(environment())
vars <- all.vars(e1)
ss <- setNames(lapply(vars,
function(v) if (exists(v,pe)) get(v,pe) else v),
vars)
## at this point I'm stumped -- have tried various
## versions of eval, lapply, rapply, substitute ...
## afraid I would actually have to make up my own recursive
## function since rapply() only works on lists ...
}

Passing and using an environment in a function

I am working with some large data sets and have constructed a negative log likelihood function and associated gradient to pass to an optimisation routine. Both the functions require a vector of parameters and the passing of the large data sets into them.
The optimisation routine will call the two functions multiple times and the speed at which the two functions execute at is most of the bottleneck in the process. I dont want to pass the data directly to function as I was under the impression that some copying by R may occur.
I have considered:
# some large data sets
a<-1; b<-2
# place the data sets in an environment
varSpace <- new.env()
assign('c', a, envir = varSpace)
assign('d', b, envir = varSpace)
dFunA <- function(x){
x <- x + a+b
x
}
dFunB <- function(x, envir = varSpace){
x <- x + get('c', envir) + get('d', envir)
x
}
dFunC <- function(x, envir = varSpace){
with(envir,{
x <- x + c + d
})
x
}
dFunD <- function(x, envir = varSpace){
attach(envir)
on.exit({detach(envir)})
x <- x + c + d
x
}
> dFunA(1)
[1] 4
> dFunB(1)
[1] 4
> dFunC(1)
Error in eval(expr, envir, enclos) : object 'x' not found
> dFunD(1)
[1] 4
Approach A requires the data sets to be further up the calling stack. It works but I would like a tidier approach.
Approach B requires the use of get and calling the environment where the data has been placed.
Approach C doesnt work .
Approach D appears to work but I am mindful of ?detach which carries the good practice comment Use of attach/detach is best avoided in functions.
Any help and advice would be appreciated.
You don't need to fiddle around with assign, get or attach. Just set the environment for your functions to the one that you've created.
dFunA <- function(x)
x + a + b
varSpace <- new.env()
varSpace$a <- 1
varSpace$b <- 2
environment(dFunA) <- varSpace
... assuming that this is necessary in the first place. As Aaron commented, R is copy-on-write, so unless you're modifying a or b they're not likely to be copied.

What are Replacement Functions in R?

I searched for a reference to learn about replacement functions in R, but I haven't found any yet. I'm trying to understand the concept of the replacement functions in R. I have the code below but I don't understand it:
"cutoff<-" <- function(x, value){
x[x > value] <- Inf
x
}
and then we call cutoff with:
cutoff(x) <- 65
Could anyone explain what a replacement function is in R?
When you call
cutoff(x) <- 65
you are in effect calling
x <- "cutoff<-"(x = x, value = 65)
The name of the function has to be quoted as it is a syntactically valid but non-standard name and the parser would interpret <- as the operator not as part of the function name if it weren't quoted.
"cutoff<-"() is just like any other function (albeit with a weird name); it makes a change to its input argument on the basis of value (in this case it is setting any value in x greater than 65 to Inf (infinite)).
The magic is really being done when you call the function like this
cutoff(x) <- 65
because R is parsing that and pulling out the various bits to make the real call shown above.
More generically we have
FUN(obj) <- value
R finds function "FUN<-"() and sets up the call by passing obj and value into "FUN<-"() and arranges for the result of "FUN<-"() to be assigned back to obj, hence it calls:
obj <- "FUN<-"(obj, value)
A useful reference for this information is the R Language Definition Section 3.4.4: Subset assignment ; the discussion is a bit oblique, but seems to be the most official reference there is (replacement functions are mentioned in passing in the R FAQ (differences between R and S-PLUS), and in the R language reference (various technical issues), but I haven't found any further discussion in official documentation).
Gavin provides an excellent discussion of the interpretation of the replacement function. I wanted to provide a reference since you also asked for that: R Language Definition Section 3.4.4: Subset assignment.
As a complement to the accepted answer I would like to note that replacement functions can be defined also for non standard functions, namely operators (see ?Syntax) and control flow constructs. (see ?Control).
Note also that it is perfectly acceptable to design a generic and associated methods for replacement functions.
operators
When defining a new class it is common to define S3 methods for $<-, [[<- and [<-, some examples are data.table:::`$<-.data.table`, data.table:::`[<-.data.table`, or tibble:::`$.tbl_df`.
However for any other operator we can write a replacement function, some examples :
`!<-` <- function(x, value) !value
x <- NULL # x needs to exist before replacement functions are used!
!x <- TRUE
x
#> [1] FALSE
`==<-` <- function(e1, e2, value) replace(e1, e1 == e2, value)
x <- 1:3
x == 2 <- 200
x
#> [1] 1 200 3
`(<-` <- function(x, value) sapply(x, value, USE.NAMES = FALSE)
x <- c("foo", "bar")
(x) <- toupper
x
#> [1] "FOO" "BAR"
`%chrtr%<-` <- function(e1, e2, value) {
chartr(e2, value, e1)
}
x <- "woot"
x %chrtr% "o" <- "a"
x
#> [1] "waat"
we can even define <-<-, but the parser will prevent its usage if we call x <- y <- z, so we need to use the left to right assignment symbol
`<-<-` <- function(e1, e2, value){
paste(e2, e1, value)
}
x <- "b"
"a" -> x <- "c"
x
#> [1] "a b c"
Fun fact, <<- can have a double role
x <- 1:3
x < 2 <- NA # this fails but `<<-` was called!
#> Error in x < 2 <- NA: incorrect number of arguments to "<<-"
# ok let's define it then!
`<<-` <- function(x, y, value){
if (missing(value)) {
eval.parent(substitute(.Primitive("<<-")(x, y)))
} else {
replace(x, x < y, value)
}
}
x < 2 <- NA
x
#> [1] NA 2 3
x <<- "still works"
x
#> [1] "still works"
control flow constructs
These are in practice seldom encountered (in fact I'm responsible for the only practical use I know, in defining for<- for my package pbfor), but R is flexible enough, or crazy enough, to allow us to define them. However to actually use them, due to the way control flow constructs are parsed, we need to use the left to right assignment ->.
`repeat<-` <- function(x, value) replicate(value, x)
x <- "foo"
3 -> repeat x
x
#> [1] "foo" "foo" "foo"
function<-
function<- can be defined in principle but to the extent of my knowledge we can't do anything with it.
`function<-` <- function(x,value){NULL}
3 -> function(arg) {}
#> Error in function(arg) {: target of assignment expands to non-language object
Remember, in R everything operation is a function call (therefore also the assignment operations) and everything that exists is an object.
Replacement functions act as if they modify their arguments in place such as in
colnames(d) <- c("Input", "Output")
They have the identifier <- at the end of their name and return a modified copy of the argument object (non-primitive replacement functions) or the same object (primitive replacement functions)
At the R prompt, the following will not work:
> `second` <- function(x, value) {
+ x[2] <- value
+ x
+ }
> x <- 1:10
> x
[1] 1 2 3 4 5 6 7 8 9 10
> second(x) <- 9
Error in second(x) <- 9: couldn't find function "second<-"
As you can see, R is searching the environment not for second but for second<-.
So lets do the same thing but using such a function identifier instead:
> `second<-` <- function(x, value) {
+ x[2] <- value
+ x
+ }
Now, the assignment at the second position of the vector works:
> second(x) <- 9
> x
[1] 1 9 3 4 5 6 7 8 9 10
I also wrote a simple script to list all replacement functions in R base package, find it here.

Resources