Suppose you define a function in R using the following code:
a <- 1
f <- function(x) x + a
If you latter redefine a you will change the function f. (So, f(1) = 2 as given but if you latter on redefine a =2 then f(1) = 3. Is there a way to force R to use the value of a at the time it compiles the function? (That is, f would not change with latter redefinitions of a).
The above is the shortest case I could thought of that embodies the problem I am having. More specifically, as requested, my situation is:
I am working with a bunch of objects I am calling "person". Each person is defined as a probability distribution that depends on a n dimensional vector $a$ and a n dimensional vector of constrains w (the share of wealth).
I want to create a "society" with N people, that is a list of N persons. To that end, I created two n by N matrices A and W. I now loop over 1 to N to create the individuals.
Society <- list()
### doesn't evaluate theta at the time, but does w...
for (i in 1:Npeople) {
w <- WealthDist[i,]
u <- function(x) prod(x^A[i,])
P <- list(u,w)
names(P) <- c("objective","w")
Society[[length(Society)+1]] <- P
}
w gets is pass-by-value, so each person gets the right amount of wealth. But A is pass-by-reference -- everybody is being assigned the same function u (namely, the function using i = N)
To finish it up, the next steps are to get the Society and, via two optimizations get an "equilibrium point".
You can create a function which uses a locked binding and creates a function to complete your purpose. The former value of a will be used for w which will be stored in the environment of the function and will not be replaced by further values changes of a.
a <- 1
j <- new.env() # create a new environment
create.func <- function () {
j$w <<- a
function (x) {
x+ j$w
}
}
f <- create.func()
a <- 2
f(2)
[1] 3 # if w was changed this should be 4
Credits to Andrew Taylor (see comments)
EDIT: BE CAREFUL: f will change if you call create.func, even if you do not store it into f. To avoid this, you could write this code (it clearly depends on what you want).
a <- 1
create.func <- function (x) {
j <- new.env()
j$w <- a
function (x) {
x + j$w
}
}
f <- create.func()
f(1)
[1] 2
a <- 2
q <- create.func()
q(1)
[1] 3
f(1)
[1] 2
EDIT 2: Lazy evaluation doesn't apply here because a is evaluated by being set to j$w. If you had used it as an argument say:
function(a)
function(x)
#use a here
you would have to use force before defining the second function, because then it wouldn't be evaluated.
EDIT 3: I removed the foo <- etc. The function will return as soon as it is declared, since you want it to be similar to the code factories defined in your link.
EDIT by OPJust to add to the accepted answer that in spirit of
Function Factory in R
the code below works:
funs.gen <- function(n) {
force(n)
function(x) {
x + n
}
}
funs = list()
for (i in seq(length(names))) {
n = names[i]
funs[[n]] = funs.gen(i)
}
R doesn't do pass by reference; everything is passed to functions by value. As you've noticed, since a is defined in the global environment, functions which reference a are referencing the global value of a, which is subject to change. To ensure that a specific value of a is used, you can use it as a parameter in the function.
f <- function(x, a = 1) {
x + a
}
This defines a as a parameter that defaults to 1. The value of a used by the function will then always be the value passed to the function, regardless of whether a is defined in the global environment.
If you're going to use lapply(), you simply pass a as a parameter to lapply().
lapply(X, f, a = <value>)
Define a within f
f <- function(x) {a<-1;x + a}
Related
I've just read about delayedAssign(), but the way you have to do it is by passing the name of the delayed variable as the first parameter. Is there a way to do it via direct assignment?
e.g.:
x <- delayed_variable("Hello World")
rather than
delayedAssign("x","Hello World")
I want to create a variable that will throw an error if accessed (use-case is obviously more complex), so for example:
f <- function(x){
y <- delayed_variable(stop("don't use y"))
x
}
f(10)
> 10
f <- function(x){
y <- delayed_variable(stop("don't use y"))
y
}
f(10)
> Error in f(10) : don't use y
No, you can't do it that way. Your example would be fine with the current setup, though:
f <- function(x){
delayedAssign("y", stop("don't use y"))
y
}
f(10)
which gives exactly the error you want. The reason for this limitation is that delayed_variable(stop("don't use y")) would create a value which would trigger the error when evaluated, and assigning it to y would evaluate it.
Another version of the same thing would be
f <- function(x, y = stop("don't use y")) {
...
}
Internally it's very similar to the delayedAssign version.
I reached a solution using makeActiveBinding() which works provided it is being called from within a function (so it doesn't work if called directly and will throw an error if it is). The main purpose of my use-case is a smaller part of this, but I generalised the code a bit for others to use.
Importantly for my use-case, this function can allow other functions to use delayed assignment within functions and can also pass R CMD Check with no Notes.
Here is the function and it gives the desired outputs from my question.
delayed_variable <- function(call){
#Get the current call
prev.call <- sys.call()
attribs <- attributes(prev.call)
# If srcref isn't there, then we're not coming from a function
if(is.null(attribs) || !"srcref" %in% names(attribs)){
stop("delayed_variable() can only be used as an assignment within a function.")
}
# Extract the call including the assignment operator
this_call <- parse(text=as.character(attribs$srcref))[[1]]
# Check if this is an assignment `<-` or `=`
if(!(identical(this_call[[1]],quote(`<-`)) ||
identical(this_call[[1]],quote(`=`)))){
stop("delayed_variable() can only be used as an assignment within a function.")
}
# Get the variable being assigned to as a symbol and a string
var_sym <- this_call[[2]]
var_str <- deparse(var_sym)
#Get the parent frame that we will be assigining into
p_frame <- parent.frame()
var_env <- new.env(parent = p_frame)
#Create a random string to be an identifier
var_rand <- paste0(sample(c(letters,LETTERS),50,replace=TRUE),collapse="")
#Put the variables into the environment
var_env[["p_frame"]] <- p_frame
var_env[["var_str"]] <- var_str
var_env[["var_rand"]] <- var_rand
# Create the function that will be bound to the variable.
# Since this is an Active Binding (AB), we have three situations
# i) It is run without input, and thus the AB is
# being called on it's own (missing(input)),
# and thus it should evaluate and return the output of `call`
# ii) It is being run as the lhs of an assignment
# as part of the initial assignment phase, in which case
# we do nothing (i.e. input is the output of this function)
# iii) It is being run as the lhs of a regular assignment,
# in which case, we want to overwrite the AB
fun <- function(input){
if(missing(input)){
# No assignment: variable is being called on its own
# So, we activate the delayed assignment call:
res <- eval(call,p_frame)
rm(list=var_str,envir=p_frame)
assign(var_str,res,p_frame)
res
} else if(!inherits(input,"assign_delay") &&
input != var_rand){
# Attempting to assign to the variable
# and it is not the initial definition
# So we overwrite the active binding
res <- eval(substitute(input),p_frame)
rm(list=var_str,envir=p_frame)
assign(var_str,res,p_frame)
invisible(res)
}
# Else: We are assigning and the assignee is the output
# of this function, in which case, we do nothing!
}
#Fix the call in the above eval to be the exact call
# rather than a variable (useful for debugging)
# This is in the line res <- eval(call,p_frame)
body(fun)[[c(2,3,2,3,2)]] <- substitute(call)
#Put the function inside the environment with all
# all of the variables above
environment(fun) <- var_env
# Check if the variable already exists in the calling
# environment and if so, remove it
if(exists(var_str,envir=p_frame)){
rm(list=var_str,envir=p_frame)
}
# Create the AB
makeActiveBinding(var_sym,fun,p_frame)
# Return a specific object to check for
structure(var_rand,call="assign_delay")
}
I'm trying to create multiple functions with varying arguments.
Just some background: I need to compute functions describing 75 days respectively and multiply them later to create a Maximum-Likelihood function. They all have the same form, they only differ in some arguments. That's why I wanted to this via a loop.
I've tried to put all the equations in a list to have access to them later on.
The list this loop generates has 75 arguments, but they're all the same, as the [i] in the defined function is not taken into account by the loop, meanging that the M_b[i] (a vector with 75 arguments) does not vary.
Does someone know, why this is the case?
simplified equation used
for (i in 1:75){
log_likelihood[[i]] <-
list(function(e_b,mu_b){M_b[i]*log(e_b*mu_b))})
}
I couldn't find an answer to this in different questions. I'm sorry, if there's a similar thread already existing.
you need to force the evaluation of the variable M_b[i], see https://adv-r.hadley.nz/function-factories.html. Below I try and make it work
func = function(i){
i = force(i)
f = function(e_b,mu_b){i*log(e_b*mu_b) }
return(f)
}
# test
func(9)(7,3) == 9*log(7*3)
#some simulated values for M_b
M_b = runif(75)
log_likelihood = vector("list",75)
for (idx in 1:75){
log_likelihood[[idx]] <- func(M_b[idx])
}
# we test it on say e_b=5, mu_b=6
test = sapply(log_likelihood,function(i)i(5,6))
actual = sapply(M_b,function(i)i*log(5*6))
identical(test,actual)
[1] TRUE
This is called lazy evaluation, where R doesn't evaluate an expression when it is not used. As correctly pointed about by #SDS0, the value you get is at i=75. We try it with your original function:
func = function(i){function(e_b,mu_b){i*log(e_b*mu_b) }}
M_b = 1:3
log_likelihood = vector("list",3)
for (idx in 1:3){
log_likelihood[[idx]] = func(M_b[idx])
}
sapply(log_likelihood,function(f)f(5,6))
[1] 10.20359 10.20359 10.20359
#you get 10.20359 which is M_b[3]*log(5*6)
There is one last option, which I just learned of, which is to do lapply which no longer does lazy evaluation:
func = function(i){function(e_b,mu_b){i*log(e_b*mu_b) }}
log_likelihood = lapply(1:3,function(idx)func(M_b[idx]))
sapply(log_likelihood,function(f)f(5,6))
[1] 3.401197 6.802395 10.203592
I'm trying to figure out how to allow a function to directly alter or create variables in its parent environment, whether the parent environment is the global environment or another function.
For example if I have a function
my_fun <- function(){
a <- 1
}
I would like a call to my_fun() to produce the same results as doing a <- 1.
I know that one way to do this is by using parent.frame as per below but I would prefer a method that doesn't involve rewriting every variable assignment.
my_fun <- function(){
env = parent.frame()
env$a <- 1
}
Try with:
g <- function(env = parent.frame()) with(env, { b <- 1 })
g()
b
## [1] 1
Note that normally it is preferable to pass the variables as return values rather than directly create them in the parent frame. If you have many variables to return you can always return them in a list, e.g. h <- function() list(a = 1, b = 2); result <- h() Now result$a and result$b have the values of a and b.
Also see Function returning more than one value.
Approach 1
f1 <- function(x)
{
# Do calculation xyz ....
f2 <- function(y)
{
# Do stuff...
return(some_object)
}
return(f2(x))
}
Approach 2
f2 <- function(y)
{
# Do stuff...
return(some_object)
}
f3 <- function(x)
{
# Do calculation xyz ....
return(f2(x))
}
Assume f1 and f3 both do the same calculations and give the same result.
Are there any significant advantages in using approach 1, calling f1(), vs approach 2, calling f3()?
Is a certain approach more favourable when:
large data is being passed in and/or out of f2?
Speed is a big issue. E.g. f1 or f3 are called repeatedly in simulations.
(Approach 1 seems common in packages, defining inside another)
One advantage of using the approach f1 is that f2 won't exist outside f1 once f1 has finished being called (and f2 is only called in f1 or f3).
Benefits of defining f2 inside f1:
f2 only visible within f1, useful if f2 is only meant for use within f1, though within package namespaces this is debatable since you just wouldn't export f2 if you defined it outside
f2 has access to variables within f1, which could be considered a good or a bad thing:
good, because you don't have to pass variables through the function interface and you can use <<- to implement stuff like memoization, etc.
bad, for the same reasons...
Disadvantages:
f2 needs to be redefined every time you call f1, which adds some overhead (not very much overhead, but definitely there)
Data size should not matter since R won't copy the data unless it is being modified under either scenario. As noted in disadvantages, defining f2 outside of f1 should be a little faster, especially if you are repeating an otherwise relatively low overhead operation many times. Here is an example:
> fun1 <- function(x) {
+ fun2 <- function(x) x
+ fun2(x)
+ }
> fun2a <- function(x) x
> fun3 <- function(x) fun2a(x)
>
> library(microbenchmark)
> microbenchmark(
+ fun1(TRUE), fun3(TRUE)
+ )
Unit: nanoseconds
expr min lq median uq max neval
fun1(TRUE) 656 674.5 728.5 859.5 17394 100
fun3(TRUE) 406 434.5 480.5 563.5 1855 100
In this case we save 250ns (edit: the difference is actually 200ns; believe it or not the extra set of {} that fun1 has costs another 50ns). Not much, but can add up if the interior function is more complex or you repeat the function many many times.
You would typically use approach 2. Some exceptions are
Function closures:
f = function() {
counter = 1
g = function() {
counter <<- counter + 1
return(counter)
}
}
counter = f()
counter()
counter()
Function closure enable us to remember the state.
Sometimes it's handy to only define functions as they are only used in one place. For example, when using optim, we often tweak an existing function. For example,
pdf = function(x, mu) dnorm(x, mu, log=TRUE)
f = function(d, lower, initial=0) {
ll = function(mu) {
if(mu < lower) return(-Inf)
else -sum(pdf(d, mu))
}
optim(initial, ll)
}
f(d, 1.5)
The ll function uses the data set d and a lower bound. This is both convenient since this may be the only time we use/need the ll function.
An example of what is mentioned in existing answers is probably what I now think of as being the most useful benefit of defining a function in the environment of another function. In simple terms: You can define functions without specifying all the parameters used inside it, provided those parameters are defined somewhere in the environment in which the function is defined. A nice reference for function environments is of course: https://adv-r.hadley.nz/environments.html
This approach can be handy for breaking up blocks of code in a function, where multiple variables might be required and referred to within the function body, into a bunch of sub functions in the function's environment, allowing for cleaner representation of the code, without having to write out a potentially long parameter list.
A simple dummy example below highlights the point
f1 <- function(x)
{
f2 <- function(y)
{
# possibly long block of code relevant to the meaning of what `f2` represents
y + a + b + d
}
# might be 10+ variables in special cases
a <- 10
b <- 5
d <- 1
f2(x)
}
#test:
> f1(100)
[1] 116
You can't use this approach if you define the functions with separate parent environments:
f3 <- function(x)
{
a <- 10
b <- 5
d <- 1
f2a(x)
}
f2a <- function(y)
{
y + a + b + d
}
> f3(100)
Error in f2a(x) : object 'a' not found
Why does
f <- function(a) {
g <- function(a=a) {
return(a + 2)
}
return(g())
}
f(3) # Error in a + 2: 'a' is missing
cause an error? It has something to do with the a=a argument, particularly with the fact that the variable names are the same. What exactly is going on?
Here are some similar pieces of code that work as expected:
f <- function(a) {
g <- function(a) {
return(a + 2)
}
return(g(a))
}
f(3) # 5
f <- function(a) {
g <- function(g_a=a) {
return(g_a + 2)
}
return(g())
}
f(3) # 5
g <- function(a) a + 2
f <- function(a) g(a)
f(3) # 5
The problem is that, as explained in the R language definition:
The default arguments to a function are evaluated in the evaluation frame of the function.
In your first code block, when you call g() without any arguments, it falls back on its default value of a, which is a. Evaluating that in the "frame of the function" (i.e. the environment created by the call to g()), it finds an argument whose name matches the symbol a, and its value is a. When it looks for the value of that a, it finds an argument whose name matches that symbol, and whose value is a. When...
As you can see, you're stuck in a loop, which is what the error message is trying to tell you:
Error in g() :
promise already under evaluation: recursive default argument reference or
earlier problems?
Your second attempt, which calls g(a) works as you expected, because you've supplied an argument, and, as explained in the same section of R-lang:
The supplied arguments to a function are evaluated in the evaluation frame of the calling function.
There it finds a symbol a, which is bound to whatever value you passed in to the outer function's formal argument a, and all is well.
The problem is the a=a part. An argument can't be its own default. That is a circular reference.
This example may help clarify how it works:
x <- 1
f <- function(a = x) { x <- 2; a }
f()
## [1] 2
Note that a does not have the default 1; it has the default 2. It looks first in the function itself for the default. In a similar way a=a would cause a to be its own default which is circular.