When running a function in R, I run another function within it.
I have a code on the lines of this:
f_a <- function(b, c){
return(b + c)
}
f_e <- function(){
b = 2
c = 2
d = f_a(b, c)
print(d)
}
This works fine. What I'd like to do is not pass the variables b, c to the function f_a. I'd like to do something like this (which throws errors)
f_a <- function(){
return(b + c)
}
f_e <- function(){
b = 2
c = 2
d = f_a()
print(d)
}
Is there a way to do this using environments or search paths or any other way?
I do encourage you to read about lexical scoping,
but I think a good approach to avoid writing a lot of variables could be:
get_args_for <- function(fun, env = parent.frame(), inherits = FALSE, ..., dots) {
potential <- names(formals(fun))
if ("..." %in% potential) {
if (missing(dots)) {
# return everything from parent frame
return(as.list(env))
}
else if (!is.list(dots)) {
stop("If provided, 'dots' should be a list.")
}
potential <- setdiff(potential, "...")
}
# get all formal arguments that can be found in parent frame
args <- mget(potential, env, ..., ifnotfound = list(NULL), inherits = inherits)
# remove not found
args <- args[sapply(args, Negate(is.null))]
# return found args and dots
c(args, dots)
}
f_a <- function(b, c = 0, ..., d = 1) {
b <- b + 1
c(b = b, c = c, d = d, ...)
}
f_e <- function() {
b <- 2
c <- 2
arg_list <- get_args_for(f_a, dots = list(5))
do.call(f_a, arg_list)
}
> f_e()
b c d
3 2 1 5
Setting inherits = FALSE by default ensures that we only get variables from the specified environment.
We could also set dots = NULL when calling get_args_for so that we don't pass all variables,
but leave the ellipsis empty.
Nevertheless, it isn't entirely robust,
because dots is simply appended at the end,
and if some arguments are not named,
they could end up matched by position.
Also, if some values should be NULL in the call,
it wouldn't be easy to detect it.
I would strongly advise against using these below inside an R package.
Not only will it be rather ugly,
you'll get a bunch of notes from R's CMD check regarding undefined global variables.
Other options.
f_a <- function() {
return(b + c)
}
f_e <- function() {
b <- 2
c <- 2
# replace f_a's enclosing environment with the current evaluation's environment
environment(f_a) <- environment()
d <- f_a()
d
}
> f_e()
[1] 4
Something like the above probably wouldn't work inside an R package,
since I think a package's functions have their enclosing environments locked.
Or:
f_a <- function() {
with(parent.frame(), {
b + c
})
}
f_e <- function() {
b <- 2
c <- 2
f_a()
}
> f_e()
[1] 4
That way you don't modify the other function's enclosing environment permanently.
However, both functions will share an environment,
so something like this could happen:
f_a <- function() {
with(parent.frame(), {
b <- b + 1
b + c
})
}
f_e <- function() {
b <- 2
c <- 2
d <- f_a()
c(b,d)
}
> f_e()
[1] 3 5
Where calling the inner function modifies the values in the outer environment.
Yet another option that is a bit more flexible,
since it only modifies the enclosing environment temporarily by using eval.
However, there are certain R functions that detect their current execution environment through "daRk magic",
and cannot be fooled by eval;
see this discussion.
f_a <- function() {
b <- b + 1
b + c
}
f_e <- function() {
b <- 2
c <- 2
# use current environment as enclosing environment for f_a's evaluation
d <- eval(body(f_a), list(), enclos=environment())
c(b=b, d=d)
}
> f_e()
b d
2 5
One option is to explicitly grab a and b from the calling environment:
f_a <- function(){
get('b', envir = parent.frame()) + get('c', envir = parent.frame())
}
f_e <- function(){
b = 2
c = 2
d = f_a()
d
}
f_e()
#> [1] 4
Alternatively, you can use quote to delay evaluation and then eval to evaluate the code in the calling environment, effectively doing the same thing:
f_a <- function(){
eval(quote(b + c), parent.frame())
}
This is not really a robust way to write code, though, as it limits the possible ways to call f_a successfully. It's much easier to follow code that explicitly passes variables.
Edit:
#alistaire's suggestion to use quote to construct the expressions brings up this further alternative that seems even less ugly:
expr_env <- new.env()
expr_env$f_a <- quote(b+c)
expr_env$f_z <- quote(x+y)
f_e<-function(){
b=2
c=2
d=eval( expr_env$f_a)
print(d)
}
Would defining the function using local be an acceptable alternative?
f_e<-function(){
b=2
c=2
d<-local({
b+c
})
print(d)
}
f_e()
[1] 4
An alternative would be to only return a parse tree and then finish the evaluation in the environment "local" to the function. This seems "ugly" to me:
expr_list<-function(){ f_a <- quote(b+c)
f_z <- quote(x+y)
list(f_a=f_a,f_z=f_z) }
f_e<-function(){
b=2
c=2
d=eval( (expr_list()$f_a))
print(d)
}
You can assign the variables to global environment and use inside function.
f_a <- function(){
return(b + c)
}
f_e <- function(){
assign("b", 2, envir = .GlobalEnv)
assign("c", 2, envir = .GlobalEnv)
d = f_a()
print(d)
}
# > f_e()
# [1] 4
Related
I'd like to be able to pass current arguments in a function to another function without individually listing each of the arguments. This is for a slightly more complex function which will have about 15 arguments with potentially more arguments later added (it's based on an API for data which might have more complex data added later):
f_nested <- function(a, b, ...) {
c <- a + b
return(c)
}
f_main <- function(a, b) {
d <- do.call(f_nested, as.list(match.call(expand.dots = FALSE)[-1]))
c <- 2 / d
return(c)
}
f_main(2, 3)
#> [1] 0.4
sapply(2:4, function(x) f_main(x, 4))
#> Error in (function (a, b, ...) : object 'x' not found
Created on 2019-06-28 by the reprex package (v0.3.0)
The first call to f_main(2, 3) produces the expected result. However, when iterating over a vector of values with sapply an error arises that the object was not found. I suspect my match.call() use is not correct and I'd like to be able to iterate over my function.
I'll borrow from lm's used of match.call, replacing the first element with the next function. I think one key is to call eval with the parent.frame(), so that x will be resolved correctly.
# no change
f_nested <- function(a, b, ...) {
c <- a + b
return(c)
}
# changed, using `eval` instead of `do.call`, reassigning the function name
f_main <- function(a, b) {
thiscall <- match.call(expand.dots = TRUE)
thiscall[[1]] <- as.name("f_nested")
d <- eval(thiscall, envir = parent.frame())
c <- 2 / d
return(c)
}
sapply(2:4, function(x) f_main(x, 4))
# [1] 0.3333333 0.2857143 0.2500000
As #MrFlick suggested, this can be shortened slightly with:
f_main <- function(a, b) {
thiscall <- match.call(expand.dots = TRUE)
thiscall[[1]] <- as.name("f_nested")
d <- eval.parent(thiscall)
c <- 2 / d
return(c)
}
Inside an R function, is it possible to detect if the user has assigned the output to an object?
For example, I would like to print on console some information only if the output is not assigned to an object, I am looking for something like this
fun <- function(a){
b <- a^2
if(!<OUTPUT ASSIGNED>) cat('a squared is ', b)
return(invisible(b))
}
So that the result on console would be different whether the function output is assigned or not, e.g:
> fun(5)
> a squared is 25
>
> out <- fun(5)
>
>
Not sure if I've completely thought this one through, but this seems to work for the example you've given. (Note it's important to use = or assign or .Primitive("<-") inside the fun you'd like to subject to this treatment.)
fun <- function(a){
b = a^2 # can't use <- here
if (!identical(Sys.getenv("R_IS_ASSIGNING"), "true")) cat('a squared is ', b)
return(invisible(b))
}
`<-` <- function(a, b) {
Sys.setenv("R_IS_ASSIGNING" = "true")
eval.parent(substitute(.Primitive("<-")(a, b)))
Sys.unsetenv("R_IS_ASSIGNING")
}
fun(5)
#> a squared is 25
out <- fun(6)
out
#> [1] 36
Created on 2019-02-17 by the reprex package (v0.2.1)
If I correctly understand what do you need it's better to use custom print method:
print.squared_value = function(x, ...){
cat('a squared is', x, "\n")
x
}
fun = function(a){
b = a^2
class(b) = union("squared_value", class(b))
b
}
fun(2)
# a squared is 4
UPDATE:
fun = function(a){
b = a^2
invisible(b)
}
h = taskCallbackManager()
# add a callback
h$add(function(expr, value, ok, visible) {
# if it was a call 'fun' without assinment
if(is.call(expr) && identical(expr[[1]], quote(fun))){
cat('a squared is', value, "\n")
}
return(TRUE)
}, name = "simpleHandler")
fun(2)
# a squared is 4
b = fun(2)
b
# [1] 4
# remove handler
removeTaskCallback("R-taskCallbackManager")
If I understood well, this could do the trick:
fun <- function(a){
b <- a^2
if(sum(unlist(lapply(lapply(ls(envir = .GlobalEnv), get), function(x){ identical(x,a^2)})))==0) cat('a squared is ', b)
return(invisible(b))
}
So:
ls(envir=.GlobalEnv) will return all objects in your global environment
lapply(ls(envir = .GlobalEnv), get): will return a list with the content of all objects in your global environment
lapply(lapply(ls(envir = .GlobalEnv), get), function(x){ identical(x,a^2)}): will return a logical list checking if the content of any of all objects in your global environment is identical to the output of your function
sum(unlist(lapply(lapply(ls(envir = .GlobalEnv), get), function(x){ identical(x,a^2)})))==0 if none of the content of any of all objects is identical to hte ouput of your function, then... cat!
I hope this helps you!
Best!
tl;dr / summary
I have a function that takes a list with many elements as argument. I would like to be able to work with those elements directly by assigning them within the function. As I have many elements I would like to use the walk functions from the tidyverse package.
library(tidyverse)
list <- (a = 1, b = 2)
func <- function(x){
walk2(names(x), x, assign, envir = {some environment})
{Rest of the function, where I can use a and b instead of x$a and x$b}
}
What environment should I use? And no parent.frame() does not work, see below.
Why parent.frame() doesn't work
Step 1: No functions - Here parent.frame() is necessary (because you're using assign inside a function). But if you do, everything works fine.
x <- list(a = 1, b = 2)
a <- 0
b <- 0
# Doesn't work:
walk2(names(x), x, assign)
a
b
# Does work:
walk2(names(x), x, assign, envir = parent.frame())
a
b
Step 2: One function - Things get more complicated when you do the same inside a function. Using parent.frame() as environment will cause the variables to change in the global environment as well.
func <- function(x) {
walk2(names(x), x, assign)
print(paste0("a inside func without parent.frame() is ", a))
print(paste0("b inside func without parent.frame() is ", b))
walk2(names(x), x, assign, envir = parent.frame())
print(paste0("a inside func with parent.frame() is ", a))
print(paste0("b inside func with parent.frame() is ", b))
}
# Using envir = parent.frame() seems to work at first
x <- list(a = 1, b = 2)
a <- 0
b <- 0
func(x)
# But a and b get overwritten in the global environment as well (unwanted)
a
b
This is not desirable. This does not happen when you assign in a "normal" way as shown by the following example.
# How it should work
func_desired <- function(x){
a <- x[[1]]
b <- x[[2]]
print(paste0("a inside func_alt is ", a))
print(paste0("b inside func_alt is ", b))
}
# a and b are correctly assigned within the function
x <- list(a = 1, b = 2)
a <- 0
b <- 0
func_desired(x)
# But outside of the function a and b are not altered
a
b
Step 3: Nested functions - Here it becomes clear that using parent.frame() will assign the variables not in the environment where walk is called, but the one behind that.
func2 <- function(x){
func(x)
print(paste0("a inside func2 is ", a))
print(paste0("b inside func2 is ", b))
}
# a and b are assigned 1 and 2 inside func2,
# But in the global environment they stay unchanged (both 0)
# func (called from func2) will look in the global environment and find a = b = 0
x <- list(a = 1, b = 2)
a <- 0
b <- 0
func2(x)
a
b
My question
What environment should I use so that this works as desired, i.e. all variables are assigned to their name in the list, but only within the environment of the function where walk is called.?
Update 2
#G. Grothendieck posted two approaches. The second one is changing the function environment inside a function. This solves my problem of too many coding replicates. I am not sure if this is a good method to pass through the CRAN check when making my scripts into a package. I will update again when I have some conclusions.
Update
I am trying to pass a lot of input argument variables to f2 and do not want to index every variable inside the function as env$c, env$d, env$calls, that is why I tried to use with in f5 and f6(a modified f2). However, assign does not work with with inside the {}, moving assign outside with will do the job but in my real case I have a few assigns inside the with expressions which I do not know how to move them out of the with function easily.
Here is an example:
## In the <environment: R_GlobalEnv>
a <- 1
b <- 2
f1 <- function(){
c <- 3
d <- 4
f2 <- function(P){
assign("calls", calls+1, inherits=TRUE)
print(calls)
return(P+c+d)
}
calls <- 0
v <- vector()
for(i in 1:10){
v[i] <- f2(P=0)
c <- c+1
d <- d+1
}
return(v)
}
f1()
Function f2 is inside f1, when f2 is called, it looks for variables calls,c,d in the environment environment(f1). This is what I wanted.
However, when I want to use f2 also in the other functions, I will define this function in the Global environment instead, call it f4.
f4 <- function(P){
assign("calls", calls+1, inherits=TRUE)
print(calls)
return(P+c+d)
}
This won't work, because it will look for calls,c,d in the Global environment instead of inside a function where the function is called. For example:
f3 <- function(){
c <- 3
d <- 4
calls <- 0
v <- vector()
for(i in 1:10){
v[i] <- f4(P=0) ## or replace here with f5(P=0)
c <- c+1
d <- d+1
}
return(v)
}
f3()
The safe way should be define calls,c,d in the input arguments of f4 and then pass these parameters into f4. However, in my case, there are too many variables to be passed into this function f4 and it would be better that I can pass it as an environment and tell f4 do not look in the Global environment(environment(f4)), only look inside the environment when f3 is called.
The way I solve it now is to use the environment as a list and use the with function.
f5 <- function(P,liste){
with(liste,{
assign("calls", calls+1, inherits=TRUE)
print(calls)
return(P+c+d)
}
)
}
f3 <- function(){
c <- 3
d <- 4
calls <- 0
v <- vector()
for(i in 1:10){
v[i] <- f5(P=0,as.list(environment())) ## or replace here with f5(P=0)
c <- c+1
d <- d+1
}
return(v)
}
f3()
However, now assign("calls", calls+1, inherits=TRUE) does not work as it should be since assign does not modify the original object. The variable calls is connected to an optimization function where the objective function is f5. That is the reason I use assign instead of passing calls as an input arguments. Using attach is also not clear to me. Here is my way to correct the assign issue:
f7 <- function(P,calls,liste){
##calls <<- calls+1
##browser()
assign("calls", calls+1, inherits=TRUE,envir = sys.frame(-1))
print(calls)
with(liste,{
print(paste('with the listed envrionment, calls=',calls))
return(P+c+d)
}
)
}
########
##################
f8 <- function(){
c <- 3
d <- 4
calls <- 0
v <- vector()
for(i in 1:10){
##browser()
##v[i] <- f4(P=0) ## or replace here with f5(P=0)
v[i] <- f7(P=0,calls,liste=as.list(environment()))
c <- c+1
d <- d+1
}
f7(P=0,calls,liste=as.list(environment()))
print(paste('final call number',calls))
return(v)
}
f8()
I am not sure how this should be done in R. Am I on the right direction, especially when passing through the CRAN check? Anyone has some hints on this?
(1) Pass caller's environment. You can explicitly pass the parent environment and index into it. Try this:
f2a <- function(P, env = parent.frame()) {
env$calls <- env$calls + 1
print(env$calls)
return(P + env$c + env$d)
}
a <- 1
b <- 2
# same as f1 except f2 removed and call to f2 replaced with call to f2a
f1a <- function(){
c <- 3
d <- 4
calls <- 0
v <- vector()
for(i in 1:10){
v[i] <- f2a(P=0)
c <- c+1
d <- d+1
}
return(v)
}
f1a()
(2) Reset called function's environment We can reset the environment of f2b in f1b as shown here:
f2b <- function(P) {
calls <<- calls + 1
print(calls)
return(P + c + d)
}
a <- 1
b <- 2
# same as f1 except f2 removed, call to f2 replaced with call to f2b
# and line marked ## at the beginning is new
f1b <- function(){
environment(f2b) <- environment() ##
c <- 3
d <- 4
calls <- 0
v <- vector()
for(i in 1:10){
v[i] <- f2b(P=0)
c <- c+1
d <- d+1
}
return(v)
}
f1b()
(3) Macro using eval.parent(substitute(...)) Yet another approach is to define a macro-like construct which effectively injects the body of f2c inline into f1c1. Here f2c is the same as f2b except for the calls <- calls + 1 line (no <<- needed) and the wrapping of the entire body in eval.parent(substitute({...})). f1c is the same as f1a except the call to f2a is replaced with a call to f2c .
f2c <- function(P) eval.parent(substitute({
calls <- calls + 1
print(calls)
return(P + c + d)
}))
a <- 1
b <- 2
f1c <- function(){
c <- 3
d <- 4
calls <- 0
v <- vector()
for(i in 1:10){
v[i] <- f2c(P=0)
c <- c+1
d <- d+1
}
return(v)
}
f1c()
(4) defmacro This is almost the same as the the last solution except it uses defmacro in the gtools package to define the macro rather than doing it ourself. (Also see the Rcmdr package for another defmacro version.) Because of the way defmacro works we must also pass calls but since it's a macro and not a function this just tells it to substitute calls in and is not the same as passing calls to a function.
library(gtools)
f2d <- defmacro(P, calls, expr = {
calls <- calls + 1
print(calls)
return(P + c + d)
})
a <- 1
b <- 2
f1d <- function(){
c <- 3
d <- 4
calls <- 0
v <- vector()
for(i in 1:10){
v[i] <- f2d(P=0, calls)
c <- c+1
d <- d+1
}
return(v)
}
f1d()
In general, I would say that any variable that is needed inside a function should be passed on through its arguments. In addition, if its value is needed later you pass it back from the function. Not doing this can quite quickly lead to strange results, e.g. what if there are multiple functions defining a variable x, which one should be used. If the amount of variables is larger, you create a custom data structure for it, e.g. putting them into a named list.
One could also use a function that redefines other functions in the specified environment.
test_var <- "global"
get_test_var <- function(){
return(test_var)
}
some_function <- function(){
test_var <- "local"
return(get_test_var())
}
some_function() # Returns "global". Not what we want here...
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
some_function2 <- function(){
test_var <- "local"
# define function locally
get_test_var2 <- function(){
return(test_var)
}
return(get_test_var2())
}
some_function2() # Returns "local", but 'get_test_var2' can't be used in other places.
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
add_function_to_envir <- function(my_function_name, to_envir) {
script_text <- capture.output(eval(parse(text = my_function_name)))
script_text[1] <- paste0(my_function_name, " <- ", script_text[1])
eval(parse(text = script_text), envir = to_envir)
}
some_function3 <- function(){
test_var <- "local"
add_function_to_envir("get_test_var", environment())
return(get_test_var())
}
some_function3() # Returns "local" and we can use 'get_test_var' from anywhere.
Here add_function_to_envir(my_function_name, to_envir) captures the script of the function, parses and reevaluates it in the new environment.
Note: the name of the function for my_function_name needs to be in quotes.
Whenever I use nested functions and don't pass the variables on as arguments, but instead pass them on with ..., I use the following function in all nested functions to get variables from the parent environment.
LoadVars <- function(variables, ...){
for (var in 1:length(variables)) {
v <- get(variables[var], envir = parent.frame(n=2))
assign(variables[var], v, envir = parent.frame(n=1))
}
}
Inside a nested function, I then LoadVars(c("foo", "bar")).
This approach is useful in the sense that you only pass on the variables you need, similar as when you pass on the variables through arguments.
Approach 2
However, it is simple to rewrite this function to load in all variables from the parent function—or higher up if needed, just increase the n value in parent.frame from its original value of 2.
LoadVars <- function(){
variables <- ls(envir = parent.frame(n=2))
for (var in 1:length(variables)) {
v <- get(variables[var], envir = parent.frame(n=2))
assign(variables[var], v, envir = parent.frame(n=1))
}
}
Example
a <- 1
A <- function(...){
b <- 2
printf("A, a = %s", a)
printf("A, b = %s", b)
B()
}
B <- function(...){
LoadVars()
printf("B, a = %s", a)
printf("B, b = %s", b)
}
A()
If you don't load variables in B, then B is able to load a because it is a global environment variable, but not b which is located in A().
Output:
[1] "A, a = 1"
[1] "A, b = 2"
[1] "B, a = 1"
[1] "B, b = 2"
I have a function in R that I call multiple times. I want to keep track of the number of times that I've called it and use that to make decisions on what to do inside of the function. Here's what I have right now:
f = function( x ) {
count <<- count + 1
return( mean(x) )
}
count = 1
numbers = rnorm( n = 100, mean = 0, sd = 1 )
for ( x in seq(1,100) ) {
mean = f( numbers )
print( count )
}
I don't like that I have to declare the variable count outside the scope of the function. In C or C++ I could just make a static variable. Can I do a similar thing in the R programming language?
Here's one way by using a closure (in the programming language sense), i.e. store the count variable in an enclosing environment accessible only by your function:
make.f <- function() {
count <- 0
f <- function(x) {
count <<- count + 1
return( list(mean=mean(x), count=count) )
}
return( f )
}
f1 <- make.f()
result <- f1(1:10)
print(result$count, result$mean)
result <- f1(1:10)
print(result$count, result$mean)
f2 <- make.f()
result <- f2(1:10)
print(result$count, result$mean)
result <- f2(1:10)
print(result$count, result$mean)
Here is another approach. This one requires less typing and (in my opinion) more readable:
f <- function(x) {
y <- attr(f, "sum")
if (is.null(y)) {
y <- 0
}
y <- x + y
attr(f, "sum") <<- y
return(y)
}
This snippet, as well as more complex example of the concept can by found in this R-Bloggers article
It seems the right answer was given by G. Grothendieck there: Emulating static variable within R functions But somehow this post got more favorable position in google search, so i copy this answer here:
Define f within a local like this:
f <- local({
static <- 0
function() { static <<- static + 1; static }
})
f()
## [1] 1
f()
## [1] 2