I'm studying R environments but I have got a question reading the "Advanced R" book by Hadley Wickham. Is it possible to make the execution environment of a function permanent?
I will try to explain the why of my question.
When Wickham explains how the execution environment of a function works the following example is shown:
j <- function() {
if (!exists("a")) {
a <- 1
} else {
a <- a + 1
}
print(a)
}
j()
I have understood why every time the function j gets called the returned value is 1.
In another part of the text he says:
When you create a function inside another function, the enclosing
environment of the child function is the execution environment of the
parent, and the execution environment is no longer ephemeral.
So I have thought to create the following function:
j <- function() {
if (!exists("a")) {
a <- 1
} else {
a <- a + 1
}
print(a)
g <- function() {}
g()
}
j()
But the returned value is always 1, so I suppose it's because the execution environment continues to be destroyed every time. What does that "no longer ephemeral" mean?
Based on the book there is also possible to use function factory structure (a function that creates another function) to capture the ephemeral execution environment of the first function. The following example is just a simple way of how we could capture it:
library(rlang)
j <- function() {
print(current_env())
a <- 1
k <- function() {
if (!exists("a")) {
a <- 1
} else {
a <- a + 1
}
print(a)
}
}
plus <- j()
> plus <- j()
<environment: 0x000001ca98bc1598>
Now no matter how many times you use the function plus, its environment will always be the then execution environment of the first function:
library(rlang)
env_print(plus)
<environment: 000001CA98BC1598>
parent: <environment: global>
bindings:
* k: <fn>
* a: <dbl>
plus()
[1] 2
env_print(plus)
<environment: 000001CA98BC1598>
parent: <environment: global>
bindings:
* k: <fn>
* a: <dbl>
I hope this to some extent answered your question, however there might be better answers out there too.
A permanent environment within a function is called a "closure". Here a toy example to demonstrate this. Check it out and then modify your code accordingly.
closure <- function(j) {
i <- 1
function(x) {
i <<- i + 1
j * i + x
}
}
i <- 12345
instance <- closure(11)
instance(3)
#[1] 25
instance(3)
#[1] 36
instance(3)
#[1] 47
otherObj <- closure(2)
otherObj(3)
#[1] 7
instance(2)
#[1] 57
Suppose I have the following function:
## Just an example
f = function() {
for(i in 1:10000)
cat(i)
return(1)
}
When I call f() is there a way to stop cat printing to the screen (without altering the function in anyway)?
Reason behind this question
My students upload their R files. I then run the scripts and check to see if they are correct. Every so often, a student leaves in the cat command. This is especially irritating when it's in a long for loop
On Linux, you can use a sink() call to /dev/null(or to a temporary file on another OS, see ?tempfile) :
sink(file="/dev/null")
f()
sink()
This should work?
oldcat = cat
cat = function( ..., file="", sep=" ", fill=F, labels=NULL, append=F ) {}
f()
cat = oldcat
Just replace cat with an empty function, and then set it back on completion
Here is a funny hack that comments out all the cat()'s in a function. Not sure if this gives errors or breaks the function though:
foo <- deparse(f)
f <- eval(parse(text=gsub("cat","#cat",foo)))
f()
[1] 1
Edit:
Another option is basically Juba's answer, using sink, but you can use the Defaults package to change the default behavior of cat. The file argument basically sinks its output in a file. So :
library("Defaults")
setDefaults(cat,file="sink.txt")
f()
Ensures that only output of cat and not print or so is sinked. However, this drastically reduces the runtime since now a file is opened and closed everytime cat() is run.
capture.output() with invisible() does what you want:
f <- function() {
cat("Hello")
return(TRUE)
}
f1 <- function() {
invisible(capture.output(f()))
}
x <- f1()
This also works:
f2 <- function() {
tmp <- tempfile()
sink(tmp)
on.exit(sink())
on.exit(file.remove(tmp), add = TRUE)
invisible(force(f()))
}
x <- f2()
The function quietly() from the purrr library creates a quiet version of a function:
library(purrr)
f <- function() {
cat("Hello")
return(TRUE)
}
f2 <- quietly(f)
f2()
#> $result
#> [1] TRUE
#>
#> $output
#> [1] "Hello"
#>
#> $warnings
#> character(0)
#>
#> $messages
#> character(0)
I want to be able to find the environment from which the ... (dots) arguments of a call originate.
Scenario
For example, consider a function
foo <- function(x, ...) {
# do something
}
We want a function env_dots(), which we invoke from within foo(), that finds the originating environment of the ... in a call to foo(), even when the call to foo() is deeply nested. That is, if we define
foo <- function(x, ...) {
# find the originating environment of '...'
env <- env_dots()
# do something
}
and nest a call to foo, like so,
baz <- function(...) {
a <- "You found the dots"
bar(1, 2)
}
bar <- function(...)
foo(...)
then calling baz() should return the environment in which the ... in the (nested) call to foo(...) originates: this is the environment where the call bar(1, 2) is made, since the 2 (but not the 1) gets passed to the dots of foo. In particular, we should get
baz()$a
#> [1] "You found the dots"
Naive implementation of env_dots()
Update — env_dots(), as defined here, will not work in general, because the final ... may be populated by arguments that are called at multiple levels of the call stack.
Here's one possibility for env_dots():
# mc: match.call() of function from which env_dots() is called
env_dots <- function(mc) {
# Return NULL if initial call invokes no dots
if (!rlang::has_name(mc, "...")) return(NULL)
# Otherwise, climb the call stack until the dots origin is found
stack <- rlang::call_stack()[-1]
l <- length(stack)
i <- 1
while (i <= l && has_dots(stack[[i]]$expr)) i <- i + 1
# return NULL if no dots invoked
if (i <= l) stack[[i + 1]]$env else NULL
}
# Does a call have dots?
has_dots <- function(x) {
if (is.null(x))
return(FALSE)
args <- rlang::lang_tail(x)
any(vapply(args, identical, logical(1), y = quote(...)))
}
This seems to work: with
foo <- function(x, ...)
env_dots(match.call(expand.dots = FALSE))
we get
baz()$a
#> [1] "You found the dots"
bar(1, 2) # 2 gets passed down to the dots of foo()
#> <environment: R_GlobalEnv>
bar(1) # foo() captures no dots
#> NULL
Questions
The above implementation of env_dots() is not very efficient.
Is there are more skillful way to implement env_dots() in rlang and/or base R?
How can I move the match.call() invocation to within env_dots()?
match.call(sys.function(-1), call = sys.call(-1), expand.dots = FALSE) will indeed work.
Remark — One can't infer the origin environment of the dots from rlang::quos(...), because some quosures won't be endowed with the calling environment (e.g., when an expression is a literal object).
I'm sorry to dig up an old question, but I'm not sure the desired behavior is well-defined. ... is not a single expression; it's a list of expressions. In case of rlang quosures, each of those expressions has their own environment. So what should the environment of the list be?
Furthermore, the ... list itself can be modified. Consider the following example, where g takes its ..., prepends it with an (unevaluated) expression x+3 and passes it onto f.
f <- function(...) {rlang::enquos( ... )}
g <- function(...) {
a <- rlang::quo( x + 3 )
l <- rlang::list2( a, ... )
f(!!!l)
}
b <- rlang::quo( 5 * y )
g( b, 10 )
# [[1]]
# <quosure>
# expr: ^x + 3
# env: 0x7ffd1eca16f0
# [[2]]
# <quosure>
# expr: ^5 * y
# env: global
# [[3]]
# <quosure>
# expr: ^10
# env: empty
Notice that each of the three quosures that make it over to f has their own environment. (As you noted in your question, literals like 10 have an empty environment. This is because the value is the same independent of which environment it's evaluated in.)
Given this scenario, what should the hypothetical env_dots() return when called inside f()?
This is probably not correct terminology, but hopefully I can get my point across.
I frequently end up doing something like:
myVar = 1
f <- function(myvar) { return(myVar); }
# f(2) = 1 now
R happily uses the variable outside of the function's scope, which leaves me scratching my head, wondering how I could possibly be getting the results I am.
Is there any option which says "force me to only use variables which have previously been assigned values in this function's scope"? Perl's use strict does something like this, for example. But I don't know that R has an equivalent of my.
EDIT: Thank you, I am aware of that I capitalized them differently. Indeed, the example was created specifically to illustrate this problem!
I want to know if there is a way that R can automatically warn me when I do this.
EDIT 2: Also, if Rkward or another IDE offers this functionality I'd like to know that too.
As far as I know, R does not provide a "use strict" mode. So you are left with two options:
1 - Ensure all your "strict" functions don't have globalenv as environment. You could define a nice wrapper function for this, but the simplest is to call local:
# Use "local" directly to control the function environment
f <- local( function(myvar) { return(myVar); }, as.environment(2))
f(3) # Error in f(3) : object 'myVar' not found
# Create a wrapper function "strict" to do it for you...
strict <- function(f, pos=2) eval(substitute(f), as.environment(pos))
f <- strict( function(myvar) { return(myVar); } )
f(3) # Error in f(3) : object 'myVar' not found
2 - Do a code analysis that warns you of "bad" usage.
Here's a function checkStrict that hopefully does what you want. It uses the excellent codetools package.
# Checks a function for use of global variables
# Returns TRUE if ok, FALSE if globals were found.
checkStrict <- function(f, silent=FALSE) {
vars <- codetools::findGlobals(f)
found <- !vapply(vars, exists, logical(1), envir=as.environment(2))
if (!silent && any(found)) {
warning("global variables used: ", paste(names(found)[found], collapse=', '))
return(invisible(FALSE))
}
!any(found)
}
And trying it out:
> myVar = 1
> f <- function(myvar) { return(myVar); }
> checkStrict(f)
Warning message:
In checkStrict(f) : global variables used: myVar
checkUsage in the codetools package is helpful, but doesn't get you all the way there.
In a clean session where myVar is not defined,
f <- function(myvar) { return(myVar); }
codetools::checkUsage(f)
gives
<anonymous>: no visible binding for global variable ‘myVar’
but once you define myVar, checkUsage is happy.
See ?codetools in the codetools package: it's possible that something there is useful:
> findGlobals(f)
[1] "{" "myVar" "return"
> findLocals(f)
character(0)
You need to fix the typo: myvar != myVar. Then it will all work...
Scope resolution is 'from the inside out' starting from the current one, then the enclosing and so on.
Edit Now that you clarified your question, look at the package codetools (which is part of the R Base set):
R> library(codetools)
R> f <- function(myVAR) { return(myvar) }
R> checkUsage(f)
<anonymous>: no visible binding for global variable 'myvar'
R>
Using get(x, inherits=FALSE) will force local scope.
myVar = 1
f2 <- function(myvar) get("myVar", inherits=FALSE)
f3 <- function(myvar){
myVar <- myvar
get("myVar", inherits=FALSE)
}
output:
> f2(8)
Error in get("myVar", inherits = FALSE) : object 'myVar' not found
> f3(8)
[1] 8
You are of course doing it wrong. Don't expect static code checking tools to find all your mistakes. Check your code with tests. And more tests. Any decent test written to run in a clean environment will spot this kind of mistake. Write tests for your functions, and use them. Look at the glory that is the testthat package on CRAN.
There is a new package modules on CRAN which addresses this common issue (see the vignette here). With modules, the function raises an error instead of silently returning the wrong result.
# without modules
myVar <- 1
f <- function(myvar) { return(myVar) }
f(2)
[1] 1
# with modules
library(modules)
m <- module({
f <- function(myvar) { return(myVar) }
})
m$f(2)
Error in m$f(2) : object 'myVar' not found
This is the first time I use it. It seems to be straightforward so I might include it in my regular workflow to prevent time consuming mishaps.
you can dynamically change the environment tree like this:
a <- 1
f <- function(){
b <- 1
print(b)
print(a)
}
environment(f) <- new.env(parent = baseenv())
f()
Inside f, b can be found, while a cannot.
But probably it will do more harm than good.
You can test to see if the variable is defined locally:
myVar = 1
f <- function(myvar) {
if( exists('myVar', environment(), inherits = FALSE) ) return( myVar) else cat("myVar was not found locally\n")
}
> f(2)
myVar was not found locally
But I find it very artificial if the only thing you are trying to do is to protect yourself from spelling mistakes.
The exists function searches for the variable name in the particular environment. inherits = FALSE tells it not to look into the enclosing frames.
environment(fun) = parent.env(environment(fun))
will remove the 'workspace' from your search path, leave everything else. This is probably closest to what you want.
#Tommy gave a very good answer and I used it to create 3 functions that I think are more convenient in practice.
strict
to make a function strict, you just have to call
strict(f,x,y)
instead of
f(x,y)
example:
my_fun1 <- function(a,b,c){a+b+c}
my_fun2 <- function(a,b,c){a+B+c}
B <- 1
my_fun1(1,2,3) # 6
strict(my_fun1,1,2,3) # 6
my_fun2(1,2,3) # 5
strict(my_fun2,1,2,3) # Error in (function (a, b, c) : object 'B' not found
checkStrict1
To get a diagnosis, execute checkStrict1(f) with optional Boolean parameters to show more ore less.
checkStrict1("my_fun1") # nothing
checkStrict1("my_fun2") # my_fun2 : B
A more complicated case:
A <- 1 # unambiguous variable defined OUTSIDE AND INSIDE my_fun3
# B unambiguous variable defined only INSIDE my_fun3
C <- 1 # defined OUTSIDE AND INSIDE with ambiguous name (C is also a base function)
D <- 1 # defined only OUTSIDE my_fun3 (D is also a base function)
E <- 1 # unambiguous variable defined only OUTSIDE my_fun3
# G unambiguous variable defined only INSIDE my_fun3
# H is undeclared and doesn't exist at all
# I is undeclared (though I is also base function)
# v defined only INSIDE (v is also a base function)
my_fun3 <- function(a,b,c){
A<-1;B<-1;C<-1;G<-1
a+b+A+B+C+D+E+G+H+I+v+ my_fun1(1,2,3)
}
checkStrict1("my_fun3",show_global_functions = TRUE ,show_ambiguous = TRUE , show_inexistent = TRUE)
# my_fun3 : E
# my_fun3 Ambiguous : D
# my_fun3 Inexistent : H
# my_fun3 Global functions : my_fun1
I chose to show only inexistent by default out of the 3 optional additions. You can change it easily in the function definition.
checkStrictAll
Get a diagnostic of all your potentially problematic functions, with the same parameters.
checkStrictAll()
my_fun2 : B
my_fun3 : E
my_fun3 Inexistent : H
sources
strict <- function(f1,...){
function_text <- deparse(f1)
function_text <- paste(function_text[1],function_text[2],paste(function_text[c(-1,-2,-length(function_text))],collapse=";"),"}",collapse="")
strict0 <- function(f1, pos=2) eval(substitute(f1), as.environment(pos))
f1 <- eval(parse(text=paste0("strict0(",function_text,")")))
do.call(f1,list(...))
}
checkStrict1 <- function(f_str,exceptions = NULL,n_char = nchar(f_str),show_global_functions = FALSE,show_ambiguous = FALSE, show_inexistent = TRUE){
functions <- c(lsf.str(envir=globalenv()))
f <- try(eval(parse(text=f_str)),silent=TRUE)
if(inherits(f, "try-error")) {return(NULL)}
vars <- codetools::findGlobals(f)
vars <- vars[!vars %in% exceptions]
global_functions <- vars %in% functions
in_global_env <- vapply(vars, exists, logical(1), envir=globalenv())
in_local_env <- vapply(vars, exists, logical(1), envir=as.environment(2))
in_global_env_but_not_function <- rep(FALSE,length(vars))
for (my_mode in c("logical", "integer", "double", "complex", "character", "raw","list", "NULL")){
in_global_env_but_not_function <- in_global_env_but_not_function | vapply(vars, exists, logical(1), envir=globalenv(),mode = my_mode)
}
found <- in_global_env_but_not_function & !in_local_env
ambiguous <- in_global_env_but_not_function & in_local_env
inexistent <- (!in_local_env) & (!in_global_env)
if(typeof(f)=="closure"){
if(any(found)) {cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),":", paste(names(found)[found], collapse=', '),"\n"))}
if(show_ambiguous & any(ambiguous)) {cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),"Ambiguous :", paste(names(found)[ambiguous], collapse=', '),"\n"))}
if(show_inexistent & any(inexistent)) {cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),"Inexistent :", paste(names(found)[inexistent], collapse=', '),"\n"))}
if(show_global_functions & any(global_functions)){cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),"Global functions :", paste(names(found)[global_functions], collapse=', '),"\n"))}
return(invisible(FALSE))
} else {return(invisible(TRUE))}
}
checkStrictAll <- function(exceptions = NULL,show_global_functions = FALSE,show_ambiguous = FALSE, show_inexistent = TRUE){
functions <- c(lsf.str(envir=globalenv()))
n_char <- max(nchar(functions))
invisible(sapply(functions,checkStrict1,exceptions,n_char = n_char,show_global_functions,show_ambiguous, show_inexistent))
}
What works for me, based on #c-urchin 's answer, is to define a script which reads all my functions and then excludes the global environment:
filenames <- Sys.glob('fun/*.R')
for (filename in filenames) {
source(filename, local=T)
funname <- sub('^fun/(.*).R$', "\\1", filename)
eval(parse(text=paste('environment(',funname,') <- parent.env(globalenv())',sep='')))
}
I assume that
all functions and nothing else are contained in the relative directory ./fun and
every .R file contains exactly one function with an identical name as the file.
The catch is that if one of my functions calls another one of my functions, then the outer function has to also call this script first, and it is essential to call it with local=T:
source('readfun.R', local=T)
assuming of course that the script file is called readfun.R.
Is it possible to add an on.exit expr to the parent call? If so, how?
For example, say that parentOnExit(expr) is a function implementing this. Then for the following code:
f <- function() {
parentOnExit(print("B"))
print("A")
}
I want to see "A" printed, then "B".
Background: What brought this to mind was the following... we have a collection of functions, some of which call others, which require a resource that should be shared from the topmost call down and which also should be closed upon exiting the topmost function. Eg, a connection to a remote server which is expensive to open. One pattern for this is:
foo <- function(r=NULL) {
if (is.null(r)) { # If we weren't passed open connection, open one
r <- openR()
on.exit(close(r))
}
bar(r=r) # Pass the open connection down
}
I was hoping to abstract those three lines down to:
r <- openIfNull(r) # Magically call on.exit(close(r)) in scope of caller
Now that I think about it though, perhaps it's worth some repeated code to avoid anything too magical. But still I'm curious about the answer to my original question. Thank you!
I have seen in this recent mail discussion (https://stat.ethz.ch/pipermail/r-devel/2013-November/067874.html) that you can use do.call for this:
f <- function() { do.call("on.exit", list(quote(cat('ONEXIT!\n'))), envir = parent.frame()); 42 }
g <- function() { x <- f(); cat('Not yet!\n'); x }
g()
#Not yet!
#ONEXIT!
#[1] 42
Using this feature and an additional ugly trick to pass the R connection object to the caller environment, it seems to solve the problem:
openR <- function(id = "connection1") {
message('openR():', id)
list(id)
}
closeR <- function(r) {
message('closeR():', r[[1]])
}
openRIfNull <- function(r) {
if (length(r)) return(r)
# create the connection
r <- openR("openRIfNull")
# save it in the parent call environment
parent_env <- parent.frame()
assign("..openRIfNull_r_connection..", r, envir = parent_env)
do.call("on.exit", list(quote(closeR(..openRIfNull_r_connection..))), envir = parent_env)
r
}
foo <- function(r = NULL) {
message('entered foo()')
r <- openRIfNull(r)
bar(r = r) # Pass the open connection down
message('exited foo()')
}
bar <- function(r) {
message('bar()')
}
example use:
foo()
# entered foo()
# openR():openRIfNull
# bar()
# exited foo()
# closeR():openRIfNull
foo(openR('before'))
# entered foo()
# openR():before
# bar()
# exited foo()
I was intrigued by the problem and tried a couple of ways to solve it. Unfortunately, they didn't work. I'm therefore inclined to believe that it can't be done. ...But someone else might be able to prove me wrong!
Anyway, I though I'd post my failed attempts so that they are recorded. I made them so that they would print "ONEXIT!" after "Not yet!" if they worked...
1 - First, simply try to evaluate the on.exit in the parent environment:
f <- function() { eval(on.exit(cat('ONEXIT!\n')), parent.frame()); 42 }
g <- function() { x<-f(); cat('Not yet!\n'); x }
g() # Nope, doesn't work!
This doesn't work, probably because the on.exit function adds stuff to the current stack frame, not the current environment.
2 - Step up the game and try to return an expression that is evaluated by the caller:
f <- function() { quote( {on.exit(cat('ONEXIT!\n')); 42}) }
g <- function() { x<-eval(f()); cat('Not yet!\n'); x }
g() # Nope, doesn't work!
This doesn't work either, probably because eval has its own stack frame, different from g.
3 - Bring my A-game, and try to rely on lazy evaluation:
h <- function(x) sys.frame(sys.nframe())
f <- function() { h({cat('Registering\n');on.exit(cat("ONEXIT!\n"));42}) }
g <- function() { x<-f()$x; cat('Not yet!\n'); x }
g() # Worse, "ONEXIT!" is never printed...
This one returns an environment to the caller, and when the caller accesses "x" in it, the expression including on.exit is evaluated. ...But it seems on.exit does not register at all in this case.
4 - Hmm. There is one way that might still work: a .Call to some C code that calls on.exit. It might be that calling C won't add another stack frame... This is a bit too complex for me to test now, but maybe some RAPI/RCpp guru could give it a shot?
I remain confused, but if Tommy can't do it, I suspect I won't be able to either. This does the first task and since it seemed so simple I thought I must be missing something:
f <- function() {
on.exit(print("B"))
print("A")
}
Second effort:
txtB <- textConnection("test b")
txt <-textConnection("test A")
f <- function(con) { df <- read.table(con);
if( isOpen(txtB)){ print("B open")
eval( close(txtB), env=.GlobalEnv ) }
return(df) }
txtB #just to make sure it's still open
# description class mode text
# "\"test b\"" "textConnection" "r" "text"
# opened can read can write
# "opened" "yes" "no"
dat <- f(txt); dat
#[1] "B open"
# V1 V2
#1 test A
txtB
#Error in summary.connection(x) : invalid connection
(OK, I edited it to close a connection within the calling environment.)
So what am I missing? (It wasn't clear to me as I tested this that connections actually have environments.)
Though this question is quite old, there is a simple fix for any future visitors:
Use add=TRUE (I don't find the documentation very clear.)
f <- function() {
on.exit(expr = print("B"),
add = TRUE)
print("A")
}
A another solution is using withr::defer() which has more options and better documentation.
The vignette is especially helpful.