eval: why does enclos = parent.frame() make a difference? - r

In this question, the following throws an error:
subset2 = function(df, condition) {
condition_call = eval(substitute(condition),df )
df[condition_call,]
}
df = data.frame(a = 1:10, b = 2:11)
condition = 3
subset2(df, a < condition)
## Error in eval(substitute(condition), df) : object 'a' not found
Josh and Jason from the original question did a great job explaining why this is. What I don't get is why supplying the enclos argument to eval apparently fixes it.
subset3 = function(df, condition) {
condition_call = eval(substitute(condition), envir = df, enclos = parent.frame())
df[condition_call, ]
}
subset3(df, a < condition)
## a b
## 1 1 2
## 2 2 3
I understand that skipping the function environment means R is no longer trying to evaluate the promise, and instead grabs the condition object from the global environment.
But I think supplying enclos = parent.frame() should not make a difference. From ?eval on the enclos argument:
Specifies the enclosure, i.e., where R looks for objects not found in envir.
But if not provided, it defaults to:
enclos = if(is.list(envir) || is.pairlist(envir)) parent.frame() else baseenv())
which, in my mind, should resolve to parent.frame() anyway, because surely, df satisfies the is.list() check.
This means that as long as some object data returns TRUE on is.list(), the behavior of eval(expr, envir = data) and eval(expr, envir = data, enclos = parent.frame()) should be identical. But as evidenced by the above, it isn't.
What am I missing?
EDIT: Thanks to SmokeyShakers who pointed out the difference between default and user-supplied arguments regarding the time of evaluation. I think this is actually already expressed here: https://stackoverflow.com/a/15505111/2416535
It might make sense to keep this one alive though, as it touches eval() specifically (the other does not), and it is not trivial to realize what the generalized question should be until one has the answer.

So, different parents. In the example that doesn't work parent.frame is looking up from inside eval into the internal environment of subset2. In the working example, parent.frame is looking up from inside subset3, likely to your global where your df sits.
Example:
tester <- function() {
print(parent.frame())
}
tester() # Global
(function() {
tester()
})() # Anonymous function's internal env

IT is getting confused by the double use of condition. Change one of them to cond:
cond <- 3
subset2(df, a < cond)
## a b
## 1 1 2
## 2 2 3
Note that even though this works it won't work if you put subset2 into a function since it will look into the global environment where subset2 was defined rather than looking inside the f execution frame and subset3 will be needed.
if (exists("cond")) rm(cond)
f <- function() {
cond <- 3
subset2(df, a < cond)
}
f() # error

Related

R: cannot make use of `data.table`'s environment, base environment and global environment all at once

Consider the dummy example below: I want to run a model on a range of subsets of the data.table in a loop, and want to specify the exact line to iterate as a string (with an iterator i)
library(data.table)
DT <- data.table(X = runif(100), Y = runif(100))
f1 <- function(code) {
for (i in c(20,30,50)) {
eval(parse(text = code))
}
}
f1("lm(X ~ Y, data = DT[sample(.N, i)])")
Obviously this doesn't return any output as lm() is merely evaluated in the background 3 times. The actual use case is more convoluted, but this is meant to be a theoretical simplification of it.
The example above, nonetheless, works fine. The problems begin when the function f1 is included in the package, instead of being defined in the global environment. If I'm not mistaken, in this case f1 is defined in the package's base env. Then, calling f1 from global env gives the error: Error in [.data.frame(x, i) : undefined columns selected. R can correctly access iterator i in its base env and DT in the global env, but cannot access the column by name inside data.table's square brackets.
I tried experimenting by setting envir and enclos arguments to eval() to baseenv(), globalenv(), parent.frame(), but haven't managed to find a combination that works.
For example, setting envir = globalenv() seems to result in accessing DT and i, but not X and Y from the DT inside lm(). Setting envir = baseenv() we lose the global env and cannot access DT (envir = baseenv(), enclos = globalenv() doesn't change it). Using envir = list(baseenv(), globalenv()) results in not being able to access anything inside data.table's square brackets, I think, error message: "Error in [.data.frame(x, i) : undefined columns selected".
The problem is that variables are resolved lexicographically. You could try passing in the expression and the substituting the value of i specifically before evaluating. This would take care of eliminating the need for explicit parsing.
f1 <- function(code) {
code <- substitute(code)
for (i in c(20,30,50)) {
cmd <- do.call("substitute", list(code, list(i=i)))
print(cmd)
result <- eval.parent(cmd)
print(result)
}
}
f1(lm(X ~ Y, data = DT[sample(.N, i)]))

force the evaluation of .SD in data.table

In order to debug j in data.table I prefer to interactively inspect the resulting -by- dt´s with browser(). SO 2013 adressed this issue and I understand that .SD must be invoked in j in order for all columns to be evaluated. I use Rstudio and using the SO 2013 method, there are two problems:
The environment pane is not updated reflecting the browser environment
I often encounter the following error msg
Error: option error has NULL value
In addition: Warning message:
In get(object, envir = currentEnv, inherits = TRUE) :
restarting interrupted promise evaluation
I can get around this by doing:
f <- function(sd=force(.SD),.env = parent.frame(n = 1)) {
by = .env$.BY;
i = .env$.I;
sd = .env$.SD;
grp = .env$.GRP;
N = .env$.N;
browser()
}
library (data.table)
setDT(copy(mtcars))[,f(.SD),by=.(gear)]
But - in the data.table spirit of keeping things short and sweet- can I somehow force (the force in f does not work) the evaluation of .SD in the call to f so that the final code could run:
setDT(copy(mtcars))[,f(),by=.(gear)]
As far as I know,
data.table needs to explicitly see .SD somewhere in the code passed to j,
otherwise it won't even expose it in the environment it creates for the execution.
See for example this question and its comments.
Why don't you create a different helper function that always specifies .SD in j?
Something like:
dt_debugger <- function(dt, ...) {
f <- function(..., .caller_env = parent.frame()) {
by <- .caller_env$.BY;
i <- .caller_env$.I;
sd <- .caller_env$.SD;
grp <- .caller_env$.GRP;
N <- .caller_env$.N;
browser()
}
dt[..., j = f(.SD)]
}
dt_debugger(as.data.table(mtcars), by = .(gear))

How to achieve block scoping in R?

I am thinking of ways to achieve block scoping in R. This would be nice for keeping a clean workspace in data science notebooks/interactive sessions. At the moment I am using an IIFE pattern like so
(function(){
temp1 <- ...
temp2 <- ...
temp3 <- ...
data <<- fn(temp1, temp2, temp3)
})()
This way I can create/update data and let the temporary be cleaned up after me. Obviously it still has side-effects with regards to potentially assigning to global, but for data analysis and not software packages I'm not concerned.
Until IIFE becomes more popular in R I thought it'd be neat to have a special operator for this, but I don't know enough about R metaprogramming. In my naive head the following should have been sufficient
`%gets%` <- function(x, val) {
val <- local(val)
assign(deparse(substitute(x)), val, envir = parent.frame())
}
x1 %gets% {
x = 10;
x + 5
}
But x still get dumped out to my global scope. So
Is this a reasonable implementation for simulating block scoping?
If so, how can I prevent my x from escaping to the outside scope?
1) local First note that this works:
if (exists("x")) rm(x) # just for reproducibility. Don't need this normally.
x1 <- local({ x <- 10; x + 5})
x1
## [1] 15
x
## Error: object 'x' not found
2) %gets% To implement %gets% we can use substitute like this:
`%gets%` <- function(.x, .value) {
assign(deparse(substitute(.x)), eval.parent(substitute(local(.value))), parent.frame())
}
x1 %gets% {
x = 10;
x + 5
}
x1
## [1] 15
x
## Error: object 'x' not found
2a) := We can make this even nicer by defining := like this:
`:=` <- `%gets%`
# test
x1 := { x <- 10; x + 5}
x1
## [1] 15
x
## Error: object 'x' not found
3) pipes Also piping can be used to avoid globals. Here x and y do not persist after the pipe completes.
library(magrittr)
list(x = 6) %$% { y <- 1; x + y + 5 }
## [1] 12
x
## Error: object 'x' not found
y
## Error: object 'y' not found
or if we have nothing to pass:
x1 <- list() %>% { x <- 10; x + 5 }
x1
## [1] 15
x
## Error: object 'x' not found
or we could use 0 to save keystrokes:
x1 <- 0 %>% { x <- 10; x + 5 }
Update Have revised (2) to simplify and correct it. Also Added (2a) and (3).
local does what you want (and IIFE is a hack in JavaScript to work around the lack of a local-like functionality).
Your %gets% code fails because you’re misunderstanding how arguments are evaluated: in your function, val is an argument. This means that it is evaluated in the caller’s scope, no exceptions. Wrapping it in local simply means that the result of evaluating val is wrapped in local — i.e. meaningless in this case. It does not mean that the expression is evaluated locally; if that were the case, you wouldn’t need local at all, you could just evaluate it in the function’s scope.
You can do that, if you want, by using eval:
`%gets%` = function (x, expr) {
assign(
as.character(substitute(x)),
eval(substitute(expr)),
parent.frame()
)
}
… but that won’t be very useful, since it cannot access variables of the caller’s scope; rather, you’d have to evaluate it in a scope that injects the caller’s scope, so that you have a “clean” environment yet can access existing variables:
`%gets%` = function (x, expr) {
parent = parent.frame()
assign(
as.character(substitute(x)),
eval.parent(substitute(eval(quote(expr), new.env(parent = parent)))),
parent
)
}
… but this is essentially just a convoluted way of redefining local assignment.

Why must local({...}) be defined using two rounds of expression quoting?

I'm trying to understand how R's local function is working. With it, you can open a temporary local scope, which means what happens in local (most notably, variable definitions), stays in local. Only the last value of the block is returned to the outside world. So:
x <- local({
a <- 2
a * 2
})
x
## [1] 4
a
## Error: object 'a' not found
local is defined like this:
local <- function(expr, envir = new.env()){
eval.parent(substitute(eval(quote(expr), envir)))
}
As I understand it, two rounds of expression quoting and subsequent evaluation happen:
eval(quote([whatever expr input]), [whatever envir input]) is generated as an unevaluated call by substitute.
The call is evaluated in local's caller frame (which is in our case, the Global Environment), so
[whatever expr input] is evaluated in [whatever envir input]
However, I do not understand why step 2 is nessecary. Why can't I simply define local like this:
local2 <- function(expr, envir = new.env()){
eval(quote(expr), envir)
}
I would think it evaluates the expression expr in an empty environment? So any variable defined in expr should exist in envir and therefore vanish after the end of local2?
However, if I try this, I get:
x <- local2({
a <- 2
a * 2
})
x
## [1] 4
a
## [1] 2
So a leaks to the Global Environment. Why is this?
EDIT: Even more mysterious: Why does it not happen for:
eval(quote({a <- 2; a*2}), new.env())
## [1] 4
a
## Error: object 'a' not found
Parameters to R functions are passed as promises. They are not evaluated unless the value is specifically requested. So look at
# clean up first
if exists("a") rm(a)
f <- function(x) print(1)
f(a<-1)
# [1] 1
a
# Error: object 'a' not found
g <- function(x) print(x)
g(a<-1)
# [1] 1
a
# [1] 1
Note that in the g() function, we are using the value passed to the function which is that assignment to a so that creates a in the global environment. With f(), that variable is never created because that function parameter remained a promise end was never evaluated.
If you want to access a parameter without evaluating it, you need to use something like match.call() or subsititute(). The local() function does the latter.
If you remove the eval.parent(), you'll see that the substitute puts the un-evaluated expression from the parameter into a new call to eval().
h <- function(expr, envir = new.env()){
substitute(eval(quote(expr), envir))
}
h(a<-1)
# eval(quote(a <- 1), new.env())
Where as if you do
j<- function(x) {
quote(x)
}
j(a<-1)
# x
you are not really creating a new function call. Further more when you eval() that expression, you are triggering the evaluation of x from it's original calling environment (triggering the evaluation of the promise), not evaluating the expression in a new environment.
local() then uses the eval.parent() so that you can use existing variables in the environment within your block. For example
b<-5
local({
a <- b
a * 2
})
# [1] 10
Look at the behaviors here
local2 <- function(expr, envir = new.env()){
eval(quote(expr), envir)
}
local2({a<-5; a})
# [1] 5
local2({a<-5; a}, list(a=100, expr="hello"))
# [1] "hello"
See how when we use a non-empty environment, the eval() is looking up expr in the environment, it's not evaluating your code block in the environment.

How to prevent namespace pollution in R [duplicate]

This is probably not correct terminology, but hopefully I can get my point across.
I frequently end up doing something like:
myVar = 1
f <- function(myvar) { return(myVar); }
# f(2) = 1 now
R happily uses the variable outside of the function's scope, which leaves me scratching my head, wondering how I could possibly be getting the results I am.
Is there any option which says "force me to only use variables which have previously been assigned values in this function's scope"? Perl's use strict does something like this, for example. But I don't know that R has an equivalent of my.
EDIT: Thank you, I am aware of that I capitalized them differently. Indeed, the example was created specifically to illustrate this problem!
I want to know if there is a way that R can automatically warn me when I do this.
EDIT 2: Also, if Rkward or another IDE offers this functionality I'd like to know that too.
As far as I know, R does not provide a "use strict" mode. So you are left with two options:
1 - Ensure all your "strict" functions don't have globalenv as environment. You could define a nice wrapper function for this, but the simplest is to call local:
# Use "local" directly to control the function environment
f <- local( function(myvar) { return(myVar); }, as.environment(2))
f(3) # Error in f(3) : object 'myVar' not found
# Create a wrapper function "strict" to do it for you...
strict <- function(f, pos=2) eval(substitute(f), as.environment(pos))
f <- strict( function(myvar) { return(myVar); } )
f(3) # Error in f(3) : object 'myVar' not found
2 - Do a code analysis that warns you of "bad" usage.
Here's a function checkStrict that hopefully does what you want. It uses the excellent codetools package.
# Checks a function for use of global variables
# Returns TRUE if ok, FALSE if globals were found.
checkStrict <- function(f, silent=FALSE) {
vars <- codetools::findGlobals(f)
found <- !vapply(vars, exists, logical(1), envir=as.environment(2))
if (!silent && any(found)) {
warning("global variables used: ", paste(names(found)[found], collapse=', '))
return(invisible(FALSE))
}
!any(found)
}
And trying it out:
> myVar = 1
> f <- function(myvar) { return(myVar); }
> checkStrict(f)
Warning message:
In checkStrict(f) : global variables used: myVar
checkUsage in the codetools package is helpful, but doesn't get you all the way there.
In a clean session where myVar is not defined,
f <- function(myvar) { return(myVar); }
codetools::checkUsage(f)
gives
<anonymous>: no visible binding for global variable ‘myVar’
but once you define myVar, checkUsage is happy.
See ?codetools in the codetools package: it's possible that something there is useful:
> findGlobals(f)
[1] "{" "myVar" "return"
> findLocals(f)
character(0)
You need to fix the typo: myvar != myVar. Then it will all work...
Scope resolution is 'from the inside out' starting from the current one, then the enclosing and so on.
Edit Now that you clarified your question, look at the package codetools (which is part of the R Base set):
R> library(codetools)
R> f <- function(myVAR) { return(myvar) }
R> checkUsage(f)
<anonymous>: no visible binding for global variable 'myvar'
R>
Using get(x, inherits=FALSE) will force local scope.
myVar = 1
f2 <- function(myvar) get("myVar", inherits=FALSE)
f3 <- function(myvar){
myVar <- myvar
get("myVar", inherits=FALSE)
}
output:
> f2(8)
Error in get("myVar", inherits = FALSE) : object 'myVar' not found
> f3(8)
[1] 8
You are of course doing it wrong. Don't expect static code checking tools to find all your mistakes. Check your code with tests. And more tests. Any decent test written to run in a clean environment will spot this kind of mistake. Write tests for your functions, and use them. Look at the glory that is the testthat package on CRAN.
There is a new package modules on CRAN which addresses this common issue (see the vignette here). With modules, the function raises an error instead of silently returning the wrong result.
# without modules
myVar <- 1
f <- function(myvar) { return(myVar) }
f(2)
[1] 1
# with modules
library(modules)
m <- module({
f <- function(myvar) { return(myVar) }
})
m$f(2)
Error in m$f(2) : object 'myVar' not found
This is the first time I use it. It seems to be straightforward so I might include it in my regular workflow to prevent time consuming mishaps.
you can dynamically change the environment tree like this:
a <- 1
f <- function(){
b <- 1
print(b)
print(a)
}
environment(f) <- new.env(parent = baseenv())
f()
Inside f, b can be found, while a cannot.
But probably it will do more harm than good.
You can test to see if the variable is defined locally:
myVar = 1
f <- function(myvar) {
if( exists('myVar', environment(), inherits = FALSE) ) return( myVar) else cat("myVar was not found locally\n")
}
> f(2)
myVar was not found locally
But I find it very artificial if the only thing you are trying to do is to protect yourself from spelling mistakes.
The exists function searches for the variable name in the particular environment. inherits = FALSE tells it not to look into the enclosing frames.
environment(fun) = parent.env(environment(fun))
will remove the 'workspace' from your search path, leave everything else. This is probably closest to what you want.
#Tommy gave a very good answer and I used it to create 3 functions that I think are more convenient in practice.
strict
to make a function strict, you just have to call
strict(f,x,y)
instead of
f(x,y)
example:
my_fun1 <- function(a,b,c){a+b+c}
my_fun2 <- function(a,b,c){a+B+c}
B <- 1
my_fun1(1,2,3) # 6
strict(my_fun1,1,2,3) # 6
my_fun2(1,2,3) # 5
strict(my_fun2,1,2,3) # Error in (function (a, b, c) : object 'B' not found
checkStrict1
To get a diagnosis, execute checkStrict1(f) with optional Boolean parameters to show more ore less.
checkStrict1("my_fun1") # nothing
checkStrict1("my_fun2") # my_fun2 : B
A more complicated case:
A <- 1 # unambiguous variable defined OUTSIDE AND INSIDE my_fun3
# B unambiguous variable defined only INSIDE my_fun3
C <- 1 # defined OUTSIDE AND INSIDE with ambiguous name (C is also a base function)
D <- 1 # defined only OUTSIDE my_fun3 (D is also a base function)
E <- 1 # unambiguous variable defined only OUTSIDE my_fun3
# G unambiguous variable defined only INSIDE my_fun3
# H is undeclared and doesn't exist at all
# I is undeclared (though I is also base function)
# v defined only INSIDE (v is also a base function)
my_fun3 <- function(a,b,c){
A<-1;B<-1;C<-1;G<-1
a+b+A+B+C+D+E+G+H+I+v+ my_fun1(1,2,3)
}
checkStrict1("my_fun3",show_global_functions = TRUE ,show_ambiguous = TRUE , show_inexistent = TRUE)
# my_fun3 : E
# my_fun3 Ambiguous : D
# my_fun3 Inexistent : H
# my_fun3 Global functions : my_fun1
I chose to show only inexistent by default out of the 3 optional additions. You can change it easily in the function definition.
checkStrictAll
Get a diagnostic of all your potentially problematic functions, with the same parameters.
checkStrictAll()
my_fun2 : B
my_fun3 : E
my_fun3 Inexistent : H
sources
strict <- function(f1,...){
function_text <- deparse(f1)
function_text <- paste(function_text[1],function_text[2],paste(function_text[c(-1,-2,-length(function_text))],collapse=";"),"}",collapse="")
strict0 <- function(f1, pos=2) eval(substitute(f1), as.environment(pos))
f1 <- eval(parse(text=paste0("strict0(",function_text,")")))
do.call(f1,list(...))
}
checkStrict1 <- function(f_str,exceptions = NULL,n_char = nchar(f_str),show_global_functions = FALSE,show_ambiguous = FALSE, show_inexistent = TRUE){
functions <- c(lsf.str(envir=globalenv()))
f <- try(eval(parse(text=f_str)),silent=TRUE)
if(inherits(f, "try-error")) {return(NULL)}
vars <- codetools::findGlobals(f)
vars <- vars[!vars %in% exceptions]
global_functions <- vars %in% functions
in_global_env <- vapply(vars, exists, logical(1), envir=globalenv())
in_local_env <- vapply(vars, exists, logical(1), envir=as.environment(2))
in_global_env_but_not_function <- rep(FALSE,length(vars))
for (my_mode in c("logical", "integer", "double", "complex", "character", "raw","list", "NULL")){
in_global_env_but_not_function <- in_global_env_but_not_function | vapply(vars, exists, logical(1), envir=globalenv(),mode = my_mode)
}
found <- in_global_env_but_not_function & !in_local_env
ambiguous <- in_global_env_but_not_function & in_local_env
inexistent <- (!in_local_env) & (!in_global_env)
if(typeof(f)=="closure"){
if(any(found)) {cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),":", paste(names(found)[found], collapse=', '),"\n"))}
if(show_ambiguous & any(ambiguous)) {cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),"Ambiguous :", paste(names(found)[ambiguous], collapse=', '),"\n"))}
if(show_inexistent & any(inexistent)) {cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),"Inexistent :", paste(names(found)[inexistent], collapse=', '),"\n"))}
if(show_global_functions & any(global_functions)){cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),"Global functions :", paste(names(found)[global_functions], collapse=', '),"\n"))}
return(invisible(FALSE))
} else {return(invisible(TRUE))}
}
checkStrictAll <- function(exceptions = NULL,show_global_functions = FALSE,show_ambiguous = FALSE, show_inexistent = TRUE){
functions <- c(lsf.str(envir=globalenv()))
n_char <- max(nchar(functions))
invisible(sapply(functions,checkStrict1,exceptions,n_char = n_char,show_global_functions,show_ambiguous, show_inexistent))
}
What works for me, based on #c-urchin 's answer, is to define a script which reads all my functions and then excludes the global environment:
filenames <- Sys.glob('fun/*.R')
for (filename in filenames) {
source(filename, local=T)
funname <- sub('^fun/(.*).R$', "\\1", filename)
eval(parse(text=paste('environment(',funname,') <- parent.env(globalenv())',sep='')))
}
I assume that
all functions and nothing else are contained in the relative directory ./fun and
every .R file contains exactly one function with an identical name as the file.
The catch is that if one of my functions calls another one of my functions, then the outer function has to also call this script first, and it is essential to call it with local=T:
source('readfun.R', local=T)
assuming of course that the script file is called readfun.R.

Resources