How to achieve block scoping in R? - r

I am thinking of ways to achieve block scoping in R. This would be nice for keeping a clean workspace in data science notebooks/interactive sessions. At the moment I am using an IIFE pattern like so
(function(){
temp1 <- ...
temp2 <- ...
temp3 <- ...
data <<- fn(temp1, temp2, temp3)
})()
This way I can create/update data and let the temporary be cleaned up after me. Obviously it still has side-effects with regards to potentially assigning to global, but for data analysis and not software packages I'm not concerned.
Until IIFE becomes more popular in R I thought it'd be neat to have a special operator for this, but I don't know enough about R metaprogramming. In my naive head the following should have been sufficient
`%gets%` <- function(x, val) {
val <- local(val)
assign(deparse(substitute(x)), val, envir = parent.frame())
}
x1 %gets% {
x = 10;
x + 5
}
But x still get dumped out to my global scope. So
Is this a reasonable implementation for simulating block scoping?
If so, how can I prevent my x from escaping to the outside scope?

1) local First note that this works:
if (exists("x")) rm(x) # just for reproducibility. Don't need this normally.
x1 <- local({ x <- 10; x + 5})
x1
## [1] 15
x
## Error: object 'x' not found
2) %gets% To implement %gets% we can use substitute like this:
`%gets%` <- function(.x, .value) {
assign(deparse(substitute(.x)), eval.parent(substitute(local(.value))), parent.frame())
}
x1 %gets% {
x = 10;
x + 5
}
x1
## [1] 15
x
## Error: object 'x' not found
2a) := We can make this even nicer by defining := like this:
`:=` <- `%gets%`
# test
x1 := { x <- 10; x + 5}
x1
## [1] 15
x
## Error: object 'x' not found
3) pipes Also piping can be used to avoid globals. Here x and y do not persist after the pipe completes.
library(magrittr)
list(x = 6) %$% { y <- 1; x + y + 5 }
## [1] 12
x
## Error: object 'x' not found
y
## Error: object 'y' not found
or if we have nothing to pass:
x1 <- list() %>% { x <- 10; x + 5 }
x1
## [1] 15
x
## Error: object 'x' not found
or we could use 0 to save keystrokes:
x1 <- 0 %>% { x <- 10; x + 5 }
Update Have revised (2) to simplify and correct it. Also Added (2a) and (3).

local does what you want (and IIFE is a hack in JavaScript to work around the lack of a local-like functionality).
Your %gets% code fails because you’re misunderstanding how arguments are evaluated: in your function, val is an argument. This means that it is evaluated in the caller’s scope, no exceptions. Wrapping it in local simply means that the result of evaluating val is wrapped in local — i.e. meaningless in this case. It does not mean that the expression is evaluated locally; if that were the case, you wouldn’t need local at all, you could just evaluate it in the function’s scope.
You can do that, if you want, by using eval:
`%gets%` = function (x, expr) {
assign(
as.character(substitute(x)),
eval(substitute(expr)),
parent.frame()
)
}
… but that won’t be very useful, since it cannot access variables of the caller’s scope; rather, you’d have to evaluate it in a scope that injects the caller’s scope, so that you have a “clean” environment yet can access existing variables:
`%gets%` = function (x, expr) {
parent = parent.frame()
assign(
as.character(substitute(x)),
eval.parent(substitute(eval(quote(expr), new.env(parent = parent)))),
parent
)
}
… but this is essentially just a convoluted way of redefining local assignment.

Related

R function for obtaining a reference to a variable

In Advanced R, environments are advertised as a useful way to get pass-by-reference semantics in R: instead of passing a list, which gets copied, I can pass an environment, which is not. This is useful to know.
But it assumes that whoever is calling my function is happy to agree on an "environment"-based data type, with named slots corresponding to the variables we want to modify.
Hasn't someone made a class which allows me to just refer to a single variable by reference? For example,
v = 1:5
r <- ref(v)
(function() {
getRef(r) # same as v
setRef(r, 1:6) # same as v <<- 1:6, in this case
})()
It would seem to be pretty easy to do this, by storing the character name of v together with the environment where it is bound.
Is there a standard library which accomplishes this semantics, or can someone provide a short snippet of code? (I haven't finished reading "Advanced R"; apologies if this is covered later in the book)
As you have already mentioned in your question, you can store the variable name and its environment and access it with get and assign what will be somehow like a reference to a single variable.
v <- 1:5
r <- list(name="v", env=environment())
(function() {
get(r$name, envir = r$env)
assign(r$name, 1:6, envir = r$env)
})()
v
#[1] 1 2 3 4 5 6
Alternatively you can store the reference to an environment but then you can access everything in this referenced environment.
v <- 1:5
r <- globalenv() #reference to everything in globalenv
(function() {
r$v
r$v <- 1:6
})()
v
#[1] 1 2 3 4 5 6
You can also create an environment with only one variable and make a reference to it.
v <- new.env(parent=emptyenv())
v$v <- 1:5
r <- v
(function() {
r$v
r$v <- 1:6
})()
v$v
#[1] 1 2 3 4 5 6
Implemented as functions using find or set the environment during creation. Have also a look at How to get environment of a variable in R.
ref <- function(name, envir = NULL) {
name <- substitute(name)
if (!is.character(name)) name <- deparse(name)
if(length(envir)==0) envir <- as.environment(find(name))
list(name=name, envir=envir)
}
getRef <- function(r) {
get(r$name, envir = r$envir, inherits = FALSE)
}
setRef <- function(r, x) {
assign(r$name, x, envir = r$envir, inherits = FALSE)
}
x <- 1
r1 <- ref(x) #x from Global Environment
#x from Function Environment
r2 <- (function() {x <- 2; ref(x, environment())})()
#But simply returning x might here be better
r2b <- (function() {x <- 2; x})()
a <- new.env(parent=emptyenv())
a$x <- 3
r3 <- ref(x, a) #x from Environment a
This is based on GKi's answer, thanks to him for stepping up.
It includes pryr::where so you don't have to install the whole library
Note that we need to point "where" to parent.frame() in the definition of "ref"
Added some test cases which I used to check correctness
The code:
# copy/modified from pryr::where
where = function(name, env=parent.frame()) {
if (identical(env, emptyenv())) {
stop("Can't find ", name, call. = FALSE)
}
if (exists(name, env, inherits = FALSE)) {
env
} else {
where(name, parent.env(env))
}
}
ref <- function(v) {
arg <- deparse(substitute(v))
list(name=arg, env=where(arg, env=parent.frame()))
}
getRef <- function(r) {
get(r$name, envir = r$env, inherits = FALSE)
}
setRef <- function(r, x) {
assign(r$name, x, envir = r$env)
}
if(1) { # tests
v <- 1:5
r <- ref(v)
(function() {
stopifnot(identical(getRef(r),1:5))
setRef(r, 1:6)
})()
stopifnot(identical(v,1:6))
# this refers to v in the global environment
v=2; r=(function() {ref(v)})()
stopifnot(getRef(r)==2)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==5)
# same as above
v=2; r=(function() {v <<- 3; ref(v)})()
stopifnot(getRef(r)==3)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==5)
# this creates a local binding first, and refers to that. the
# global binding is unaffected
v=2; r=(function() {v=3; ref(v)})()
stopifnot(getRef(r)==3)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==2)
# additional tests
r=(function() {v=4; (function(v1) { ref(v1) })(v)})()
stopifnot(r$name=="v1")
stopifnot(getRef(r)==4)
setRef(r,5)
stopifnot(getRef(r)==5)
# check that outer v is not modified
v=2; r=(function() {(function(v1) { ref(v1) })(v)})()
stopifnot(getRef(r)==2)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==2)
}
I imagine there may be some garbage collection inefficiency if you're creating a reference to a small variable in a temporary environment with a different large variable, since the reference must retain the whole environment - although the same problem could arise with other uses of lexical scoping.
I will probably use this code next time I need pass-by-reference semantics.

declare a variable without registering it to the environment

In some R script, I use some dummy variable in a for loop.
The variable has no purpose itself, so I don't need it recorded at all.
For instance :
database = read.csv("data/somefile.csv")
for (i in 1:ncol(database)) {
name <- names(database)[i]
if (name %in% some_vector) {
label(database[, .i]) <- some_function(databas$somecolumn)
}
}
In R Studio, the "Global Environement" tab keeps track of variables i and name (and give it the last value it had), although they have no usefulness at all.
Is there any elegant way to declare my value so it is not tracked in the global environment ?
Use local for all your workspace hygiene needs.
foo <- local({
x <- 0
for(i in 1:nrow(mtcars))
x <- x + mtcars$mpg[i]
x
})
foo now contains the result of the calculation, and the temporary variables i and x are discarded.
To hide objects from RStudio's object explorer, you can prefix with . like
.x = 2
Downsides. This still creates .x and keeps it in memory, where it might take up space or accidentally be used again after you've forgotten about it. It also hides from the standard "clear workspace" command rm(list = ls()). See ?ls for a way of handling this.
Aside. Generally, I would not create any variables like this, instead wrapping any operation involving temporary objects in a function as #Aurèle suggested and not leaning too heavily on what RStudio's object browser shows me.
The only case so far where I've used dot-prefixed objects is for interactive use in a function, like:
f = function(x, y, debug.obj = FALSE){
dx = dim(x)
dy = dim(y)
if (!(length(dx) == 2 && length(dy) == 2 && dx[2] == dy[1])){
if (debug.obj){
.debug.f <<- list(dx = dx, dy = dy)
stop("Dims don't match. See .debug.f")
}
stop("Dims don't match.")
}
x %*% y
}
# example usage
f(matrix(1,1,1), matrix(2,2,2), debug.obj = TRUE)
# Error in f(matrix(1, 1, 1), matrix(2, 2, 2), debug.obj = TRUE) :
# Dims don't match. See .debug.f
.debug.f
# $dx
# [1] 1 1
#
# $dy
# [1] 2 2
Even this might be a bad idea, though.

Passing and using an environment in a function

I am working with some large data sets and have constructed a negative log likelihood function and associated gradient to pass to an optimisation routine. Both the functions require a vector of parameters and the passing of the large data sets into them.
The optimisation routine will call the two functions multiple times and the speed at which the two functions execute at is most of the bottleneck in the process. I dont want to pass the data directly to function as I was under the impression that some copying by R may occur.
I have considered:
# some large data sets
a<-1; b<-2
# place the data sets in an environment
varSpace <- new.env()
assign('c', a, envir = varSpace)
assign('d', b, envir = varSpace)
dFunA <- function(x){
x <- x + a+b
x
}
dFunB <- function(x, envir = varSpace){
x <- x + get('c', envir) + get('d', envir)
x
}
dFunC <- function(x, envir = varSpace){
with(envir,{
x <- x + c + d
})
x
}
dFunD <- function(x, envir = varSpace){
attach(envir)
on.exit({detach(envir)})
x <- x + c + d
x
}
> dFunA(1)
[1] 4
> dFunB(1)
[1] 4
> dFunC(1)
Error in eval(expr, envir, enclos) : object 'x' not found
> dFunD(1)
[1] 4
Approach A requires the data sets to be further up the calling stack. It works but I would like a tidier approach.
Approach B requires the use of get and calling the environment where the data has been placed.
Approach C doesnt work .
Approach D appears to work but I am mindful of ?detach which carries the good practice comment Use of attach/detach is best avoided in functions.
Any help and advice would be appreciated.
You don't need to fiddle around with assign, get or attach. Just set the environment for your functions to the one that you've created.
dFunA <- function(x)
x + a + b
varSpace <- new.env()
varSpace$a <- 1
varSpace$b <- 2
environment(dFunA) <- varSpace
... assuming that this is necessary in the first place. As Aaron commented, R is copy-on-write, so unless you're modifying a or b they're not likely to be copied.

How to prevent namespace pollution in R [duplicate]

This is probably not correct terminology, but hopefully I can get my point across.
I frequently end up doing something like:
myVar = 1
f <- function(myvar) { return(myVar); }
# f(2) = 1 now
R happily uses the variable outside of the function's scope, which leaves me scratching my head, wondering how I could possibly be getting the results I am.
Is there any option which says "force me to only use variables which have previously been assigned values in this function's scope"? Perl's use strict does something like this, for example. But I don't know that R has an equivalent of my.
EDIT: Thank you, I am aware of that I capitalized them differently. Indeed, the example was created specifically to illustrate this problem!
I want to know if there is a way that R can automatically warn me when I do this.
EDIT 2: Also, if Rkward or another IDE offers this functionality I'd like to know that too.
As far as I know, R does not provide a "use strict" mode. So you are left with two options:
1 - Ensure all your "strict" functions don't have globalenv as environment. You could define a nice wrapper function for this, but the simplest is to call local:
# Use "local" directly to control the function environment
f <- local( function(myvar) { return(myVar); }, as.environment(2))
f(3) # Error in f(3) : object 'myVar' not found
# Create a wrapper function "strict" to do it for you...
strict <- function(f, pos=2) eval(substitute(f), as.environment(pos))
f <- strict( function(myvar) { return(myVar); } )
f(3) # Error in f(3) : object 'myVar' not found
2 - Do a code analysis that warns you of "bad" usage.
Here's a function checkStrict that hopefully does what you want. It uses the excellent codetools package.
# Checks a function for use of global variables
# Returns TRUE if ok, FALSE if globals were found.
checkStrict <- function(f, silent=FALSE) {
vars <- codetools::findGlobals(f)
found <- !vapply(vars, exists, logical(1), envir=as.environment(2))
if (!silent && any(found)) {
warning("global variables used: ", paste(names(found)[found], collapse=', '))
return(invisible(FALSE))
}
!any(found)
}
And trying it out:
> myVar = 1
> f <- function(myvar) { return(myVar); }
> checkStrict(f)
Warning message:
In checkStrict(f) : global variables used: myVar
checkUsage in the codetools package is helpful, but doesn't get you all the way there.
In a clean session where myVar is not defined,
f <- function(myvar) { return(myVar); }
codetools::checkUsage(f)
gives
<anonymous>: no visible binding for global variable ‘myVar’
but once you define myVar, checkUsage is happy.
See ?codetools in the codetools package: it's possible that something there is useful:
> findGlobals(f)
[1] "{" "myVar" "return"
> findLocals(f)
character(0)
You need to fix the typo: myvar != myVar. Then it will all work...
Scope resolution is 'from the inside out' starting from the current one, then the enclosing and so on.
Edit Now that you clarified your question, look at the package codetools (which is part of the R Base set):
R> library(codetools)
R> f <- function(myVAR) { return(myvar) }
R> checkUsage(f)
<anonymous>: no visible binding for global variable 'myvar'
R>
Using get(x, inherits=FALSE) will force local scope.
myVar = 1
f2 <- function(myvar) get("myVar", inherits=FALSE)
f3 <- function(myvar){
myVar <- myvar
get("myVar", inherits=FALSE)
}
output:
> f2(8)
Error in get("myVar", inherits = FALSE) : object 'myVar' not found
> f3(8)
[1] 8
You are of course doing it wrong. Don't expect static code checking tools to find all your mistakes. Check your code with tests. And more tests. Any decent test written to run in a clean environment will spot this kind of mistake. Write tests for your functions, and use them. Look at the glory that is the testthat package on CRAN.
There is a new package modules on CRAN which addresses this common issue (see the vignette here). With modules, the function raises an error instead of silently returning the wrong result.
# without modules
myVar <- 1
f <- function(myvar) { return(myVar) }
f(2)
[1] 1
# with modules
library(modules)
m <- module({
f <- function(myvar) { return(myVar) }
})
m$f(2)
Error in m$f(2) : object 'myVar' not found
This is the first time I use it. It seems to be straightforward so I might include it in my regular workflow to prevent time consuming mishaps.
you can dynamically change the environment tree like this:
a <- 1
f <- function(){
b <- 1
print(b)
print(a)
}
environment(f) <- new.env(parent = baseenv())
f()
Inside f, b can be found, while a cannot.
But probably it will do more harm than good.
You can test to see if the variable is defined locally:
myVar = 1
f <- function(myvar) {
if( exists('myVar', environment(), inherits = FALSE) ) return( myVar) else cat("myVar was not found locally\n")
}
> f(2)
myVar was not found locally
But I find it very artificial if the only thing you are trying to do is to protect yourself from spelling mistakes.
The exists function searches for the variable name in the particular environment. inherits = FALSE tells it not to look into the enclosing frames.
environment(fun) = parent.env(environment(fun))
will remove the 'workspace' from your search path, leave everything else. This is probably closest to what you want.
#Tommy gave a very good answer and I used it to create 3 functions that I think are more convenient in practice.
strict
to make a function strict, you just have to call
strict(f,x,y)
instead of
f(x,y)
example:
my_fun1 <- function(a,b,c){a+b+c}
my_fun2 <- function(a,b,c){a+B+c}
B <- 1
my_fun1(1,2,3) # 6
strict(my_fun1,1,2,3) # 6
my_fun2(1,2,3) # 5
strict(my_fun2,1,2,3) # Error in (function (a, b, c) : object 'B' not found
checkStrict1
To get a diagnosis, execute checkStrict1(f) with optional Boolean parameters to show more ore less.
checkStrict1("my_fun1") # nothing
checkStrict1("my_fun2") # my_fun2 : B
A more complicated case:
A <- 1 # unambiguous variable defined OUTSIDE AND INSIDE my_fun3
# B unambiguous variable defined only INSIDE my_fun3
C <- 1 # defined OUTSIDE AND INSIDE with ambiguous name (C is also a base function)
D <- 1 # defined only OUTSIDE my_fun3 (D is also a base function)
E <- 1 # unambiguous variable defined only OUTSIDE my_fun3
# G unambiguous variable defined only INSIDE my_fun3
# H is undeclared and doesn't exist at all
# I is undeclared (though I is also base function)
# v defined only INSIDE (v is also a base function)
my_fun3 <- function(a,b,c){
A<-1;B<-1;C<-1;G<-1
a+b+A+B+C+D+E+G+H+I+v+ my_fun1(1,2,3)
}
checkStrict1("my_fun3",show_global_functions = TRUE ,show_ambiguous = TRUE , show_inexistent = TRUE)
# my_fun3 : E
# my_fun3 Ambiguous : D
# my_fun3 Inexistent : H
# my_fun3 Global functions : my_fun1
I chose to show only inexistent by default out of the 3 optional additions. You can change it easily in the function definition.
checkStrictAll
Get a diagnostic of all your potentially problematic functions, with the same parameters.
checkStrictAll()
my_fun2 : B
my_fun3 : E
my_fun3 Inexistent : H
sources
strict <- function(f1,...){
function_text <- deparse(f1)
function_text <- paste(function_text[1],function_text[2],paste(function_text[c(-1,-2,-length(function_text))],collapse=";"),"}",collapse="")
strict0 <- function(f1, pos=2) eval(substitute(f1), as.environment(pos))
f1 <- eval(parse(text=paste0("strict0(",function_text,")")))
do.call(f1,list(...))
}
checkStrict1 <- function(f_str,exceptions = NULL,n_char = nchar(f_str),show_global_functions = FALSE,show_ambiguous = FALSE, show_inexistent = TRUE){
functions <- c(lsf.str(envir=globalenv()))
f <- try(eval(parse(text=f_str)),silent=TRUE)
if(inherits(f, "try-error")) {return(NULL)}
vars <- codetools::findGlobals(f)
vars <- vars[!vars %in% exceptions]
global_functions <- vars %in% functions
in_global_env <- vapply(vars, exists, logical(1), envir=globalenv())
in_local_env <- vapply(vars, exists, logical(1), envir=as.environment(2))
in_global_env_but_not_function <- rep(FALSE,length(vars))
for (my_mode in c("logical", "integer", "double", "complex", "character", "raw","list", "NULL")){
in_global_env_but_not_function <- in_global_env_but_not_function | vapply(vars, exists, logical(1), envir=globalenv(),mode = my_mode)
}
found <- in_global_env_but_not_function & !in_local_env
ambiguous <- in_global_env_but_not_function & in_local_env
inexistent <- (!in_local_env) & (!in_global_env)
if(typeof(f)=="closure"){
if(any(found)) {cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),":", paste(names(found)[found], collapse=', '),"\n"))}
if(show_ambiguous & any(ambiguous)) {cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),"Ambiguous :", paste(names(found)[ambiguous], collapse=', '),"\n"))}
if(show_inexistent & any(inexistent)) {cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),"Inexistent :", paste(names(found)[inexistent], collapse=', '),"\n"))}
if(show_global_functions & any(global_functions)){cat(paste(f_str,paste(rep(" ",n_char-nchar(f_str)),collapse=""),"Global functions :", paste(names(found)[global_functions], collapse=', '),"\n"))}
return(invisible(FALSE))
} else {return(invisible(TRUE))}
}
checkStrictAll <- function(exceptions = NULL,show_global_functions = FALSE,show_ambiguous = FALSE, show_inexistent = TRUE){
functions <- c(lsf.str(envir=globalenv()))
n_char <- max(nchar(functions))
invisible(sapply(functions,checkStrict1,exceptions,n_char = n_char,show_global_functions,show_ambiguous, show_inexistent))
}
What works for me, based on #c-urchin 's answer, is to define a script which reads all my functions and then excludes the global environment:
filenames <- Sys.glob('fun/*.R')
for (filename in filenames) {
source(filename, local=T)
funname <- sub('^fun/(.*).R$', "\\1", filename)
eval(parse(text=paste('environment(',funname,') <- parent.env(globalenv())',sep='')))
}
I assume that
all functions and nothing else are contained in the relative directory ./fun and
every .R file contains exactly one function with an identical name as the file.
The catch is that if one of my functions calls another one of my functions, then the outer function has to also call this script first, and it is essential to call it with local=T:
source('readfun.R', local=T)
assuming of course that the script file is called readfun.R.

How can I create a custom assignment using a replacement function?

I have defined a function called once as follows:
once <- function(x, value) {
xname <- deparse(substitute(x))
if(!exists(xname)) {
assign(xname, value, env=parent.frame())
}
invisible()
}
The idea is that value is time-consuming to evaluate, and I only want to assign it to x the first time I run a script.
> z
Error: object 'z' not found
> once(z, 3)
> z
[1] 3
I'd really like the usage to be once(x) <- value rather than once(x, value), but if I write a function once<- it gets upset that the variable doesn't exist:
> once(z) <- 3
Error in once(z) <- 3 : object 'z' not found
Does anyone have a way around this?
ps: is there a name to describe functions like once<- or in general f<-?
If you are willing to modify your requirements slightly to use square brackets rather than parentheses then you could do this:
once <- structure(NA, class = "once")
"[<-.once" <- function(once, x, value) {
xname <- deparse(substitute(x))
pf <- parent.frame()
if (!exists(xname, pf)) assign(xname, value, pf)
once
}
# assigns 3 to x (assuming x does not currently exist)
once[x] <- 3
x # 3
# skips assignment (since x now exists)
once[x] <- 4
x # 3
As per item 3.4.4 in the R Language Reference, something like a names replacement is evaluated like this:
`*tmp*` <- x
x <- "names<-"(`*tmp*`, value=c("a","b"))
rm(`*tmp*`)
This is bad news for your requirement, because the assignment will fail on the first line (as x is not found), and even if it would work, your deparse(substitute) call will never evaluate to what you want it to.
Sorry to disappoint you

Resources