Why does rm inside a function not delete objects? - r

rel.mem <- function(nm) {
rm(nm)
}
I defined the above function rel.mem -- takes a single argument and passes it to rm
> ls()
[1] "rel.mem"
> x<-1:10
> ls()
[1] "rel.mem" "x"
> rel.mem(x)
> ls()
[1] "rel.mem" "x"
Now you can see what I call rel.mem x is not deleted -- I know this is due to the incorrect environment on which rm is being attempted.
What is a good fix for this?
Criteria for a good fix:
The caller should not have to pass the environment
The callee (rel.mem) should be able to determine the environment by using an R language facility (call stack inspection, aspects, etc.)
The interface of the function rel.mem should be kept simple -- idiot proof: call rel.mem -- then rel.mem takes it from there -- no need to pass environments.
NOTES:
As many commenters have pointed out that one easy fix is to pass the environment.
What I meant by a good fix [and I should have clarified it] is that the callee function (in this case rel.mem) is able to calculate/find out the environment when the caller was referring to and then remove the object from the right environment.
The type of reasoning in "2" can be done in other languages by inspecting the call stack -- for example in Java I would throw a dummy exception -- catch it and then parse the call stack. In other languages still I could use Aspect Oriented techniques. The question is can something like that be done in R?
As one commenter has suggested that there may be multiple objects with the same name and thus the "right" environment is meaningless -- as I've stated above that in other languages it is possible (sometimes with some creative trickery) to interpret the call-stack -- this may not be possible in R
As one commenter has suggested that rm(list=nm, envir = parent.frame()) will remove this from the parent environment. This is correct -- however I'm looking for something that will work for an arbitrary call depth.

The quick answer is that you're in a different environment - essentially picture the variables in a box: you have a box for the function and one for the Global Environment. You just need to tell rm where to find that box.
So
rel_mem <- function(nm) {
# State the environment
rm(list=nm, envir = .GlobalEnv )
}
x = 10
rel_mem("x")
Alternatively, you can use the pos argument, e.g.
rel_mem <- function(nm) {
rm(list=nm, pos=1 )
}
If you type search() you will see a vector of environments, the global is number 1.
Another two options are
envir = parent.frame() if you want to go one level up the call stack
Use inherits = TRUE to go up the call stack until you find something
In the above code, notice that I'm passing the object as a character - I'm passing the "x" not x. We can be clever and avoid this using the substitute function
rel_mem <- function(nm) {
rm(list = as.character(substitute(nm)), envir = .GlobalEnv )
}
To finish I'll just add that deleting things in the .GlobalEnv from a function is generally a bad idea.
Further resources:
Environments:http://adv-r.had.co.nz/Environments.html
Substitute function: http://adv-r.had.co.nz/Computing-on-the-language.html#capturing-expressions

If you are using another function to find the global objects within your function such as ls(), you must state the environment in it explicitly too:
rel_mem <- function(nm) {
# State the environment in both functions
rm(list = ls(envir = .GlobalEnv) %>% .[startsWith(., "plot_")], envir = .GlobalEnv)
}

Related

Precise specification of function environments

I would like to have the option of specifying very precisely (exactly, if possible) the behavior of functions. For example,
foo <- function() {}
Env <- new.env(parent=emptyenv())
environment(foo) <- Env
foo()
# Error in { : could not find function "{"
Env$`{` <- base::`{`
environment(foo) <- Env
foo()
# NULL
I know I need to set the environment of foo, and that whatever foo depends upon is defined in that environment (or that environment's parent, or the parent's parent, etc.). But
what all must be specified, and
what can't be specified (i.e., in what cases am I stuck with default behavior)?
For example, in the following, why is there no error when the call to bar is attempted, when I leave " undefined (what kind of a thing is ", anyway? Or is it ""?).
bar <- function() {""}
Env <- new.env(parent=emptyenv())
Env$`{` <- base::`{`
environment(bar) <- Env
bar()
# [1] ""
My larger exercise is exploring ways to build very large systems with lots of dependencies, but no circular dependencies. I think I will need precise control over function environments in order to guarantee that there are no cycles. I am aiming for a ball of mud of sorts, where I can, given certain discipline, put all the R code I ever write into a single structure (which might not all be in memory, or even on local storage)—and, whenever I write new code, have the option of depending on some code I wrote in the past, provided that I have access to that ealier code.
Related posts include:
[distinct], [specify], and [environments-enclosures-frames].

Where are function constants stored if a function is created inside another function?

I am using a parent function to generate a child function by returning the function in the parent function call. The purpose of the parent function is to set a constant (y) in the child function. Below is a MWE. When I try to debug the child function I cannot figure out in which environment the variable is stored in.
power=function(y){
return(function(x){return(x^y)})
}
square=power(2)
debug(square)
square(3)
debugging in: square(3)
debug at #2: {
return(x^y)
}
Browse[2]> x
[1] 3
Browse[2]> y
[1] 2
Browse[2]> ls()
[1] "x"
Browse[2]> find('y')
character(0)
If you inspect the type of an R function, you’ll observe the following:
> typeof(square)
[1] "closure"
And that is, in fact, exactly the answer to your question: a closure is a function that carries an environment around.
R also tells you which environment this is (albeit not in a terribly useful way):
> square
function(x){return(x^y)}
<environment: 0x7ffd9218e578>
(The exact number will differ with each run — it’s just a memory address.)
Now, which environment does this correspond to? It corresponds to a local environment that was created when we executed power(2) (a “stack frame”). As the other answer says, it’s now the parent environment of the square function (in fact, in R every function, except for certain builtins, is associated with a parent environment):
> ls(environment(square))
[1] "y"
> environment(square)$y
[1] 2
You can read more about environments in the chapter in Hadley’s Advanced R book.
Incidentally, closures are a core feature of functional programming languages. Another core feature of functional languages is that every expression is a value — and, by implication, a function’s (return) value is the value of its last expression. This means that using the return function in R is both unnecessary and misleading!1 You should therefore leave it out: this results in shorter, more readable code:
power = function (y) {
function (x) x ^ y
}
There’s another R specific subtlety here: since arguments are evaluated lazily, your function definition is error-prone:
> two = 2
> square = power(two)
> two = 10
> square(5)
[1] 9765625
Oops! Subsequent modifications of the variable two are reflected inside square (but only the first time! Further redefinitions won’t change anything). To guard against this, use the force function:
power = function (y) {
force(y)
function (x) x ^ y
}
force simply forces the evaluation of an argument name, nothing more.
1 Misleading, because return is a function in R and carries a slightly different meaning compared to procedural languages: it aborts the current function exectuion.
The variable y is stored in the parent environment of the function. The environment() function returns the current environment, and we use parent.env() to get the parent environment of a particular environment.
ls(envir=parent.env(environment())) #when using the browser
The find() function doesn't seem helpful in this case because it seems to only search objects that have been attached to the global search path (search()). It doesn't try to resolve variable names in the current scope.

Auto assignment

Recently I have been working with a set of R scripts that I inherited from a colleague. It is for me a trusted source but more than once I found in his code auto-assignments like
x <<- x
Is there any scope where such an operation could make sense?
This is a mechanism for copying a value defined within a function into the global environment (or at least, somewhere within the stack of parent of environments): from ?"<<-"
The operators ‘<<-’ and ‘->>’ are normally only used in functions,
and cause a search to be made through parent environments for an
existing definition of the variable being assigned. If such a
variable is found (and its binding is not locked) then its value
is redefined, otherwise assignment takes place in the global
environment.
I don't think it's particularly good practice (R is a mostly-functional language, and it's generally better to avoid function side effects), but it does do something. (#Roland points out in comments and #BrianO'Donnell in his answer [quoting Thomas Lumley] that using <<- is good practice if you're using it to modify a function closure, as in demo(scoping). In my experience it is more often misused to construct global variables than to work cleanly with function closures.)
Consider this example, starting in an empty/clean environment:
f <- function() {
x <- 1 ## assignment
x <<- x ## global assignment
}
Before we call f():
x
## Error: object 'x' not found
Now call f() and try again:
f()
x
## [1] 1
<<-
is a global assignment operator and I would imagine there should hardly ever be a reason to use it because it effectively causes side effects.
The scope to use it would be in any case when one wants to define a global variable or a variable one level up from current environment.
Alan gives a good answer: Use the superassignment operator <<- to write upstairs.
Hadley also gives a good answer: How do you use "<<-" (scoping assignment) in R?.
For details on the 'superassignment' operator see Scope.
Here is some critical information on the operator from the section on Assignment Operators in the R manual:
"The operators <<- and ->> are normally only used in functions, and cause a search to be made through parent environments for an existing definition of the variable being assigned. If such a variable is found (and its binding is not locked) then its value is redefined, otherwise assignment takes place in the global environment."
Thomas Lumley sums it up nicely: "The good use of superassignment is in conjuction with lexical scope, where an environment stores state for a function or set of functions that modify the state by using superassignment."
For example:
x <- NA
test <- function(x) {
x <<- x
}
> test(5)
> x
#[1] 5
That's a simple use here, <<- will do a parent environment search (case of nested functions declarations) and if not found assign in the global environment.
Usually this is a really bad ideaTM as you have no real control on where the variable will be assigned and you have chances it will overwrite a variable used for another purpose somewhere.

lapply() emptied list step by step while processing

First of all, excuse me for the bad title. I'm still so confused about this behavior, that I wasn't able to describe it; however I was able to reproduce it and broke it down to an (goofy) example.
Please, could you be so kind and explain why other.list appears to be full of NULLs after calling lapply()?
some.list <- rep(list(rnorm(1)),33)
other.list <- rep(list(), length = 33)
lapply(seq_along(some.list), function(i, other.list) {
other.list[[i]] <- some.list[[i]]
browser()
}, other.list)
I watched this in debugging mode in RStudio. For certain i, other.list[[i]] gets some.list[[i]] assigned, but it will be NULLed for the next iteration. I want to understand this behavior so bad!
The reason is that the assignment is taking place inside a function, and you've used the normal assignment operator <-, rather than the superassignment operator <<-. When inside a function scope, IOW when a function is executed, the normal assignment operator always assigns to a local variable in the evaluation environment that is created for that particular evaluation of that function (returned by a call to environment() from inside the function with fun=NULL). Thus, your global other.list variable, which is defined in the global environment (returned by globalenv()), will not be touched by such an assignment. The superassignment operator, on the other hand, will follow the closure environment chain (can be followed recursively via parent.env()) back until it finds a variable with the name on the LHS of the assignment, and then it assigns to that. The global environment is always at the base of the closure environment chain. If no such variable is found, the superassignment operator creates one in the global environment.
Thus, if you change <- to <<- in the assignment that takes place inside the function, you will be able to modify the global other.list variable.
See https://stat.ethz.ch/R-manual/R-devel/library/base/html/assignOps.html.
Here, I tried to make a little demo to demonstrate these concepts. In all my assignments, I'm assigning the actual environment that contains the variable being assigned to:
oldGlobal <- environment(); ## environment() is same as globalenv() in global scope
(function() {
newLocal1 <- environment(); ## creates a new local variable in this function evaluation's evaluation environment
print(newLocal1); ## <environment: 0x6014cbca8> (different for every evaluation)
oldGlobal <<- parent.env(environment()); ## target search hits oldGlobal in closure environment; RHS is same as globalenv()
newGlobal1 <<- globalenv(); ## target search fails; creates a new variable in the global environment
(function() {
newLocal2 <- environment(); ## creates a new local variable in this function evaluation's evaluation environment
print(newLocal2); ## <environment: 0x6014d2160> (different for every evaluation)
newLocal1 <<- parent.env(environment()); ## target search hits the existing newLocal1 in closure environment
print(newLocal1); ## same value that was already in newLocal1
oldGlobal <<- parent.env(parent.env(environment())); ## target search hits oldGlobal two closure environments up in the chain; RHS is same as globalenv()
newGlobal2 <<- globalenv(); ## target search fails; creates a new variable in the global environment
})();
})();
oldGlobal; ## <environment: R_GlobalEnv>
newGlobal1; ## <environment: R_GlobalEnv>
newGlobal2; ## <environment: R_GlobalEnv>
I haven't run your code, but two observations:
I usually avoid putting browser() as the last line inside a function because that gets treated as the return value
other.list does not get modified by your lapply. You need to understand the basics of environments and that any bindings you make inside lapply do not hold outside of it. It's a design feature and the whole point is that lapply can't have side effects - you should only use its return value. You can either use the <<- operator instead of <- though I don't recommend that, or you can use the assign function instead. Or you can do it properly the way lapply is meant to be used:
others.list <- lapply(seq_along(some.list), function(i, other.list) {
some.list[[i]]
})
Note that it's generally recommended to not make assignments inside lapply that change variables outside of it. lapply is meant to perform a function on every element and return a list, and that list should be all that lapply is used for

Is stricter error reporting available in R?

In PHP we can do error_reporting(E_ALL) or error_reporting(E_ALL|E_STRICT) to have warnings about suspicious code. In g++ you can supply -Wall (and other flags) to get more checking of your code. Is there some similar in R?
As a specific example, I was refactoring a block of code into some functions. In one of those functions I had this line:
if(nm %in% fields$non_numeric)...
Much later I realized that I had overlooked adding fields to the parameter list, but R did not complain about an undefined variable.
(Posting as an answer rather than a comment)
How about ?codetools::checkUsage (codetools is a built-in package) ... ?
This is not really an answer, I just can't resist showing how you could declare globals explicitly. #Ben Bolker should post his comment as the Answer.
To avoiding seeing globals, you can take a function "up" one environment -- it'll be able to see all the standard functions and such (mean, etc), but not anything you put in the global environment:
explicit.globals = function(f) {
name = deparse(substitute(f))
env = parent.frame()
enclos = parent.env(.GlobalEnv)
environment(f) = enclos
env[[name]] = f
}
Then getting a global is just retrieving it from .GlobalEnv:
global = function(n) {
name = deparse(substitute(n))
env = parent.frame()
env[[name]] = get(name, .GlobalEnv)
}
assign('global', global, env=baseenv())
And it would be used like
a = 2
b = 3
f = function() {
global(a)
a
b
}
explicit.globals(f)
And called like
> f()
Error in f() : object 'b' not found
I personally wouldn't go for this but if you're used to PHP it might make sense.
Summing up, there is really no correct answer: as Owen and gsk3 point out, R functions will use globals if a variable is not in the local scope. This may be desirable in some situations, so how could the "error" be pointed out?
checkUsage() does nothing that R's built-in error-checking does not (in this case). checkUsageEnv(.GlobalEnv) is a useful way to check a file of helper functions (and might be great as a pre-hook for svn or git; or as part of an automated build process).
I feel the best solution when refactoring is: at the very start to move all global code to a function (e.g. call it main()) and then the only global code would be to call that function. Do this first, then start extracting functions, etc.

Resources