Using a parent environment to access child - r

I'm attempting to create a hierarchy of environments, where I have one main environment which contains a variety of other environments which could then contain other environments (and so on).
I create the environment and its child like this:
parentEnv <- new.env()
childEnv <- new.env(parentEnv)
We can see that the parent and child were created:
> childEnv
<environment: 0x000000000e811208>
> parentEnv
<environment: 0x000000000d9e2440>
However, I then check the child's parent and it tells me its the R_Global. (This isn't actually surprising as I was able to access it directly). Does new.env(parent) not do what I think it does?
> parent.env(childEnv)
<environment: R_GlobalEnv>
That's fine, I then set my child's parent as parent.env(childEnv) <- parentEnv (though the R docs says this is 'dangerous' and could be become deprecated, I wanted to try it anyway).
parent.env(childEnv) <- parentEnv
> childEnv
<environment: 0x000000000e811208>
> parent.env(childEnv)
<environment: 0x000000000d9e2440>
We now see that the child's parent is parentEnv! Everything should be great, right...?
> parentEnv$childEnv
NULL
> with(parentEnv, childEnv)
<environment: 0x000000000e811208>
I can't access it with $. Though I can using the 'with' notation. What's going on here? Am not understand how environments work in R?

To answer your question about with: The argument parentEnv in with(parentEnv, childEnv) is being ignored, and the object childEnv is found because it belongs in R_GlobalEnv. You can see that running any of these:
eval(childEnv)
evalq(childEnv)
with(new.env(), childEnv)
The environment where an (environment) object belongs is not necessarily its parent environment.
If you want to set not only the parent environment of children but also the environment where they belong to parentEnv, I suggest one of this two options:
parentEnv <- new.env()
parentEnv$childEnv1 <- new.env(parent=parentEnv)
evalq(childEnv2 <- new.env(), parentEnv)
Note that they give the same result:
parentEnv # <environment: 0x0000000007ec0c18>
parent.env(parentEnv$childEnv1) # <environment: 0x0000000007ec0c18>
parent.env(parentEnv$childEnv2) # <environment: 0x0000000007ec0c18>
EDIT: I've changed local to evalq, since they give the same result and the latter is more intuitive. Thanks #hadley.

You doesn't actually seem like you are passing the parent argument when you make childEnv. You are relying on R type-matching the argument to parent which doesn't seem to work here. At the moment (and I am guessing) R is positional matching your environment to the hash argument and because it is not logical it is silently dropping it. Instead try this:
parentEnv <- new.env()
childEnv <- new.env( parent = parentEnv )
parentEnv
#<environment: 0x1078dde60>
parent.env( childEnv )
#<environment: 0x1078dde60>
I assume the fact that it didn't work as you had in your OP is something to do with the fact that new.env() is a .Primitive function. From the help manual for ?.Primitive:
The advantage of .Primitive over .Internal functions is the potential efficiency of argument passing, and that positional matching can be used where desirable, e.g. in switch. For more details, see the ‘R Internals Manual’

Related

Strictly speaking does the scoping assignment <<- assign to the parent environment or global environment?

Often the parent environment is the global environment.
But occasionally it isn't. For example in functions within functions, or in an error function in tryCatch().
Strictly speaking, does <<- assign to the global environment, or simply to the parent environment?
Try it out:
env = new.env()
env2 = new.env(parent = env)
local(x <<- 42, env2)
ls(env)
# character(0)
ls()
# [1] "env" "env2" "x"
But:
env$x = 1
local(x <<- 2, env2)
env$x
# [1] 2
… so <<- does walk up the entire chain of parent environments until it finds an existing object of the given name, and replaces that. However, if it doesn’t find any such object, it creates a new object in .GlobalEnv.
(The documentation states much the same. But in a case such as this nothing beats experimenting to gain a better understanding.)
Per the documentation:
The operators <<- and ->> are normally only used in functions, and cause a search to be made through parent environments for an existing definition of the variable being assigned.
Use of this operator will cause R to search through the environment tree until it finds a match. The search starts at the environment in which the operator is used and moves up the stack from there. So it's not guaranteed to be a "global" assignment, but could be.
As sindri_baldur points out, if the variable is not found in any existing environment, a new one will be created at the global level.
Lastly, I should point out that use of the operator is confusing more often than it is helpful, as it breaks the otherwise highly functional nature of R programming. There's more than likely a way to avoid using <<-.

Set the environment of a function placed outside the .GlobalEnv

I want to attach functions from a custom environment to the global environment, while masking possible internal functions.
Specifically, say that f() uses an internal function g(), then:
f() should not be visible in .GlobalEnv with ls(all=TRUE).
f() should be usable from .GlobalEnv.
f() internal function g() should not be visible and not usable from .GlobalEnv.
First let us create environments and functions as follows:
assign('ep', value=new.env(parent=.BaseNamespaceEnv), envir=.BaseNamespaceEnv)
assign('e', value=new.env(parent=ep), envir=ep)
assign('g', value=function() print('hello'), envir=ep)
assign('f', value=function() g(), envir=ep$e)
ls(.GlobalEnv)
## character(0)
Should I run now:
ep$e$f()
## Error in ep$e$f() (from #1) : could not find function "g"
In fact, the calling environment of f is:
environment(get('f', envir=ep$e))
## <environment: R_GlobalEnv>
where g is not present.
Trying to change f's environment gives an error:
environment(get('f', envir=ep$e))=ep
## Error in environment(get("f", envir = ep$e)) = ep :
## target of assignment expands to non-language object
Apparently it works with:
environment(ep$e$f)=ep
attach(ep$e)
Now, as desired, only f() is usable from .GlobalEnv, g() is not.
f()
[1] "hello"
g()
## Error: could not find function "g" (intended behaviour)
Also, neither f() nor g() are visible from .GlobalEnv, but unfortunately:
ls(.GlobalEnv)
## [1] "ep"
Setting the environment associated with f() to ep, places ep in .GlobalEnv.
Cluttering the Global environment was exactly what I was trying to avoid.
Can I reset the parent environment of f without making it visible from the Global one?
UPDATE
From your feedback, you suggest to build a package to get proper namespace services.
The package is not flexible. My helper functions are stored in a project subdir, say hlp, and sourced like source("hlp/util1.R").
In this way scripts can be easily mixed and updated on the fly on a project basis.
(Added new enumerated list on top)
UPDATE 2
An almost complete solution, which does not require external packages, is now here.
Either packages or modules do exactly what you want. If you’re not happy with packages’ lack of flexibility, I suggest you give ‘box’ modules a shot: they elegantly solve your problem and allow you to treat arbitrary R source files as modules:
Just mark public functions inside the module with the comment #' #export, and load it via
box::use(./foo)
foo$f()
or
box::use(./foo[...])
f()
This fulfils all the points in your enumeration. In particular, both pieces of code make f, but not g, available to the caller. In addition, modules have numerous other advantages over using source.
On a more technical note, your code results in ep being inside the global environment because the assignment environment(ep$e$f)=ep creates a copy of ep inside your global environment. Once you’ve attached the environment, you can delete this object. However, the code still has issues (it’s more complex than necessary and, as Hong Ooi mentioned, you shouldn’t mess with the base namespace).
First, you shouldn't be messing around with the base namespace. Cluttering up the base because you don't want to clutter up the global environment is just silly.*
Second, you can use local() as a poor-man's namespacing:
e <- local({
g <- function() "hello"
f <- function() g()
environment()
})
e$f()
# [1] "hello"
* If what you have in mind is a method for storing package state, remember that (essentially) anything you put in the global environment will be placed in its own namespace when you package it up. So don't worry about cluttering things up.

Does an environment NOT on the search path have a parent?

I'm reading through Hadley Wickham's Advanced R book and am currently reading the "Environments" chapter
It says that every environment except the emptyenv has a parent. I need help understanding something that clears this up to me - maybe I'm just overcomplicating it or misunderstanding how variables work in R.
Let's say I define my own environment:
myenv <- new.env()
Now if I do a simple parent.env(myenv) I do get the global env as expected.
But now what happens when I attach this environment, causing it to go in the search path above the global env?
attach(myenv)
Now if I look at the search path using search() I can see that myenv is a parent of .GlobalEnv. I can also verify this using parent.env(globalenv()) which returns myenv. And if I run parent.env(parent.env(globalenv())) then I get tools:rstudio which makes sense.
BUT if I try to look at the parent of myenv directly using parent.env(myenv) I get the global env. In order to get the result I expect (tools:rstudio) I need to cast my environment to an environment manually: parent.env(as.environment("myenv")) returns tools:rstudio.
So which is correct? What is the parent environment of myenv?
I know I can change the parent of an environment with parent.env<-, but if I just attach it like in the above example, I don't understand what is considered to be the true parent
The problem is that attaching myenv creates a copy of myenv (and also modifies its parent) so we now have two myenv environments and they can be and are different. You have already shown in the question that they have different parents. Try this in a fresh session to further show that they are distinct:
myenv <- new.env()
myenv$x <- 1
# with the attach below we now have 2 myenv environments -
# the one above and the new one created in the attach below on the search path.
attach(myenv)
# this changes x in the original myenv but not the x in the copy on the search path
myenv$x <- 2
myenv$x
## 2
# the copy of myenv on the search path still has the original value of x
as.environment("myenv")$x
## 1
Check out this blog post for more.
The parent environment is the enclosure from which you defined the new environment. However, this is not so for functions. If you're defining a function, the parent frame will be environment from which it's called, not defined (see help(sys.parent)).
The Environment documentation is quite informative.
new.env returns a new (empty) environment with (by default) enclosure the parent frame.
parent.env returns the enclosing environment of its argument.
So it's no surprise that the parent environment is the global environment. Had you assigned myenv in a different enclosure, that would be its parent.
The empty env is the end of the line
parent.env(baseenv())
# <environment: R_EmptyEnv>
parent.env(parent.env(baseenv()))
# Error in parent.env(parent.env(baseenv())) :
# the empty environment has no parent
And here's something else that's useful...
If one follows the chain of enclosures found by repeatedly calling parent.env from any environment, eventually one reaches the empty environment emptyenv(), into which nothing may be assigned.
I guess it also depends on what's on the search path and whether or not you even attach the it to the search path.
myenv <- new.env()
attach(myenv)
sapply(search(), function(x) {
attr(parent.env(as.environment(x)), "name")
})
Which gives a list of environments and their parents on the search my path.
$.GlobalEnv
[1] "myenv"
$myenv
[1] "package:stringi"
...
...
$`package:base`
NULL
[[12]]
<environment: R_EmptyEnv>

How to find unreferenced environments?

This is a followup to an answer here efficiently move environment from inside function to global environment , which pointed out that it's necessary to return a reference to an environment which was created inside a function if one wishes to work with the contents of that environment
Is it true that the newly created environment continues to exist if we don't return a reference, and if so how does one track down such an environment, either to access its contents or delete it?
Sure, if it was assigned to a symbol somewhere outside of the function's evaluation environment (as it was in the OP's example), an environment will continue to exist. In that sense, an environment is just like any other named R object. (The fact that unassigned environments can be kept in existence by closures does mean that environments sometimes persist where other types of object wouldn't, but that's not what's happening here.)
## OP's example function
funfun <- function(inc = 1){
dataEnv <- new.env()
dataEnv$d1 <- 1 + inc
dataEnv$d2 <- 2 + inc
dataEnv$d3 <- 2 + inc
assign('dataEnv', dataEnv, envir = globalenv()) ## Assignment to .GlobalEnv
}
funfun()
ls(env=.GlobalEnv)
# [1] "dataEnv" "funfun"
## It's easy to find environments assigned to a symbol in another environment,
## if you know which environment to look in.
Filter(isTRUE, eapply(.GlobalEnv, is.environment))
# $dataEnv
# [1] TRUE
In the OP's example, it's relatively easy to track down, because the environment was assigned to a symbol in .GlobalEnv. In general, though, (and again, just like any other R object) it will be difficult to track down if, for instance, it's assigned to an element in a list or some more complicated structure.
(This, incidentally, is why non-local assignment is usually discouraged in R and other more purely functional languages. When functions only return a value, and that value is only assigned to a symbol via explicit assignments (like v <- f()), the effects of executing code becomes a lot easier to reason about and predict. Fewer surprises makes for nicer code!)

attaching NULL environment results in scoping error

I've run into a strange (to me) behaviour in R's lexical scoping that results from first attaching a NULL environment to the search path, as suggested in the help file for attach(), and then populating it using sys.source().
Here is a simplified and reproducible example of the issue. I have 3 functions (f1, f2, and f3) in three separate files I wish to attach into three separate environments (env.A, env.B, and env.C, respectively). Here is the setup function:
setup <- function() {
for (i in sprintf('env.%s',LETTERS[1:3])) if (i%in%search())
detach(i, unload=TRUE, force=TRUE, character.only=TRUE) # detach existing to avoid duplicates
env.A = attach(NULL, name='env.A')
env.B = attach(NULL, name='env.B')
env.C = attach(NULL, name='env.C')
sys.source('one.R', envir=env.A)
sys.source('two.R', envir=env.B)
sys.source('three.R', envir=env.C)
}
setup()
Once this function is called, 3 new environments are created with the functions f1, f2, and f3 contained within each environment. Each function lives in one of 3 separate files: "one.R", "two.R", and "three.R". The functions are trivial:
f1 <- function() {
print('this is my f1 function')
return('ok')
}
f2 <- function() {
f1()
f3()
print('this is my f2 function')
return('ok')
}
f3 <- function() {
print('this is my f3 function')
return('ok')
}
As you can see, functions f1 and f3 have no dependencies, but function f2 depends on both f1 and f2. Calling search() shows the following:
[1] ".GlobalEnv" "env.C" "env.B"
[4] "env.A" "package:stats" "package:graphics"
[7] "package:grDevices" "package:utils" "package:datasets"
[10] "package:methods" "Autoloads" "package:base"
The Issue:
Calling f2, gives the following:
> f2()
[1] "this is my f1 function"
Error in f2() : could not find function "f3"
Clearly f2 can "see" f1, but it cannot find f3. Permuting the order the attached environments leads me to conclude that the order of the search path is critical. Functions lower down in the search path are visible, whereas functions "upstream" of where the function is being call from are not found.
In this case, f2 (env.B) found f1 (env.A), but could not find f3 (env.C). This is contrary to how I understand R's scoping rules (at least I thought I understood it). My understanding is that R first checks the local environment, then the enclosing environment, then any additional enclosing environments, then works its way down the search, starting with ".GlobalEnv", until it finds the first matching appropriate (function/object) name. If it makes it all the way to the "R_empty_env" then returns the "could not find function" error. This obviously isn't happening in this simple example.
Question:
What is happening? Why doesn't R traverse the entire search path and find f3 sitting in env.C? I assume there is something going on behind the scenes when the attach call is made. Perhaps some attributes are set detailing dependencies? I have found a workaround that does not run into this issue, whereby I create and populate the environment prior to attaching it. Using pseudocode:
env.A <- new.env(); ... B ... C
sys.source('one.R', envir=env.A)
...
attach(env.A)
...
This workaround exhibits a behaviour consistent with my expectations, but I am puzzled by the difference: attach then populate vs. populate then attach.
Comments, explanations, thoughts greatly appreciated. Thanks.
The different between the two methods has to do with the parent environment of each of the newly created environments.
When R finds an object, it will then try to resolve all variable in that environment. If it cannot find them, it will then look next in the parent environment. It will continue to do so until it gets all the way to the empty environment. So if a function as the global environment as a parent environment, then every environment in the search path will be searched as you were expecting.
When you create an environment with
env.A <- new.env();
the default value for the parent= parameter is parent.frame() and as such when you call it it will set the value to the current environment() value. Observe
parent.env(env.A)
# <environment: R_GlobalEnv>
s a child of the global environment. However, when you do
env.A = attach(NULL, name='env.A')
parent.env(env.A)
# <environment: 0x1089c0ea0>
# attr(,"name")
# [1] "tools:RGUI"
You will see that it sets the parent to the environment in the search path that was last loaded (which happens to be "tools:RGUI" for me after a fresh R restart.) And continuing
env.B = attach(NULL, name='env.B')
parent.env(env.B)
#<environment: 0x108a2edf8>
#attr(,"name")
#[1] "env.A"
env.C = attach(NULL, name='env.C')
parent.env(env.C)
# <environment: 0x108a4f6e0>
# attr(,"name")
# [1] "env.B"
Notice how as we continue to add environments via attach(), they do not have a parent of GlobalEnv. This means that once we resolve a variable to env.B, it does not have a way to go "up the chain" to env.A. This is why it cannot find f3(). This is the same as doing
env.A <- new.env(parent=parent.env(globalenv()));
env.B <- new.env(parent=env.A);
env.C <- new.env(parent=env.B);
with explicit calls to new.env.
Note that if I switch the order of attaches to
env.C = attach(NULL, name='env.C')
env.B = attach(NULL, name='env.B')
env.A = attach(NULL, name='env.A')
and try to run f2(), this time it can't find f1(), again because it can only go one way up the chain.
So the two different ways to create environments differ in the way they assign the default parent environment. So perhaps the attach(NULL) method really isn't appropriate for you in this case.
I agree the answer seems to lie in the default parent assignment differing between attach() and new.env(). I find it a little strange that attach() would assign parentage to the environment second in the search list by default, but it is what it is, there is probably a valid reason behind it. The solution is simple enough:
env.A <- attach(NULL, name='env.A')
parent.env(env.A) <- .GlobalEnv
In the alternate solution using new.env(), there is a small caveat that you didn't run into because you were working directly in the .GlobalEnv, but in the OP, I was working within a temporary environment (the "setup" function). So the parent frame of the new.env() call is actually this setup environment. See below:
setup <- function() {
env.A <- new.env(); env.B <- new.env(); env.C <- new.env()
print(parent.env(environment()))
print(parent.frame())
print(environment())
print(parent.env(env.A))
print(parent.env(env.B))
print(parent.env(env.C))
}
setup()
#<environment: R_GlobalEnv>
#<environment: R_GlobalEnv>
#<environment: 0x2298368>
#<environment: 0x2298368>
#<environment: 0x2298368>
#<environment: 0x2298368>
When setup() is called from the command line, notice its parent is .GlobalEnv, as is the parent frame. However, the parent of environments A-C is the temporary setup environment (0x2298368). When setup() completes, its environment closes and is deleted and env.A-C become orphans. At this point (I assume) R re-assigns parentage to .GlobalEnv and this is why this alternative works.
I think a cleaner way would not to depend on the correct re-assignment to .GlobalEnv and to specify it directly: env.A <- new.env(parent=.GlobalEnv). This works fine in my test case ... we'll see what happens when I scale up to ~750 interdependent functions!
Thanks again for your clear answer, I'd up-vote it but I'm apparently too new to have that privilege.

Resources