Does an environment NOT on the search path have a parent? - r

I'm reading through Hadley Wickham's Advanced R book and am currently reading the "Environments" chapter
It says that every environment except the emptyenv has a parent. I need help understanding something that clears this up to me - maybe I'm just overcomplicating it or misunderstanding how variables work in R.
Let's say I define my own environment:
myenv <- new.env()
Now if I do a simple parent.env(myenv) I do get the global env as expected.
But now what happens when I attach this environment, causing it to go in the search path above the global env?
attach(myenv)
Now if I look at the search path using search() I can see that myenv is a parent of .GlobalEnv. I can also verify this using parent.env(globalenv()) which returns myenv. And if I run parent.env(parent.env(globalenv())) then I get tools:rstudio which makes sense.
BUT if I try to look at the parent of myenv directly using parent.env(myenv) I get the global env. In order to get the result I expect (tools:rstudio) I need to cast my environment to an environment manually: parent.env(as.environment("myenv")) returns tools:rstudio.
So which is correct? What is the parent environment of myenv?
I know I can change the parent of an environment with parent.env<-, but if I just attach it like in the above example, I don't understand what is considered to be the true parent

The problem is that attaching myenv creates a copy of myenv (and also modifies its parent) so we now have two myenv environments and they can be and are different. You have already shown in the question that they have different parents. Try this in a fresh session to further show that they are distinct:
myenv <- new.env()
myenv$x <- 1
# with the attach below we now have 2 myenv environments -
# the one above and the new one created in the attach below on the search path.
attach(myenv)
# this changes x in the original myenv but not the x in the copy on the search path
myenv$x <- 2
myenv$x
## 2
# the copy of myenv on the search path still has the original value of x
as.environment("myenv")$x
## 1
Check out this blog post for more.

The parent environment is the enclosure from which you defined the new environment. However, this is not so for functions. If you're defining a function, the parent frame will be environment from which it's called, not defined (see help(sys.parent)).
The Environment documentation is quite informative.
new.env returns a new (empty) environment with (by default) enclosure the parent frame.
parent.env returns the enclosing environment of its argument.
So it's no surprise that the parent environment is the global environment. Had you assigned myenv in a different enclosure, that would be its parent.
The empty env is the end of the line
parent.env(baseenv())
# <environment: R_EmptyEnv>
parent.env(parent.env(baseenv()))
# Error in parent.env(parent.env(baseenv())) :
# the empty environment has no parent
And here's something else that's useful...
If one follows the chain of enclosures found by repeatedly calling parent.env from any environment, eventually one reaches the empty environment emptyenv(), into which nothing may be assigned.
I guess it also depends on what's on the search path and whether or not you even attach the it to the search path.
myenv <- new.env()
attach(myenv)
sapply(search(), function(x) {
attr(parent.env(as.environment(x)), "name")
})
Which gives a list of environments and their parents on the search my path.
$.GlobalEnv
[1] "myenv"
$myenv
[1] "package:stringi"
...
...
$`package:base`
NULL
[[12]]
<environment: R_EmptyEnv>

Related

Prevent function definition in .Rprofile from being overridden by variable assignment

I'd like to create shorter aliases for some R functions, like j=paste0. However when I add the line j=paste0 to ~/.Rprofile, it is later overridden when I use j as a variable name.
I was able to create my own package by first running package.skeleton() and then running this:
rm anRpackage/R/*
echo 'j=paste0'>anRpackage/R/test.R
echo 'export("j")'>anRpackage/NAMESPACE
rm -r anRpackage/man
R CMD install anRpackage
And then library(anRpackage);j=0;j("a",1) returns "a1". However is there some easier way to prevent the function definition from being overridden by the variable assignment?
The code in .Rprofile will run in the global environment so that's where the variable j will be defined. But if you use j as a variable later in the global environment, it will replace that value. Variables in a given environment must have unique names. But two different environments may have the same name defined and R will use the first version it finds that will work as a function when you attempt to call the variable name as a function. So basically you need to create a separate environment. You can do what with a package as you've done. You could also use attach() to add a new environment to your search path.
attach(list(j=paste0))
This will allow for the behavior you want.
attach(list(j=paste0))
j("a",1)
# [1] "a1"
j <- 5
j("a",1)
# [1] "a1"
Normally I would discourage people from using attach() because it causes confusing change to your workspace but this would do what you want. Just be careful if anyone else will be using your code because creating aliases for built in functions and reusing those as variable names is not a common thing to see in R scripts.
You can see where it was attached by looking at the output of search(). Normally it will be attached in the second position so you can remove it with detach(2)
I ended up putting my custom functions to ~/.R.R and I added these lines to .Rprofile:
if("myfuns"%in%search())detach("myfuns")
source("~/.R.R",attach(NULL,name="myfuns"))
From the help page of attach:
One useful ‘trick’ is to use ‘what = NULL’ (or equivalently a
length-zero list) to create a new environment on the search path
into which objects can be assigned by assign or load or
sys.source.
...
## create an environment on the search path and populate it
sys.source("myfuns.R", envir = attach(NULL, name = "myfuns"))

Strictly speaking does the scoping assignment <<- assign to the parent environment or global environment?

Often the parent environment is the global environment.
But occasionally it isn't. For example in functions within functions, or in an error function in tryCatch().
Strictly speaking, does <<- assign to the global environment, or simply to the parent environment?
Try it out:
env = new.env()
env2 = new.env(parent = env)
local(x <<- 42, env2)
ls(env)
# character(0)
ls()
# [1] "env" "env2" "x"
But:
env$x = 1
local(x <<- 2, env2)
env$x
# [1] 2
… so <<- does walk up the entire chain of parent environments until it finds an existing object of the given name, and replaces that. However, if it doesn’t find any such object, it creates a new object in .GlobalEnv.
(The documentation states much the same. But in a case such as this nothing beats experimenting to gain a better understanding.)
Per the documentation:
The operators <<- and ->> are normally only used in functions, and cause a search to be made through parent environments for an existing definition of the variable being assigned.
Use of this operator will cause R to search through the environment tree until it finds a match. The search starts at the environment in which the operator is used and moves up the stack from there. So it's not guaranteed to be a "global" assignment, but could be.
As sindri_baldur points out, if the variable is not found in any existing environment, a new one will be created at the global level.
Lastly, I should point out that use of the operator is confusing more often than it is helpful, as it breaks the otherwise highly functional nature of R programming. There's more than likely a way to avoid using <<-.

R how to restrict the names that are in scope to those I create explicitly?

I thought that it would be enough to use fully qualified names to avoid polluting my scope with names I did not explicitly introduce, but apparently, with R, this is not the case.
For example,
% R_PROFILE_USER= /usr/bin/R --quiet --no-save --no-restore
> ls(all = TRUE)
character(0)
> load("/home/berriz/_/projects/fda/deseq/.R/data_for_deseq.RData")
> ls(all = TRUE)
[1] "a" "b" "c"
> ?rlog
No documentation for ‘rlog’ in specified packages and libraries:
you could try ‘??rlog’
So far, so good. In particular, as the last command shows, the interpreter knows nothing of rlog.
But after I run
> d <- DESeq2::DESeqDataSetFromMatrix(countData = a, colData = b, design = c)
...then, henceforth, the command ?rlog will produce a documentation page for a function I did not explicitly introduce into the environment (and did not refer to with a fully qualified name).
I find this behavior disconcerting.
In particular, I don't know when some definition I have explicitly made will be silently shadowed as a side-effect of some seemingly unrelated command.
How can I control what the environment can see?
Or to put it differently, how can I prevent side effects like the one illustrated above?
Not sure if "scope" means the same thing in R as it may to other languages. R uses "environments" (see http://adv-r.had.co.nz/Environments.html for detailed explanation). Your scope in R includes all environments that are loaded, and as you have discovered, the user doesn't explicitly control every environment that is loaded.
For example,
ls()
lists the objects in your default environment '.GlobalEnv'
search()
lists the currently loaded environments.
ls(name='package.stats')
In default R installations, 'package:stats' is one of the environments loaded on startup.
By default, everything you create is stored in the global environment.
ls(name='.GlobalEnv')
You can explicitly reference objects you create by referencing their environment with the $ syntax.
x <- c(1,2,3)
.GlobalEnv$x

Set the environment of a function placed outside the .GlobalEnv

I want to attach functions from a custom environment to the global environment, while masking possible internal functions.
Specifically, say that f() uses an internal function g(), then:
f() should not be visible in .GlobalEnv with ls(all=TRUE).
f() should be usable from .GlobalEnv.
f() internal function g() should not be visible and not usable from .GlobalEnv.
First let us create environments and functions as follows:
assign('ep', value=new.env(parent=.BaseNamespaceEnv), envir=.BaseNamespaceEnv)
assign('e', value=new.env(parent=ep), envir=ep)
assign('g', value=function() print('hello'), envir=ep)
assign('f', value=function() g(), envir=ep$e)
ls(.GlobalEnv)
## character(0)
Should I run now:
ep$e$f()
## Error in ep$e$f() (from #1) : could not find function "g"
In fact, the calling environment of f is:
environment(get('f', envir=ep$e))
## <environment: R_GlobalEnv>
where g is not present.
Trying to change f's environment gives an error:
environment(get('f', envir=ep$e))=ep
## Error in environment(get("f", envir = ep$e)) = ep :
## target of assignment expands to non-language object
Apparently it works with:
environment(ep$e$f)=ep
attach(ep$e)
Now, as desired, only f() is usable from .GlobalEnv, g() is not.
f()
[1] "hello"
g()
## Error: could not find function "g" (intended behaviour)
Also, neither f() nor g() are visible from .GlobalEnv, but unfortunately:
ls(.GlobalEnv)
## [1] "ep"
Setting the environment associated with f() to ep, places ep in .GlobalEnv.
Cluttering the Global environment was exactly what I was trying to avoid.
Can I reset the parent environment of f without making it visible from the Global one?
UPDATE
From your feedback, you suggest to build a package to get proper namespace services.
The package is not flexible. My helper functions are stored in a project subdir, say hlp, and sourced like source("hlp/util1.R").
In this way scripts can be easily mixed and updated on the fly on a project basis.
(Added new enumerated list on top)
UPDATE 2
An almost complete solution, which does not require external packages, is now here.
Either packages or modules do exactly what you want. If you’re not happy with packages’ lack of flexibility, I suggest you give ‘box’ modules a shot: they elegantly solve your problem and allow you to treat arbitrary R source files as modules:
Just mark public functions inside the module with the comment #' #export, and load it via
box::use(./foo)
foo$f()
or
box::use(./foo[...])
f()
This fulfils all the points in your enumeration. In particular, both pieces of code make f, but not g, available to the caller. In addition, modules have numerous other advantages over using source.
On a more technical note, your code results in ep being inside the global environment because the assignment environment(ep$e$f)=ep creates a copy of ep inside your global environment. Once you’ve attached the environment, you can delete this object. However, the code still has issues (it’s more complex than necessary and, as Hong Ooi mentioned, you shouldn’t mess with the base namespace).
First, you shouldn't be messing around with the base namespace. Cluttering up the base because you don't want to clutter up the global environment is just silly.*
Second, you can use local() as a poor-man's namespacing:
e <- local({
g <- function() "hello"
f <- function() g()
environment()
})
e$f()
# [1] "hello"
* If what you have in mind is a method for storing package state, remember that (essentially) anything you put in the global environment will be placed in its own namespace when you package it up. So don't worry about cluttering things up.

How to find unreferenced environments?

This is a followup to an answer here efficiently move environment from inside function to global environment , which pointed out that it's necessary to return a reference to an environment which was created inside a function if one wishes to work with the contents of that environment
Is it true that the newly created environment continues to exist if we don't return a reference, and if so how does one track down such an environment, either to access its contents or delete it?
Sure, if it was assigned to a symbol somewhere outside of the function's evaluation environment (as it was in the OP's example), an environment will continue to exist. In that sense, an environment is just like any other named R object. (The fact that unassigned environments can be kept in existence by closures does mean that environments sometimes persist where other types of object wouldn't, but that's not what's happening here.)
## OP's example function
funfun <- function(inc = 1){
dataEnv <- new.env()
dataEnv$d1 <- 1 + inc
dataEnv$d2 <- 2 + inc
dataEnv$d3 <- 2 + inc
assign('dataEnv', dataEnv, envir = globalenv()) ## Assignment to .GlobalEnv
}
funfun()
ls(env=.GlobalEnv)
# [1] "dataEnv" "funfun"
## It's easy to find environments assigned to a symbol in another environment,
## if you know which environment to look in.
Filter(isTRUE, eapply(.GlobalEnv, is.environment))
# $dataEnv
# [1] TRUE
In the OP's example, it's relatively easy to track down, because the environment was assigned to a symbol in .GlobalEnv. In general, though, (and again, just like any other R object) it will be difficult to track down if, for instance, it's assigned to an element in a list or some more complicated structure.
(This, incidentally, is why non-local assignment is usually discouraged in R and other more purely functional languages. When functions only return a value, and that value is only assigned to a symbol via explicit assignments (like v <- f()), the effects of executing code becomes a lot easier to reason about and predict. Fewer surprises makes for nicer code!)

Resources