Making saveRDS() work for globalenv() - r

I would like to save an arbitrary environment with saveRDS(). Whether global or custom, I was hoping saveRDS() would have the same behavior. However, the global environment seems like a special case.
ls()
## [1] "dl" "gwd" "ld" "td" # Helpers defined in my .Rprofile
x <- 1
ls()
## [1] "dl" "gwd" "ld" "td" "x"
saveRDS(globalenv(), "global.rds")
Now, when I start a new R session and try to load the environment, x is dropped. This does not happen with a custom environment created with new.env(parent = globalenv()).
ls(readRDS("global.rds"))
## [1] "dl" "gwd" "ld" "td"
I expect this behavior is due to the fact that environments in R are really just collections of pointers. Still, I am looking for a way to save globalenv() with saveRDS(). I am currently using save.image() for my application, but it is a clunky special case that increases the demands on my testing workflow.
EDIT
If someone could elaborate on why exactly globalenv() does not really get saved, I would really appreciate it. I assume it is because of pointers, but that does not explain why this sort of thing does work for environments created with new.env(parent = globalenv()).

Related

Hide data in global environment

Is there a function that allows for data to be hidden in the global environment but it can still be accessed?
For example, I have a very long script with up to 100+ lines and my global environment is looking messy, there is too much and it strains my brain finding what is necessary.
I have searched up similar questions and they involve creating a package, quite frankly I have no time to learn, at this moment.
If you name all the objects you don't want to appear in the global environment starting with a dot (.), for example:
.foo <- 'bar' the object will be accesible but will be hidden in the global environment or in any ls() call:
> .foo <- 'bar'
> .foo
[1] "bar"
> ls()
character(0)
>
Edit: Adding a working example
Possible solutions would be:
removing objects once they're not needed any more
put related variables into a list (hello lapply, sapply)
move into a separate custom environment (new.env())
simplify the script to not use as many objects
run script in batch mode and miss out on what the environment looks like altogether
Sounds like an XY problem, 100 lines is quite small, it's likely that you are using too many temporary variables or are numbering objects that should be in a list.
You also don't mention why you don't like your environment to be "messy", my guess is that maybe you don't like the output of ls() ?
Then maybe you'll be happy to learn about the pattern argument of ls() which will allow you to filter the result, mostly useful with prefixes or suffixes as in the following examples :
something <- 1
some_var <- 2
another_var <- 3
ls(pattern ="^some")
#> [1] "some_var" "something"
ls(pattern="var$")
#> [1] "another_var" "some_var"
Created on 2019-11-17 by the reprex package (v0.3.0)

Set the environment of a function placed outside the .GlobalEnv

I want to attach functions from a custom environment to the global environment, while masking possible internal functions.
Specifically, say that f() uses an internal function g(), then:
f() should not be visible in .GlobalEnv with ls(all=TRUE).
f() should be usable from .GlobalEnv.
f() internal function g() should not be visible and not usable from .GlobalEnv.
First let us create environments and functions as follows:
assign('ep', value=new.env(parent=.BaseNamespaceEnv), envir=.BaseNamespaceEnv)
assign('e', value=new.env(parent=ep), envir=ep)
assign('g', value=function() print('hello'), envir=ep)
assign('f', value=function() g(), envir=ep$e)
ls(.GlobalEnv)
## character(0)
Should I run now:
ep$e$f()
## Error in ep$e$f() (from #1) : could not find function "g"
In fact, the calling environment of f is:
environment(get('f', envir=ep$e))
## <environment: R_GlobalEnv>
where g is not present.
Trying to change f's environment gives an error:
environment(get('f', envir=ep$e))=ep
## Error in environment(get("f", envir = ep$e)) = ep :
## target of assignment expands to non-language object
Apparently it works with:
environment(ep$e$f)=ep
attach(ep$e)
Now, as desired, only f() is usable from .GlobalEnv, g() is not.
f()
[1] "hello"
g()
## Error: could not find function "g" (intended behaviour)
Also, neither f() nor g() are visible from .GlobalEnv, but unfortunately:
ls(.GlobalEnv)
## [1] "ep"
Setting the environment associated with f() to ep, places ep in .GlobalEnv.
Cluttering the Global environment was exactly what I was trying to avoid.
Can I reset the parent environment of f without making it visible from the Global one?
UPDATE
From your feedback, you suggest to build a package to get proper namespace services.
The package is not flexible. My helper functions are stored in a project subdir, say hlp, and sourced like source("hlp/util1.R").
In this way scripts can be easily mixed and updated on the fly on a project basis.
(Added new enumerated list on top)
UPDATE 2
An almost complete solution, which does not require external packages, is now here.
Either packages or modules do exactly what you want. If you’re not happy with packages’ lack of flexibility, I suggest you give ‘box’ modules a shot: they elegantly solve your problem and allow you to treat arbitrary R source files as modules:
Just mark public functions inside the module with the comment #' #export, and load it via
box::use(./foo)
foo$f()
or
box::use(./foo[...])
f()
This fulfils all the points in your enumeration. In particular, both pieces of code make f, but not g, available to the caller. In addition, modules have numerous other advantages over using source.
On a more technical note, your code results in ep being inside the global environment because the assignment environment(ep$e$f)=ep creates a copy of ep inside your global environment. Once you’ve attached the environment, you can delete this object. However, the code still has issues (it’s more complex than necessary and, as Hong Ooi mentioned, you shouldn’t mess with the base namespace).
First, you shouldn't be messing around with the base namespace. Cluttering up the base because you don't want to clutter up the global environment is just silly.*
Second, you can use local() as a poor-man's namespacing:
e <- local({
g <- function() "hello"
f <- function() g()
environment()
})
e$f()
# [1] "hello"
* If what you have in mind is a method for storing package state, remember that (essentially) anything you put in the global environment will be placed in its own namespace when you package it up. So don't worry about cluttering things up.

How to find unreferenced environments?

This is a followup to an answer here efficiently move environment from inside function to global environment , which pointed out that it's necessary to return a reference to an environment which was created inside a function if one wishes to work with the contents of that environment
Is it true that the newly created environment continues to exist if we don't return a reference, and if so how does one track down such an environment, either to access its contents or delete it?
Sure, if it was assigned to a symbol somewhere outside of the function's evaluation environment (as it was in the OP's example), an environment will continue to exist. In that sense, an environment is just like any other named R object. (The fact that unassigned environments can be kept in existence by closures does mean that environments sometimes persist where other types of object wouldn't, but that's not what's happening here.)
## OP's example function
funfun <- function(inc = 1){
dataEnv <- new.env()
dataEnv$d1 <- 1 + inc
dataEnv$d2 <- 2 + inc
dataEnv$d3 <- 2 + inc
assign('dataEnv', dataEnv, envir = globalenv()) ## Assignment to .GlobalEnv
}
funfun()
ls(env=.GlobalEnv)
# [1] "dataEnv" "funfun"
## It's easy to find environments assigned to a symbol in another environment,
## if you know which environment to look in.
Filter(isTRUE, eapply(.GlobalEnv, is.environment))
# $dataEnv
# [1] TRUE
In the OP's example, it's relatively easy to track down, because the environment was assigned to a symbol in .GlobalEnv. In general, though, (and again, just like any other R object) it will be difficult to track down if, for instance, it's assigned to an element in a list or some more complicated structure.
(This, incidentally, is why non-local assignment is usually discouraged in R and other more purely functional languages. When functions only return a value, and that value is only assigned to a symbol via explicit assignments (like v <- f()), the effects of executing code becomes a lot easier to reason about and predict. Fewer surprises makes for nicer code!)

python rpy2 module: refresh global R environment

The documentation for rpy2 states that the robjects.r object gives access to an R global environment. Is there a way to "refresh" this global environment to its initial state?
I would like to be able to restore the global environment to the state it was in when the rpy2.robjects module was imported but not yet used. In this manner, I don't have to worry about memory leaks on long running jobs or other unexpected side effects. Yes, refreshing the environment could introduce a different category of bug, but I believe in my case it will be a win.
Taking your question to mean literally what it says, if you just want to clear out .GlobalEnv, you can do that with a single line:
rm(list = ls(all.names=TRUE))
The all.names=TRUE bit is necessary because some object names are not returned by vanilla ls(). For example:
x <- rnorm(5)
ls()
# [1] "x"
# Doesn't remove objects with names starting with "."
rm(list=ls())
ls(all.names = TRUE)
# [1] ".Random.seed"
# Removes all objects
rm(list = ls(all.names=TRUE))
ls(all.names = TRUE)
# character(0)
There is only /one/ "global environment" in R; it is initialized when R starts. You can clear out its members, as Josh points it out, but if you happen to need to it this might mean that you'd better instanciate new environments and either switch between them or delete them when no longer needed.

Examining contents of .rdata file by attaching into a new environment - possible?

I am interested in listing objects in an RDATA file and loading only selected objects, rather than the whole set (in case some may be big or may already exist in the environment). I'm not quite clear on how to do this when there are conflicts in names, as attach() doesn't work as nicely.
1: For examining the contents of an R data file without loading it: This question is similar, but different from, the one asked at listing contents of an R data file without loading
In that case, the solution offered was:
attach(filename)
ls(pos = 2)
detach()
If there are naming conflicts between objects in the file and those in the global environment, this warning appears:
The following object(s) are masked _by_ '.GlobalEnv':
I tried creating a new environment, but I cannot seem to attach into that.
For instance, this produces the same error:
lsfile <- function(filename){
tmpEnv <- new.env()
evalq(attach(filename), envir = tmpEnv)
tmpls <- ls(pos = 2)
detach()
return(tmpls)
}
lsfile(filename)
Maybe I've made a mess of things with evalq (or eval). Is there some other way to avoid the naming conflict?
2: If I want to access an object - if there are no naming conflicts, I can just work with the one from the .rdat file, or copy it to a new one. If there are conflicts, how does one access the object in the file's namespace?
For instance, if my file is "sample.rdat", and the object is surveyData, and a surveyData object already exists in the global environment, then how can I access the one from the file:sample.rdat namespace?
I currently solve this problem by loading everything into a temporary environment, and then copy out what's needed, but this is inefficient.
Since this question has just been referenced let's clarify two things:
attach() simply calls load() so there is really no point in using it instead of load
if you want selective access to prevent masking it's much easier to simply load the file into a new environment:
e = local({load("foo.RData"); environment()})
You can then use ls(e) and access contents like e$x. You can still use attach on the environment if you really want it on the search path.
FWIW .RData files have no index (the objects are stored in one big pairlist), so you can't list the contained objects without loading. If you want convenient access, convert it to the lazy-load format instead which simply adds an index so each object can be loaded separately (see Get specific object from Rdata file)
I just use an env= argument to load():
> x <- 1; y <- 2; z <- "foo"
> save(x, y, z, file="/tmp/foo.RData")
> ne <- new.env()
> load(file="/tmp/foo.RData", env=ne)
> ls(env=ne)
[1] "x" "y" "z"
> ne$z
[1] "foo"
>
The cost of this approach is that you do read the whole RData file---but on the other hand that seems to be unavoidable anyway as no other method seems to offer a list of the 'content' of such a file.
You can suppress the warning by setting warn.conflicts=FALSE on the call to attach. If an object is masked by one in the global environment, you can use get to retreive it from your attached data.
x <- 1:10
save(x, file="x.rData")
#attach("x.rData", pos=2, warn.conflicts=FALSE)
attach("x.rData", pos=2)
(x <- 1)
# [1] 1
(x <- get("x", pos=2))
# [1] 1 2 3 4 5 6 7 8 9 10
Thanks to #Dirk and #Joshua.
I had an epiphany. The command/package foreach with SMP or MC seems to produce environments that only inherit, but do not seem to conflict with, the global environment.
lsfile <- function(list_files){
aggregate_ls = foreach(ix = 1:length(list_files)) %dopar% {
attach(list_files[ix])
tmpls <- ls(pos = 2)
return(tmpls)
}
return(aggregate_ls)
}
lsfile("f1.rdat")
lsfile(dir(pattern = "*rdat"))
This is useful to me because I can now parallelize this. This is a bare-bones version, and I will modify it to give more detailed information, but so far it seems to be the only way to avoid conflicts, even without ignore.
So, question #1 can be resolved by either ignoring the warnings (as #Joshua suggested) or by using whatever magic foreach summons.
For part 2, loading an object, I think #Joshua has the right idea - "get" will do.
The foreach magic can also work, by using the .noexport option. However, this has risks: whatever isn't specifically excluded will be inherited/exported from the global environment (I could do ls(), but there's always the possibility of attached datasets). For safety, this means that get() must still be used to avoid the risk of a naming conflict. Loading into a subenvironment avoids the naming conflict, but doesn't avoid the loading of unnecessary objects.
#Joshua's answer is far simpler than my foreach detour.

Resources