.onLoad to create a new environment in R - r

Within a package I intend to submit to CRAN, I am using .onload(...) to create a new environment in which needed variables will be stored without directly modifying the global environment.
.onLoad <- function(...) {
envr <- new.env() # when package is loaded, create new environment to store needed variables
}
This function is saved in a file called zzz.R.
I then use assign(...) to assign variables to the new environment:
assign("x", x, envir = envr)
To retrieve variables in the new environment within my created functions, I do
envr$x
However, on building, installing, loading my package, and running my main function, I receive an error that the object 'envr' cannot be found.
I'm wondering what's happening here.
Creating a new environment directly in R works fine:
envr <- new.env()
envr$a <- 5
envr$a
[1] 5
Any thoughts on resolving the issue?

Your code
envr <- new.env()
assigns the new environment to a local variable in the .onLoad function. When that function exits, the variable isn't visible anywhere else.
You can make your assignment outside the function using <<-, but you have to be careful. That makes R look up through enclosing environments until it finds the variable. If it never finds it, it will do the assignment in the global environment, and that's not yours to write in, so CRAN won't accept your package.
So the right way to do this is to create the variable outside any function as suggested in https://stackoverflow.com/a/12605694/2372064, or to create a variable outside the function but create the environment on loading, e.g.
envr <- NULL
.onLoad <- function(...) {
envr <<- new.env() # when package is loaded, create new environment to store needed variables
}

Related

exposing functions without cluttering ls()

I have a package that generates some functions when you call an initialize function. I create these functions in the parent.frame of initialize(), which I guess is the global environment. I want to emulate the normal package behavior that allows you to directly call a function from a package after loading it, but without having to see those functions when you list your workspace contents using ls(). For example, doing
library(ggplot2)
ls()
doesn't return geom_line, geom_point, etc., but you don't have to use :: to call those functions. They are exposed to the user but do not live in the global environment.
Is there a clever way for me to do the same thing for functions generated by the call to initialize, e.g. by defining environments or namespaces in zzz.r and the onLoad or onAttach hooks? I thought of trying to set the function environments to the package namespace, but it seems that you cannot modify the package namespace after it is loaded.
EDIT the package I'm working on is here: https://github.com/mkoohafkan/arcpyr. The arcpy.initialize function connects to Python using PythonInR, imports the arcpy package, and then creates interfaces for a list of functions. I'll try to create a simplified dummy package later today.
So I eventually found a solution that uses both environments (thanks #ssdecontrol!) and attach.
f = new.env() # create the environment f
assign("foo", "bar", pos = f) # create the variable foo inside f
ls() # lists f
ls(f) # lists foo
attach(f) # attach f to the current environment
foo # foo can now be accessed directly
## bar
ls() # but still only shows f
rm(f) # can even remove f
foo # and foo is still accessible
## bar
Of course, there are some risks to using attach.
I redid the arcpyr package to use environments instead, but you can get the old behavior back by doing
arcpy = arcpy_env()
attach(arcpy)

functions in .Rprofile are not found with are in .env

I have a .Rprofile I have copied from https://www.r-bloggers.com/fun-with-rprofile-and-customizing-r-startup/ However, when I load my R session the functions that are in env$ they don't work and the functions not in env works perfectly, here is an example:
sshhh <- function(a.package){
suppressWarnings(suppressPackageStartupMessages(
library(a.package, character.only=TRUE)))
}
auto.loads <-c("dplyr", "ggplot2")
if(interactive()){
invisible(sapply(auto.loads, sshhh))
}
.env <- new.env()
attach(.env)
.env$unrowname <- function(x) {
rownames(x) <- NULL
x
}
.env$unfactor <- function(df){
id <- sapply(df, is.factor)
df[id] <- lapply(df[id], as.character)
df
}
message("n*** Successfully loaded .Rprofile ***n")
Once R is loaded I can type sshhh and it shows the function, but if I type unfactor it shows object not found
Any help? Should I put all the functions on my workspace???
They functions created in a separate environment are intentionally hidden. This is to protect them from calls to rm(list=ls()).
From the original article:
[Lines 58-59]: This creates a new hidden namespace that we can store
some functions in. We need to do this in order for these functions to
survive a call to “rm(list=ls())” which will remove everything in the
current namespace. This is described wonderfully in this blog post [1].
To use the unfactor function, you would call .env$unfactor().
If you want to make those function available in the global namespace without having to refer to .env, you can simply leave out the whole .env part and just add the function the same way you did for the sshhh function.
[1] http://gettinggeneticsdone.blogspot.com.es/2013/07/customize-rprofile.html

saving and loading compressed R object

save(something, file="something.RData", compress="xz")
then when I load for reuse
load("something.RData")
print(something)
Error in print(something) : object 'something' not found
It is a random forest object.
Am I missing the unzip code?
This works at the console (where you have no parent environment), but not in a function because of the way load() uses environments (and will assign to the calling function).
Two simple alternatives:
Use saveRDS() and readRDS() for single objects.
Create an environment and use it as shown below.
Here is a short example of the second approach:
ne <- new.env()
load(somefile, ne) # now ls(ne) will show what was loaded
foo <- ne$something

How do I load objects to the current environment from a function in R?

Instead of doing
a <- loadBigObject("a")
b <- loadBigObject("b")
I'd like to call a function like
loadBigObjects(list("a","b"))
And be able to access the a and b objects.
It is not clear what loadBigObjects() does or where it will look for a and b. How does it load the objects from file or sourcing code?
There are lots of options in general:
sys.source() allows an R file to be sourced to a given environment
load() which will load an .Rdata file to a given environment
assign() in combination with any object created by loadBigObjects() or a call to readRDS() can also load an object to a given environment.
From within your function, you'll want to specify the environment in which to load objects as the Global Environment by using globalenv(). If you don't do that then the object will only exist in the evaluation frame of the running loadBigObjects(). E.g.
loadBigObjects <- function(list) {
lapply(list, function(x) assign(x, readRDS(x), envir = globalenv()))
}
(as per your comment to #GSee's Answer, and assuming the list("a","b") is sufficient information for readRDS() to locate and open the object.
Without knowing anything about what loadBigObject is or does, you can use lapply to apply a function to a list of objects
lapply(list("a", "b"), loadBigObject)
If you provided the code for loadBigObject or at least describe what it is supposed to do, a better loadBigObjects function could probably be written.
The assign function can be used to define a variable in an environment other than the current one.
loadBigObjects <- function(lst) {
lapply(lst, function(l) {
assign(l, loadBigObject(l), envir=globalenv())
}
lst
}
(Not that this is necessarily a good idea.)

R: creating an environment in the globalenv() from inside a function

Right now I have the lines:
envCache <- new.env( hash=TRUE, parent = .GlobalEnv )
print(parent.env(envCache))
R claims the environment is in the global environment, but when I try to find the environment later it's not there.
What I'm trying to do here is cache some dataframes in and environment under the global environment, so each time I call a function it does not have to hit the server to get the same data again. Ideally, I'll call the function once using a source command in the R console, it will grab the data necessary, save it to an environment in the global environment, and then when I call the same function from the R console it will see the environment and dataframe from which it will grab the data as opposed to re-querying the server.
When R looks for a symbol, it looks in the current environment, then the environment's parent, and so on. It has not assigned envCache into the global environment. One way to implement what you would like to do is to create a 'closure' that remembers state, along the lines of
makeCache <- function() {
cache <- new.env(parent=emptyenv())
list(get = function(key) cache[[key]],
set = function(key, value) cache[[key]] <- value,
## next two added in response to #sunpyg
load = function(rdaFile) load(rdaFile, cache),
ls = function() ls(cache))
}
invoking makeCache() returns a list of two functions, get and set.
a <- makeCache()
Each function has an environment in which it was defined (the environment created when you invoked makeCache()). When you invoke a$set("a", 1) the rules of variable look-up mean that R looks for a variable cache, first inside the function aCache$set, and when it doesn't find it there in the environment in which set was defined.
> a$get("foo")
NULL
> a$set("foo", 1)
> a$get("foo")
[1] 1
Cool, eh? Note that parent=emptyenv()) means that a get() on a non-existent keys stops looking in cache, otherwise it would have continued to look in the parent environment of cache, and so on.
There's a bank account example in the Introduction to R document that is really fun. In response to #sunpyg's comment, I've added a load and ls function to add data from an Rda file and to list the content of the cache, e.g., a$load("foo.Rda").
Here's what I came up with as an alternate solution. It may be doing the same thing as the other answer in the backround, but the code is more intuitive to me.
cacheTesting <- function()
{
if (exists("cache"))
{
print("IT WORKS!!!")
cacheData <- get("test", envir = cache)
print(cacheData)
}
else
{
assign("cache", new.env(hash = TRUE), envir = .GlobalEnv)
test <- 42
assign("test", test, envir = cache)
}
}
The first run of the code creates the environment in the .GlobalEnv using an assign statement. The second run sees that environment, because it actually made it to .GlobalEnv, and pulls the data placed there from it before printing it.

Resources