Keep user-defined functions in global environment, during removal of objects - r

Question: How can I control the deletion (and saving) of user-defined function?
What I have tried so far:
I've gotten a recommendation to ad a dot [.] in the beginning of every function, being told that the functions would not be deleted. When tested, the function are deleted despite of staring with dot.
Requirements:
All "non-function" should be handled by the [rm].
Due to automation, the procedure needs to be able to be triggered by R base from a terminal. It is not enough that solution works only in Rstudio.
Global environment to be used, due to keeping the solution standardized.
If possible, one should be able to define which function to keep/delete.
Expected outcome:
None of the functions in the example should be deleted.
Below you fin the example code:
# Create 3 object variables.
a <- 1
b <- 2
c <- 3
# Create 3 functions.
myFunction1 <- function() {}
myFunction2 <- function() {}
myFunction3 <- function() {}
# Remove all from global.env.
# Keep the ones specified below.
rm(list = ls()[! ls() %in% c(
"a",
"c"
)
]
)

You can use ls.str to specify a mode of object to find. With this you can exclude functions from the rm list.
rm(list=setdiff(ls(),ls.str(mode="function")))
ls()
[1] "myFunction1" "myFunction2" "myFunction3"
However, you might be better off formalising your functions in a package and then you would not need to worry about deleting them with rm.

I strongly recommend a different approach. Don’t partially remove objects, use proper scope instead. That is, don’t define objects in the global environment that don’t need to be defined there, define them inside functions or local scopes instead.
Going one step further, your functions.r file also shouldn’t define functions in the global environment. Instead, as suggested in a comment, it should define them inside a dedicated environment which you may attach, if convenient. This is in fact what R packages solve. If you feel that R packages are too heavy for your purpose, I suggest you write modules using my ‘box’ package: it cleanly implements file-based code modules.
If you use scoping as it was designed, there’s no need to call rm on temporary variables, and hence your problem won’t arise.
If you really want a clean slate, restart R and re-execute your script: this is the only way to consistently reset the state of the R session; all other ways are error-prone hacks because they only perform a partial cleanup.
A note on what you wrote:
When tested, the function are deleted despite of staring with dot.
They’re not — they’re just invisible; that’s what the leading dot does. However, this recommendation also strikes me as bad practice: it’s an unnecessary hack.

Easy. Don't use the global environment.
myenv <- new.env()
with(myenv,
{
# Create 3 object variables.
a <- 1
b <- 2
c <- 3
}
)
myenv$a
#[1] 1
# Create 3 functions.
myFunction1 <- function() {}
myFunction2 <- function() {}
myFunction3 <- function() {}
# Remove all from env.
# Keep the ones specified below.
rm(list = ls(envir = myenv)[! ls(envir = myenv) %in% c(
"a",
"c"
)
], envir = myenv
)
ls(envir = myenv)
#[1] "a" "c"

Related

Unit testing functions with global variables in R

Preamble: package structure
I have an R package that contains an R/globals.R file with the following content (simplified):
utils::globalVariables("COUNTS")
Then I have a function that simply uses this variable. For example, R/addx.R contains a function that adds a number to COUNTS
addx <- function(x) {
COUNTS + x
}
This is all fine when doing a devtools::check() on my package, there's no complaining about COUNTS being out of the scope of addx().
Problem: writing a unit test
However, say I also have a tests/testthtat/test-addx.R file with the following content:
test_that("addition works", expect_gte(fun(1), 1))
The content of the test doesn't really matter here, because when running devtools::test() I get an "object 'COUNTS' not found" error.
What am I missing? How can I correctly write this test (or setup my package).
What I've tried to solve the problem
Adding utils::globalVariables("COUNTS") to R/addx.R, either before, inside or after the function definition.
Adding utils::globalVariables("COUNTS") to tests/testthtat/test-addx.R in all places I could think of.
Manually initializing COUNTS (e.g., with COUNTS <- 0 or <<- 0) in all places of tests/testthtat/test-addx.R I could think of.
Reading some examples from other packages on GitHub that use a similar syntax (source).
I think you misunderstand what utils::globalVariables("COUNTS") does. It just declares that COUNTS is a global variable, so when the code analysis sees
addx <- function(x) {
COUNTS + x
}
it won't complain about the use of an undefined variable. However, it is up to you to actually create the variable, for example by an explicit
COUNTS <- 0
somewhere in your source. I think if you do that, you won't even need the utils::globalVariables("COUNTS") call, because the code analysis will see the global definition.
Where you would need it is when you're doing some nonstandard evaluation, so that it's not obvious where a variable comes from. Then you declare it as a global, and the code analysis won't worry about it. For example, you might get a warning about
subset(df, Col1 < 0)
because it appears to use a global variable named Col1, but of course that's fine, because the subset() function evaluates in a non-standard way, letting you include column names without writing df$Col.
#user2554330's answer is great for many things.
If I understand correctly, you have a COUNTS that needs to be updateable, so putting it in the package environment might be an issue.
One technique you can use is the use of local environments.
Two alternatives:
If it will always be referenced in one function, it might be easiest to change the function from
myfunc <- function(...) {
# do something
COUNTS <- COUNTS + 1
}
to
myfunc <- local({
COUNTS <- NA
function(...) {
# do something
COUNTS <<- COUNTS + 1
}
})
What this does is create a local environment "around" myfunc, so when it looks for COUNTS, it will be found immediately. Note that it reassigns using <<- instead of <-, since the latter would not update the different-environment-version of the variable.
You can actually access this COUNTS from another function in the package:
otherfunc <- function(...) {
COUNTScopy <- get("COUNTS", envir = environment(myfunc))
COUNTScopy <- COUNTScopy + 1
assign("COUNTS", COUNTScopy, envir = environment(myfunc))
}
(Feel free to name it COUNTS here as well, I used a different name to highlight that it doesn't matter.)
While the use of get and assign is a little inconvenient, it should only be required twice per function that needs to do this.
Note that the user can get to this if needed, but they'll need to use similar mechanisms. Perhaps that's a problem; in my packages where I need some form of persistence like this, I have used convenience getter/setter functions.
You can place an environment within your package, and then use it like a named list within your package functions:
E <- new.env(parent = emptyenv())
myfunc <- function(...) {
# do something
E$COUNTS <- E$COUNTS + 1
}
otherfunc <- function(...) {
E$COUNTS <- E$COUNTS + 1
}
We do not need the get/assign pair of functions, since E (a horrible name, chosen for its brevity) should be visible to all functions in your package. If you don't need the user to have access, then keep it unexported. If you want users to be able to access it, then exporting it via the normal package mechanisms should work.
Note that with both of these, if the user unloads and reloads the package, the COUNTS value will be lost/reset.
I'll list provide a third option, in case the user wants/needs direct access, or you don't want to do this type of value management within your package.
Make the user provide it at all times. For this, add an argument to every function that needs it, and have the user pass an environment. I recommend that because most arguments are passed by-value, but environments allow referential semantics (pass by-reference).
For instance, in your package:
myfunc <- function(..., countenv) {
stopifnot(is.environment(countenv))
# do something
countenv$COUNT <- countenv$COUNT + 1
}
otherfunc <- function(..., countenv) {
countenv$COUNT <- countenv$COUNT + 1
}
new_countenv <- function(init = 0) {
E <- new.env(parent = emptyenv())
E$COUNT <- init
E
}
where new_countenv is really just a convenience function.
The user would then use your package as:
mycount <- new_countenv()
myfunc(..., countenv = mycount)
otherfunc(..., countenv = mycount)

Function hiding in R

Consider the following file.r:
foo = function(){}
bar = function(){}
useful = function() {foo(); bar()}
foo and bar are meant only for internal use by useful - they are not reusable at all, because they require very specific data layout, have embedded constants, do something obscure that no one is going to need etc.
I don't want to define them inside useful{}, because then it will become too long (>10 LOC).
A client could do the following to import only useful in their namespace, and still I am not sure if that will work with foo and bar outside visibility:
# Source a single function from a source file.
# Example use
# max.a.posteriori <- source1( "file.r","useful" )
source1 <- function( path, fun )
{
source( path, local=TRUE )
get( fun )
}
How can I properly do this on the file.r side i.e. export only specific functions?
Furthermore, there is the problem of ordering of functions, which I feel is related to the above. Let us have
douglas = function() { adams() }
adams = function() { douglas() }
How do I handle circular dependencies?
You can achieve this by setting the binding environment of your useful function, as in the code listed below. This is similar to what packages do and if your project gets bigger, I would really recommend creating a package using the great devtools package.
If the functions foo and bar are not used by other functions I would just define them inside useful. Since the functions are quite independent pieces of code it does not make the code more complicated to understand, even if the line count of useful increases. (Except of course if you are forced by some guideline to keep the line count short.)
For more on environments see: http://adv-r.had.co.nz/Environments.html
# define new environment
myenv <- new.env()
# define functions in this environment
myenv$foo <- function(){}
myenv$bar <- function(){}
# define useful in global environment
useful <- function(){
foo()
bar()
}
# useful does not find the called functions so far
useful()
# neither can they be found in the globalenv
foo()
# but of course in myenv
myenv$foo()
# set the binding environment of useful to myenv
environment(useful) <- myenv
# everything works now
useful()
foo()
My recommendation is to use packages. They were created for such situations. But still you cannot hide the functions itself in pure R.
In order to encapsulate foo and bar you need to implement a class. The easiest way, in my opinion, to do that in R is through R6classes: https://cran.r-project.org/web/packages/R6/vignettes/Introduction.html#private-members. There you have the example on how to hide the length function.

Determining current environment, or maybe I'm going about this all wrong?

I'm attempting to use environments to keep specialized constants out of the global namespace and for potentially masking each other. This is resulting in a slew of warnings along the lines of The following object(s) are masked from ....
I have:
foo <- new.env()
with(foo, {
# Define variables pertaining to foo.
)}
bar <- new.env()
with(bar, {
# Define variables pertaining to bar.
)}
Now it gets interesting. I have various functions that need to access the items in foo and bar. For example:
fooFunc1 <- function (args) {
attach(foo)
on.exit(detach(foo))
## Do foo things.
fooFunc2()
}
Now, fooFunc2 is defined similarly with an attach() statement at the top. This results in an warning that everything defined in foo has been masked. Which makes sense, because we're already in foo. The answer would appear to be having each function would check if it's already in the correct environment and only attach() if not. But I'm not seeing a way to name an environment to work with environmentName().
So how do people actually effect encapsulation and hiding in R? Having to type foo$fooVar1, foo$fooVar2, etc. seems absurd. Same with wrapping every statement in with(). What am I missing?
You could use some thing like:
if (!"foo" %in% search()) {attach(foo); on.exit(detach(foo))}
Or alternatively, use local:
fooFunc1 <- local(function(args) {
##Do foo things
fooFunc2()
}, env=foo)
I would use with again. For example:
foo <- new.env()
with(foo,{x=1;y=2})
fooFunc1 <- function(){
xx <- with(foo,{
x^2+1/2
})
}
You could just turn off the conflict warnings with attach(foo, warn.conflicts=FALSE). Alternatively, if you want to keep redundancies out of your searchpath, you could do something like this instead:
try(detach(foo), silent=TRUE)
attach(foo)
on.exit(try(detach(foo), silent=TRUE))
I think the best way, though, is to define the functions with the environment you want to run them in.
f <- function(...) {print(...)}
environment(f) <- foo
or equivalently,
f <- local({
function(...) {print(...)}
}, env=foo)
Functions in R are all closures, meaning they're all bundled with a reference to the environment that they are supposed to run in. By default each function's environment is the environment in which it is created, but you can change this using the environment or local functions to any environment you want.

how to add functions in an existing environment

Is it possible to use env() as a substitute for namespaces, and how do you check if an environment exists already before adding functions to it?
This is related to this question, and Brendan's suggestion
How to organize large R programs?
I understand Dirk's point in that question, however for development it is sometimes impractical to put functions in packages.
EDIT: The idea is to mimic namespaces across files, and hence to be able to load different files independently. If a file has been previously loaded then the environment doesn't need to be created, just added to.
Thanks for ideas
EDIT: So presumably this code below would be the equivalent of namespaces in other languages:-
# how to use environment as namespaces
# file 1
# equivalent of 'namespace e' if (!(exists("e") && is.environment(e))) { e <- new.env(parent=baseenv()) }
e$f1 <- function(x) {1}
# file 2
# equivalent of 'namespace e' if (!(exists("e") && is.environment(e))) { e <- new.env(parent=baseenv()) }
e$f2 <- function(x) {2}
Yes you can for the most part. Each function has an environment and that's where it looks for other functions and global variables. By using your own environment you have full control over that.
Typically functions are also assigned to an environment (by assigning them to a name), and typically those two environments are the same - but not always. In a package, the namespace environment is used for both, but then the (different) package environment on the search path also has the same (exported) functions defined. So there the environments are different.
# this will ensure only stats and packages later on the search list are searched for
# functions from your code (similar to import in a package)
e <- new.env(parent=as.environment("package:stats"))
# simple alternative if you want access to everything
# e <- new.env(parent=globalenv())
# Make all functions in "myfile.R" have e as environment
source("myfile.R", local=e)
# Or change an existing function to have a new environment:
e$myfunc <- function(x) sin(x)
environment(e$myfunc) <- e
# Alternative one-liner:
e$myfunc <- local(function(x) sin(x), e)
# Attach it if you want to be able to call them as usual.
# Note that this creates a new environment "myenv".
attach(e, name="myenv")
# remove all temp objects
rm(list=ls())
# and try your new function:
myfunc(1:3)
# Detach when it's time to clean up or reattach an updated version...
detach("myfile")
In the example above, e corresponds to a namespace and the attached "myenv" corresponds to a package environment (like "package:stats" on the search path).
Namespaces are environments, so you can use exactly the same mechanism. Since R uses lexical scoping the parent of the environment defines what the function will see (i.e. how free variables are bound). And exactly like namespace you can attach environments and look them up.
So to create a new "manual namespace" you can use something like
e <- new.env(parent=baseenv())
# use local(), sys.source(), source() or e$foo <- assignment to populate it, e.g.
local({
f <- function() { ... }
#...
}, e)
attach(e, name = "mySuperNamespace")
Now it is loaded and attached just like a namespace - so you can use f just like it was in a namespace. Namespaces use one more parent environment to resolve imports - you can do that too if you care. If you need to check for your cool environment, just check the search path, e.g "mySuperNamespace" %in% search(). If you need the actual environment, use as.environment("mySuperNamespace")
You can check that environments exist in the same way that you would any other variable.
e <- new.env()
exists("e") && is.environment(e)

hiding personal functions in R

I have a few convenience functions in my .Rprofile, such as this handy function for returning the size of objects in memory. Sometimes I like to clean out my workspace without restarting and I do this with rm(list=ls()) which deletes all my user created objects AND my custom functions. I'd really like to not blow up my custom functions.
One way around this seems to be creating a package with my custom functions so that my functions end up in their own namespace. That's not particularly hard, but is there an easier way to ensure custom functions don't get killed by rm()?
Combine attach and sys.source to source into an environment and attach that environment. Here I have two functions in file my_fun.R:
foo <- function(x) {
mean(x)
}
bar <- function(x) {
sd(x)
}
Before I load these functions, they are obviously not found:
> foo(1:10)
Error: could not find function "foo"
> bar(1:10)
Error: could not find function "bar"
Create an environment and source the file into it:
> myEnv <- new.env()
> sys.source("my_fun.R", envir = myEnv)
Still not visible as we haven't attached anything
> foo(1:10)
Error: could not find function "foo"
> bar(1:10)
Error: could not find function "bar"
and when we do so, they are visible, and because we have attached a copy of the environment to the search path the functions survive being rm()-ed:
> attach(myEnv)
> foo(1:10)
[1] 5.5
> bar(1:10)
[1] 3.027650
> rm(list = ls())
> foo(1:10)
[1] 5.5
I still think you would be better off with your own personal package, but the above might suffice in the meantime. Just remember the copy on the search path is just that, a copy. If the functions are fairly stable and you're not editing them then the above might be useful but it is probably more hassle than it is worth if you are developing the functions and modifying them.
A second option is to just name them all .foo rather than foo as ls() will not return objects named like that unless argument all = TRUE is set:
> .foo <- function(x) mean(x)
> ls()
character(0)
> ls(all = TRUE)
[1] ".foo" ".Random.seed"
Here are two ways:
1) Have each of your function names start with a dot., e.g. .f instead of f. ls will not list such functions unless you use ls(all.names = TRUE) therefore they won't be passed to your rm command.
or,
2) Put this in your .Rprofile
attach(list(
f = function(x) x,
g = function(x) x*x
), name = "MyFunctions")
The functions will appear as a component named "MyFunctions" on your search list rather than in your workspace and they will be accessible almost the same as if they were in your workspace. search() will display your search list and ls("MyFunctions") will list the names of the functions you attached. Since they are not in your workspace the rm command you normally use won't remove them. If you do wish to remove them use detach("MyFunctions") .
Gavin's answer is wonderful, and I just upvoted it. Merely for completeness, let me toss in another one:
R> q("no")
followed by
M-x R
to create a new session---which re-reads the .Rprofile. Easy, fast, and cheap.
Other than that, private packages are the way in my book.
Another alternative: keep the functions in a separate file which is sourced within .RProfile. You can re-source the contents directly from within R at your leisure.
I find that often my R environment gets cluttered with various objects when I'm creating or debugging a function. I wanted a way to efficiently keep the environment free of these objects while retaining personal functions.
The simple function below was my solution. It does 2 things:
1) deletes all non-function objects that do not begin with a capital letter and then
2) saves the environment as an RData file
(requires the R.oo package)
cleanup=function(filename="C:/mymainR.RData"){
library(R.oo)
# create a dataframe listing all personal objects
everything=ll(envir=1)
#get the objects that are not functions
nonfunction=as.vector(everything[everything$data.class!="function",1])
#nonfunction objects that do not begin with a capital letter should be deleted
trash=nonfunction[grep('[[:lower:]]{1}',nonfunction)]
remove(list=trash,pos=1)
#save the R environment
save.image(filename)
print(paste("New, CLEAN R environment saved in",filename))
}
In order to use this function 3 rules must always be kept:
1) Keep all data external to R.
2) Use names that begin with a capital letter for non-function objects that I want to keep permanently available.
3) Obsolete functions must be removed manually with rm.
Obviously this isn't a general solution for everyone...and potentially disastrous if you don't live by rules #1 and #2. But it does have numerous advantages: a) fear of my data getting nuked by cleanup() keeps me disciplined about using R exclusively as a processor and not a database, b) my main R environment is so small I can backup as an email attachment, c) new functions are automatically saved (I don't have to manually manage a list of personal functions) and d) all modifications to preexisting functions are retained. Of course the best advantage is the most obvious one...I don't have to spend time doing ls() and reviewing objects to decide whether they should be rm'd.
Even if you don't care for the specifics of my system, the "ll" function in R.oo is very useful for this kind of thing. It can be used to implement just about any set of clean up rules that fit your personal programming style.
Patrick Mohr
A nth, quick and dirty option, would be to use lsf.str() when using rm(), to get all the functions in the current workspace. ...and let you name the functions as you wish.
pattern <- paste0('*',lsf.str(), '$', collapse = "|")
rm(list = ls()[-grep(pattern, ls())])
I agree, it may not be the best practice, but it gets the job done! (and I have to selectively clean after myself anyway...)
Similar to Gavin's answer, the following loads a file of functions but without leaving an extra environment object around:
if('my_namespace' %in% search()) detach('my_namespace'); source('my_functions.R', attach(NULL, name='my_namespace'))
This removes the old version of the namespace if it was attached (useful for development), then attaches an empty new environment called my_namespace and sources my_functions.R into it. If you don't remove the old version you will build up multiple attached environments of the same name.
Should you wish to see which functions have been loaded, look at the output for
ls('my_namespace')
To unload, use
detach('my_namespace')
These attached functions, like a package, will not be deleted by rm(list=ls()).

Resources