rm() and detach() in function - not working - r

I have a function, which is supposed to do the following:
remove a given vector
unload a given package
re-load a given package
Here's an example:
removeReload <- function(old, new){
rm(old)
detach("package:anypackage")
library(anypackage)
new <- new
}
However, this function does not remove old from the workspace. I also tried this function as old <- NULL, but again to no avail.
Any ideas as to why this is the case, and how to get old to be removed?
Thanks!

rm comes with an envir argument to specify the environment to remove objects from. The default is the environment in which rm was called. Normally, if you use rm(blah), the calling environment is the environment you're working in, but if you put rm inside a function, the calling environment is the function environment. You can use rm(old, envir = .GlobalEnv)
Beware programming with this function - it may lead to unintended consequences if you put it inside yet another function.
Example:
> foo = function() {
+ rm(x, envir = .GlobalEnv)
+ }
> x = 1
> foo()
> x
There are more details in at the help page, ?rm, and that page links to ?environment for even more detail.
Similarly, the new <- new as the last line of your function is not doing assignment in the global environment. Normal practice would be to have your function return(new) and do the assignment as it is called, something like new <- removeUnload(old, new). But it's hard to make a "good practice" recommendation from the pseudocode you provide, since you pass in new as an input... it isn't clear whether your function arguments are objects or character strings of object names.

Related

Unit testing functions with global variables in R

Preamble: package structure
I have an R package that contains an R/globals.R file with the following content (simplified):
utils::globalVariables("COUNTS")
Then I have a function that simply uses this variable. For example, R/addx.R contains a function that adds a number to COUNTS
addx <- function(x) {
COUNTS + x
}
This is all fine when doing a devtools::check() on my package, there's no complaining about COUNTS being out of the scope of addx().
Problem: writing a unit test
However, say I also have a tests/testthtat/test-addx.R file with the following content:
test_that("addition works", expect_gte(fun(1), 1))
The content of the test doesn't really matter here, because when running devtools::test() I get an "object 'COUNTS' not found" error.
What am I missing? How can I correctly write this test (or setup my package).
What I've tried to solve the problem
Adding utils::globalVariables("COUNTS") to R/addx.R, either before, inside or after the function definition.
Adding utils::globalVariables("COUNTS") to tests/testthtat/test-addx.R in all places I could think of.
Manually initializing COUNTS (e.g., with COUNTS <- 0 or <<- 0) in all places of tests/testthtat/test-addx.R I could think of.
Reading some examples from other packages on GitHub that use a similar syntax (source).
I think you misunderstand what utils::globalVariables("COUNTS") does. It just declares that COUNTS is a global variable, so when the code analysis sees
addx <- function(x) {
COUNTS + x
}
it won't complain about the use of an undefined variable. However, it is up to you to actually create the variable, for example by an explicit
COUNTS <- 0
somewhere in your source. I think if you do that, you won't even need the utils::globalVariables("COUNTS") call, because the code analysis will see the global definition.
Where you would need it is when you're doing some nonstandard evaluation, so that it's not obvious where a variable comes from. Then you declare it as a global, and the code analysis won't worry about it. For example, you might get a warning about
subset(df, Col1 < 0)
because it appears to use a global variable named Col1, but of course that's fine, because the subset() function evaluates in a non-standard way, letting you include column names without writing df$Col.
#user2554330's answer is great for many things.
If I understand correctly, you have a COUNTS that needs to be updateable, so putting it in the package environment might be an issue.
One technique you can use is the use of local environments.
Two alternatives:
If it will always be referenced in one function, it might be easiest to change the function from
myfunc <- function(...) {
# do something
COUNTS <- COUNTS + 1
}
to
myfunc <- local({
COUNTS <- NA
function(...) {
# do something
COUNTS <<- COUNTS + 1
}
})
What this does is create a local environment "around" myfunc, so when it looks for COUNTS, it will be found immediately. Note that it reassigns using <<- instead of <-, since the latter would not update the different-environment-version of the variable.
You can actually access this COUNTS from another function in the package:
otherfunc <- function(...) {
COUNTScopy <- get("COUNTS", envir = environment(myfunc))
COUNTScopy <- COUNTScopy + 1
assign("COUNTS", COUNTScopy, envir = environment(myfunc))
}
(Feel free to name it COUNTS here as well, I used a different name to highlight that it doesn't matter.)
While the use of get and assign is a little inconvenient, it should only be required twice per function that needs to do this.
Note that the user can get to this if needed, but they'll need to use similar mechanisms. Perhaps that's a problem; in my packages where I need some form of persistence like this, I have used convenience getter/setter functions.
You can place an environment within your package, and then use it like a named list within your package functions:
E <- new.env(parent = emptyenv())
myfunc <- function(...) {
# do something
E$COUNTS <- E$COUNTS + 1
}
otherfunc <- function(...) {
E$COUNTS <- E$COUNTS + 1
}
We do not need the get/assign pair of functions, since E (a horrible name, chosen for its brevity) should be visible to all functions in your package. If you don't need the user to have access, then keep it unexported. If you want users to be able to access it, then exporting it via the normal package mechanisms should work.
Note that with both of these, if the user unloads and reloads the package, the COUNTS value will be lost/reset.
I'll list provide a third option, in case the user wants/needs direct access, or you don't want to do this type of value management within your package.
Make the user provide it at all times. For this, add an argument to every function that needs it, and have the user pass an environment. I recommend that because most arguments are passed by-value, but environments allow referential semantics (pass by-reference).
For instance, in your package:
myfunc <- function(..., countenv) {
stopifnot(is.environment(countenv))
# do something
countenv$COUNT <- countenv$COUNT + 1
}
otherfunc <- function(..., countenv) {
countenv$COUNT <- countenv$COUNT + 1
}
new_countenv <- function(init = 0) {
E <- new.env(parent = emptyenv())
E$COUNT <- init
E
}
where new_countenv is really just a convenience function.
The user would then use your package as:
mycount <- new_countenv()
myfunc(..., countenv = mycount)
otherfunc(..., countenv = mycount)

R functions that execute functions

I'm trying to break out common lines of code used in a fairly large R script into encapsulated functions...however, they don't seem to be running the intended code when called. I feel like I'm missing some conceptual piece of how R works, or functional programming in general.
Examples:
Here's a piece of code I'd like to call to clear the workspace -
clearWorkSpace <- function() {
rm(list= ls(all=TRUE))
}
As noted, the code inside of the function executes as expected, however if the parent function is called, the environment is not cleared.
Again, here's a function intended to load all dependency files -
loadDependencies <- function() {
dep_files <- list.files(path="./dependencies")
for (file in dep_files) {
file_path <- paste0("./dependencies/",file)
source(file_path,local=TRUE)
}
}
If possible, it'd be great to be able to encapsulate code into easy to read functions. Thanks for your help in advance.
What you are calling workspace is more properly referred to as the global environment.
Functions execute in their own environments. This is, for example, why you don't see the variables defined inside a function in your global environment. Also how a function knows to use a variable named x defined in the function body rather than some x you might happen to have in your global environment.
Most functions don't modify the external environments, which is good! It's the functional programming paradigm. Functions that do modify environments, such as rm and source, usually take arguments so that you can be explicit about which environment is modified. If you look at ?rm you'll see an envir argument, and that argument is most of what its Details section describes. source has a local argument:
local - TRUE, FALSE or an environment, determining where the parsed expressions are evaluated. FALSE (the default) corresponds to the user's workspace (the global environment) and TRUE to the environment from which source is called.
You explicitly set local = TRUE when you call source, which explicitly tells source to only modify the local (inside the function) environment, so of course your global environment is untouched!
To make your functions work as I assume you want them to, you could modify clearWorkSpace like this:
clearWorkSpace <- function() {
rm(list= ls(all=TRUE, envir = .GlobalEnv), envir = .GlobalEnv)
}
And for loadDependencies simply delete the local = TRUE. (Or more explicitly set local = FALSE or local = .GlobalEnv) Though you could re-write it in a more R-like way:
loadDependencies = function() {
invisible(lapply(list.files(path = "./dependencies", full.names = TRUE), source))
}
For both of these (especially with the simplified dependency running above) I'd question whether you really need these wrapped up in functions. Might be better to just get in the habit of restarting R when you resume work on a project and keeping invisible(lapply(list.files(path = "./dependencies", full.names = TRUE), source)) at the top of your script...
For more reading on environments, there is The Evironments Section of Advanced R. Notably, there are several ways to specify environments that might be useful for different use cases rather than hard-coding the global environment.
In theory you need just to do something like:
rm(list= ls(all=TRUE, envir = .GlobalEnv))
I mean you set explicitly the environment ( even it is better here to use pos argument). but this will delete also the clearWorkSpace function since it is a defined in the global environment. So this will fails with a recursive call.
Personally I never use rm within a function or a local call. My understanding , rm is intended to be called from the console to clear the work space.

R user-defined functions in new environment

I use some user-defined small functions as helpers. These functions are all stored in a R_HOME_USER/helperdirectory. Until now, these functions were sourced at R start up. The overall method is something like `lapply(my.helper.list,source). I want now these functions to be sourced but not to appear in my environment, as they pollute it.
A first and clean approach would be to build a package with all my helper.R. For now, I do not want to follow this method. A second approach would be to name these helpers with a leading dot. This annoys me to have to run R > .helper1().
Best way would be to define these helpers in a specific and accessible environment, but I am messing with the code. My idea is to create first a new environment:
.helperEnv <- new.env(parent = baseenv())
attach(.helperEnv, name = '.helperEnv')
Fine, R > search() returns 'helperEnv' in the list. Then I run :
assign('helper1', helper1, envir = .helperEnv)
rm(helper1)
Fine, ls(.helperEnv)returns 'helper1' and this function does not appear anymore in my environment.
The issue is I can't run helper1 (object not found). I guess I am not on the right track and would appreciate some hints.
I think you should assign the pos argument in your call to attach as a negative number:
.helperEnv <- new.env()
.helperEnv$myfunc<-function(x) x^3+1
attach(.helperEnv,name="helper",pos=-1)
ls()
#character(0)
myfunc
#function(x) x^3+1

How do I load objects to the current environment from a function in R?

Instead of doing
a <- loadBigObject("a")
b <- loadBigObject("b")
I'd like to call a function like
loadBigObjects(list("a","b"))
And be able to access the a and b objects.
It is not clear what loadBigObjects() does or where it will look for a and b. How does it load the objects from file or sourcing code?
There are lots of options in general:
sys.source() allows an R file to be sourced to a given environment
load() which will load an .Rdata file to a given environment
assign() in combination with any object created by loadBigObjects() or a call to readRDS() can also load an object to a given environment.
From within your function, you'll want to specify the environment in which to load objects as the Global Environment by using globalenv(). If you don't do that then the object will only exist in the evaluation frame of the running loadBigObjects(). E.g.
loadBigObjects <- function(list) {
lapply(list, function(x) assign(x, readRDS(x), envir = globalenv()))
}
(as per your comment to #GSee's Answer, and assuming the list("a","b") is sufficient information for readRDS() to locate and open the object.
Without knowing anything about what loadBigObject is or does, you can use lapply to apply a function to a list of objects
lapply(list("a", "b"), loadBigObject)
If you provided the code for loadBigObject or at least describe what it is supposed to do, a better loadBigObjects function could probably be written.
The assign function can be used to define a variable in an environment other than the current one.
loadBigObjects <- function(lst) {
lapply(lst, function(l) {
assign(l, loadBigObject(l), envir=globalenv())
}
lst
}
(Not that this is necessarily a good idea.)

R: creating an environment in the globalenv() from inside a function

Right now I have the lines:
envCache <- new.env( hash=TRUE, parent = .GlobalEnv )
print(parent.env(envCache))
R claims the environment is in the global environment, but when I try to find the environment later it's not there.
What I'm trying to do here is cache some dataframes in and environment under the global environment, so each time I call a function it does not have to hit the server to get the same data again. Ideally, I'll call the function once using a source command in the R console, it will grab the data necessary, save it to an environment in the global environment, and then when I call the same function from the R console it will see the environment and dataframe from which it will grab the data as opposed to re-querying the server.
When R looks for a symbol, it looks in the current environment, then the environment's parent, and so on. It has not assigned envCache into the global environment. One way to implement what you would like to do is to create a 'closure' that remembers state, along the lines of
makeCache <- function() {
cache <- new.env(parent=emptyenv())
list(get = function(key) cache[[key]],
set = function(key, value) cache[[key]] <- value,
## next two added in response to #sunpyg
load = function(rdaFile) load(rdaFile, cache),
ls = function() ls(cache))
}
invoking makeCache() returns a list of two functions, get and set.
a <- makeCache()
Each function has an environment in which it was defined (the environment created when you invoked makeCache()). When you invoke a$set("a", 1) the rules of variable look-up mean that R looks for a variable cache, first inside the function aCache$set, and when it doesn't find it there in the environment in which set was defined.
> a$get("foo")
NULL
> a$set("foo", 1)
> a$get("foo")
[1] 1
Cool, eh? Note that parent=emptyenv()) means that a get() on a non-existent keys stops looking in cache, otherwise it would have continued to look in the parent environment of cache, and so on.
There's a bank account example in the Introduction to R document that is really fun. In response to #sunpyg's comment, I've added a load and ls function to add data from an Rda file and to list the content of the cache, e.g., a$load("foo.Rda").
Here's what I came up with as an alternate solution. It may be doing the same thing as the other answer in the backround, but the code is more intuitive to me.
cacheTesting <- function()
{
if (exists("cache"))
{
print("IT WORKS!!!")
cacheData <- get("test", envir = cache)
print(cacheData)
}
else
{
assign("cache", new.env(hash = TRUE), envir = .GlobalEnv)
test <- 42
assign("test", test, envir = cache)
}
}
The first run of the code creates the environment in the .GlobalEnv using an assign statement. The second run sees that environment, because it actually made it to .GlobalEnv, and pulls the data placed there from it before printing it.

Resources