Force R function call to be self-sufficient

Force R function call to be self-sufficient - r

I'm looking for a way to call a function that is not influenced by other objects in .GlobalEnv.
Take a look at the two functions below:
y = 3
f1 = function(x) x+y
f2 = function(x) {
library(dplyr)
x %>%
mutate(area = Sepal.Length *Sepal.Width) %>%
head()
}
In this case:
f1(5) should fail, because y is not defined in the function scope
f2(iris) should pass, because the function does not reference variables outside its scope
Now, I can overwrite the environment of f1 and f2, either to baseenv() or new.env(parent=environment(2L)):
environment(f1) = baseenv()
environment(f2) = baseenv()
f1(3) # fails, as it should
f2(iris) # fails, because %>% is not in function env
or:
# detaching here makes `dplyr` inaccessible for `f2`
# not detaching leaves `head` inaccessible for `f2`
detach("package:dplyr", unload=TRUE)
environment(f1) = new.env(parent=as.environment(2L))
environment(f2) = new.env(parent=as.environment(2L))
f1(3) # fails, as it should
f2(iris) # fails, because %>% is not in function env
Is there a way to overwrite a function's environment so that it has to be self-sufficient, but it also always works as long as it loads its own libraries?

The problem here is, fundamentally, that library and similar tools don’t provide scoping, and are not designed to be made to work with scopes:1 Even though library is executed inside the function, its effect is actually global, not local. Ugh.
Specifically, your approach of isolating the function from the global environment is sounds; however, library manipulates the search path (via attach), and the function’s environment isn’t “notified” of this: it will still point to the previous second search path entry as its grandparent.
You need to find a way of updating the function environment’s grandparent environment when library/attach/… ist called. You could achieve this by replacing library etc. in the function’s parent environment with your own versions that calls a modified version of attach. This attach2 would then not only call the original attach but also relink your environment’s parent.
1 As an aside, ‘box’ fixes all of these problems. Replacing library(foo) with box::use(foo[...]) in your code makes it work. This is because modules are strongly scoped and environment-aware.

Related

Strictly speaking does the scoping assignment <<- assign to the parent environment or global environment?

Often the parent environment is the global environment.
But occasionally it isn't. For example in functions within functions, or in an error function in tryCatch().
Strictly speaking, does <<- assign to the global environment, or simply to the parent environment?

Try it out:
env = new.env()
env2 = new.env(parent = env)
local(x <<- 42, env2)
ls(env)
# character(0)
ls()
# [1] "env" "env2" "x"
But:
env$x = 1
local(x <<- 2, env2)
env$x
# [1] 2
… so <<- does walk up the entire chain of parent environments until it finds an existing object of the given name, and replaces that. However, if it doesn’t find any such object, it creates a new object in .GlobalEnv.
(The documentation states much the same. But in a case such as this nothing beats experimenting to gain a better understanding.)

Per the documentation:
The operators <<- and ->> are normally only used in functions, and cause a search to be made through parent environments for an existing definition of the variable being assigned.
Use of this operator will cause R to search through the environment tree until it finds a match. The search starts at the environment in which the operator is used and moves up the stack from there. So it's not guaranteed to be a "global" assignment, but could be.
As sindri_baldur points out, if the variable is not found in any existing environment, a new one will be created at the global level.
Lastly, I should point out that use of the operator is confusing more often than it is helpful, as it breaks the otherwise highly functional nature of R programming. There's more than likely a way to avoid using <<-.

Keep user-defined functions in global environment, during removal of objects

Question: How can I control the deletion (and saving) of user-defined function?
What I have tried so far:
I've gotten a recommendation to ad a dot [.] in the beginning of every function, being told that the functions would not be deleted. When tested, the function are deleted despite of staring with dot.
Requirements:
All "non-function" should be handled by the [rm].
Due to automation, the procedure needs to be able to be triggered by R base from a terminal. It is not enough that solution works only in Rstudio.
Global environment to be used, due to keeping the solution standardized.
If possible, one should be able to define which function to keep/delete.
Expected outcome:
None of the functions in the example should be deleted.
Below you fin the example code:
# Create 3 object variables.
a <- 1
b <- 2
c <- 3
# Create 3 functions.
myFunction1 <- function() {}
myFunction2 <- function() {}
myFunction3 <- function() {}
# Remove all from global.env.
# Keep the ones specified below.
rm(list = ls()[! ls() %in% c(
"a",
"c"
)
]
)

You can use ls.str to specify a mode of object to find. With this you can exclude functions from the rm list.
rm(list=setdiff(ls(),ls.str(mode="function")))
ls()
[1] "myFunction1" "myFunction2" "myFunction3"
However, you might be better off formalising your functions in a package and then you would not need to worry about deleting them with rm.

I strongly recommend a different approach. Don’t partially remove objects, use proper scope instead. That is, don’t define objects in the global environment that don’t need to be defined there, define them inside functions or local scopes instead.
Going one step further, your functions.r file also shouldn’t define functions in the global environment. Instead, as suggested in a comment, it should define them inside a dedicated environment which you may attach, if convenient. This is in fact what R packages solve. If you feel that R packages are too heavy for your purpose, I suggest you write modules using my ‘box’ package: it cleanly implements file-based code modules.
If you use scoping as it was designed, there’s no need to call rm on temporary variables, and hence your problem won’t arise.
If you really want a clean slate, restart R and re-execute your script: this is the only way to consistently reset the state of the R session; all other ways are error-prone hacks because they only perform a partial cleanup.
A note on what you wrote:
When tested, the function are deleted despite of staring with dot.
They’re not — they’re just invisible; that’s what the leading dot does. However, this recommendation also strikes me as bad practice: it’s an unnecessary hack.

Easy. Don't use the global environment.
myenv <- new.env()
with(myenv,
{
# Create 3 object variables.
a <- 1
b <- 2
c <- 3
}
)
myenv$a
#[1] 1
# Create 3 functions.
myFunction1 <- function() {}
myFunction2 <- function() {}
myFunction3 <- function() {}
# Remove all from env.
# Keep the ones specified below.
rm(list = ls(envir = myenv)[! ls(envir = myenv) %in% c(
"a",
"c"
)
], envir = myenv
)
ls(envir = myenv)
#[1] "a" "c"

Set the environment of a function placed outside the .GlobalEnv

I want to attach functions from a custom environment to the global environment, while masking possible internal functions.
Specifically, say that f() uses an internal function g(), then:
f() should not be visible in .GlobalEnv with ls(all=TRUE).
f() should be usable from .GlobalEnv.
f() internal function g() should not be visible and not usable from .GlobalEnv.
First let us create environments and functions as follows:
assign('ep', value=new.env(parent=.BaseNamespaceEnv), envir=.BaseNamespaceEnv)
assign('e', value=new.env(parent=ep), envir=ep)
assign('g', value=function() print('hello'), envir=ep)
assign('f', value=function() g(), envir=ep$e)
ls(.GlobalEnv)
## character(0)
Should I run now:
ep$e$f()
## Error in ep$e$f() (from #1) : could not find function "g"
In fact, the calling environment of f is:
environment(get('f', envir=ep$e))
## <environment: R_GlobalEnv>
where g is not present.
Trying to change f's environment gives an error:
environment(get('f', envir=ep$e))=ep
## Error in environment(get("f", envir = ep$e)) = ep :
## target of assignment expands to non-language object
Apparently it works with:
environment(ep$e$f)=ep
attach(ep$e)
Now, as desired, only f() is usable from .GlobalEnv, g() is not.
f()
[1] "hello"
g()
## Error: could not find function "g" (intended behaviour)
Also, neither f() nor g() are visible from .GlobalEnv, but unfortunately:
ls(.GlobalEnv)
## [1] "ep"
Setting the environment associated with f() to ep, places ep in .GlobalEnv.
Cluttering the Global environment was exactly what I was trying to avoid.
Can I reset the parent environment of f without making it visible from the Global one?
UPDATE
From your feedback, you suggest to build a package to get proper namespace services.
The package is not flexible. My helper functions are stored in a project subdir, say hlp, and sourced like source("hlp/util1.R").
In this way scripts can be easily mixed and updated on the fly on a project basis.
(Added new enumerated list on top)
UPDATE 2
An almost complete solution, which does not require external packages, is now here.

Either packages or modules do exactly what you want. If you’re not happy with packages’ lack of flexibility, I suggest you give ‘box’ modules a shot: they elegantly solve your problem and allow you to treat arbitrary R source files as modules:
Just mark public functions inside the module with the comment #' #export, and load it via
box::use(./foo)
foo$f()
or
box::use(./foo[...])
f()
This fulfils all the points in your enumeration. In particular, both pieces of code make f, but not g, available to the caller. In addition, modules have numerous other advantages over using source.
On a more technical note, your code results in ep being inside the global environment because the assignment environment(ep$e$f)=ep creates a copy of ep inside your global environment. Once you’ve attached the environment, you can delete this object. However, the code still has issues (it’s more complex than necessary and, as Hong Ooi mentioned, you shouldn’t mess with the base namespace).

First, you shouldn't be messing around with the base namespace. Cluttering up the base because you don't want to clutter up the global environment is just silly.*
Second, you can use local() as a poor-man's namespacing:
e <- local({
g <- function() "hello"
f <- function() g()
environment()
})
e$f()
# [1] "hello"
* If what you have in mind is a method for storing package state, remember that (essentially) anything you put in the global environment will be placed in its own namespace when you package it up. So don't worry about cluttering things up.

R functions that execute functions

I'm trying to break out common lines of code used in a fairly large R script into encapsulated functions...however, they don't seem to be running the intended code when called. I feel like I'm missing some conceptual piece of how R works, or functional programming in general.
Examples:
Here's a piece of code I'd like to call to clear the workspace -
clearWorkSpace <- function() {
rm(list= ls(all=TRUE))
}
As noted, the code inside of the function executes as expected, however if the parent function is called, the environment is not cleared.
Again, here's a function intended to load all dependency files -
loadDependencies <- function() {
dep_files <- list.files(path="./dependencies")
for (file in dep_files) {
file_path <- paste0("./dependencies/",file)
source(file_path,local=TRUE)
}
}
If possible, it'd be great to be able to encapsulate code into easy to read functions. Thanks for your help in advance.

What you are calling workspace is more properly referred to as the global environment.
Functions execute in their own environments. This is, for example, why you don't see the variables defined inside a function in your global environment. Also how a function knows to use a variable named x defined in the function body rather than some x you might happen to have in your global environment.
Most functions don't modify the external environments, which is good! It's the functional programming paradigm. Functions that do modify environments, such as rm and source, usually take arguments so that you can be explicit about which environment is modified. If you look at ?rm you'll see an envir argument, and that argument is most of what its Details section describes. source has a local argument:
local - TRUE, FALSE or an environment, determining where the parsed expressions are evaluated. FALSE (the default) corresponds to the user's workspace (the global environment) and TRUE to the environment from which source is called.
You explicitly set local = TRUE when you call source, which explicitly tells source to only modify the local (inside the function) environment, so of course your global environment is untouched!
To make your functions work as I assume you want them to, you could modify clearWorkSpace like this:
clearWorkSpace <- function() {
rm(list= ls(all=TRUE, envir = .GlobalEnv), envir = .GlobalEnv)
}
And for loadDependencies simply delete the local = TRUE. (Or more explicitly set local = FALSE or local = .GlobalEnv) Though you could re-write it in a more R-like way:
loadDependencies = function() {
invisible(lapply(list.files(path = "./dependencies", full.names = TRUE), source))
}
For both of these (especially with the simplified dependency running above) I'd question whether you really need these wrapped up in functions. Might be better to just get in the habit of restarting R when you resume work on a project and keeping invisible(lapply(list.files(path = "./dependencies", full.names = TRUE), source)) at the top of your script...
For more reading on environments, there is The Evironments Section of Advanced R. Notably, there are several ways to specify environments that might be useful for different use cases rather than hard-coding the global environment.

In theory you need just to do something like:
rm(list= ls(all=TRUE, envir = .GlobalEnv))
I mean you set explicitly the environment ( even it is better here to use pos argument). but this will delete also the clearWorkSpace function since it is a defined in the global environment. So this will fails with a recursive call.
Personally I never use rm within a function or a local call. My understanding , rm is intended to be called from the console to clear the work space.

How to find unreferenced environments?

This is a followup to an answer here efficiently move environment from inside function to global environment , which pointed out that it's necessary to return a reference to an environment which was created inside a function if one wishes to work with the contents of that environment
Is it true that the newly created environment continues to exist if we don't return a reference, and if so how does one track down such an environment, either to access its contents or delete it?

Sure, if it was assigned to a symbol somewhere outside of the function's evaluation environment (as it was in the OP's example), an environment will continue to exist. In that sense, an environment is just like any other named R object. (The fact that unassigned environments can be kept in existence by closures does mean that environments sometimes persist where other types of object wouldn't, but that's not what's happening here.)
## OP's example function
funfun <- function(inc = 1){
dataEnv <- new.env()
dataEnv$d1 <- 1 + inc
dataEnv$d2 <- 2 + inc
dataEnv$d3 <- 2 + inc
assign('dataEnv', dataEnv, envir = globalenv()) ## Assignment to .GlobalEnv
}
funfun()
ls(env=.GlobalEnv)
# [1] "dataEnv" "funfun"
## It's easy to find environments assigned to a symbol in another environment,
## if you know which environment to look in.
Filter(isTRUE, eapply(.GlobalEnv, is.environment))
# $dataEnv
# [1] TRUE
In the OP's example, it's relatively easy to track down, because the environment was assigned to a symbol in .GlobalEnv. In general, though, (and again, just like any other R object) it will be difficult to track down if, for instance, it's assigned to an element in a list or some more complicated structure.
(This, incidentally, is why non-local assignment is usually discouraged in R and other more purely functional languages. When functions only return a value, and that value is only assigned to a symbol via explicit assignments (like v <- f()), the effects of executing code becomes a lot easier to reason about and predict. Fewer surprises makes for nicer code!)