Modifying many objects in enclosing environment of a function - r

Usually in R a function first creates a new environment and does its stuff inside. I would like to have a function that defines/reinitialize a whole lot of things accessible to the parent environment of a function.
I know I can use the <<- operator for specific variables but here I have a lot of functions, variables, even environments that are defined and I would like to have the choice with a parameter in the function to use the parent environment or not.
Currently, I'm calling the function and then attaching it's environment if needed as following:
init <- function(){
things <- 0
ICI <<- environment()
success <- TRUE
return(success)
}
init();attach(ICI)
It works fine but is their a way to change the current environment of the function to be the parent environment so that I can define a parameter of the function switching on or off this behavior?

Actually attach can be called within the function and the attachment will not be destroyed when returning to the parent environment so the following allows to set back everything in the parent environment:
init <- function(transparent=FALSE){
# compute values
things <- 0
success <- TRUE
# follow "set back variables" argument
ICI <- environment()
if(transparent){
attach(ICI) # everything is transmitted to the parent environment
}else{
ICI <<- ICI # only transmit a handle for the environment
}
return(success)
}
init();# attach(ICI)

Related

rm() and detach() in function - not working

I have a function, which is supposed to do the following:
remove a given vector
unload a given package
re-load a given package
Here's an example:
removeReload <- function(old, new){
rm(old)
detach("package:anypackage")
library(anypackage)
new <- new
}
However, this function does not remove old from the workspace. I also tried this function as old <- NULL, but again to no avail.
Any ideas as to why this is the case, and how to get old to be removed?
Thanks!
rm comes with an envir argument to specify the environment to remove objects from. The default is the environment in which rm was called. Normally, if you use rm(blah), the calling environment is the environment you're working in, but if you put rm inside a function, the calling environment is the function environment. You can use rm(old, envir = .GlobalEnv)
Beware programming with this function - it may lead to unintended consequences if you put it inside yet another function.
Example:
> foo = function() {
+ rm(x, envir = .GlobalEnv)
+ }
> x = 1
> foo()
> x
There are more details in at the help page, ?rm, and that page links to ?environment for even more detail.
Similarly, the new <- new as the last line of your function is not doing assignment in the global environment. Normal practice would be to have your function return(new) and do the assignment as it is called, something like new <- removeUnload(old, new). But it's hard to make a "good practice" recommendation from the pseudocode you provide, since you pass in new as an input... it isn't clear whether your function arguments are objects or character strings of object names.

Force R function call to be self-sufficient

I'm looking for a way to call a function that is not influenced by other objects in .GlobalEnv.
Take a look at the two functions below:
y = 3
f1 = function(x) x+y
f2 = function(x) {
library(dplyr)
x %>%
mutate(area = Sepal.Length *Sepal.Width) %>%
head()
}
In this case:
f1(5) should fail, because y is not defined in the function scope
f2(iris) should pass, because the function does not reference variables outside its scope
Now, I can overwrite the environment of f1 and f2, either to baseenv() or new.env(parent=environment(2L)):
environment(f1) = baseenv()
environment(f2) = baseenv()
f1(3) # fails, as it should
f2(iris) # fails, because %>% is not in function env
or:
# detaching here makes `dplyr` inaccessible for `f2`
# not detaching leaves `head` inaccessible for `f2`
detach("package:dplyr", unload=TRUE)
environment(f1) = new.env(parent=as.environment(2L))
environment(f2) = new.env(parent=as.environment(2L))
f1(3) # fails, as it should
f2(iris) # fails, because %>% is not in function env
Is there a way to overwrite a function's environment so that it has to be self-sufficient, but it also always works as long as it loads its own libraries?
The problem here is, fundamentally, that library and similar tools don’t provide scoping, and are not designed to be made to work with scopes:1 Even though library is executed inside the function, its effect is actually global, not local. Ugh.
Specifically, your approach of isolating the function from the global environment is sounds; however, library manipulates the search path (via attach), and the function’s environment isn’t “notified” of this: it will still point to the previous second search path entry as its grandparent.
You need to find a way of updating the function environment’s grandparent environment when library/attach/… ist called. You could achieve this by replacing library etc. in the function’s parent environment with your own versions that calls a modified version of attach. This attach2 would then not only call the original attach but also relink your environment’s parent.
1 As an aside, ‘box’ fixes all of these problems. Replacing library(foo) with box::use(foo[...]) in your code makes it work. This is because modules are strongly scoped and environment-aware.

Why does "<<-" messes with the scope of a function in Shiny

I run into a very interesting problem. I wrote a function and wanted to check the output of some variables within the function as well as the return result.
observe({
result <- myFunction()
})
myFunction <- function() {
# some calculations
# ...
# create Dataframe from previous calculated variables
# I was interested in the result of problematicVariable
# thats why I wanted to make it global for checking, after
# closing down the shiny app
problematicVariable <<- data.frame(...)
if(someCondition) {
# ...
} else {
# some calculations
# ...
# now I used problematicVariable for the first time
foo <- data.frame(problematicVariable$bar, problematicVariable$foo)
}
That gave me
data.frame: arguments imply differing number of rows: ...
However, since I made problematicVariable global I run the line where the App crashed manually (foo <- data.frame(problematicVariable$bar, problematicVariable$foo)). There was absolutely no problem. So I thought, that this is strange...
I got rid of the double << and changed it to problematicVariable <- ... and now it works.
So, using <<- to assign problematicVariable somehow made problematicVariable not available in the if...else.
Why causes <<- a behaviour like this? That messes with the scope?!
<<- doesn't always create variables in the global environment. It will, however, create variables in the parent scope. Sometimes the parent scope is the same as the global environment.
?assign is what you want. But I don't see any reason to create global variables from inside a function. Just return the variable - code is easier to debug that way and you'll get fewer unexpected results.
EDIT: Suspected that this was a dupe. Good discussion about this can be found here.

R functions that execute functions

I'm trying to break out common lines of code used in a fairly large R script into encapsulated functions...however, they don't seem to be running the intended code when called. I feel like I'm missing some conceptual piece of how R works, or functional programming in general.
Examples:
Here's a piece of code I'd like to call to clear the workspace -
clearWorkSpace <- function() {
rm(list= ls(all=TRUE))
}
As noted, the code inside of the function executes as expected, however if the parent function is called, the environment is not cleared.
Again, here's a function intended to load all dependency files -
loadDependencies <- function() {
dep_files <- list.files(path="./dependencies")
for (file in dep_files) {
file_path <- paste0("./dependencies/",file)
source(file_path,local=TRUE)
}
}
If possible, it'd be great to be able to encapsulate code into easy to read functions. Thanks for your help in advance.
What you are calling workspace is more properly referred to as the global environment.
Functions execute in their own environments. This is, for example, why you don't see the variables defined inside a function in your global environment. Also how a function knows to use a variable named x defined in the function body rather than some x you might happen to have in your global environment.
Most functions don't modify the external environments, which is good! It's the functional programming paradigm. Functions that do modify environments, such as rm and source, usually take arguments so that you can be explicit about which environment is modified. If you look at ?rm you'll see an envir argument, and that argument is most of what its Details section describes. source has a local argument:
local - TRUE, FALSE or an environment, determining where the parsed expressions are evaluated. FALSE (the default) corresponds to the user's workspace (the global environment) and TRUE to the environment from which source is called.
You explicitly set local = TRUE when you call source, which explicitly tells source to only modify the local (inside the function) environment, so of course your global environment is untouched!
To make your functions work as I assume you want them to, you could modify clearWorkSpace like this:
clearWorkSpace <- function() {
rm(list= ls(all=TRUE, envir = .GlobalEnv), envir = .GlobalEnv)
}
And for loadDependencies simply delete the local = TRUE. (Or more explicitly set local = FALSE or local = .GlobalEnv) Though you could re-write it in a more R-like way:
loadDependencies = function() {
invisible(lapply(list.files(path = "./dependencies", full.names = TRUE), source))
}
For both of these (especially with the simplified dependency running above) I'd question whether you really need these wrapped up in functions. Might be better to just get in the habit of restarting R when you resume work on a project and keeping invisible(lapply(list.files(path = "./dependencies", full.names = TRUE), source)) at the top of your script...
For more reading on environments, there is The Evironments Section of Advanced R. Notably, there are several ways to specify environments that might be useful for different use cases rather than hard-coding the global environment.
In theory you need just to do something like:
rm(list= ls(all=TRUE, envir = .GlobalEnv))
I mean you set explicitly the environment ( even it is better here to use pos argument). but this will delete also the clearWorkSpace function since it is a defined in the global environment. So this will fails with a recursive call.
Personally I never use rm within a function or a local call. My understanding , rm is intended to be called from the console to clear the work space.

R: creating an environment in the globalenv() from inside a function

Right now I have the lines:
envCache <- new.env( hash=TRUE, parent = .GlobalEnv )
print(parent.env(envCache))
R claims the environment is in the global environment, but when I try to find the environment later it's not there.
What I'm trying to do here is cache some dataframes in and environment under the global environment, so each time I call a function it does not have to hit the server to get the same data again. Ideally, I'll call the function once using a source command in the R console, it will grab the data necessary, save it to an environment in the global environment, and then when I call the same function from the R console it will see the environment and dataframe from which it will grab the data as opposed to re-querying the server.
When R looks for a symbol, it looks in the current environment, then the environment's parent, and so on. It has not assigned envCache into the global environment. One way to implement what you would like to do is to create a 'closure' that remembers state, along the lines of
makeCache <- function() {
cache <- new.env(parent=emptyenv())
list(get = function(key) cache[[key]],
set = function(key, value) cache[[key]] <- value,
## next two added in response to #sunpyg
load = function(rdaFile) load(rdaFile, cache),
ls = function() ls(cache))
}
invoking makeCache() returns a list of two functions, get and set.
a <- makeCache()
Each function has an environment in which it was defined (the environment created when you invoked makeCache()). When you invoke a$set("a", 1) the rules of variable look-up mean that R looks for a variable cache, first inside the function aCache$set, and when it doesn't find it there in the environment in which set was defined.
> a$get("foo")
NULL
> a$set("foo", 1)
> a$get("foo")
[1] 1
Cool, eh? Note that parent=emptyenv()) means that a get() on a non-existent keys stops looking in cache, otherwise it would have continued to look in the parent environment of cache, and so on.
There's a bank account example in the Introduction to R document that is really fun. In response to #sunpyg's comment, I've added a load and ls function to add data from an Rda file and to list the content of the cache, e.g., a$load("foo.Rda").
Here's what I came up with as an alternate solution. It may be doing the same thing as the other answer in the backround, but the code is more intuitive to me.
cacheTesting <- function()
{
if (exists("cache"))
{
print("IT WORKS!!!")
cacheData <- get("test", envir = cache)
print(cacheData)
}
else
{
assign("cache", new.env(hash = TRUE), envir = .GlobalEnv)
test <- 42
assign("test", test, envir = cache)
}
}
The first run of the code creates the environment in the .GlobalEnv using an assign statement. The second run sees that environment, because it actually made it to .GlobalEnv, and pulls the data placed there from it before printing it.

Resources