Calling an R function in a different environment - r

I fell like it should be fairly straightforward to do this, but I can't for the life of me find a solution... I want to evaluate an R function in an environment different from the one where it is.
What I'd like:
# A simple function
f <- function() {
x + 1
}
# Create an env and assign x <- 3
env <- new.env()
assign("x", 3, envir = env)
# Call f on env
call_on_env(f, env)
#> 4
The closest I got to "call_on_env()" was:
# Quote call and evaluate
quo <- quote(f())
eval(quo, envir = env)
Unfortunately the code above returns an error: Error in f() : object 'x' not found. So then... Is there a way for me to evaluate f() on env?
Edit: I'm able to send f() to env and then call it, but this leaves f() permanently there. For context [see below], I want to call the function in parallel with some pre-loaded packages.
Context: I'm calling a function in parallel with parallel::clusterMap() and I'd like for the packages loaded in my global environment to also be loaded on the clusters. As far as I can tell, parallel::clusterExport() can only export a list of variables, so it doesn't work for me...

Move f into env
environment(f) <- env
f()
# [1] 4
Note: Evaluation of objects across different environments is not desirable, as you have encountered here. It's best to keep all objects that you plan to interact with one another in the same environment.
If you don't want to change the environment of f, you could put all the above into a new function.
fx <- function(f, env) {
environment(f) <- env
f()
}
fx(f, env)
# [1] 4

The source() function might help:
source('scriptfilename.R')
If the file is located in another path then use:
source('YOURPATH/scriptfilename.R')
When you run source() it will pull all of the functions into your current Environment. You can then reference any of the functions contained in the R script where it sits.
However I wouldn't recommend referencing functions/scripts outside of your R project folder structure, since the links will break if you share your R project folder with others.

Related

Separate scripts from .GlobalEnv: Source script that source scripts

This question is similar to Source script to separate environment in R, not the global environment, but with a key twist.
Consider a script that sources another script:
# main.R
source("funs.R")
x <- 1
# funs.R
hello <- function() {message("Hi")}
I want to source the script main.R and keep everything in a "local" environment, say env <- new.env(). Normally, one could call source("main.R", local = env) and expect everything to be in the env environment. However, that's not the case here: x is part of env, but the function hello is not! It is in .GlobalEnv.
Question: How can I source a script to a separate environment in R, even if that script itself sources other scripts, and without modifying the other scripts being sourced?
Thanks for helping, and let me know if I can clarify anything.
EDIT 1: Updated question to be explicit that scripts being source cannot be modified (assume they are not under your control).
You can use trace to inject code in functions,
so you could force all source calls to set local = TRUE.
Here I just override it if local is FALSE in case any nested calls to source actually set it to other environments due to special logic of their own.
env <- new.env()
# use !isTRUE if you want to support older R versions (<3.5.0)
tracer <- quote(
if (isFALSE(local)) {
local <- TRUE
}
)
trace(source, tracer, print = FALSE, where = .GlobalEnv)
# if you're doing this inside a function, uncomment the next line
#on.exit(untrace(source, where = .GlobalEnv))
source("main.R", local = env)
As mentioned in the code,
if you wrap this logic in a function,
consider using on.exit to make sure you untrace even if there are errors.
EDIT: as mentioned in the comments,
this could have issues if some of the scripts you will be loading assume there is 1 (global) environment where everything ends.
I suppose you could change the tracer to something like
tracer <- quote(
if (missing(local)) {
local <- TRUE
}
)
or maybe
tracer <- quote(
if (isFALSE(local)) {
# fetch the specific environment you created
local <- get("env", .GlobalEnv)
}
)
The former assumes that if the script didn't specify local at all,
it doesn't care about which environment ends up holding everything.
The latter assumes that source calls that didn't specify local or set it to FALSE want everything to end up in 1 environment,
and modify the logic to use your environment instead of the global one.
Disclaimer: Very ugly and potentially dangerous, but whatever.
Redefine source:
env<-new.env()
source<-function(...) base::source(..., local = env)
source("main.R")
#just remove your redefinition when you don't need it
rm(source)
The best way to protect yourself from side effects of code you cannot control is isolation. You can use callr to easily execute the scripts isolated in a separate R session:
using environments:
env <- new.env()
env <- as.environment(callr::r(function(env) {
list2env(env, .GlobalEnv)
source("main.R")
as.list(.GlobalEnv)
}, args = list(as.list(env))))
env
#> <environment: 0x0000000018124878>
env$hello()
#> Hi
simpler version sticking to lists:
params <- list()
results <- callr::r(function(params) {
list2env(params, .GlobalEnv)
source("main.R")
as.list(.GlobalEnv)
}, args = list(params))
results
#> $x
#> [1] 1
#>
#> $hello
#> function ()
#> {
#> message("Hi")
#> }
results$hello()
#> Hi
The param part is only needed if you actually need to provide input the scripts (not used for you example).
Obviously, this will not work for open connections and similar stuff. In that case, you might want to look into callr::r_session.

R Script as a Function

I have a long script that involves data manipulation and estimation. I have it setup to use a set of parameters, though I would like to be able to run this script multiple times with different sets of inputs kind of like a function.
Running the script produces plots and saves estimates to a csv, I am not particularly concerned with the objects it creates.
I would rather not wrap the script in a function as it is meant to be used interactively.
How do people go about doing something like this?
I found this for command line arguments : How to pass command-line arguments when source() an R file but still doesn't solve the interactive problem
I have dealt with something similar before. Below is the solution I came up with.
I basically use list2env to push variables to either the global or function's local environment
and I then source the function in the designated environment.
This can be quite useful especially when coupled with exists as shown in the example below which would allow you to keep your script stand-alone.
These two questions may also be of help:
Source-ing an .R script within a function and passing a variable through (RODBC)
How to pass command-line arguments when source() an R file
# Function ----------------------------------------------------------------
subroutine <- function(file, param = list(), local = TRUE, ...) {
list2env(param, envir = if (local) environment() else globalenv())
source(file, local = local, ...)
}
# Example -----------------------------------------------------------------
# Create an example script
tmp <- "test_subroutine.R"
cat("if (!exists('msg')) msg <- 'no argument provided'; print(msg)", file = tmp)
# Example of using exists in the script to keep it stand-alone
subroutine(tmp)
# Evaluate in functions environment
subroutine(tmp, list(msg = "use function's environment"), local = TRUE)
exists("msg", envir = globalenv()) # FALSE
# Evaluate in global environment
subroutine(tmp, list(msg = "use global environment"), local = FALSE)
exists("msg", envir = globalenv()) # TRUE
unlink(tmp)
Just to clarify what was alluded to in Hansi's comment, here is one approach to this issue:
Wrap the script into a function, since this will let you go up one level of abstraction if needed, and will also make it easier to call the function whenever it is needed in any other script.
In cases where you want to use the script interactively, you can put a browser() call somewhere in your script. At the point where browser() is called, the function will pause and keep the environment as-is within the function, and you can then step through the function and use R interactively from within the function.
In the base package, check ?commandArgs, you can use this to parse out arguments from the command line.
If I have a script, test.R, containing the code:
args <- commandArgs(trailingOnly=TRUE)
for (arg in args){
print(arg)
}
and I call it from the command line with rscript as follows:
rscript test.R arg1 arg2 arg3
The output is:
[1] "arg1"
[1] "arg2"
[1] "arg3"

Set the environment of a function placed outside the .GlobalEnv

I want to attach functions from a custom environment to the global environment, while masking possible internal functions.
Specifically, say that f() uses an internal function g(), then:
f() should not be visible in .GlobalEnv with ls(all=TRUE).
f() should be usable from .GlobalEnv.
f() internal function g() should not be visible and not usable from .GlobalEnv.
First let us create environments and functions as follows:
assign('ep', value=new.env(parent=.BaseNamespaceEnv), envir=.BaseNamespaceEnv)
assign('e', value=new.env(parent=ep), envir=ep)
assign('g', value=function() print('hello'), envir=ep)
assign('f', value=function() g(), envir=ep$e)
ls(.GlobalEnv)
## character(0)
Should I run now:
ep$e$f()
## Error in ep$e$f() (from #1) : could not find function "g"
In fact, the calling environment of f is:
environment(get('f', envir=ep$e))
## <environment: R_GlobalEnv>
where g is not present.
Trying to change f's environment gives an error:
environment(get('f', envir=ep$e))=ep
## Error in environment(get("f", envir = ep$e)) = ep :
## target of assignment expands to non-language object
Apparently it works with:
environment(ep$e$f)=ep
attach(ep$e)
Now, as desired, only f() is usable from .GlobalEnv, g() is not.
f()
[1] "hello"
g()
## Error: could not find function "g" (intended behaviour)
Also, neither f() nor g() are visible from .GlobalEnv, but unfortunately:
ls(.GlobalEnv)
## [1] "ep"
Setting the environment associated with f() to ep, places ep in .GlobalEnv.
Cluttering the Global environment was exactly what I was trying to avoid.
Can I reset the parent environment of f without making it visible from the Global one?
UPDATE
From your feedback, you suggest to build a package to get proper namespace services.
The package is not flexible. My helper functions are stored in a project subdir, say hlp, and sourced like source("hlp/util1.R").
In this way scripts can be easily mixed and updated on the fly on a project basis.
(Added new enumerated list on top)
UPDATE 2
An almost complete solution, which does not require external packages, is now here.
Either packages or modules do exactly what you want. If you’re not happy with packages’ lack of flexibility, I suggest you give ‘box’ modules a shot: they elegantly solve your problem and allow you to treat arbitrary R source files as modules:
Just mark public functions inside the module with the comment #' #export, and load it via
box::use(./foo)
foo$f()
or
box::use(./foo[...])
f()
This fulfils all the points in your enumeration. In particular, both pieces of code make f, but not g, available to the caller. In addition, modules have numerous other advantages over using source.
On a more technical note, your code results in ep being inside the global environment because the assignment environment(ep$e$f)=ep creates a copy of ep inside your global environment. Once you’ve attached the environment, you can delete this object. However, the code still has issues (it’s more complex than necessary and, as Hong Ooi mentioned, you shouldn’t mess with the base namespace).
First, you shouldn't be messing around with the base namespace. Cluttering up the base because you don't want to clutter up the global environment is just silly.*
Second, you can use local() as a poor-man's namespacing:
e <- local({
g <- function() "hello"
f <- function() g()
environment()
})
e$f()
# [1] "hello"
* If what you have in mind is a method for storing package state, remember that (essentially) anything you put in the global environment will be placed in its own namespace when you package it up. So don't worry about cluttering things up.

R user-defined functions in new environment

I use some user-defined small functions as helpers. These functions are all stored in a R_HOME_USER/helperdirectory. Until now, these functions were sourced at R start up. The overall method is something like `lapply(my.helper.list,source). I want now these functions to be sourced but not to appear in my environment, as they pollute it.
A first and clean approach would be to build a package with all my helper.R. For now, I do not want to follow this method. A second approach would be to name these helpers with a leading dot. This annoys me to have to run R > .helper1().
Best way would be to define these helpers in a specific and accessible environment, but I am messing with the code. My idea is to create first a new environment:
.helperEnv <- new.env(parent = baseenv())
attach(.helperEnv, name = '.helperEnv')
Fine, R > search() returns 'helperEnv' in the list. Then I run :
assign('helper1', helper1, envir = .helperEnv)
rm(helper1)
Fine, ls(.helperEnv)returns 'helper1' and this function does not appear anymore in my environment.
The issue is I can't run helper1 (object not found). I guess I am not on the right track and would appreciate some hints.
I think you should assign the pos argument in your call to attach as a negative number:
.helperEnv <- new.env()
.helperEnv$myfunc<-function(x) x^3+1
attach(.helperEnv,name="helper",pos=-1)
ls()
#character(0)
myfunc
#function(x) x^3+1

attaching NULL environment results in scoping error

I've run into a strange (to me) behaviour in R's lexical scoping that results from first attaching a NULL environment to the search path, as suggested in the help file for attach(), and then populating it using sys.source().
Here is a simplified and reproducible example of the issue. I have 3 functions (f1, f2, and f3) in three separate files I wish to attach into three separate environments (env.A, env.B, and env.C, respectively). Here is the setup function:
setup <- function() {
for (i in sprintf('env.%s',LETTERS[1:3])) if (i%in%search())
detach(i, unload=TRUE, force=TRUE, character.only=TRUE) # detach existing to avoid duplicates
env.A = attach(NULL, name='env.A')
env.B = attach(NULL, name='env.B')
env.C = attach(NULL, name='env.C')
sys.source('one.R', envir=env.A)
sys.source('two.R', envir=env.B)
sys.source('three.R', envir=env.C)
}
setup()
Once this function is called, 3 new environments are created with the functions f1, f2, and f3 contained within each environment. Each function lives in one of 3 separate files: "one.R", "two.R", and "three.R". The functions are trivial:
f1 <- function() {
print('this is my f1 function')
return('ok')
}
f2 <- function() {
f1()
f3()
print('this is my f2 function')
return('ok')
}
f3 <- function() {
print('this is my f3 function')
return('ok')
}
As you can see, functions f1 and f3 have no dependencies, but function f2 depends on both f1 and f2. Calling search() shows the following:
[1] ".GlobalEnv" "env.C" "env.B"
[4] "env.A" "package:stats" "package:graphics"
[7] "package:grDevices" "package:utils" "package:datasets"
[10] "package:methods" "Autoloads" "package:base"
The Issue:
Calling f2, gives the following:
> f2()
[1] "this is my f1 function"
Error in f2() : could not find function "f3"
Clearly f2 can "see" f1, but it cannot find f3. Permuting the order the attached environments leads me to conclude that the order of the search path is critical. Functions lower down in the search path are visible, whereas functions "upstream" of where the function is being call from are not found.
In this case, f2 (env.B) found f1 (env.A), but could not find f3 (env.C). This is contrary to how I understand R's scoping rules (at least I thought I understood it). My understanding is that R first checks the local environment, then the enclosing environment, then any additional enclosing environments, then works its way down the search, starting with ".GlobalEnv", until it finds the first matching appropriate (function/object) name. If it makes it all the way to the "R_empty_env" then returns the "could not find function" error. This obviously isn't happening in this simple example.
Question:
What is happening? Why doesn't R traverse the entire search path and find f3 sitting in env.C? I assume there is something going on behind the scenes when the attach call is made. Perhaps some attributes are set detailing dependencies? I have found a workaround that does not run into this issue, whereby I create and populate the environment prior to attaching it. Using pseudocode:
env.A <- new.env(); ... B ... C
sys.source('one.R', envir=env.A)
...
attach(env.A)
...
This workaround exhibits a behaviour consistent with my expectations, but I am puzzled by the difference: attach then populate vs. populate then attach.
Comments, explanations, thoughts greatly appreciated. Thanks.
The different between the two methods has to do with the parent environment of each of the newly created environments.
When R finds an object, it will then try to resolve all variable in that environment. If it cannot find them, it will then look next in the parent environment. It will continue to do so until it gets all the way to the empty environment. So if a function as the global environment as a parent environment, then every environment in the search path will be searched as you were expecting.
When you create an environment with
env.A <- new.env();
the default value for the parent= parameter is parent.frame() and as such when you call it it will set the value to the current environment() value. Observe
parent.env(env.A)
# <environment: R_GlobalEnv>
s a child of the global environment. However, when you do
env.A = attach(NULL, name='env.A')
parent.env(env.A)
# <environment: 0x1089c0ea0>
# attr(,"name")
# [1] "tools:RGUI"
You will see that it sets the parent to the environment in the search path that was last loaded (which happens to be "tools:RGUI" for me after a fresh R restart.) And continuing
env.B = attach(NULL, name='env.B')
parent.env(env.B)
#<environment: 0x108a2edf8>
#attr(,"name")
#[1] "env.A"
env.C = attach(NULL, name='env.C')
parent.env(env.C)
# <environment: 0x108a4f6e0>
# attr(,"name")
# [1] "env.B"
Notice how as we continue to add environments via attach(), they do not have a parent of GlobalEnv. This means that once we resolve a variable to env.B, it does not have a way to go "up the chain" to env.A. This is why it cannot find f3(). This is the same as doing
env.A <- new.env(parent=parent.env(globalenv()));
env.B <- new.env(parent=env.A);
env.C <- new.env(parent=env.B);
with explicit calls to new.env.
Note that if I switch the order of attaches to
env.C = attach(NULL, name='env.C')
env.B = attach(NULL, name='env.B')
env.A = attach(NULL, name='env.A')
and try to run f2(), this time it can't find f1(), again because it can only go one way up the chain.
So the two different ways to create environments differ in the way they assign the default parent environment. So perhaps the attach(NULL) method really isn't appropriate for you in this case.
I agree the answer seems to lie in the default parent assignment differing between attach() and new.env(). I find it a little strange that attach() would assign parentage to the environment second in the search list by default, but it is what it is, there is probably a valid reason behind it. The solution is simple enough:
env.A <- attach(NULL, name='env.A')
parent.env(env.A) <- .GlobalEnv
In the alternate solution using new.env(), there is a small caveat that you didn't run into because you were working directly in the .GlobalEnv, but in the OP, I was working within a temporary environment (the "setup" function). So the parent frame of the new.env() call is actually this setup environment. See below:
setup <- function() {
env.A <- new.env(); env.B <- new.env(); env.C <- new.env()
print(parent.env(environment()))
print(parent.frame())
print(environment())
print(parent.env(env.A))
print(parent.env(env.B))
print(parent.env(env.C))
}
setup()
#<environment: R_GlobalEnv>
#<environment: R_GlobalEnv>
#<environment: 0x2298368>
#<environment: 0x2298368>
#<environment: 0x2298368>
#<environment: 0x2298368>
When setup() is called from the command line, notice its parent is .GlobalEnv, as is the parent frame. However, the parent of environments A-C is the temporary setup environment (0x2298368). When setup() completes, its environment closes and is deleted and env.A-C become orphans. At this point (I assume) R re-assigns parentage to .GlobalEnv and this is why this alternative works.
I think a cleaner way would not to depend on the correct re-assignment to .GlobalEnv and to specify it directly: env.A <- new.env(parent=.GlobalEnv). This works fine in my test case ... we'll see what happens when I scale up to ~750 interdependent functions!
Thanks again for your clear answer, I'd up-vote it but I'm apparently too new to have that privilege.

Resources