Separate scripts from .GlobalEnv: Source script that source scripts - r

This question is similar to Source script to separate environment in R, not the global environment, but with a key twist.
Consider a script that sources another script:
# main.R
source("funs.R")
x <- 1
# funs.R
hello <- function() {message("Hi")}
I want to source the script main.R and keep everything in a "local" environment, say env <- new.env(). Normally, one could call source("main.R", local = env) and expect everything to be in the env environment. However, that's not the case here: x is part of env, but the function hello is not! It is in .GlobalEnv.
Question: How can I source a script to a separate environment in R, even if that script itself sources other scripts, and without modifying the other scripts being sourced?
Thanks for helping, and let me know if I can clarify anything.
EDIT 1: Updated question to be explicit that scripts being source cannot be modified (assume they are not under your control).

You can use trace to inject code in functions,
so you could force all source calls to set local = TRUE.
Here I just override it if local is FALSE in case any nested calls to source actually set it to other environments due to special logic of their own.
env <- new.env()
# use !isTRUE if you want to support older R versions (<3.5.0)
tracer <- quote(
if (isFALSE(local)) {
local <- TRUE
}
)
trace(source, tracer, print = FALSE, where = .GlobalEnv)
# if you're doing this inside a function, uncomment the next line
#on.exit(untrace(source, where = .GlobalEnv))
source("main.R", local = env)
As mentioned in the code,
if you wrap this logic in a function,
consider using on.exit to make sure you untrace even if there are errors.
EDIT: as mentioned in the comments,
this could have issues if some of the scripts you will be loading assume there is 1 (global) environment where everything ends.
I suppose you could change the tracer to something like
tracer <- quote(
if (missing(local)) {
local <- TRUE
}
)
or maybe
tracer <- quote(
if (isFALSE(local)) {
# fetch the specific environment you created
local <- get("env", .GlobalEnv)
}
)
The former assumes that if the script didn't specify local at all,
it doesn't care about which environment ends up holding everything.
The latter assumes that source calls that didn't specify local or set it to FALSE want everything to end up in 1 environment,
and modify the logic to use your environment instead of the global one.

Disclaimer: Very ugly and potentially dangerous, but whatever.
Redefine source:
env<-new.env()
source<-function(...) base::source(..., local = env)
source("main.R")
#just remove your redefinition when you don't need it
rm(source)

The best way to protect yourself from side effects of code you cannot control is isolation. You can use callr to easily execute the scripts isolated in a separate R session:
using environments:
env <- new.env()
env <- as.environment(callr::r(function(env) {
list2env(env, .GlobalEnv)
source("main.R")
as.list(.GlobalEnv)
}, args = list(as.list(env))))
env
#> <environment: 0x0000000018124878>
env$hello()
#> Hi
simpler version sticking to lists:
params <- list()
results <- callr::r(function(params) {
list2env(params, .GlobalEnv)
source("main.R")
as.list(.GlobalEnv)
}, args = list(params))
results
#> $x
#> [1] 1
#>
#> $hello
#> function ()
#> {
#> message("Hi")
#> }
results$hello()
#> Hi
The param part is only needed if you actually need to provide input the scripts (not used for you example).
Obviously, this will not work for open connections and similar stuff. In that case, you might want to look into callr::r_session.

Related

Calling an R function in a different environment

I fell like it should be fairly straightforward to do this, but I can't for the life of me find a solution... I want to evaluate an R function in an environment different from the one where it is.
What I'd like:
# A simple function
f <- function() {
x + 1
}
# Create an env and assign x <- 3
env <- new.env()
assign("x", 3, envir = env)
# Call f on env
call_on_env(f, env)
#> 4
The closest I got to "call_on_env()" was:
# Quote call and evaluate
quo <- quote(f())
eval(quo, envir = env)
Unfortunately the code above returns an error: Error in f() : object 'x' not found. So then... Is there a way for me to evaluate f() on env?
Edit: I'm able to send f() to env and then call it, but this leaves f() permanently there. For context [see below], I want to call the function in parallel with some pre-loaded packages.
Context: I'm calling a function in parallel with parallel::clusterMap() and I'd like for the packages loaded in my global environment to also be loaded on the clusters. As far as I can tell, parallel::clusterExport() can only export a list of variables, so it doesn't work for me...
Move f into env
environment(f) <- env
f()
# [1] 4
Note: Evaluation of objects across different environments is not desirable, as you have encountered here. It's best to keep all objects that you plan to interact with one another in the same environment.
If you don't want to change the environment of f, you could put all the above into a new function.
fx <- function(f, env) {
environment(f) <- env
f()
}
fx(f, env)
# [1] 4
The source() function might help:
source('scriptfilename.R')
If the file is located in another path then use:
source('YOURPATH/scriptfilename.R')
When you run source() it will pull all of the functions into your current Environment. You can then reference any of the functions contained in the R script where it sits.
However I wouldn't recommend referencing functions/scripts outside of your R project folder structure, since the links will break if you share your R project folder with others.

R Script as a Function

I have a long script that involves data manipulation and estimation. I have it setup to use a set of parameters, though I would like to be able to run this script multiple times with different sets of inputs kind of like a function.
Running the script produces plots and saves estimates to a csv, I am not particularly concerned with the objects it creates.
I would rather not wrap the script in a function as it is meant to be used interactively.
How do people go about doing something like this?
I found this for command line arguments : How to pass command-line arguments when source() an R file but still doesn't solve the interactive problem
I have dealt with something similar before. Below is the solution I came up with.
I basically use list2env to push variables to either the global or function's local environment
and I then source the function in the designated environment.
This can be quite useful especially when coupled with exists as shown in the example below which would allow you to keep your script stand-alone.
These two questions may also be of help:
Source-ing an .R script within a function and passing a variable through (RODBC)
How to pass command-line arguments when source() an R file
# Function ----------------------------------------------------------------
subroutine <- function(file, param = list(), local = TRUE, ...) {
list2env(param, envir = if (local) environment() else globalenv())
source(file, local = local, ...)
}
# Example -----------------------------------------------------------------
# Create an example script
tmp <- "test_subroutine.R"
cat("if (!exists('msg')) msg <- 'no argument provided'; print(msg)", file = tmp)
# Example of using exists in the script to keep it stand-alone
subroutine(tmp)
# Evaluate in functions environment
subroutine(tmp, list(msg = "use function's environment"), local = TRUE)
exists("msg", envir = globalenv()) # FALSE
# Evaluate in global environment
subroutine(tmp, list(msg = "use global environment"), local = FALSE)
exists("msg", envir = globalenv()) # TRUE
unlink(tmp)
Just to clarify what was alluded to in Hansi's comment, here is one approach to this issue:
Wrap the script into a function, since this will let you go up one level of abstraction if needed, and will also make it easier to call the function whenever it is needed in any other script.
In cases where you want to use the script interactively, you can put a browser() call somewhere in your script. At the point where browser() is called, the function will pause and keep the environment as-is within the function, and you can then step through the function and use R interactively from within the function.
In the base package, check ?commandArgs, you can use this to parse out arguments from the command line.
If I have a script, test.R, containing the code:
args <- commandArgs(trailingOnly=TRUE)
for (arg in args){
print(arg)
}
and I call it from the command line with rscript as follows:
rscript test.R arg1 arg2 arg3
The output is:
[1] "arg1"
[1] "arg2"
[1] "arg3"

R functions that execute functions

I'm trying to break out common lines of code used in a fairly large R script into encapsulated functions...however, they don't seem to be running the intended code when called. I feel like I'm missing some conceptual piece of how R works, or functional programming in general.
Examples:
Here's a piece of code I'd like to call to clear the workspace -
clearWorkSpace <- function() {
rm(list= ls(all=TRUE))
}
As noted, the code inside of the function executes as expected, however if the parent function is called, the environment is not cleared.
Again, here's a function intended to load all dependency files -
loadDependencies <- function() {
dep_files <- list.files(path="./dependencies")
for (file in dep_files) {
file_path <- paste0("./dependencies/",file)
source(file_path,local=TRUE)
}
}
If possible, it'd be great to be able to encapsulate code into easy to read functions. Thanks for your help in advance.
What you are calling workspace is more properly referred to as the global environment.
Functions execute in their own environments. This is, for example, why you don't see the variables defined inside a function in your global environment. Also how a function knows to use a variable named x defined in the function body rather than some x you might happen to have in your global environment.
Most functions don't modify the external environments, which is good! It's the functional programming paradigm. Functions that do modify environments, such as rm and source, usually take arguments so that you can be explicit about which environment is modified. If you look at ?rm you'll see an envir argument, and that argument is most of what its Details section describes. source has a local argument:
local - TRUE, FALSE or an environment, determining where the parsed expressions are evaluated. FALSE (the default) corresponds to the user's workspace (the global environment) and TRUE to the environment from which source is called.
You explicitly set local = TRUE when you call source, which explicitly tells source to only modify the local (inside the function) environment, so of course your global environment is untouched!
To make your functions work as I assume you want them to, you could modify clearWorkSpace like this:
clearWorkSpace <- function() {
rm(list= ls(all=TRUE, envir = .GlobalEnv), envir = .GlobalEnv)
}
And for loadDependencies simply delete the local = TRUE. (Or more explicitly set local = FALSE or local = .GlobalEnv) Though you could re-write it in a more R-like way:
loadDependencies = function() {
invisible(lapply(list.files(path = "./dependencies", full.names = TRUE), source))
}
For both of these (especially with the simplified dependency running above) I'd question whether you really need these wrapped up in functions. Might be better to just get in the habit of restarting R when you resume work on a project and keeping invisible(lapply(list.files(path = "./dependencies", full.names = TRUE), source)) at the top of your script...
For more reading on environments, there is The Evironments Section of Advanced R. Notably, there are several ways to specify environments that might be useful for different use cases rather than hard-coding the global environment.
In theory you need just to do something like:
rm(list= ls(all=TRUE, envir = .GlobalEnv))
I mean you set explicitly the environment ( even it is better here to use pos argument). but this will delete also the clearWorkSpace function since it is a defined in the global environment. So this will fails with a recursive call.
Personally I never use rm within a function or a local call. My understanding , rm is intended to be called from the console to clear the work space.

how to add functions in an existing environment

Is it possible to use env() as a substitute for namespaces, and how do you check if an environment exists already before adding functions to it?
This is related to this question, and Brendan's suggestion
How to organize large R programs?
I understand Dirk's point in that question, however for development it is sometimes impractical to put functions in packages.
EDIT: The idea is to mimic namespaces across files, and hence to be able to load different files independently. If a file has been previously loaded then the environment doesn't need to be created, just added to.
Thanks for ideas
EDIT: So presumably this code below would be the equivalent of namespaces in other languages:-
# how to use environment as namespaces
# file 1
# equivalent of 'namespace e' if (!(exists("e") && is.environment(e))) { e <- new.env(parent=baseenv()) }
e$f1 <- function(x) {1}
# file 2
# equivalent of 'namespace e' if (!(exists("e") && is.environment(e))) { e <- new.env(parent=baseenv()) }
e$f2 <- function(x) {2}
Yes you can for the most part. Each function has an environment and that's where it looks for other functions and global variables. By using your own environment you have full control over that.
Typically functions are also assigned to an environment (by assigning them to a name), and typically those two environments are the same - but not always. In a package, the namespace environment is used for both, but then the (different) package environment on the search path also has the same (exported) functions defined. So there the environments are different.
# this will ensure only stats and packages later on the search list are searched for
# functions from your code (similar to import in a package)
e <- new.env(parent=as.environment("package:stats"))
# simple alternative if you want access to everything
# e <- new.env(parent=globalenv())
# Make all functions in "myfile.R" have e as environment
source("myfile.R", local=e)
# Or change an existing function to have a new environment:
e$myfunc <- function(x) sin(x)
environment(e$myfunc) <- e
# Alternative one-liner:
e$myfunc <- local(function(x) sin(x), e)
# Attach it if you want to be able to call them as usual.
# Note that this creates a new environment "myenv".
attach(e, name="myenv")
# remove all temp objects
rm(list=ls())
# and try your new function:
myfunc(1:3)
# Detach when it's time to clean up or reattach an updated version...
detach("myfile")
In the example above, e corresponds to a namespace and the attached "myenv" corresponds to a package environment (like "package:stats" on the search path).
Namespaces are environments, so you can use exactly the same mechanism. Since R uses lexical scoping the parent of the environment defines what the function will see (i.e. how free variables are bound). And exactly like namespace you can attach environments and look them up.
So to create a new "manual namespace" you can use something like
e <- new.env(parent=baseenv())
# use local(), sys.source(), source() or e$foo <- assignment to populate it, e.g.
local({
f <- function() { ... }
#...
}, e)
attach(e, name = "mySuperNamespace")
Now it is loaded and attached just like a namespace - so you can use f just like it was in a namespace. Namespaces use one more parent environment to resolve imports - you can do that too if you care. If you need to check for your cool environment, just check the search path, e.g "mySuperNamespace" %in% search(). If you need the actual environment, use as.environment("mySuperNamespace")
You can check that environments exist in the same way that you would any other variable.
e <- new.env()
exists("e") && is.environment(e)

R: creating an environment in the globalenv() from inside a function

Right now I have the lines:
envCache <- new.env( hash=TRUE, parent = .GlobalEnv )
print(parent.env(envCache))
R claims the environment is in the global environment, but when I try to find the environment later it's not there.
What I'm trying to do here is cache some dataframes in and environment under the global environment, so each time I call a function it does not have to hit the server to get the same data again. Ideally, I'll call the function once using a source command in the R console, it will grab the data necessary, save it to an environment in the global environment, and then when I call the same function from the R console it will see the environment and dataframe from which it will grab the data as opposed to re-querying the server.
When R looks for a symbol, it looks in the current environment, then the environment's parent, and so on. It has not assigned envCache into the global environment. One way to implement what you would like to do is to create a 'closure' that remembers state, along the lines of
makeCache <- function() {
cache <- new.env(parent=emptyenv())
list(get = function(key) cache[[key]],
set = function(key, value) cache[[key]] <- value,
## next two added in response to #sunpyg
load = function(rdaFile) load(rdaFile, cache),
ls = function() ls(cache))
}
invoking makeCache() returns a list of two functions, get and set.
a <- makeCache()
Each function has an environment in which it was defined (the environment created when you invoked makeCache()). When you invoke a$set("a", 1) the rules of variable look-up mean that R looks for a variable cache, first inside the function aCache$set, and when it doesn't find it there in the environment in which set was defined.
> a$get("foo")
NULL
> a$set("foo", 1)
> a$get("foo")
[1] 1
Cool, eh? Note that parent=emptyenv()) means that a get() on a non-existent keys stops looking in cache, otherwise it would have continued to look in the parent environment of cache, and so on.
There's a bank account example in the Introduction to R document that is really fun. In response to #sunpyg's comment, I've added a load and ls function to add data from an Rda file and to list the content of the cache, e.g., a$load("foo.Rda").
Here's what I came up with as an alternate solution. It may be doing the same thing as the other answer in the backround, but the code is more intuitive to me.
cacheTesting <- function()
{
if (exists("cache"))
{
print("IT WORKS!!!")
cacheData <- get("test", envir = cache)
print(cacheData)
}
else
{
assign("cache", new.env(hash = TRUE), envir = .GlobalEnv)
test <- 42
assign("test", test, envir = cache)
}
}
The first run of the code creates the environment in the .GlobalEnv using an assign statement. The second run sees that environment, because it actually made it to .GlobalEnv, and pulls the data placed there from it before printing it.

Resources