attaching packages to a "temporary" search path in R - r

Inside a function, I am sourcing a script:
f <- function(){
source("~/Desktop/sourceme.R") # source someone elses script
# do some stuff to the variables read in
}
f()
search() # library sourceme.R attaches is all the way in the back!
and unfortunately, the scripts that I am sourcing are not fully under my control. They make calls to library(somePackage), and it pollutes the search path.
This is mostly a problem if the author of sourceme.R expects the package that he/she is attaching to be at the top level/close to the global environment. If I myself have attached some package that masks some of the function names he/she is expecting to be available, then that's no good.
Is there a way I can source scripts but somehow make my own temporary search path that "resets" after the function is finished running?

I would consider sourcing the script in a separate R process using the callr package and then return the environment created by the sourced file.
By using a separate R process, this will prevent your search path from being polluted. I'm guessing there maybe some side effects (such as defining new functions of variables) in your global environment you do want. The local argument of the source functions allows you to specify where the parsed script should be executed. If you return this environment from the other R process, you can access any result you need.
Not sure what yours looks like but say I have this file that would modify the search path:
# messWithSearchPath.R
library(dplyr)
a <- data.frame(groupID = rep(1:3, 10), value = rnorm(30))
b <- a %>%
group_by(groupID) %>%
summarize(agg = sum(value))
From my top level script, I would write a wrapper function to source it in a new environment and have callr execute this function:
RogueScript <- function(){
rogueEnv <- new.env()
source("messWIthSearchPath.R", local = rogueEnv)
rogueEnv
}
before <- search()
scriptResults <- callr::r(RogueScript)
scriptResults$b
#> groupID agg
#> 1 1 -2.871642
#> 2 2 3.368499
#> 3 3 1.159509
identical(before, search())
#> [1] TRUE
If the scripts have other side effects (such as setting options or establishing external connections), this method probably won't work. There may be workarounds depending on what they are intended to do, but this should work if you just want the variables/functions created. It also prevents the scripts from conflicting with each other not just your top level script.

One way would be to "snapshot" your current search path and try to return to it later:
search.snapshot <- local({
.snap <- character(0)
function(restore = FALSE) {
if (restore) {
if (is.null(.snap)) {
return(character(0))
} else {
extras <- setdiff(search(), .snap)
# may not work if DLLs are loaded
for (pkg in extras) {
suppressWarnings(detach(pkg, character.only = TRUE, unload = TRUE))
}
return(extras)
}
} else .snap <<- search()
}
})
In action:
search.snapshot() # store current state
get(".snap", envir = environment(search.snapshot)) # view snapshot
# [1] ".GlobalEnv" "ESSR" "package:stats"
# [4] "package:graphics" "package:grDevices" "package:utils"
# [7] "package:datasets" "package:r2" "package:methods"
# [10] "Autoloads" "package:base"
library(ggplot2)
library(zoo)
# Attaching package: 'zoo'
# The following objects are masked from 'package:base':
# as.Date, as.Date.numeric
library(dplyr)
# Attaching package: 'dplyr'
# The following objects are masked from 'package:stats':
# filter, lag
# The following objects are masked from 'package:base':
# intersect, setdiff, setequal, union
search()
# [1] ".GlobalEnv" "package:dplyr" "package:zoo"
# [4] "package:ggplot2" "ESSR" "package:stats"
# [7] "package:graphics" "package:grDevices" "package:utils"
# [10] "package:datasets" "package:r2" "package:methods"
# [13] "Autoloads" "package:base"
search.snapshot(TRUE) # returns detached packages
# [1] "package:dplyr" "package:zoo" "package:ggplot2"
search()
# [1] ".GlobalEnv" "ESSR" "package:stats"
# [4] "package:graphics" "package:grDevices" "package:utils"
# [7] "package:datasets" "package:r2" "package:methods"
# [10] "Autoloads" "package:base"
I am somewhat confident (without verification) that this will not always work with all packages, perhaps due to dependencies and/or loaded DLLs. You can try adding force=TRUE to the detach call, not sure if that'll work better or perhaps have other undesirable side-effects.

Related

Why would loading a package change the resid function being used?

I understand that resid() is a generic function in R, and which specific residual function is used depends on the object to which resid() is applied, just like print().
However, I noticed that, sometimes loading a package would change which specific residual function is used, yielding drastically different residual plots. Could anyone help me understand why that happens?
This is an example from my data:
> #### Showing packages loaded after starting up R ####
> search()
[1] ".GlobalEnv" "tools:rstudio" "package:stats" "package:graphics" "package:grDevices" "package:utils"
[7] "package:datasets" "package:methods" "Autoloads" "package:base"
>
> #### Before loading nlme ####
>
> ## s1 is a gls object, calculated using the nlme package
> s1 <- readRDS("../Data/my_gls.RDS")
> qqnorm(resid(s1, type = "pearson"), main = "before loading nlme")
> qqline(resid(s1, type = "pearson"))
>
> methods(resid)
[1] residuals.default* residuals.glm residuals.HoltWinters* residuals.isoreg* residuals.lm
[6] residuals.nls* residuals.smooth.spline* residuals.tukeyline*
see '?methods' for accessing help and source code
Warning message:
In .S3methods(generic.function, class, envir) :
generic function 'resid' dispatches methods for generic 'residuals'
> sloop::s3_dispatch(resid(s1, type = "pearson"))
resid.gls
=> resid.default
> ## the resid.default is used
And the resulting qqplot is
Then, after loading the nlme package,
> #### After loading nlme ####
>
> library(nlme)
Warning message:
package ‘nlme’ was built under R version 4.1.2
> search()
[1] ".GlobalEnv" "package:nlme" "tools:rstudio" "package:stats" "package:graphics" "package:grDevices"
[7] "package:utils" "package:datasets" "package:methods" "Autoloads" "package:base"
>
> # s2 is the same as s1
> s2 <- readRDS("../Data/my_gls.RDS")
> qqnorm(resid(s2, type = "pearson"), main = "after loading nlme")
> qqline(resid(s2, type = "pearson"))
>
> methods(resid)
[1] residuals.default* residuals.glm residuals.gls* residuals.glsStruct* residuals.gnls*
[6] residuals.gnlsStruct* residuals.HoltWinters* residuals.isoreg* residuals.lm residuals.lme*
[11] residuals.lmeStruct* residuals.lmList* residuals.nlmeStruct* residuals.nls* residuals.smooth.spline*
[16] residuals.tukeyline*
see '?methods' for accessing help and source code
Warning message:
In .S3methods(generic.function, class, envir) :
generic function 'resid' dispatches methods for generic 'residuals'
> sloop::s3_dispatch(resid(s2, type = "pearson"))
=> resid.gls
* resid.default
> # resid.gls is used
the qqplot looks like this
As the command sloop::s3_dispatch(resid(s1, type = "pearson")) indicated, resid.default is the function being used before the nlme package is loaded, but resid.gls is the one being used after nlme is loaded. Why such a change---is it because resid.gls is not included in the default options of resid(), as the first methods(resid) suggested?
I am using R 4.1.0, and I would appreciate your feedback very much, if any. Thank you.
> version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 4
minor 1.0
year 2021
month 05
day 18
svn rev 80317
language R
version.string R version 4.1.0 (2021-05-18)
nickname Camp Pontaneze

Accessing current environment in R's eval function

According to this post, environment() function is the function to call a current environment.
However, I found that at least that is not the case in eval function, with following examples.
.env <- new.env()
.env$info$progress <- 3
.expr <- "environment()$info$progress <- 5"
eval(parse(text = .expr), envir = .env, enclos = .env)
> invalid (NULL) left side of assignment
I also tried assign function, but it does not work either
.env <- new.env()
.env$info$progress <- 3
.expr <- "assign(info$progress, 11, envir = environment())"
eval(parse(text = .expr), envir = .env, enclos = .env)
> Error in assign(info$progress, 11, envir = environment()) :
> invalid first argument
So environment function failed to find current environment in eval.
I would appreciate if anyone lets me know how to access current environment in above examples or how to move-around this issue in eval.
environment() does what you think it does. The issue is with assigning directly to the result of a function call.
> new.env()$info$progress <- 3
Error in new.env()$info$progress <- 3 :
invalid (NULL) left side of assignment
> .env <- new.env()
> .env$info$progress <- 3
> evalq(identical(environment(), .env), envir = .env)
[1] TRUE
> evalq({ e <- environment(); e$info$progress <- 5 }, envir = .env)
> .env$info
$progress
[1] 5
The goal (which I thought was access to a defined environment) can be accomplished by considering the fact that no call to environment is needed. That function with a NULL argument doesn't retrieve anything useful. The .env object is an environment, so the assignment should just be into it:
.env <- new.env()
.env$info$progres <- 3
.expr <- ".env$info$progres <- 5"
eval(parse(text = .expr) )
#------------
> ls(envir=.env)
[1] "info"
> ?get
> get("info", envir=.env)
$progres
[1] 5
The environment assignment operation is supposed to put values into the environment of functions. I think it's probably undefined when you make an assignment into an unbound environment. I would not have thought that environment()$info$progres <- 5 would have succeeded in placing a value into .env since the target of environment(.)<- was NULL.
Responding to your comment: I'm not sure what was meant by "a current environment". There is "the current environment" and the .env-environment was not that environment (nor was it ever that environment, even for an instant). Creating an environment with new.env does not make it the current environment. It only creates an environment which allows you to store or retrieve objects in it by referencing its name.
.env <- new.env()
environment()
#<environment: R_GlobalEnv>
It isn't even on the search path. It's kind of "on the sidelines" waiting to be referenced.
> search()
[1] ".GlobalEnv" "package:acs" "package:XML" "package:acepack" "package:abind"
[6] "package:downloader" "package:forcats" "package:stringr" "package:dplyr" "package:purrr"
[11] "package:readr" "package:tidyr" "package:tibble" "package:tidyverse" "tools:RGUI"
[16] "package:grDevices" "package:utils" "package:datasets" "package:graphics" "package:rms"
[21] "package:SparseM" "package:Hmisc" "package:ggplot2" "package:Formula" "package:stats"
[26] "package:survival" "package:sos" "package:brew" "package:lattice" "package:methods"
[31] "Autoloads" "package:base"
> ls(envir=.env)
[1] "info"
I find myself wondering if the goal was to use a more object-oriented style, and if so would recommend looking at the ?R6 help page and the section in the R Language Definition entitled: "5 Object-oriented programming".
After navigating through the help pages looking at the code for getAnywhere, ?find, ?ls, ?objects, I found a particular use of apropos that you might find interesting:
apropos("\\.", mode="environment")
[1] ".AutoloadEnv" ".BaseNamespaceEnv" ".env" ".GenericArgsEnv" ".GlobalEnv"
[6] ".userHooksEnv"
If you use:
apropos("." , mode="environment")`
..., constructed with the most generic pattern possible, you will also find the 100 or so ggproto-environments defined by ggplot2-functions, assuming you have that package loaded. I think Hadley's "Advanced Programming" may have more on this topic of interest because he defines a "environment list" class and functions to manipulate them.

R: Source personal scripts keeping some functions hidden

Follow up to this
I want to source scripts inside a given environment, like in sys.source, but "exporting" only some functions and keeping the others private.
I created this function:
source2=function(script){
ps=paste0(script, "_")
assign(ps, new.env(parent=baseenv()))
assign(script, new.env(parent=get(ps)))
private=function(f){
fn=deparse(substitute(f))
assign(fn, f, parent.env(parent.frame()))
rm(list=fn, envir=parent.frame())
}
assign("private", private, get(script))
sys.source(paste0(script, ".R"), envir=get(script))
rm(private, envir=get(script))
attach(get(script), name=script)
}
For the most part, this function works as expected.
Consider the script:
## foo.R
f=function() g()
g=function() print('hello')
private(g)
Note the private() function, which will hide g().
If I, so to say, import the module foo:
source2("foo")
I have a new environment in the search path:
search()
## [1] ".GlobalEnv" "foo" "package:stats"
## [4] "package:graphics" "package:grDevices" "package:utils"
## [7] "package:datasets" "package:methods" "Autoloads"
## [10] "package:base"
The current environment, .GlobalEnv, shows only:
ls()
## [1] "source2"
But if I list items in foo environment:
ls("foo")
## [1] "f"
Therefore I can run:
f()
## [1] "hello"
The problem is that g() is hidden totally.
getAnywhere(g)
## no object named 'g' was found
Too much. In fact, if I want to debug f():
debug(f)
f()
debugging in: f()
## Error in f() : could not find function "g"
The question is:
Where is g()? Can I still retrieve it?
Use:
get("g",env=environment(f))
## function ()
## print("hello")
## <environment: 0x0000000018780c30>
ls(parent.env(environment(f)))
## [1] "g"
Credit goes to Alexander Griffith for the solution.

Character Vector of loaded packages

I am currently trying to translate my loaded packages into a character vector to use in the pkgDep function. Does anyone have any idea on how to do this? Currently my results are formatted as a list, and using the unlist()function has not worked for me. I think rapply would do the trick, but I am running into issues on how to set up the function. I have pasted my code below. Thanks!
x <- loaded_packages()
typeof(x)
#need a character vector with package names to pass into function
pkgList <- pkgDep(x, availPkgs = pkgdata, suggests=TRUE)`
Use search() function to see the packages currently loaded.
x <- search()
x
# [1] ".GlobalEnv" "package:dplyr" "package:stats"
# [4] "package:graphics" "package:grDevices" "package:utils"
# [7] "package:datasets" "package:methods" "Autoloads"
# [10] "package:base"
pkgList <- pkgDep(x, availPkgs = pkgdata, suggests=TRUE)`
If you can tell us what pkgDep() function does, we can get the loaded packages list in specific format.
Try this function:
x <- search()
As per this link.

parent.env( x ) confusion

I've read the documentation for parent.env() and it seems fairly straightforward - it returns the enclosing environment. However, if I use parent.env() to walk the chain of enclosing environments, I see something that I cannot explain. First, the code (taken from "R in a nutshell")
library( PerformanceAnalytics )
x = environment(chart.RelativePerformance)
while (environmentName(x) != environmentName(emptyenv()))
{
print(environmentName(parent.env(x)))
x <- parent.env(x)
}
And the results:
[1] "imports:PerformanceAnalytics"
[1] "base"
[1] "R_GlobalEnv"
[1] "package:PerformanceAnalytics"
[1] "package:xts"
[1] "package:zoo"
[1] "tools:rstudio"
[1] "package:stats"
[1] "package:graphics"
[1] "package:utils"
[1] "package:datasets"
[1] "package:grDevices"
[1] "package:roxygen2"
[1] "package:digest"
[1] "package:methods"
[1] "Autoloads"
[1] "base"
[1] "R_EmptyEnv"
How can we explain the "base" at the top and the "base" at the bottom? Also, how can we explain "package:PerformanceAnalytics" and "imports:PerformanceAnalytics"? Everything would seem consistent without the first two lines. That is, function chart.RelativePerformance is in the package:PerformanceAnalytics environment which is created by xts, which is created by zoo, ... all the way up (or down) to base and the empty environment.
Also, the documentation is not super clear on this - is the "enclosing environment" the environment in which another environment is created and thus walking parent.env() shows a "creation" chain?
Edit
Shameless plug: I wrote a blog post that explains environments, parent.env(), enclosures, namespace/package, etc. with intuitive diagrams.
1) Regarding how base could be there twice (given that environments form a tree), its the fault of the environmentName function. Actually the first occurrence is .BaseNamespaceEnv and the latter occurrence is baseenv().
> identical(baseenv(), .BaseNamespaceEnv)
[1] FALSE
2) Regarding the imports:PerformanceAnalytics that is a special environment that R sets up to hold the imports mentioned in the package's NAMESPACE or DESCRIPTION file so that objects in it are encountered before anything else.
Try running this for some clarity. The str(p) and following if statements will give a better idea of what p is:
library( PerformanceAnalytics )
x <- environment(chart.RelativePerformance)
str(x)
while (environmentName(x) != environmentName(emptyenv())) {
p <- parent.env(x)
cat("------------------------------\n")
str(p)
if (identical(p, .BaseNamespaceEnv)) cat("Same as .BaseNamespaceEnv\n")
if (identical(p, baseenv())) cat("Same as baseenv()\n")
x <- p
}
The first few items in your results give evidence of the rules R uses to search for variables used in functions in packages with namespaces. From the R-ext manual:
The namespace controls the search strategy for variables used by functions in the package.
If not found locally, R searches the package namespace first, then the imports, then the base
namespace and then the normal search path.
Elaborating just a bit, have a look at the first few lines of chart.RelativePerformance:
head(body(chart.RelativePerformance), 5)
# {
# Ra = checkData(Ra)
# Rb = checkData(Rb)
# columns.a = ncol(Ra)
# columns.b = ncol(Rb)
# }
When a call to chart.RelativePerformance is being evaluated, each of those symbols --- whether the checkData on line 1, or the ncol on line 3 --- needs to be found somewhere on the search path. Here are the first few enclosing environments checked:
First off is namespace:PerformanceAnalytics. checkData is found there, but ncol is not.
Next stop (and the first location listed in your results) is imports:PerformanceAnalytics. This is the list of functions specified as imports in the package's NAMESPACE file. ncol is not found here either.
The base environment namespace (where ncol will be found) is the last stop before proceeding to the normal search path. Almost any R function will use some base functions, so this stop ensures that none of that functionality can be broken by objects in the global environment or in other packages. (R's designers could have left it to package authors to explicitly import the base environment in their NAMESPACE files, but adding this default pass through base does seem like the better design decision.)
The second base is .BaseNamespaceEnv, while the second to last base is baseenv(). These are not different (probably w.r.t. its parents). The parent of .BaseNamespaceEnv is .GlobalEnv, while that of baseenv() is emptyenv().
In a package, as #Josh says, R searches the namespace of the package, then the imports, and then the base (i.e., BaseNamespaceEnv).
you can find this by, e.g.:
> library(zoo)
> packageDescription("zoo")
Package: zoo
# ... snip ...
Imports: stats, utils, graphics, grDevices, lattice (>= 0.18-1)
# ... snip ...
> x <- environment(zoo)
> x
<environment: namespace:zoo>
> ls(x) # objects in zoo
[1] "-.yearmon" "-.yearqtr" "[.yearmon"
[4] "[.yearqtr" "[.zoo" "[<-.zoo"
# ... snip ...
> y <- parent.env(x)
> y # namespace of imported packages
<environment: 0x116e37468>
attr(,"name")
[1] "imports:zoo"
> ls(y) # objects in the imported packages
[1] "?" "abline"
[3] "acf" "acf2AR"
# ... snip ...

Resources