how to add functions in an existing environment - r

Is it possible to use env() as a substitute for namespaces, and how do you check if an environment exists already before adding functions to it?
This is related to this question, and Brendan's suggestion
How to organize large R programs?
I understand Dirk's point in that question, however for development it is sometimes impractical to put functions in packages.
EDIT: The idea is to mimic namespaces across files, and hence to be able to load different files independently. If a file has been previously loaded then the environment doesn't need to be created, just added to.
Thanks for ideas
EDIT: So presumably this code below would be the equivalent of namespaces in other languages:-
# how to use environment as namespaces
# file 1
# equivalent of 'namespace e' if (!(exists("e") && is.environment(e))) { e <- new.env(parent=baseenv()) }
e$f1 <- function(x) {1}
# file 2
# equivalent of 'namespace e' if (!(exists("e") && is.environment(e))) { e <- new.env(parent=baseenv()) }
e$f2 <- function(x) {2}

Yes you can for the most part. Each function has an environment and that's where it looks for other functions and global variables. By using your own environment you have full control over that.
Typically functions are also assigned to an environment (by assigning them to a name), and typically those two environments are the same - but not always. In a package, the namespace environment is used for both, but then the (different) package environment on the search path also has the same (exported) functions defined. So there the environments are different.
# this will ensure only stats and packages later on the search list are searched for
# functions from your code (similar to import in a package)
e <- new.env(parent=as.environment("package:stats"))
# simple alternative if you want access to everything
# e <- new.env(parent=globalenv())
# Make all functions in "myfile.R" have e as environment
source("myfile.R", local=e)
# Or change an existing function to have a new environment:
e$myfunc <- function(x) sin(x)
environment(e$myfunc) <- e
# Alternative one-liner:
e$myfunc <- local(function(x) sin(x), e)
# Attach it if you want to be able to call them as usual.
# Note that this creates a new environment "myenv".
attach(e, name="myenv")
# remove all temp objects
rm(list=ls())
# and try your new function:
myfunc(1:3)
# Detach when it's time to clean up or reattach an updated version...
detach("myfile")
In the example above, e corresponds to a namespace and the attached "myenv" corresponds to a package environment (like "package:stats" on the search path).

Namespaces are environments, so you can use exactly the same mechanism. Since R uses lexical scoping the parent of the environment defines what the function will see (i.e. how free variables are bound). And exactly like namespace you can attach environments and look them up.
So to create a new "manual namespace" you can use something like
e <- new.env(parent=baseenv())
# use local(), sys.source(), source() or e$foo <- assignment to populate it, e.g.
local({
f <- function() { ... }
#...
}, e)
attach(e, name = "mySuperNamespace")
Now it is loaded and attached just like a namespace - so you can use f just like it was in a namespace. Namespaces use one more parent environment to resolve imports - you can do that too if you care. If you need to check for your cool environment, just check the search path, e.g "mySuperNamespace" %in% search(). If you need the actual environment, use as.environment("mySuperNamespace")

You can check that environments exist in the same way that you would any other variable.
e <- new.env()
exists("e") && is.environment(e)

Related

Passing values between functions in an R package

In an R package, let's say we have two functions. One is setting some parameters; the other one is using those parameters. How can I build such a pattern in R. It is similar to event-driven applications. But I am not sure if it is possible in R or not.
For example:
If we run set_param( a=10), whenever we run print_a.R, it prints 10, and incase of running set_param(a=20), it prints 20.
I need a solution without assigning value to the global environment because CRAN checks raise notes.
I suggest adding a variable to your package, as #MrFlick suggested.
For instance, in ./R/myoptions.R:
.myoptions <- new.env(parent = emptyenv())
getter <- function(k) {
.myoptions[[k]]
}
setter <- function(k, v) {
.myoptions[[k]] <- v
}
lister <- function() {
names(.myoptions)
}
Then other package functions can use this as a key/value store:
getter("optA")
# NULL
setter("optA", 99)
getter("optA")
# [1] 99
lister()
# [1] "optA"
and all the while, nothing is in the .GlobalEnv:
ls(all.names = TRUE)
# character(0)
Values can be as complex as you want.
Note that these are not exported, so if you want/need the user to have direct access to this, then you'll need to update NAMESPACE or, if using roxygen2, add #' #export before each function definition.
NB: I should add that a more canonical approach might be to use options(.) for these, so that users can preemptively control and have access to them., programmatically.

R Package Development - Global Variables vs Parameter [duplicate]

I'm developing a package in R. I have a bunch of functions, some of them need some global variables. How do I manage global variables in packages?
I've read something about environment, but I do not understand how it will work, of if this even is the way to go about the things.
You can use package local variables through an environment. These variables will be available to multiple functions in the package, but not (easily) accessible to the user and will not interfere with the users workspace. A quick and simple example is:
pkg.env <- new.env()
pkg.env$cur.val <- 0
pkg.env$times.changed <- 0
inc <- function(by=1) {
pkg.env$times.changed <- pkg.env$times.changed + 1
pkg.env$cur.val <- pkg.env$cur.val + by
pkg.env$cur.val
}
dec <- function(by=1) {
pkg.env$times.changed <- pkg.env$times.changed + 1
pkg.env$cur.val <- pkg.env$cur.val - by
pkg.env$cur.val
}
cur <- function(){
cat('the current value is', pkg.env$cur.val, 'and it has been changed',
pkg.env$times.changed, 'times\n')
}
inc()
inc()
inc(5)
dec()
dec(2)
inc()
cur()
You could set an option, eg
options("mypkg-myval"=3)
1+getOption("mypkg-myval")
[1] 4
In general global variables are evil. The underlying principle why they are evil is that you want to minimize the interconnections in your package. These interconnections often cause functions to have side-effects, i.e. it depends not only on the input arguments what the outcome is, but also on the value of some global variable. Especially when the number of functions grows, this can be hard to get right and hell to debug.
For global variables in R see this SO post.
Edit in response to your comment:
An alternative could be to just pass around the needed information to the functions that need it. You could create a new object which contains this info:
token_information = list(token1 = "087091287129387",
token2 = "UA2329723")
and require all functions that need this information to have it as an argument:
do_stuff = function(arg1, arg2, token)
do_stuff(arg1, arg2, token = token_information)
In this way it is clear from the code that token information is needed in the function, and you can debug the function on its own. Furthermore, the function has no side effects, as its behavior is fully determined by its input arguments. A typical user script would look something like:
token_info = create_token(token1, token2)
do_stuff(arg1, arg2, token_info)
I hope this makes things more clear.
The question is unclear:
Just one R process or several?
Just on one host, or across several machine?
Is there common file access among them or not?
In increasing order of complexity, I'd use a file, a SQLite backend via the RSQlite package or (my favourite :) the rredis package to set to / read from a Redis instance.
You could also create a list of tokens and add it to R/sysdata.rda with usethis::use_data(..., internal = TRUE). The data in this file is internal, but accessible by all functions. The only problem would arise if you only want some functions to access the tokens, which would be better served by:
the environment solution already proposed above; or
creating a hidden helper function that holds the tokens and returns them. Then just call this hidden function inside the functions that use the tokens, and (assuming it is a list) you can inject them to their environment with list2env(..., envir = environment()).
If you don't mind adding a dependency to your package, you can use an R6 object from the homonym package, as suggested in the comments to #greg-snow's answer.
R6 objects are actual environments with the possibility of adding public and private methods, are very lightweight and could be a good and more rigorous option to share package's global variables, without polluting the global environment.
Compared to #greg-snow's solution, it allows for a stricter control of your variables (you can add methods that check for types for example). The drawback can be the dependency and, of course, learning the R6 syntax.
library(R6)
MyPkgOptions = R6::R6Class(
"mypkg_options",
public = list(
get_option = function(x) private$.options[[x]]
),
active = list(
var1 = function(x){
if(missing(x)) private$.options[['var1']]
else stop("This is an environment parameter that cannot be changed")
}
,var2 = function(x){
if(missing(x)) private$.options[['var2']]
else stop("This is an environment parameter that cannot be changed")
}
),
private = list(
.options = list(
var1 = 1,
var2 = 2
)
)
)
# Create an instance
mypkg_options = MyPkgOptions$new()
# Fetch values from active fields
mypkg_options$var1
#> [1] 1
mypkg_options$var2
#> [1] 2
# Alternative way
mypkg_options$get_option("var1")
#> [1] 1
mypkg_options$get_option("var3")
#> NULL
# Variables are locked unless you add a method to change them
mypkg_options$var1 = 3
#> Error in (function (x) : This is an environment parameter that cannot be changed
Created on 2020-05-27 by the reprex package (v0.3.0)

Keep user-defined functions in global environment, during removal of objects

Question: How can I control the deletion (and saving) of user-defined function?
What I have tried so far:
I've gotten a recommendation to ad a dot [.] in the beginning of every function, being told that the functions would not be deleted. When tested, the function are deleted despite of staring with dot.
Requirements:
All "non-function" should be handled by the [rm].
Due to automation, the procedure needs to be able to be triggered by R base from a terminal. It is not enough that solution works only in Rstudio.
Global environment to be used, due to keeping the solution standardized.
If possible, one should be able to define which function to keep/delete.
Expected outcome:
None of the functions in the example should be deleted.
Below you fin the example code:
# Create 3 object variables.
a <- 1
b <- 2
c <- 3
# Create 3 functions.
myFunction1 <- function() {}
myFunction2 <- function() {}
myFunction3 <- function() {}
# Remove all from global.env.
# Keep the ones specified below.
rm(list = ls()[! ls() %in% c(
"a",
"c"
)
]
)
You can use ls.str to specify a mode of object to find. With this you can exclude functions from the rm list.
rm(list=setdiff(ls(),ls.str(mode="function")))
ls()
[1] "myFunction1" "myFunction2" "myFunction3"
However, you might be better off formalising your functions in a package and then you would not need to worry about deleting them with rm.
I strongly recommend a different approach. Don’t partially remove objects, use proper scope instead. That is, don’t define objects in the global environment that don’t need to be defined there, define them inside functions or local scopes instead.
Going one step further, your functions.r file also shouldn’t define functions in the global environment. Instead, as suggested in a comment, it should define them inside a dedicated environment which you may attach, if convenient. This is in fact what R packages solve. If you feel that R packages are too heavy for your purpose, I suggest you write modules using my ‘box’ package: it cleanly implements file-based code modules.
If you use scoping as it was designed, there’s no need to call rm on temporary variables, and hence your problem won’t arise.
If you really want a clean slate, restart R and re-execute your script: this is the only way to consistently reset the state of the R session; all other ways are error-prone hacks because they only perform a partial cleanup.
A note on what you wrote:
When tested, the function are deleted despite of staring with dot.
They’re not — they’re just invisible; that’s what the leading dot does. However, this recommendation also strikes me as bad practice: it’s an unnecessary hack.
Easy. Don't use the global environment.
myenv <- new.env()
with(myenv,
{
# Create 3 object variables.
a <- 1
b <- 2
c <- 3
}
)
myenv$a
#[1] 1
# Create 3 functions.
myFunction1 <- function() {}
myFunction2 <- function() {}
myFunction3 <- function() {}
# Remove all from env.
# Keep the ones specified below.
rm(list = ls(envir = myenv)[! ls(envir = myenv) %in% c(
"a",
"c"
)
], envir = myenv
)
ls(envir = myenv)
#[1] "a" "c"

Examining contents of .rdata file by attaching into a new environment - possible?

I am interested in listing objects in an RDATA file and loading only selected objects, rather than the whole set (in case some may be big or may already exist in the environment). I'm not quite clear on how to do this when there are conflicts in names, as attach() doesn't work as nicely.
1: For examining the contents of an R data file without loading it: This question is similar, but different from, the one asked at listing contents of an R data file without loading
In that case, the solution offered was:
attach(filename)
ls(pos = 2)
detach()
If there are naming conflicts between objects in the file and those in the global environment, this warning appears:
The following object(s) are masked _by_ '.GlobalEnv':
I tried creating a new environment, but I cannot seem to attach into that.
For instance, this produces the same error:
lsfile <- function(filename){
tmpEnv <- new.env()
evalq(attach(filename), envir = tmpEnv)
tmpls <- ls(pos = 2)
detach()
return(tmpls)
}
lsfile(filename)
Maybe I've made a mess of things with evalq (or eval). Is there some other way to avoid the naming conflict?
2: If I want to access an object - if there are no naming conflicts, I can just work with the one from the .rdat file, or copy it to a new one. If there are conflicts, how does one access the object in the file's namespace?
For instance, if my file is "sample.rdat", and the object is surveyData, and a surveyData object already exists in the global environment, then how can I access the one from the file:sample.rdat namespace?
I currently solve this problem by loading everything into a temporary environment, and then copy out what's needed, but this is inefficient.
Since this question has just been referenced let's clarify two things:
attach() simply calls load() so there is really no point in using it instead of load
if you want selective access to prevent masking it's much easier to simply load the file into a new environment:
e = local({load("foo.RData"); environment()})
You can then use ls(e) and access contents like e$x. You can still use attach on the environment if you really want it on the search path.
FWIW .RData files have no index (the objects are stored in one big pairlist), so you can't list the contained objects without loading. If you want convenient access, convert it to the lazy-load format instead which simply adds an index so each object can be loaded separately (see Get specific object from Rdata file)
I just use an env= argument to load():
> x <- 1; y <- 2; z <- "foo"
> save(x, y, z, file="/tmp/foo.RData")
> ne <- new.env()
> load(file="/tmp/foo.RData", env=ne)
> ls(env=ne)
[1] "x" "y" "z"
> ne$z
[1] "foo"
>
The cost of this approach is that you do read the whole RData file---but on the other hand that seems to be unavoidable anyway as no other method seems to offer a list of the 'content' of such a file.
You can suppress the warning by setting warn.conflicts=FALSE on the call to attach. If an object is masked by one in the global environment, you can use get to retreive it from your attached data.
x <- 1:10
save(x, file="x.rData")
#attach("x.rData", pos=2, warn.conflicts=FALSE)
attach("x.rData", pos=2)
(x <- 1)
# [1] 1
(x <- get("x", pos=2))
# [1] 1 2 3 4 5 6 7 8 9 10
Thanks to #Dirk and #Joshua.
I had an epiphany. The command/package foreach with SMP or MC seems to produce environments that only inherit, but do not seem to conflict with, the global environment.
lsfile <- function(list_files){
aggregate_ls = foreach(ix = 1:length(list_files)) %dopar% {
attach(list_files[ix])
tmpls <- ls(pos = 2)
return(tmpls)
}
return(aggregate_ls)
}
lsfile("f1.rdat")
lsfile(dir(pattern = "*rdat"))
This is useful to me because I can now parallelize this. This is a bare-bones version, and I will modify it to give more detailed information, but so far it seems to be the only way to avoid conflicts, even without ignore.
So, question #1 can be resolved by either ignoring the warnings (as #Joshua suggested) or by using whatever magic foreach summons.
For part 2, loading an object, I think #Joshua has the right idea - "get" will do.
The foreach magic can also work, by using the .noexport option. However, this has risks: whatever isn't specifically excluded will be inherited/exported from the global environment (I could do ls(), but there's always the possibility of attached datasets). For safety, this means that get() must still be used to avoid the risk of a naming conflict. Loading into a subenvironment avoids the naming conflict, but doesn't avoid the loading of unnecessary objects.
#Joshua's answer is far simpler than my foreach detour.

hiding personal functions in R

I have a few convenience functions in my .Rprofile, such as this handy function for returning the size of objects in memory. Sometimes I like to clean out my workspace without restarting and I do this with rm(list=ls()) which deletes all my user created objects AND my custom functions. I'd really like to not blow up my custom functions.
One way around this seems to be creating a package with my custom functions so that my functions end up in their own namespace. That's not particularly hard, but is there an easier way to ensure custom functions don't get killed by rm()?
Combine attach and sys.source to source into an environment and attach that environment. Here I have two functions in file my_fun.R:
foo <- function(x) {
mean(x)
}
bar <- function(x) {
sd(x)
}
Before I load these functions, they are obviously not found:
> foo(1:10)
Error: could not find function "foo"
> bar(1:10)
Error: could not find function "bar"
Create an environment and source the file into it:
> myEnv <- new.env()
> sys.source("my_fun.R", envir = myEnv)
Still not visible as we haven't attached anything
> foo(1:10)
Error: could not find function "foo"
> bar(1:10)
Error: could not find function "bar"
and when we do so, they are visible, and because we have attached a copy of the environment to the search path the functions survive being rm()-ed:
> attach(myEnv)
> foo(1:10)
[1] 5.5
> bar(1:10)
[1] 3.027650
> rm(list = ls())
> foo(1:10)
[1] 5.5
I still think you would be better off with your own personal package, but the above might suffice in the meantime. Just remember the copy on the search path is just that, a copy. If the functions are fairly stable and you're not editing them then the above might be useful but it is probably more hassle than it is worth if you are developing the functions and modifying them.
A second option is to just name them all .foo rather than foo as ls() will not return objects named like that unless argument all = TRUE is set:
> .foo <- function(x) mean(x)
> ls()
character(0)
> ls(all = TRUE)
[1] ".foo" ".Random.seed"
Here are two ways:
1) Have each of your function names start with a dot., e.g. .f instead of f. ls will not list such functions unless you use ls(all.names = TRUE) therefore they won't be passed to your rm command.
or,
2) Put this in your .Rprofile
attach(list(
f = function(x) x,
g = function(x) x*x
), name = "MyFunctions")
The functions will appear as a component named "MyFunctions" on your search list rather than in your workspace and they will be accessible almost the same as if they were in your workspace. search() will display your search list and ls("MyFunctions") will list the names of the functions you attached. Since they are not in your workspace the rm command you normally use won't remove them. If you do wish to remove them use detach("MyFunctions") .
Gavin's answer is wonderful, and I just upvoted it. Merely for completeness, let me toss in another one:
R> q("no")
followed by
M-x R
to create a new session---which re-reads the .Rprofile. Easy, fast, and cheap.
Other than that, private packages are the way in my book.
Another alternative: keep the functions in a separate file which is sourced within .RProfile. You can re-source the contents directly from within R at your leisure.
I find that often my R environment gets cluttered with various objects when I'm creating or debugging a function. I wanted a way to efficiently keep the environment free of these objects while retaining personal functions.
The simple function below was my solution. It does 2 things:
1) deletes all non-function objects that do not begin with a capital letter and then
2) saves the environment as an RData file
(requires the R.oo package)
cleanup=function(filename="C:/mymainR.RData"){
library(R.oo)
# create a dataframe listing all personal objects
everything=ll(envir=1)
#get the objects that are not functions
nonfunction=as.vector(everything[everything$data.class!="function",1])
#nonfunction objects that do not begin with a capital letter should be deleted
trash=nonfunction[grep('[[:lower:]]{1}',nonfunction)]
remove(list=trash,pos=1)
#save the R environment
save.image(filename)
print(paste("New, CLEAN R environment saved in",filename))
}
In order to use this function 3 rules must always be kept:
1) Keep all data external to R.
2) Use names that begin with a capital letter for non-function objects that I want to keep permanently available.
3) Obsolete functions must be removed manually with rm.
Obviously this isn't a general solution for everyone...and potentially disastrous if you don't live by rules #1 and #2. But it does have numerous advantages: a) fear of my data getting nuked by cleanup() keeps me disciplined about using R exclusively as a processor and not a database, b) my main R environment is so small I can backup as an email attachment, c) new functions are automatically saved (I don't have to manually manage a list of personal functions) and d) all modifications to preexisting functions are retained. Of course the best advantage is the most obvious one...I don't have to spend time doing ls() and reviewing objects to decide whether they should be rm'd.
Even if you don't care for the specifics of my system, the "ll" function in R.oo is very useful for this kind of thing. It can be used to implement just about any set of clean up rules that fit your personal programming style.
Patrick Mohr
A nth, quick and dirty option, would be to use lsf.str() when using rm(), to get all the functions in the current workspace. ...and let you name the functions as you wish.
pattern <- paste0('*',lsf.str(), '$', collapse = "|")
rm(list = ls()[-grep(pattern, ls())])
I agree, it may not be the best practice, but it gets the job done! (and I have to selectively clean after myself anyway...)
Similar to Gavin's answer, the following loads a file of functions but without leaving an extra environment object around:
if('my_namespace' %in% search()) detach('my_namespace'); source('my_functions.R', attach(NULL, name='my_namespace'))
This removes the old version of the namespace if it was attached (useful for development), then attaches an empty new environment called my_namespace and sources my_functions.R into it. If you don't remove the old version you will build up multiple attached environments of the same name.
Should you wish to see which functions have been loaded, look at the output for
ls('my_namespace')
To unload, use
detach('my_namespace')
These attached functions, like a package, will not be deleted by rm(list=ls()).

Resources