Cleaning up the workspace "hiding" objects [duplicate] - r

This question already has answers here:
hiding personal functions in R
(7 answers)
Closed 9 years ago.
It's getting to the point where I have about 100 or so personal functions that I use for line by line data analysis. I generally use f.<mnemonic> nomenclature for my functions, but I'm finding that they're starting to get in the way of my work. Is there any way to hide them from the workspace? Such that ls() doesn't show them, but I can still use them?

If you have that many functions which you use on a repeated basis, consider putting them into a package. They can then live in their own namespace, which removes ls() clutter and also allows you to remove the f. prefix.

You can also put the function definitions into a separate environment, and then attach() that environment. (This is similar to Hong Ooi's suggestion, without the added step of making that into a loadable package.) I have this code in my .Rprofile file to set up some utility functions I commonly use:
local(env = my.fns, { # create a new env. all variables created below go into this env.
foo <- function (bar) {
# whatever foo does
}
# put as many function definitions here as you want
})
attach(my.fns)
All the functions inside my.fns are now available at the commandline, but the only thing that shows up in ls() is my.fns itself.

Try this to leave out the "f-dots":
fless <- function() { ls(env=.GlobalEnv)[!grepl("^f\\.", ls(env=.GlobalEnv) )]}
The ls() function looks at objects in an environment. If you only used (as I initially did) :
fless <- function() ls()[!grepl("^f\\.", ls())]
You get ... nothing. Adding .GlobalEnv moves the focus for ls() out to the usual workspace. The indexing is pretty straightforward. You are just removing (with the ! operator) anything that starts with "f." and since the "." is a special character in regex expressions, you need to escape it, ... and since the "\" is also a special character, the escape needs to be doubled.

A couple of options not already mentioned are
objects with names beginning with . are not shown by ls() (by default; you can turn this on with argument all.names = TRUE in the ls() call), so you could rename everything to .f.<mnemonic> in the source files.
In a similar vein to #Aaron's answer but use sys.source() to to source directly into an environment.
An example using sys.source() is shown below:
env <- attach(NULL, name = "myenv")
sys.source(fnames, env)
where fnames is a list of file names/paths from which to read your functions.

Related

Prevent function definition in .Rprofile from being overridden by variable assignment

I'd like to create shorter aliases for some R functions, like j=paste0. However when I add the line j=paste0 to ~/.Rprofile, it is later overridden when I use j as a variable name.
I was able to create my own package by first running package.skeleton() and then running this:
rm anRpackage/R/*
echo 'j=paste0'>anRpackage/R/test.R
echo 'export("j")'>anRpackage/NAMESPACE
rm -r anRpackage/man
R CMD install anRpackage
And then library(anRpackage);j=0;j("a",1) returns "a1". However is there some easier way to prevent the function definition from being overridden by the variable assignment?
The code in .Rprofile will run in the global environment so that's where the variable j will be defined. But if you use j as a variable later in the global environment, it will replace that value. Variables in a given environment must have unique names. But two different environments may have the same name defined and R will use the first version it finds that will work as a function when you attempt to call the variable name as a function. So basically you need to create a separate environment. You can do what with a package as you've done. You could also use attach() to add a new environment to your search path.
attach(list(j=paste0))
This will allow for the behavior you want.
attach(list(j=paste0))
j("a",1)
# [1] "a1"
j <- 5
j("a",1)
# [1] "a1"
Normally I would discourage people from using attach() because it causes confusing change to your workspace but this would do what you want. Just be careful if anyone else will be using your code because creating aliases for built in functions and reusing those as variable names is not a common thing to see in R scripts.
You can see where it was attached by looking at the output of search(). Normally it will be attached in the second position so you can remove it with detach(2)
I ended up putting my custom functions to ~/.R.R and I added these lines to .Rprofile:
if("myfuns"%in%search())detach("myfuns")
source("~/.R.R",attach(NULL,name="myfuns"))
From the help page of attach:
One useful ‘trick’ is to use ‘what = NULL’ (or equivalently a
length-zero list) to create a new environment on the search path
into which objects can be assigned by assign or load or
sys.source.
...
## create an environment on the search path and populate it
sys.source("myfuns.R", envir = attach(NULL, name = "myfuns"))

A note on graphics::curve() in R CMD check

I use the following code in my own package.
graphics::curve( foo (x) )
When I run R CMD check, it said the following note.How do I delete the NOTE?
> checking R code for possible problems ... NOTE
foo: no visible binding for global variable 'x'
Undefined global functions or variables:
x
Edit for the answers:
I try the answer as follows.
function(...){
utils::globalVariables("x")
graphics::curve( sin(x) )
}
But it did not work. So,..., now, I use the following code, instead
function(...){
x <-1 # This is not used but to avoid the NOTE, I use an object "x".
graphics::curve( sin(x) )
}
The last code can remove the NOTE.
Huuum, I guess, the answer is correct, but, I am not sure but it dose not work for me.
Two things:
Add
utils::globalVariables("x")
This can be added in a file of its own (e.g., globals.R), or (my technique) within the file that contains that code.
It is not an error to include the same named variables in multiple files, so the same-file technique will preclude you from accidentally removing it when you remove one (but not another) reference. From the help docs: "Repeated calls in the same package accumulate the names of the global variables".
This must go outside of any function declarations, on its own (top-level). While it is included in the package source (it needs to be, in order to have an effect on the CHECK process), but otherwise has no impact on the package.
Add
importFrom(utils,globalVariables)
to your package NAMESPACE file, since you are now using that function (unless you want another CHECK warning about objects not found in the global environment :-).

hiding personal functions in R

I have a few convenience functions in my .Rprofile, such as this handy function for returning the size of objects in memory. Sometimes I like to clean out my workspace without restarting and I do this with rm(list=ls()) which deletes all my user created objects AND my custom functions. I'd really like to not blow up my custom functions.
One way around this seems to be creating a package with my custom functions so that my functions end up in their own namespace. That's not particularly hard, but is there an easier way to ensure custom functions don't get killed by rm()?
Combine attach and sys.source to source into an environment and attach that environment. Here I have two functions in file my_fun.R:
foo <- function(x) {
mean(x)
}
bar <- function(x) {
sd(x)
}
Before I load these functions, they are obviously not found:
> foo(1:10)
Error: could not find function "foo"
> bar(1:10)
Error: could not find function "bar"
Create an environment and source the file into it:
> myEnv <- new.env()
> sys.source("my_fun.R", envir = myEnv)
Still not visible as we haven't attached anything
> foo(1:10)
Error: could not find function "foo"
> bar(1:10)
Error: could not find function "bar"
and when we do so, they are visible, and because we have attached a copy of the environment to the search path the functions survive being rm()-ed:
> attach(myEnv)
> foo(1:10)
[1] 5.5
> bar(1:10)
[1] 3.027650
> rm(list = ls())
> foo(1:10)
[1] 5.5
I still think you would be better off with your own personal package, but the above might suffice in the meantime. Just remember the copy on the search path is just that, a copy. If the functions are fairly stable and you're not editing them then the above might be useful but it is probably more hassle than it is worth if you are developing the functions and modifying them.
A second option is to just name them all .foo rather than foo as ls() will not return objects named like that unless argument all = TRUE is set:
> .foo <- function(x) mean(x)
> ls()
character(0)
> ls(all = TRUE)
[1] ".foo" ".Random.seed"
Here are two ways:
1) Have each of your function names start with a dot., e.g. .f instead of f. ls will not list such functions unless you use ls(all.names = TRUE) therefore they won't be passed to your rm command.
or,
2) Put this in your .Rprofile
attach(list(
f = function(x) x,
g = function(x) x*x
), name = "MyFunctions")
The functions will appear as a component named "MyFunctions" on your search list rather than in your workspace and they will be accessible almost the same as if they were in your workspace. search() will display your search list and ls("MyFunctions") will list the names of the functions you attached. Since they are not in your workspace the rm command you normally use won't remove them. If you do wish to remove them use detach("MyFunctions") .
Gavin's answer is wonderful, and I just upvoted it. Merely for completeness, let me toss in another one:
R> q("no")
followed by
M-x R
to create a new session---which re-reads the .Rprofile. Easy, fast, and cheap.
Other than that, private packages are the way in my book.
Another alternative: keep the functions in a separate file which is sourced within .RProfile. You can re-source the contents directly from within R at your leisure.
I find that often my R environment gets cluttered with various objects when I'm creating or debugging a function. I wanted a way to efficiently keep the environment free of these objects while retaining personal functions.
The simple function below was my solution. It does 2 things:
1) deletes all non-function objects that do not begin with a capital letter and then
2) saves the environment as an RData file
(requires the R.oo package)
cleanup=function(filename="C:/mymainR.RData"){
library(R.oo)
# create a dataframe listing all personal objects
everything=ll(envir=1)
#get the objects that are not functions
nonfunction=as.vector(everything[everything$data.class!="function",1])
#nonfunction objects that do not begin with a capital letter should be deleted
trash=nonfunction[grep('[[:lower:]]{1}',nonfunction)]
remove(list=trash,pos=1)
#save the R environment
save.image(filename)
print(paste("New, CLEAN R environment saved in",filename))
}
In order to use this function 3 rules must always be kept:
1) Keep all data external to R.
2) Use names that begin with a capital letter for non-function objects that I want to keep permanently available.
3) Obsolete functions must be removed manually with rm.
Obviously this isn't a general solution for everyone...and potentially disastrous if you don't live by rules #1 and #2. But it does have numerous advantages: a) fear of my data getting nuked by cleanup() keeps me disciplined about using R exclusively as a processor and not a database, b) my main R environment is so small I can backup as an email attachment, c) new functions are automatically saved (I don't have to manually manage a list of personal functions) and d) all modifications to preexisting functions are retained. Of course the best advantage is the most obvious one...I don't have to spend time doing ls() and reviewing objects to decide whether they should be rm'd.
Even if you don't care for the specifics of my system, the "ll" function in R.oo is very useful for this kind of thing. It can be used to implement just about any set of clean up rules that fit your personal programming style.
Patrick Mohr
A nth, quick and dirty option, would be to use lsf.str() when using rm(), to get all the functions in the current workspace. ...and let you name the functions as you wish.
pattern <- paste0('*',lsf.str(), '$', collapse = "|")
rm(list = ls()[-grep(pattern, ls())])
I agree, it may not be the best practice, but it gets the job done! (and I have to selectively clean after myself anyway...)
Similar to Gavin's answer, the following loads a file of functions but without leaving an extra environment object around:
if('my_namespace' %in% search()) detach('my_namespace'); source('my_functions.R', attach(NULL, name='my_namespace'))
This removes the old version of the namespace if it was attached (useful for development), then attaches an empty new environment called my_namespace and sources my_functions.R into it. If you don't remove the old version you will build up multiple attached environments of the same name.
Should you wish to see which functions have been loaded, look at the output for
ls('my_namespace')
To unload, use
detach('my_namespace')
These attached functions, like a package, will not be deleted by rm(list=ls()).

Protecting function names in R

Is it possible in R to protect function names (or variables in general) so that they cannot be masked.
I recently spotted that this can be a problem when creating a data frame with the name "new", which masked a function used by lmer and thus stopped it working. (Recovery is easy once you know what the problem is, here "rm(new)" did it.)
There is an easy workaround for your problem, without worrying about protecting variable names (though playing with lockBinding does look fun). If a function becomes masked, as in your example, it is still possible to call the masked version, with the help of the :: operator.
In general, the syntax is packagename::variablename.
(If the function you want has not been exported from the package, then you need three colons instead, :::. This shouldn't apply in this case however.)
Maybe use environments! This is a great way to separate namespaces. For example:
> a <- new.env()
> assign('printer', function(x) print(x), envir=a)
> get('printer', envir=a)('test!')
[1] "test!"
#hdallazuanna recommends (via Twitter)
new <- 1
lockBinding('new', globalenv())
this makes sense when the variable is user created but does not, of course, prevent overwriting a function from a package.
I had the reverse problem from the OP, and I wanted to prevent my custom functions in .Rprofile from being overridden when I defined a variable with the same name as a function, but I ended up putting my functions to ~/.R.R and I added these lines to .Rprofile:
if("myfuns"%in%search())detach("myfuns")
source("~/.R.R",attach(NULL,name="myfuns"))
From the help page of attach:
One useful ‘trick’ is to use ‘what = NULL’ (or equivalently a
length-zero list) to create a new environment on the search path
into which objects can be assigned by assign or load or
sys.source.
...
## create an environment on the search path and populate it
sys.source("myfuns.R", envir = attach(NULL, name = "myfuns"))

R: disentangling scopes

My question is about avoiding namespace pollution when writing modules in R.
Right now, in my R project, I have functions1.R with doFoo() and doBar(), functions2.R with other functions, and main.R with the main program in it, which first does source('functions1.R'); source('functions2.R'), and then calls the other functions.
I've been starting the program from the R GUI in Mac OS X, with source('main.R'). This is fine the first time, but after that, the variables that were defined the first time through the program are defined for the second time functions*.R are sourced, and so the functions get a whole bunch of extra variables defined.
I don't want that! I want an "undefined variable" error when my function uses a variable it shouldn't! Twice this has given me very late nights of debugging!
So how do other people deal with this sort of problem? Is there something like source(), but that makes an independent namespace that doesn't fall through to the main one? Making a package seems like one solution, but it seems like a big pain in the butt compared to e.g. Python, where a source file is automatically a separate namespace.
Any tips? Thank you!
I would explore two possible solutions to this.
a) Think more in a more functional manner. Don't create any variables outside of a function. so, for example, main.R should contain one function main(), which sources in the other files, and does the work. when main returns, none of the clutter will remain.
b) Clean things up manually:
#main.R
prior_variables <- ls()
source('functions1.R')
source('functions2.R')
#stuff happens
rm(list = setdiff(ls(),prior_variables))`
The main function you want to use is sys.source(), which will load your functions/variables in a namespace ("environment" in R) other than the global one. One other thing you can do in R that is fantastic is to attach namespaces to your search() path so that you need not reference the namespace directly. That is, if "namespace1" is on your search path, a function within it, say "fun1", need not be called as namespace1.fun1() as in Python, but as fun1(). [Method resolution order:] If there are many functions with the same name, the one in the environment that appears first in the search() list will be called. To call a function in a particular namespace explicitly, one of many possible syntaxes - albeit a bit ugly - is get("fun1","namespace1")(...) where ... are the arguments to fun1(). This should also work with variables, using the syntax get("var1","namespace1"). I do this all the time (I usually load just functions, but the distinction between functions and variables in R is small) so I've written a few convenience functions that loads from my ~/.Rprofile.
name.to.env <- function(env.name)
## returns named environment on search() path
pos.to.env(grep(env.name,search()))
attach.env <- function(env.name)
## creates and attaches environment to search path if it doesn't already exist
if( all(regexpr(env.name,search())<0) ) attach(NULL,name=env.name,pos=2)
populate.env <- function(env.name,path,...) {
## populates environment with functions in file or directory
## creates and attaches named environment to search() path
## if it doesn't already exist
attach.env(env.name)
if( file.info(path[1])$isdir )
lapply(list.files(path,full.names=TRUE,...),
sys.source,name.to.env(env.name)) else
lapply(path,sys.source,name.to.env(env.name))
invisible()
}
Example usage:
populate.env("fun1","pathtofile/functions1.R")
populate.env("fun2","pathtofile/functions2.R")
and so on, which will create two separate namespaces: "fun1" and "fun2", which are attached to the search() path ("fun2" will be higher on the search() list in this case). This is akin to doing something like
attach(NULL,name="fun1")
sys.source("pathtofile/functions1.R",pos.to.env(2))
manually for each file ("2" is the default position on the search() path). The way that populate.env() is written, if a directory, say "functions/", contains many R files without conflicting function names, you can call it as
populate.env("myfunctions","functions/")
to load all functions (and variables) into a single namespace. With name.to.env(), you can also do something like
with(name.to.env("fun1"), doStuff(var1))
or
evalq(doStuff(var1), name.to.env("fun1"))
Of course, if your project grows big and you have lots and lots of functions (and variables), writing a package is the way to go.
If you switch to using packages, you get namespaces as a side-benefit (provided you use a NAMESPACE file). There are other advantages for using packages.
If you were really trying to avoid packages (which you shouldn't), then you could try assigning your variables in specific environments.
Well avoiding namespace pollution, as you put it, is just a matter of diligently partitioning the namespace and keeping your global namespace uncluttered.
Here are the essential functions for those two kinds of tasks:
Understanding/Navigating the Namespace Structure
At start-up, R creates a new environment to store all objects created during that session--this is the "global environment".
# to get the name of that environment:
globalenv()
But this isn't the root environment. The root is an environment called "the empty environment"--all environments chain back to it:
emptyenv()
returns: <environment: R_EmptyEnv>
# to view all of the chained parent environments (which includes '.GlobalEnv'):
search()
Creating New Environments:
workspace1 = new.env()
is.environment(workspace1)
returns: [1] TRUE
class(workspace1)
returns: [1] "environment"
# add an object to this new environment:
with(workspace1, attach(what="/Users/doug/Documents/test_obj.RData",
name=deparse(substitute(what)), warn.conflicts=T, pos=2))
# verify that it's there:
exists("test_obj", where=workspace1)
returns: [1] TRUE
# to locate the new environment (if it's not visible from your current environment)
parent.env(workspace1)
returns: <environment: R_GlobalEnv>
objects(".GlobalEnv")
returns: [1] "test_obj"
Coming from python, et al., this system (at first) seemed to me like a room full of carnival mirrors. The R Gurus on the other hand seem to be quite comfortable with it. I'm sure there are a number of reasons why, but my intuition is that they don't let environments persist. I notice that R beginners use 'attach', as in attach('this_dataframe'); I've noticed that experienced R users don't do that; they use 'with' instead eg,
with(this_dataframe, tapply(etc....))
(I suppose they would achieve the same thing if they used 'attach' then 'detach' but 'with' is faster and you don't have to remember the second step.) In other words, namespace collisions are avoided in part by limiting the objects visible from the global namespace.

Resources