Prevent function definition in .Rprofile from being overridden by variable assignment - r

I'd like to create shorter aliases for some R functions, like j=paste0. However when I add the line j=paste0 to ~/.Rprofile, it is later overridden when I use j as a variable name.
I was able to create my own package by first running package.skeleton() and then running this:
rm anRpackage/R/*
echo 'j=paste0'>anRpackage/R/test.R
echo 'export("j")'>anRpackage/NAMESPACE
rm -r anRpackage/man
R CMD install anRpackage
And then library(anRpackage);j=0;j("a",1) returns "a1". However is there some easier way to prevent the function definition from being overridden by the variable assignment?

The code in .Rprofile will run in the global environment so that's where the variable j will be defined. But if you use j as a variable later in the global environment, it will replace that value. Variables in a given environment must have unique names. But two different environments may have the same name defined and R will use the first version it finds that will work as a function when you attempt to call the variable name as a function. So basically you need to create a separate environment. You can do what with a package as you've done. You could also use attach() to add a new environment to your search path.
attach(list(j=paste0))
This will allow for the behavior you want.
attach(list(j=paste0))
j("a",1)
# [1] "a1"
j <- 5
j("a",1)
# [1] "a1"
Normally I would discourage people from using attach() because it causes confusing change to your workspace but this would do what you want. Just be careful if anyone else will be using your code because creating aliases for built in functions and reusing those as variable names is not a common thing to see in R scripts.
You can see where it was attached by looking at the output of search(). Normally it will be attached in the second position so you can remove it with detach(2)

I ended up putting my custom functions to ~/.R.R and I added these lines to .Rprofile:
if("myfuns"%in%search())detach("myfuns")
source("~/.R.R",attach(NULL,name="myfuns"))
From the help page of attach:
One useful ‘trick’ is to use ‘what = NULL’ (or equivalently a
length-zero list) to create a new environment on the search path
into which objects can be assigned by assign or load or
sys.source.
...
## create an environment on the search path and populate it
sys.source("myfuns.R", envir = attach(NULL, name = "myfuns"))

Related

A note on graphics::curve() in R CMD check

I use the following code in my own package.
graphics::curve( foo (x) )
When I run R CMD check, it said the following note.How do I delete the NOTE?
> checking R code for possible problems ... NOTE
foo: no visible binding for global variable 'x'
Undefined global functions or variables:
x
Edit for the answers:
I try the answer as follows.
function(...){
utils::globalVariables("x")
graphics::curve( sin(x) )
}
But it did not work. So,..., now, I use the following code, instead
function(...){
x <-1 # This is not used but to avoid the NOTE, I use an object "x".
graphics::curve( sin(x) )
}
The last code can remove the NOTE.
Huuum, I guess, the answer is correct, but, I am not sure but it dose not work for me.
Two things:
Add
utils::globalVariables("x")
This can be added in a file of its own (e.g., globals.R), or (my technique) within the file that contains that code.
It is not an error to include the same named variables in multiple files, so the same-file technique will preclude you from accidentally removing it when you remove one (but not another) reference. From the help docs: "Repeated calls in the same package accumulate the names of the global variables".
This must go outside of any function declarations, on its own (top-level). While it is included in the package source (it needs to be, in order to have an effect on the CHECK process), but otherwise has no impact on the package.
Add
importFrom(utils,globalVariables)
to your package NAMESPACE file, since you are now using that function (unless you want another CHECK warning about objects not found in the global environment :-).

Externally set default argument of a function from a package

I am building a package with functions that have default arguments. I would like to find a clean way to set the values of the default arguments once the function has been imported.
My first attempt was to set the default values to unknown objects (within the package). Then when I load the package, I would have a external script that would assign a value to the unknown objects.
But it does not seem very clean since I am compiling a function with an unknown object.
My issue is that I will need other people to use the functions, and since they have many arguments I want to keep the code as concise as possible. And many arguments can be set by a configuration script before running the program.
So, for instance, I define in my package:
function_try <- function(x = VAL){
return(x)
}
I compile the package and load it, and then I have an external script that does (or reading from a config file)
VAL <- "hello"
And then a user of the function can just run
function_try()
I would use options for that. So your function looks like:
function_try <- function(x = getOption("mypackage.default.value")) x
In your external script you make sure that the option value is set:
options(mypackage.default.value = "hello")
IMHO that is a clean solution, because anybody reading your function will see at first sight that a certain options value is used as a default and has also a clear understanding of how to overwrite this value if needed.
I would also define a fall back value in your library onLoad to make sure that the value is defined in the first place. You can then even react in your functions to this fallback value and issue a meaningful warning if the function realizes that the the external script did for whatever reason not provide a new value.

R user-defined functions in new environment

I use some user-defined small functions as helpers. These functions are all stored in a R_HOME_USER/helperdirectory. Until now, these functions were sourced at R start up. The overall method is something like `lapply(my.helper.list,source). I want now these functions to be sourced but not to appear in my environment, as they pollute it.
A first and clean approach would be to build a package with all my helper.R. For now, I do not want to follow this method. A second approach would be to name these helpers with a leading dot. This annoys me to have to run R > .helper1().
Best way would be to define these helpers in a specific and accessible environment, but I am messing with the code. My idea is to create first a new environment:
.helperEnv <- new.env(parent = baseenv())
attach(.helperEnv, name = '.helperEnv')
Fine, R > search() returns 'helperEnv' in the list. Then I run :
assign('helper1', helper1, envir = .helperEnv)
rm(helper1)
Fine, ls(.helperEnv)returns 'helper1' and this function does not appear anymore in my environment.
The issue is I can't run helper1 (object not found). I guess I am not on the right track and would appreciate some hints.
I think you should assign the pos argument in your call to attach as a negative number:
.helperEnv <- new.env()
.helperEnv$myfunc<-function(x) x^3+1
attach(.helperEnv,name="helper",pos=-1)
ls()
#character(0)
myfunc
#function(x) x^3+1

Protecting function names in R

Is it possible in R to protect function names (or variables in general) so that they cannot be masked.
I recently spotted that this can be a problem when creating a data frame with the name "new", which masked a function used by lmer and thus stopped it working. (Recovery is easy once you know what the problem is, here "rm(new)" did it.)
There is an easy workaround for your problem, without worrying about protecting variable names (though playing with lockBinding does look fun). If a function becomes masked, as in your example, it is still possible to call the masked version, with the help of the :: operator.
In general, the syntax is packagename::variablename.
(If the function you want has not been exported from the package, then you need three colons instead, :::. This shouldn't apply in this case however.)
Maybe use environments! This is a great way to separate namespaces. For example:
> a <- new.env()
> assign('printer', function(x) print(x), envir=a)
> get('printer', envir=a)('test!')
[1] "test!"
#hdallazuanna recommends (via Twitter)
new <- 1
lockBinding('new', globalenv())
this makes sense when the variable is user created but does not, of course, prevent overwriting a function from a package.
I had the reverse problem from the OP, and I wanted to prevent my custom functions in .Rprofile from being overridden when I defined a variable with the same name as a function, but I ended up putting my functions to ~/.R.R and I added these lines to .Rprofile:
if("myfuns"%in%search())detach("myfuns")
source("~/.R.R",attach(NULL,name="myfuns"))
From the help page of attach:
One useful ‘trick’ is to use ‘what = NULL’ (or equivalently a
length-zero list) to create a new environment on the search path
into which objects can be assigned by assign or load or
sys.source.
...
## create an environment on the search path and populate it
sys.source("myfuns.R", envir = attach(NULL, name = "myfuns"))

R: disentangling scopes

My question is about avoiding namespace pollution when writing modules in R.
Right now, in my R project, I have functions1.R with doFoo() and doBar(), functions2.R with other functions, and main.R with the main program in it, which first does source('functions1.R'); source('functions2.R'), and then calls the other functions.
I've been starting the program from the R GUI in Mac OS X, with source('main.R'). This is fine the first time, but after that, the variables that were defined the first time through the program are defined for the second time functions*.R are sourced, and so the functions get a whole bunch of extra variables defined.
I don't want that! I want an "undefined variable" error when my function uses a variable it shouldn't! Twice this has given me very late nights of debugging!
So how do other people deal with this sort of problem? Is there something like source(), but that makes an independent namespace that doesn't fall through to the main one? Making a package seems like one solution, but it seems like a big pain in the butt compared to e.g. Python, where a source file is automatically a separate namespace.
Any tips? Thank you!
I would explore two possible solutions to this.
a) Think more in a more functional manner. Don't create any variables outside of a function. so, for example, main.R should contain one function main(), which sources in the other files, and does the work. when main returns, none of the clutter will remain.
b) Clean things up manually:
#main.R
prior_variables <- ls()
source('functions1.R')
source('functions2.R')
#stuff happens
rm(list = setdiff(ls(),prior_variables))`
The main function you want to use is sys.source(), which will load your functions/variables in a namespace ("environment" in R) other than the global one. One other thing you can do in R that is fantastic is to attach namespaces to your search() path so that you need not reference the namespace directly. That is, if "namespace1" is on your search path, a function within it, say "fun1", need not be called as namespace1.fun1() as in Python, but as fun1(). [Method resolution order:] If there are many functions with the same name, the one in the environment that appears first in the search() list will be called. To call a function in a particular namespace explicitly, one of many possible syntaxes - albeit a bit ugly - is get("fun1","namespace1")(...) where ... are the arguments to fun1(). This should also work with variables, using the syntax get("var1","namespace1"). I do this all the time (I usually load just functions, but the distinction between functions and variables in R is small) so I've written a few convenience functions that loads from my ~/.Rprofile.
name.to.env <- function(env.name)
## returns named environment on search() path
pos.to.env(grep(env.name,search()))
attach.env <- function(env.name)
## creates and attaches environment to search path if it doesn't already exist
if( all(regexpr(env.name,search())<0) ) attach(NULL,name=env.name,pos=2)
populate.env <- function(env.name,path,...) {
## populates environment with functions in file or directory
## creates and attaches named environment to search() path
## if it doesn't already exist
attach.env(env.name)
if( file.info(path[1])$isdir )
lapply(list.files(path,full.names=TRUE,...),
sys.source,name.to.env(env.name)) else
lapply(path,sys.source,name.to.env(env.name))
invisible()
}
Example usage:
populate.env("fun1","pathtofile/functions1.R")
populate.env("fun2","pathtofile/functions2.R")
and so on, which will create two separate namespaces: "fun1" and "fun2", which are attached to the search() path ("fun2" will be higher on the search() list in this case). This is akin to doing something like
attach(NULL,name="fun1")
sys.source("pathtofile/functions1.R",pos.to.env(2))
manually for each file ("2" is the default position on the search() path). The way that populate.env() is written, if a directory, say "functions/", contains many R files without conflicting function names, you can call it as
populate.env("myfunctions","functions/")
to load all functions (and variables) into a single namespace. With name.to.env(), you can also do something like
with(name.to.env("fun1"), doStuff(var1))
or
evalq(doStuff(var1), name.to.env("fun1"))
Of course, if your project grows big and you have lots and lots of functions (and variables), writing a package is the way to go.
If you switch to using packages, you get namespaces as a side-benefit (provided you use a NAMESPACE file). There are other advantages for using packages.
If you were really trying to avoid packages (which you shouldn't), then you could try assigning your variables in specific environments.
Well avoiding namespace pollution, as you put it, is just a matter of diligently partitioning the namespace and keeping your global namespace uncluttered.
Here are the essential functions for those two kinds of tasks:
Understanding/Navigating the Namespace Structure
At start-up, R creates a new environment to store all objects created during that session--this is the "global environment".
# to get the name of that environment:
globalenv()
But this isn't the root environment. The root is an environment called "the empty environment"--all environments chain back to it:
emptyenv()
returns: <environment: R_EmptyEnv>
# to view all of the chained parent environments (which includes '.GlobalEnv'):
search()
Creating New Environments:
workspace1 = new.env()
is.environment(workspace1)
returns: [1] TRUE
class(workspace1)
returns: [1] "environment"
# add an object to this new environment:
with(workspace1, attach(what="/Users/doug/Documents/test_obj.RData",
name=deparse(substitute(what)), warn.conflicts=T, pos=2))
# verify that it's there:
exists("test_obj", where=workspace1)
returns: [1] TRUE
# to locate the new environment (if it's not visible from your current environment)
parent.env(workspace1)
returns: <environment: R_GlobalEnv>
objects(".GlobalEnv")
returns: [1] "test_obj"
Coming from python, et al., this system (at first) seemed to me like a room full of carnival mirrors. The R Gurus on the other hand seem to be quite comfortable with it. I'm sure there are a number of reasons why, but my intuition is that they don't let environments persist. I notice that R beginners use 'attach', as in attach('this_dataframe'); I've noticed that experienced R users don't do that; they use 'with' instead eg,
with(this_dataframe, tapply(etc....))
(I suppose they would achieve the same thing if they used 'attach' then 'detach' but 'with' is faster and you don't have to remember the second step.) In other words, namespace collisions are avoided in part by limiting the objects visible from the global namespace.

Resources