How to check for accidentally redefining a function name in R - r

I'm writing some fairly involved R code spread across multiple files and collected together into a package. A problem I've run into on occasion is that I will define a utility function in one file that has the same name as another utility function defined in another file. One of the two definitions gets replaced, leading to unintended behavior. Is there any sort of tool to check for this kind of accidental redefinition? Something that would check that no two top-level assignments foo <- ... in the package assign to the same name?

As pointed out in the comments, the right way to do this is to use packages. Packages give functions their own namespaces automatically, plus they make it very easy to reuse and share code. If you're using RStudio, you can create one with very little effort from the New Project menu.
However, if you can't use packages or namespaces for some reason, there's still a way to do what you want: you can lock a variable (including a function) so that it's not possible to overwrite it.
> pin <- 11
> lockBinding("pin", .GlobalEnv)
> pin <- 12
Error: cannot change value of locked binding for 'pin'
See Binding and Environment Locking for details.

Related

Function parameters - replace by reference

Thanks for all your advice. My remaining question is this:
Can I replace column name 'sulphate' in the following statement ...
dataclean <- datatable$sulfate[!datanas]
.... with a reference to a parameter 'pollutant', which may or may not have a value of 'sulfate'?
When you attach values to arguments, they appear as they would be objects in your workspace. But the environment is not workspace but that of the function.
So in your case, directory would be a character string and it would work. For the first time. Your working directory is now changed and you need to revert back to the previous one for the function to work again. This can get pretty messy so what I like to do is just refer to raw files by full path. See ?list.files for more info.
For your second question, your best bet is to refer to a certain level within the variable, is to do
x[, pollutant]
It is convenient to add drop = FALSE argument there, in order to keep the what I'm assuming is a data.frame.
You could improve your function by also implementing the datatable argument. That way you have all the objects bundled together nicely.
The most important thing to note here would be "debugging". You should learn to use at least browser(). This function will stop the execution of your function at the very step where it was called. This enables you, in the R console, to inspect elements in the function and run code to see what's going. This way you can speed up the development of code, at least initially when you usually haven't internalized all the data structures and paradigms yet.

how to use utils::globalVariables

Following your recommendations (or trying to do it, at least), I have tried some options, but the problem remains, so there must be something I am missing.
I have included a more complete code
setwd("C:/naapp")
#' #import utils
#' #import devtools
I have tried with and without using suppressForeignCheck
if(getRversion() >= "2.15.1"){
utils::globalVariables(c("eleven"))
utils::suppressForeignCheck(c("eleven"))
}
myFunctionSum <- function(X){print(X+eleven)}
myFunctionMul <- function(X){print(X*eleven)}
myFunction11 <- function(X){
assign("eleven",11,envir=environment(myFunctionMul))
}
maybe I should use a particular environment?
package.skeleton(name = "myPack11", list=ls(),
path = "C:/naapp", force = TRUE,
code_files = character())
I remove the "man" directory from the directory myPack11,
otherwise I would get an error because the help files are empty.
I add the imports utils, and devtools to the descrption
Then I run check
devtools::check("myPack11")
And I still get this note
#checking R code for possible problems ... NOTE
#myFunctionMul: no visible binding for global variable 'eleven'
#myFunctionSum: no visible binding for global variable 'eleven'
#Undefined global functions or variables:eleven
I have tried also to make an enviroment, combining Tomas Kalibera's suggetion and an example I found in the Internet.
myEnvir <- new.env()
myEnvir$eleven <- 11
etc
In this case, I get the same note, but with "myEnvir", instead of "eleven"
First version of the question
I trying to use "globalVariables" from the package utils. I am building an interface in R and I am planning to submit to CRAN. This is my first time, so, sorry if the question is very basic.I have read the help and I have tried to find examples, but I still don't know how to use it.
I have made a little silly example to ilustrate my question, which is:
Where do I have to place this line exactly?:
if(getRversion() >= "2.15.1"){utils::globalVariables("eleven")}
My example has three functions. myFunction11 creates the global variable "eleven" and the other two functions manipulate it. In my real code, I cannot use arguments in the functions that are called by means of a button. Consider that this is just a silly example to learn how to use globalVariables (to avoid binding notes).
myFunction11 <- function(){
assign("eleven",11,envir=environment(myFunctionSum))
}
myFunctionSum <- function(X){
print(X+eleven)
}
myFunctionMul <- function(X){
print(X*eleven)
}
Thank you in advance
I thought that the file globals.R would be automatically generated when using globalsVariables. The problem was that I needed to create the package skeleton, then create the file globals.R, add it to the R directory in the package and check the package.
So, I needed to place this in a different file:
#' #import utils
utils::globalVariables(c("eleven"))
and save it
The documentation clearly says:
## In the same source file (to remind you that you did it) add:
if(getRversion() >= "2.15.1") utils::globalVariables(c(".obj1", "obj2"))
so put it in the same source file as your functions. It can go in any of your R source files, but the comment above recommends you put it close to your code. Looking at a bunch of github packages reveals another common pattern is to have a globals.R function with it in, but this is probably a bad idea. If you later remove the global from your package but neglect to update globals.R you could mask a problem. Putting it right close to the functions that use it will hopefully remind you when you edit those functions.
Make sure you put it outside any function definitions in the file, or it won't get seen.
You cannot modify bindings in a package namespace once the package is loaded (and namespace sealed, and bindings locked). The check tool helps you to spot violations of this restriction, so you find out about the problem when checking the package rather than while running it. globalVariables is just a call to silence check when looking for these violations, which is undesirable in almost all cases. If you really need mutable state in a package, you can create a new environment (using new.env) and bind it to an (unexported) "global" variable in your namespace. This binding will be locked, but this is ok, because in R you can change an environment in place (add/remove elements, effectively modifying the elements).
The best situation is however when you can keep all mutable state in user objects (passed in as arguments into functions, and their modified versions returned as output values of functions).

Dynamically Generate Reference Classes

I'm attempting to generate reference classes within an R package on the fly, and it's proving to be fairly difficult. Here are the approaches I've taken and problems I've run into:
I'm creating a package in which I hope to be able to dynamically read in a schema and automatically generate an associated reference class (think SOAP). Of course, this means I won't be able to define my reference classes before-hand in the package sources.
I initially attempted to create a new class using a simple:
myClass <- setRefClass("NewClassName", fields=list(fieldA="character"))
which, of course, works fine when executed interactively, but when included in the package sources, I get a locked binding error. From my reading, it looks like this occurs because when running interactively, the class information is being stored in the global environment, which is not locked, while my package's base environment is locked.
I then found a thread that suggested using something to the effect of:
myClass <- setRefClass("NewClassName", fields=list(fieldA="character"), where=globalenv())
This actually crashed R/Studio when I tried to build the package, so I don't have a log of the error it generated, unfortunately, but it certainly didn't work.
Next I tried creating a new environment within my package which I could use to store these reference classes. So I added a .classEnv <- new.env() line in my package sources (not inside of any function) and then attempted to use this class when creating a new reference class:
myClass <- setRefClass("NewClassName", fields=list(fieldA="character"), where=.classEnv)
This actually seemed to work OK, but generates the following warning:
> myClass <- setRefClass("NewClassName", where=.classEnv)
Warning message:
In getPackageName(where) :
Created a package name, ‘2013-04-23 10:19:14’, when none found
So, for some reason, methods::getPackageName() isn't able to pick up which package my new environment is in?
Is there a way to create my new environment differently so that getPackageName() can properly recognize the package? Can I add some feature which allows me to help getPackageName() detect the package? Will this even work if I can deal with the warning, or am I misusing reference classes by trying to create them dynamically?
To get the conversation going, I found that getpackageName stores the package name in a hidden .packageName variable in the specified environment.
So you can actually get around the warning with
assign(".packageName", "MyPkg", envir=.classEnv)
myClass <- setRefClass("NewClassName", fields=classFields, where=.classEnv)
which resolves the warning, but the documentation says not to trust the .packageName variable indefinitely, and I still feel like I'm hacking this in and may be misunderstanding something important about reference classes and their relationship to environments.
Full details from documentation:
Package names are normally installed during loading of the package, by the INSTALL script or by the library function. (Currently, the name is stored as the object .packageName but don't trust this for the future.)
Edit:
After reading a little further, the setPackageName method may be a more reliable way to set the package name for the environment. Per the docs:
setPackageName can be used to establish a package name in an environment that would otherwise not have one. This allows you to create classes and/or methods in an arbitrary environment, but it is usually preferable to create packages by the standard R programming tools (package.skeleton, etc.)
So it looks like one valid solution would be the following:
setPackageName("MyPkg", .classEnv)
myClass <- setRefClass("NewClassName", fields=classFields, where=.classEnv)
That eliminates the warning message and doesn't rely on anything that's documented as unstable. I'm still not clear why it's necessary, but...

How to prevent functions polluting global namespace?

My R project is getting increasingly complex, and I'm starting to look for some construct that's equivalent to classes in Java/C#, or modules in python, so that my global namespace doesn't become littered with functions that are never used outside of one particular .r file.
So, I guess my question is: to what extent is it possible to limit the scope of functions to within a specific .r file, or similar?
I think I can just make the entire .r file into one giant function, and put functions inside that, but that messes with the echoing:
myfile.r:
myfile <- function() {
somefunction <- function(a,b,c){}
anotherfunction <- function(a,b,c){}
# do some stuff here...
123
456
# ...
}
myfile()
Output:
> source("myfile.r",echo=T)
> myfile <- function() {
+ somefunction <- function(a,b,c){}
+ anotherfunction <- function(a,b,c){}
+
+ # do some stuff here...
+ # . .... [TRUNCATED]
> myfile()
>
You can see that "123" is not printed, even though we used echo=T in the source command.
I'm wondering if there is some other construct which is more standard, since putting everything inside a single function doesn't sound like something that is really standard? But perhaps it is? Also, if it means that echo=T works then that is a definite bonus for me.
Firstly, as #Spacedman has said, you'll be best served by a package but there are other options.
S3 Methods
R's original "object orientation" is known as S3. The majority of R's code base uses this particular paradigm. It is what makes plot() work for all kinds of objects. plot() is a generic function and the R Core Team and package developers can and have written their own methods for plot(). Strictly these methods might have names like plot.foo() where foo is a class of object for which the function defines a plot() method. The beauty of S3 is that you don't (hardly) ever need to know or call plot.foo() you just use plot(bar) and R works out which plot() method to dispatch to based on the class of object bar.
In your comments on your Question you mention that you have a function populate() that has methods (in effect) for classes "crossvalidate" and "prod" which you keep in separate .r files. The S3 way to set this up is to do:
populate <- function(x, ...) { ## add whatever args you want/need
UseMethod("populate")
}
populate.crossvalidate <-
function(x, y, z, ...) { ## add args but must those of generic
## function code here
}
populate.prod <-
function(x, y, z, ...) { ## add args but must have those of generic
## function code here
}
The given some object bar with class "prod", calling
populate(bar)
will result in R calling populate() (the generic), it then looks for a function with name populate.prod because that is the class of bar. It finds our populate.prod() and so dispatches that function passing on to it the arguments we initially specified.
So you see that you only ever refer to the methods using the name of the generic, not the full function name. R works out for you what method needs to be called.
The two populate() methods can have very different arguments, with exception that strictly they should have the same arguments as the generic function. So in the example above, all methods should have arguments x and .... (There is an exception for methods that employ formula objects but we don't need to worry about that here.)
Package Namespaces
Since R 2.14.0, all R packages have had their own namespace, even if one were not provided by the package author, although namespaces have been around for a lot longer in R than that.
In your example, we wish to register the populate() generic and it's two methods with the S3 system. We also wish to export the generic function. Usually we don't want or need to export the individual methods. So, pop your functions into .R files in the R folder of the package sources and then in the top level of the package sources create a file named NAMESPACE and add the following statements:
export(populate) ## export generic
S3method(populate, crossvalidate) ## register methods
S3method(populate, prod)
Then once you have installed your package, you will note that you can call populate() but R will complain if you try to call populate.prod() etc directly by name from the prompt or in another function. This is because the functions that are the individual methods have not been exported from the namespace and thence are not visible outside it. Any function in your package that call populate() will be able to access the methods you have defined, but any functions or code outside your package can't see the methods at all. If you want, you can call non-exported functions using the ::: operator, i.e.
mypkg:::populate.crossvalidate(foo, bar)
will work, where mypkg is the name of your package.
To be honest, you don't even need a NAMESPACE file as R will auto generate one when you install the package, one that automatically exports all functions. That way your two methods will be visible as populate.xxx() (where xxx is the particular method) and will operate as S3 methods.
Read Section 1 Creating R Packages in the Writing R Extensions manual for details of what is involved, but yuo won't need to do half of this if you don't want too, especially if the package is for your own use. Just create the appropriate package folders (i.e. R and man), stick your .R files in R. Write a single .Rd file in man where you add
\name{Misc Functions}
\alias{populate}
\alias{populate.crossvalidate}
\alias{populate.prod}
at the top of the file. Add \alias{} for any other functions you have. Then you'll need to build and install the package.
Alternative using sys.source()
Although I don't (can't!) really recommend what I mention below as a long-term viable option here, there is an alternative that will allow you to isolate the functions from individual .r files as you initially requested. This is achieved through the use of environments not namespaces and doesn't involve creating a package.
The sys.source() function can be used to source R code/functions from a .R file and evaluate it in an environment. As you .R file is creating/defining functions, if you source it inside another environment then those will functions will be defined there, in that environment. They won't be visible on the standard search path by default and hence a populate() function defined in crossvalidate.R will not clash with a populate() defined in prod.R as long as you use two separate environments. When you need to use one set of functions you can assign the environment to the search path, upon which it will then be miraculously visible to everything, and when you are done you can detach it. The attach the other environment, use it, detach etc. Or you can arrange for R code to be evaluated in a specific environment using things like eval().
Like I said, this isn't a recommended solution but it will work, after a fashion, in the manner you describe. For example
## two source files that both define the same function
writeLines("populate <- function(x) 1:10", con = "crossvalidate.R")
writeLines("populate <- function(x) letters[1:10]", con = "prod.R")
## create two environments
crossvalidate <- new.env()
prod <- new.env()
## source the .R files into their respective environments
sys.source("crossvalidate.R", envir = crossvalidate)
sys.source("prod.R", envir = prod)
## show that there are no populates find-able on the search path
> ls()
[1] "crossvalidate" "prod"
> find("populate")
character(0)
Now, attach one of the environments and call populate():
> attach(crossvalidate)
> populate()
[1] 1 2 3 4 5 6 7 8 9 10
> detach(crossvalidate)
Now call the function in the other environment
> attach(prod)
> populate()
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
> detach(prod)
Clearly, each time you want to use a particular function, you need to attach() its environment and then call it, followed by a detach() call. Which is a pain.
I did say you can arrange for R code (expressions really) to be evaluated in a stated environment. You can use eval() of with() for this for example.
> with(crossvalidate, populate())
[1] 1 2 3 4 5 6 7 8 9 10
At least now you only need a single call to run the version of populate() of your choice. However, if calling the functions by their full name, e.g. populate.crossvalidate() is too much effort (as per your comments) then I dare say that even the with() idea will be too much hassle? And anyway, why would you use this when you can quite easily have your own R package.
Don't worry about the complexity of 'making a package'. Stop thinking of it like that. What you are going to do is this:
in the folder where you are working on your project, make a folder called 'R'
put your R code in there, one function per file
make a DESCRIPTION file in your project directory. Check out existing examples for the exact format, but you only need a few fields.
Get devtools. install.packages("devtools")
Use devtools. library(devtools)
Now, write your functions in your R files in your R folder. To load them into R, DONT source them. Do load_all(). Your functions will be loaded but NOT into the global environment.
Edit one of your R files, then do load_all() again. This will load any modified files in the R folder, thus updating your function.
That's it. Edit, load_all, rinse and repeat. You have created a package, but its pretty lightweight and you don't have to deal with the bondage and discipline of R's package building tools.
I've seen, used, and even written code that tries to implement a lightweight packagey mechanism for loading objects, and none are as good as what devtools does.
All Hail Hadley!
You might want to consider making a package. As an alternative, you could look at environments. Finally, RStudio's projects may be closer to what would suit you.

when do you want to set up new environments in R

per the discussion of R programming styles, I saw someone once said he puts all his custom function into a new environment and attach it. I also remember R environment maybe used as a hash table. Is this good style? When do you want to put your data/functions into a new enviroment? Or just use the.GlobalEnv whatever?
EDIT put my second part of question back:
how to inspect same name variable for different environments?
Martin Mächler suggests that this is the one time you might want to consider attach(), although he suggested it in the context of attaching a .Rdata file to the search path but your Q is essentially the same thing.
The advantage is that you don't clutter the global environment with functions, that might get overwritten accidentally etc. Whilst I wouldn't go so far as to call this bad style, you might be better off sticking your custom functions into your own personal R package. Yes, this will incur a bit of overhead of setting up the package structure and providing some documentation to allow the package to be installed, but in the longer term this is a better solution. With tools like roxygen this process is getting easier to boot.
Personally, I haven't found a need for fiddling with environments in 10+ years of using R; well documented scripts that load, process and analyse data, cleaning up after themselves all the while have served me well thus far.
Another suggestion for the second part of your question (now deleted) is to use with() (following on from #Joshua's example):
> .myEnv <- new.env()
> .myEnv$a <- 2
> a <- 1
> str(a)
num 1
> ls.str(.myEnv, a)
a : num 2
> str(.myEnv$a)
num 2
> with(.myEnv, a)
[1] 2
> a
[1] 1
If your ecosystem of data and code has grown large enough that you are considering isolating it in an environment, you are better off creating a package. A package gives you much more support for:
Managing a project that is growing large and complex by separating code and data into files so there is less to dig through at one time.
A package makes it dead simple to hand off your work to someone else so they can use your code and data.
A package provides additional support for documentation and reporting.
Setting up a package for R is so easy, just call package.skeleton(), that every project I work on gets its code and data stored in a package.
The only time I use environments is when I need to isolate a run of some code, usually a script written by someone else, so that it's side effects and variable names don't get crossed with mine. I do this by evalq(source('someScript.R', local=TRUE), SomeEnvironment).
To answer your second question (that you've now deleted), use ls.str, or just access the object in the environment with $:
.myEnv <- new.env()
.myEnv$a <- 2
a <- 1
str(a)
ls.str(.myEnv, a)
str(.myEnv$a)

Resources