R package can see variables not passed to it - r

I am writing a new R package and find that variables that I have not explicitly passed to a function in the package (as input argument) are visible within it, e.g.:
myFunc <- function(a,b,c) {
print(d)
}
where d is in the caller .R script, but has not been passed to myFunc, is visible.
Any help would be great, thanks; I'm using R 3.2.4 and have been using roxygen2 (via devtools::document()) to create the NAMESPACE if that helps.

Isn't this just a consequence of the scoping rules in R?
Your function defines a new myFunc environment. When you try to reference d in print(d), the interpreter first checks the myFunc environment for an object called d. Because no such object exists, the interpreter next checks the calling environment for an object called d. It finds the variable defined in your .R script and then prints it.
Here's a link with more info and a pile of examples.

Very useful link, thanks. It looks like forcing limited scoping within a function (i.e. getting a function to not access the global scope) is not a default property of R.
I found a similar question here: R force local scope
Using the checkStrict function posted by the main responder to that question seems to have worked; it found an unintended use of a global variable.
> require(myCustomPackage)
> checkStrict(showDendro)
Warning message:
In checkStrict(showDendro) : global variables used: palName
where showDendro is a function inside my custom package.
So it seems the solution to my problem is:
1) while you can stop R from moving up to the global environment by enclosing all your functions in the local() function , that seems like a tedious solution.
2) when moving code from the general environment into its own function, run something like checkStrict to remove unintended use of global variables.

Related

Setting the environment of a function to the package environment it came from

I have a function that is part of my own package. When called from namespace:package_name, the function behaves slightly different than when it is called from R_GlobalEnv. To find out why I want to debug the function.
To make debugging the function as easy as possible, I want to write the changes I made to the function to the namespace:package_name environment, so I can immediately test it, but I cannot figure out how to do this.
If I type namespace:package_name, it gives the environment at the bottom as follows:
<bytecode: 0x00000236403c9da0>
<environment: namespace:package_name>
But if I try environment(fun) <- namespace:package_name, it says:
Error: object 'namespace' not found.
How do I set the environment of my function to the package environment?
EDIT:
Surprisingly fixInNamespace does not leave the namespace intact.

How to use R environments inside package?

I'm having a problem where a variable seems to not be found using with()
I have a package, with internal data that is an environment (ENV); so every function within my package can access (and modify) ENV.
Essentially I'm using the following code inside my package:
In /data-raw I make an environment, and save it to /data
ENV = new.env()
ENV$A = 2
ENV$B = 3
And in my package:
foo<-function(bar){
with(ENV,{
if(nrow(bar)==0){
print(ENV$A)
} else {
print(ENV$B)
}
})
}
bar = data.frame()
foo(bar)
What I actually get is: Error in nrow(bar) { : object '.result' not found
I thought the function environment would be the parent of the with environment... Can I not access the function's variables like this?
Thanks for any help.
Update
So, they're definitely in different places. The parent environment inside the function is my package namespace, whereas the parent inside with is the global environment.
I thought the function environment would be the parent of the with environment...
No: the parent environment of with is the parent environment of the environment. In fact, that’s one of the fundamental issues that commands such with have.
To work around this you could convert your environment to a list (via as.list). Of course that copies all objects in the environment so it’s potentially inefficient. It also makes modifying objects inside the environment impossible.

How to include a closure in an R-package?

I would like to include a closure with the functions of an R package we are writing. The function (and its siblings) will have data in its environment, perform a comparison of input with the data, and return the result. To illustrate, think of a function with an inbuilt telephone directory: you query with a number and the function returns a name.
This function will be called as a helper by several other functions in our R package, so it has to exist once the package is loaded. And we want the function to be available in the package environment, just like any other function.
Should I create it via its factory function in .onLoad() and assign() it to the package environment? Could I ship it as an .RDS? Or RData, or does this violate CRAN policy on "binary executable code"? Or is there a different, canonical way? And where would the code and the data (or the RDS/RData) go in the package directory structure?
(I see that the question of how to document a closure has been discussed here).
For the benefit of anyone stumbling on this question. The solution I finally worked out involved a few steps but is "clean" as far as I can tell.
Put the factory function in a file R/aaa.R to ensure it gets loaded before the closure.
Put the data that the closure uses into the standard inst/extdata/ folder.
Put a file with the closure's name and proper docstring into R/: define the closure as a normal function that just returns nothing. This is necessary so the function is properly exported and known in the package namespace. Immediately call the factory function to create the closure and overwrite the original definition. Note: it's not enough to just bring the data into the factory function as an argument, it actually needs to be accessed before defining the closure. Why? That's because lazy loading won't actually have loaded the data into the environment you need it in unless you access it.
That's all. Summary: create a stub for your closure, then overwrite that with the return value of the factory function.
If the factory function is called later by the package user
but we still want the returned closure to be inside the package (for example if we don't want it to be changed by anything other than the factory, reliably accessible from within the package, documented etc..):
# exported function (visible to user)
# everything this function does is 'outsourced'
# to a non-exported function that we can overwrite with the factory:
visible_function(...){
hidden_function(...)
}
# not exported function (invisible to the user)
# called by the visible function
# fails unless factory is called first
hidden_function(x){
stop("call factory_fun() before you can use visible_function()")
}
# exported function, visible to the user.
# changes the hidden function called by the visible function
factory_function(x){
produced_function<-function(){
print(paste(x, "is an object forever stored in my namespace!"))
}
assignInNamespace("hidden_function",
produced_function,
ns="myPackageName")
}
Note that R CMD check throws a NOTE on assignInNamespace so CRAN won't easily accept this solution

Define Global Variables when creating packages

I have this problem. I am creating a new package with name "mypackagefunction" for R whose partial code is this
mypackagefunction<-function(){
##This is the constructor of my package
##1st step: define variables
gdata <<- NULL
#...
#below of this, there are more functions and code
}
So, I build and reload in R Studio and then check and in this step I receive this warning:
mypackagefunction: no visible binding for '<<-' assignment to ‘gdata’
But when I run my package with:
mypackagefunction()
I can use call that variable which is into the package with this results
> mypackagefunction()
> gdata
NULL
How can I remove this NOTE or Warning when I check my package? or another way to define Global Variables?
There are standard ways to include data in a package - if you want some particular R object to be available to the user of the package, this is what you should do. Data is not limited to data frames and matrices - any R object(s) can be included.
If, on the other hand, your intention was to modify the global environment every time a a function is called, then you're doing it wrong. In R's functional programming paradigm, functions return objects that can be assigned into the global environment by the user. Objects don't just "appear" in the global environment, with the programmer hoping that the user both (a) knows to look for them and (b) didn't have any objects of the same name that they wanted to keep (because they just got overwritten). It is possible to write code like this (using <<- as in your question, or explicitly calling assign as in #abhiieor's answer), but it will probably not be accepted to CRAN as it violates CRAN policy.
Another way of defining global variable is like assign('prev_id', id, envir = .GlobalEnv) where id is assignee variable or some value and prev_id is global variable

Can we have more error (messages)?

Is there a way, in R, to pop up an error message if a function uses a variable
not declared in the body of the function: i.e, i want someone to flag this type of functions
aha<-function(p){
return(p+n)
}
see; if there happens to be a "n" variable lying somewhere, aha(p=2) will give me an "answer" since R will just take "n" from that mysterious place called the "environment"
If you want to detect such potential problems during the code-writing phase and not during run-time, then the codetools package is your friend.
library(codetools)
aha<-function(p){
return(p+n)
}
#check a specific function:
checkUsage(aha)
#check all loaded functions:
checkUsageEnv(.GlobalEnv)
These will tell you that no visible binding for global variable ‘n’.
Richie's suggestion is very good.
I would just add that you should consider creating unit test cases that would run in a clean R environment. That will also eliminate the concern about global variables and ensures that your functions behave the way that they should. You might want to consider using RUnit for this. I have my test suite scheduled to run every night in a new environment using RScript, and that's very effective and catching any kind of scope issues, etc.
Writing R codes to check other R code is going to be tricky. You'd have to find a way to determine which bits of code were variable declarations, and then try and work out whether they'd already been declared within the function.
EDIT: The previous statement would have been true, but as Aniko pointed out, the hard work has already been done in the codetools package.
One related thing that may be useful to you is to force a variable to be taken from the function itself (rather than from an enclosing environment).
This modified version of your function will always fail, since n is not declared.
aha <- function(p)
{
n <- get("n", inherits=FALSE)
return(p+n)
}

Resources