I think this is a simple question, and the answer is probably somewhere in either Hadley's book or the official R documentation, or in the answers to Writing robust R code: namespaces, masking and using the `::` operator, but I haven't come across it yet (or don't remember reading about it).
Let's say I want to write a package that contains the function:
sort_by_column <- function(x, j) {
j_ord <- order(x[, j])
x[j_ord, ]
}
If I define this function in the global environment and I pass in a value for x that is a data.table, it will silently fail because [ will dispatch to [.data.table instead of [.data.frame:
library(data.table)
sort_by_column(iris, 3)
sort_by_column(data.table(iris), 3)
My gut tells me that [.data.table won't even be available to my package unless I explicitly import it, in which case this problem can't occur in a package the way it can occur in the global environment. Is that true? If not, how can I handle this kind of masking?
edit 2: Function sort_by_column is defined in package A. Another package, B was loaded before package A but is not explictly imported by A. Do calls from functions defined in A search in package B?
edit: To clarify, I want to define a function inside a package in such a way that it "ignores" other packages the user might have loaded, in order to avoid function naming conflicts like the one demonstrated above. Is this ignoring behavior automatic, or do I need to do something in particular?
If you want to specify a particular method for "[" then you should be able to use:
`[.data.frame`(x, TRUE, j)
Or test for data.tables using inherits and trap that as an edge case?
Related
There's a bit of a preamble before I get to my question, so hang with me!
For an R package I'm working on I'd like to make it as easy as possible for users to partially apply functions inline. I had the idea of using the [] operators to call my partial application function, which I've name "partialApplication." What I aim to achieve is this:
dnorm[mean = 3](1:10)
# Which would be exactly equivalent to:
dnorm(1:10, mean = 3)
To achieve this I tried defining a new [] method for objects of class function, i.e.
`[.function` <- function(...) partialApplication(...)
However, R gives a warning that the [] method for function objects is "locked." (Is there any way to override this?)
My idea seemed to be thwarted, but I thought of one simple solution: I can invent a new S3 class "partialAppliable" and create a [] method for it, i.e.
`[.partialAppliable` = function(...) partialApplication(...)
Then, I can take any function I want and append 'partialAppliable' to its class, and now my method will work.
class(dnorm) = append(class(dnorm), 'partialAppliable')
dnorm[mean = 3](1:10)
# It works!
Now here's my question/problem: I'd like users to be able to use any function they want, so I thought, what if I loop through all the objects in the active environment (using ls) and append 'partialAppliable' to the class of all functions? For instance:
allobjs = unlist(lapply(search(), ls))
#This lists all objects defined in all active packages
for(i in allobjs) {
if(is.function(get(i))) {
curfunc = get(i)
class(curfunc) = append(class(curfunc), 'partialAppliable')
assign(i, curfunc)
}
}
Voilà! It works. (I know, I should probably assign the modified functions back into their original package environments, but you get the picture).
Now, I'm not a professional programmer, but I've picked up that doing this sort of thing (globally modifying all variables in all packages) is generally considered unwise/risky. However, I can't think of any specific problems that will arise. So here's my question: what problems might arise from doing this? Can anyone think of specific functions/packages that will be broken by doing this?
Thanks!
This is similar to what the Defaults package did. The package is archived because the author decided that modifying other package's code is a "very bad thing". I think most people would agree. Just because you can do it, does not mean it's a good idea.
And, no, you most certainly should not assign the modified functions back to their original package environments. CRAN does not like when packages modify the users search path unnecessarily, so I would be surprised if they allowed a package to modify other package's function arguments.
You could work around that by putting all the modified functions in an environment on the search path. But then you have to ensure that environment is always searched first, which means modifying the search path every time another package is loaded.
Changing arguments for functions in other packages also has the potential to make it very difficult for others to reproduce your results because they must have all your argument settings. Unless you always call functions with all their arguments specified, which defeats the purpose of what you're trying to do.
I would like to add an idiosyncratically modified function to package written by someone else, with an R Script, i.e. just for the session, not permanently. The specific example is, let's say, bls_map_county2() added to the blscrapeR package. bls_map_county2 is just a copy of the bls_map_county() function with an added ... argument, for purposes of changing a few of the map drawing parameters. I have not yet inserted the additional parameters. Running the function as-is, I get the error:
Error in BLS_map_county(map_data = df, fill_rate = "unemployed_rate", :
could not find function "geom_map"
I assume this is because my function does not point to the blscrapeR namespace. How do I assign my function to the (installed, loaded) blscrapeR namespace, and is there anything else I need to do to let it access whatever machinery from the package it requires?
When I am hacking on a function in a particular package that in turn calls other functions I often use this form after the definition:
mod_func <- function( args) {body hacked}
environment(mod_func) <- environment(old_func)
But I think the function you might really want is assignInNamespace. These methods will allow the access to non-exported functions in loaded packages. They will not however succeed if the package is not loaded. So you may want to have a stopifnot() check surrounding require(pkgname).
There are two parts to this answer - first a generic answer to your question, and 2nd a specific answer for the particular function that you reference, in which the problem is something slightly different.
1) generic solution to accessing internal functions when you edit a package function
You should already have access to the package namespace, since you loaded it, so it is only the unexported functions that will give you issues.
I usually just prepend the package name with the ::: operator to the non exported functions. I.e., find every instance of a call to some_internal_function(), and replace it with PackageName:::some_internal_function(). If there are several different internal functions called within the function you are editing, you may need to do this a few times for each of the offending function calls.
The help page for ::: does contain these warnings
Beware -- use ':::' at your own risk!
and
It is typically a design mistake to use ::: in your code since the
corresponding object has probably been kept internal for a good
reason. Consider contacting the package maintainer if you feel the
need to access the object for anything but mere inspection.
But for what you are doing, in terms of temporarily hacking another function from the same package for your own use, these warnings should be safe to ignore (at you own risk, of course - as it says in the manual)
2) In the case of blscrapeR ::bls_map_county()
The offending line in this case is
ggplot2::ggplot() + geom_map(...
in which the package writers have specified the ggplot2 namespace for ggplot(), but forgotten to do so for geom_map() which is also part of ggplot2 (and not an internal function in blscrapeR ).
In this case, just load ggplot2, and you should be good to go.
You may also consider contacting the package maintainer to inform them of this error.
I have this problem. I am creating a new package with name "mypackagefunction" for R whose partial code is this
mypackagefunction<-function(){
##This is the constructor of my package
##1st step: define variables
gdata <<- NULL
#...
#below of this, there are more functions and code
}
So, I build and reload in R Studio and then check and in this step I receive this warning:
mypackagefunction: no visible binding for '<<-' assignment to ‘gdata’
But when I run my package with:
mypackagefunction()
I can use call that variable which is into the package with this results
> mypackagefunction()
> gdata
NULL
How can I remove this NOTE or Warning when I check my package? or another way to define Global Variables?
There are standard ways to include data in a package - if you want some particular R object to be available to the user of the package, this is what you should do. Data is not limited to data frames and matrices - any R object(s) can be included.
If, on the other hand, your intention was to modify the global environment every time a a function is called, then you're doing it wrong. In R's functional programming paradigm, functions return objects that can be assigned into the global environment by the user. Objects don't just "appear" in the global environment, with the programmer hoping that the user both (a) knows to look for them and (b) didn't have any objects of the same name that they wanted to keep (because they just got overwritten). It is possible to write code like this (using <<- as in your question, or explicitly calling assign as in #abhiieor's answer), but it will probably not be accepted to CRAN as it violates CRAN policy.
Another way of defining global variable is like assign('prev_id', id, envir = .GlobalEnv) where id is assignee variable or some value and prev_id is global variable
I have to make an R package that depends on the package data.table. However, if I would do a function such as the next one in the package
randomdt <- function(){
dt <- data.table(random = rnorm(10))
dt[dt$random > 0]
}
the function [ will use the method for data.frame not for data.table and therefore the error
Error in `[.data.frame`(x, i) : undefined columns selected
will appear. Usually this would be solved by using get('[.data.table') or similar method (package::function is the simplest) but that appears not to work. After all, [ is a primitive function and I don't know how the methods to it work.
So, how can I call the data.table [ function from my package?
Updated based on some feedback from MichaelChirico and comments by Arun and Soheil.
Roughly speaking, there's two approaches you might consider. The first is building the dependency into your package itself, while the second is including lines in your R code that test for the presence of data.table (and possibly even install it automatically if it is not found).
The data.table FAQ specifically addresses this in 6.9, and states that you can ensure that data.table is appropriately loaded by your package by:
Either i) include data.table in the Depends: field of your DESCRIPTION file, or ii) include data.table in the Imports: field of your DESCRIPTION file AND import(data.table) in your NAMESPACE file.
As noted in the comments, this is common R behavior that is in numerous packages.
An alternative approach is to create specific lines of code which test for and import the required packages as part of your code. This is, I would contend, not the ideal solution given the elegance of using the option provided above. However, it is technically possible.
A simple way of doing this would be to use either require or library to check for the existence of data.table, with an error thrown if it could not be attached. You could even use a simple set of conditional statements to run install.packages to install what you need if loading them fails.
Yihui Xie (of knitr fame) has a great post about the difference between library and require here and makes a strong case for just using library in cases where the package is absolutely essential for the upcoming code.
Is there a way, in R, to pop up an error message if a function uses a variable
not declared in the body of the function: i.e, i want someone to flag this type of functions
aha<-function(p){
return(p+n)
}
see; if there happens to be a "n" variable lying somewhere, aha(p=2) will give me an "answer" since R will just take "n" from that mysterious place called the "environment"
If you want to detect such potential problems during the code-writing phase and not during run-time, then the codetools package is your friend.
library(codetools)
aha<-function(p){
return(p+n)
}
#check a specific function:
checkUsage(aha)
#check all loaded functions:
checkUsageEnv(.GlobalEnv)
These will tell you that no visible binding for global variable ‘n’.
Richie's suggestion is very good.
I would just add that you should consider creating unit test cases that would run in a clean R environment. That will also eliminate the concern about global variables and ensures that your functions behave the way that they should. You might want to consider using RUnit for this. I have my test suite scheduled to run every night in a new environment using RScript, and that's very effective and catching any kind of scope issues, etc.
Writing R codes to check other R code is going to be tricky. You'd have to find a way to determine which bits of code were variable declarations, and then try and work out whether they'd already been declared within the function.
EDIT: The previous statement would have been true, but as Aniko pointed out, the hard work has already been done in the codetools package.
One related thing that may be useful to you is to force a variable to be taken from the function itself (rather than from an enclosing environment).
This modified version of your function will always fail, since n is not declared.
aha <- function(p)
{
n <- get("n", inherits=FALSE)
return(p+n)
}