When creating an R package, there are at least two alternatives for referencing functions in imported packages.
Either,
Explicitly name the function using the double colon operator whenever you call it, package::function.
Add importFrom(package, function) to the NAMESPACE file, either directly or via an #' #importFrom package function roxygen tag.
What are the advantages and disadvantages of each method?
Are there any technical differences in what each syntax achieves?
Arguments in favour of using package::function
It makes it completely clear where the function has come from.
Arguments in favour of using #importFrom package function
It involves less typing, particularly when a function is used many times by your package.
Since it involves looking up the package and a call to the :: function, package::function has a small runtime performance penalty. See https://stackoverflow.com/a/7283511/134830.
On balance, what's the verdict?
Both methods do the job and arguments either way aren't overwhelming, so don't lose sleep over this. Just pick one method and stick to it.
The policy that has been adopted at my place of work is that for a few commonly used packages, #importFrom roxygen tags should be used. For example, developers are expected to know that ddply comes from plyr, or functions beginning str_ come from stringr. In this case, the explicit parentage of the function isn't as useful to know. For functions outside this core list, (or if there is any ambiguity) :: should be used to make it clear where it came from.
Related
I'm writing some R functions that employ some useful functions in other packages like stringr and base64enc. Is it good not to call library(...) or require(...) to load these packages first but to use :: to directly refer to the function I need, like stringr::str_match(...)?
Is it a good practice in general case? Or what problem might it induce?
It all depends on context.
:: is primarily necessary if there are namespace collisions, functions from different packages with the same name. When I load the dplyr package, it provides a function filter, which collides with (and masks) the filter function loaded by default in the stats package. So if I want to use the stats version of the function after loading dplyr, I'll need to call it with stats::filter.
This also gives motivation for not loading lots of packages. If you really only want one function from a package, it can be better to use :: than load the whole package, especially if you know the package will mask other functions you want to use.
Not in code, but in text, I do find :: very useful. It's much more concise to type stats::filter than "the filter function from the stats package".
From a performance perspective, there is a (very) small price for using ::. Long-time R-Core development team member Martin Maechler wrote (on the r-devel mailing list (Sept 2017))
Many people seem to forget that every use of :: is an R
function call and using it is inefficient compared to just using
the already imported name.
The performance penalty is very small, on the order of a few microseconds, so it's only a concern when you need highly optimized code. Running a line of code that uses :: one million times will take a second or two longer than code that doesn't use ::.
As far as portability goes, it's nice to explicitly load packages at the top of a script because it makes it easy to glance at the first few lines and see what packages are needed, installing them if necessary before getting too deep in anything else, i.e., getting halfway through a long process that now can't be completed without starting over.
Aside: a similar argument can be made to prefer library() over require(). Library will cause an error and stop if the package isn't there, whereas require will warn but continue. If your code has a contingency plan in case the package isn't there, then by all means use if (require(package)) ..., but if your code will fail without a package you should use library(package) at the top so it fails early and clearly.
Within your own package
The general solution is to make your own package that imports the other packages you need to use in the DESCRIPTION file. Those packages will be automatically installed when your package is installed, so you can use pkg::fun internally. Or, by also importing them in the NAMESPACE file, you can import an entire package or selectively importFrom specific functions and not need ::. Opinions differ on this. Martin Maechler (same r-devel source as above) says:
Personally I've got the impression that :: is
much "overused" nowadays, notably in packages where I'd strongly
advocate using importFrom() in NAMESPACE, so all this happens
at package load time, and then not using :: in the package
sources itself.
On the other hand, RStudio Chief Scientist Hadley Wickham says in his R Packages book:
It's common for packages to be listed in Imports in DESCRIPTION, but not in NAMESPACE. In fact, this is what I recommend: list the package in DESCRIPTION so that it’s installed, then always refer to it explicitly with pkg::fun(). Unless there is a strong reason not to, it's better to be explicit.
With two esteemed R experts giving opposite recommendations, I think it's fair to say that you should pick whichever style suits you best and meets your needs for clarity, efficiency, and maintainability.
If you frequently find yourself using just one function from another package, you can copy the code and add it to your own package. For example, I have a package for personal use that borrows %nin% from the Hmisc package because I think it's a great function, but I don't often use anything else from Hmisc. With roxygen2, it's easy to add #author and #references to properly attribute the code for a borrowed function. Also make sure the package licenses are compatible when doing this.
When making your own package for R, one often wants to make use of functions from a different package.
Maybe it's a plotting library like ggplot2, dplyr, or some niche function.
However, when making a function that depends on functions in other packages, what is the appropriate way to call them? In particular, I am looking for examples of when to use
myFunction <- function(x) {
example_package::function(x)
}
or
require(example_package)
myFunction <- function(x) {
function(x)
}
When should I use one over the other?
If you're actually creating an R package (as opposed to a script to source, R Project, or other method), you should NEVER use library() or require(). This is not an alternative to using package::function(). You are essentially choosing between package::function() and function(), which as highlighted by #Bernhard, explicitly calling the package ensures consistency if there are conflicting names in two or more packages.
Rather than require(package), you need to worry about properly defining your DESCRIPTION and NAMESPACE files. There's many posts about that on SO and elsewhere, so won't go into details, see here for example.
Using package::function() can help with above if you are using roxygen2 to generate your package documentation (it will automatically generate a proper NAMESPACE file.
The douple-colon variant :: has a clear advantage in the rare situations, when the same function name is used by two packages. There is a function psych::alpha to calculate Cronbach's alpha as a measure of internal consistency and a function scales::alpha to modify color transparency. There are not that many examples but then again, there are examples. dplyr even masks functions from the stats and base package! (And the tidyverse is continuing to produce more and more entries in our namespaces. Should you use dyplr you do not know, if the base function you use today will be masked by a future version of dplyr thus leading to an unexpected runtime problem of your package in the future.)
All of that is no problem if you use the :: variant. All of that is not a problem if in your package the last package opened is the one you mean.
The require (or library) variant leads to overall shorter code and it is obvious, at what time and place in the code the problem of a not-available package will lead to an error and thus become visible.
In general, both work well and you are free to choose, which of these admittedly small differences appears more important to you.
I am making an R-Package and I'm struggling with the import of infix functions like %>%, := or %dopar%.
In the DESCRIPTION-file I use the Imports: <otherPackage> (e.g. Imports: doParallel) notion. Within the code I use the package::function() (e.g. dplyr::mutate()) notion, which seems to work (R CMD check is pleased) but how do I import infix functions?
The #importFrom (e.g #' #importFrom magrittr %>%) roxygen way seems to work for %>%, := and %dopar%. But since it is copied over into the NAMSEPACE-file, adding the #importFrom to one function solves the problem package-wide, which seems rather "hacky".
What is the best practice to import such functions into my package?
I'm not sure if there's a single best practice in this case.
Using #importFrom to update the NAMESPACE file is indeed to be a package-wide directive,
but I've never come across a package that had problems with that,
or reasons to avoid it.
You can annotate several functions with the same #importFrom directive if you like,
denoting which functions use which imports,
and it won't cause any conflicts;
it's entirely up to you though,
a single one would suffice.
Using #import might be frowned upon,
but I think it really depends on which package you import.
From your question I gather you use :: explicitly
(which I would personally say is good practice),
and then you don't even need to alter the NAMESPACE.
For most cases that would be just fine,
though there can be very special cases that usually need to be considered individually.
These special cases, at least in my experience, are usually related with S4 generics.
Take for instance the base::rowSums function:
it is not a generic function in base,
but if the Matrix package is attached,
rowSums is "transformed" into an S4 generic,
but the generic is not in the base package.
Why that's the case is beyond the scope of this answer
(see ?Methods_for_Nongenerics for more information),
but it means that if your package uses the notation base::rowSums,
it would not dispatch to methods from Matrix.
The only way to support both cases
(i.e. when Matrix is not used by the user and when it is)
would be to use rowSums without base::.
Now, regarding infix operators,
if you wanted to use ::,
you'd need something like base::`%in%`("a", c("a", "b")),
which essentially entails using it as a function and losing the infix syntax,
something you probably don't want.
So unless you have very specific reasons to avoid one or the other,
just use whatever notation you prefer.
I'd personally stick to :: as much as possible,
but would never use it for infix operators.
In Python we have chance to import a certain function from a library with a command "import function from library as smth. Do we have something similar in R?
I know that we can call the function like "library::function()", my question mostly refers to the "as" part.
It is not common and not necessary to do this in R. The assignment operator <- can be used to give a new name to an existing function. For example, one could define a function that does exactly the same as lubridate's, year() function with:
asYear <- lubridate::year
One could argue that, by doing so, the year() function has been "imported" from the lubridate package and that it is now called asYear(). In fact, the new function does just the same (which is no surprise, simply because it is the same):
asYear(Sys.Date())
#[1] 2016
So it is possible to construct an analogy to "from package import as", but it is not a good idea to do this. Here are a few reasons I can think of:
Debugging a code where library functions have been renamed will be
much more difficult.
The documentation is not available for the renamed function. In this example, ?asYear won't work, in contrast to ?lubridate::year or library(lubridate); help(year).
The function is not only renamed but it is copied, which clutters the environment and is inefficient in terms of memory usage.
The maintenance of the code becomes unnecessarily difficult. If another programmer (or the original programmer a few years later) looks at a code containing such a redefinition of a function, it will be harder for her or him to understand what this function is doing.
There are probably more reasons, but I hope that this is sufficient to discourage the use of such a construction. Different programming languages have different peculiarities, and as a programmer it is necessary to adapt to them. What is common in Python can be awkward in R, and vice versa.
A simple and commonly used way to handle such a standard situation in R is to use library() to load the entire namespace of the package containing the requested function:
library (lubridate)
year(Sys.Date())
However, one should be aware of possible namespace clashes, especially if many libraries are loaded simultaneously. Different functions could be defined with the same name in different packages. A well-known example thereof are the contrasting implementations of the lag() function in the dplyr and stats package.
In such cases one can use the double colon operator :: to resolve the namespace that should be addressed. This would be similar to the use of "from" in the case of "import", but such a specification would be needed each time the function is called.
lubridate::year(Sys.Date())
#[1] 2016
I'm writing some R functions that employ some useful functions in other packages like stringr and base64enc. Is it good not to call library(...) or require(...) to load these packages first but to use :: to directly refer to the function I need, like stringr::str_match(...)?
Is it a good practice in general case? Or what problem might it induce?
It all depends on context.
:: is primarily necessary if there are namespace collisions, functions from different packages with the same name. When I load the dplyr package, it provides a function filter, which collides with (and masks) the filter function loaded by default in the stats package. So if I want to use the stats version of the function after loading dplyr, I'll need to call it with stats::filter.
This also gives motivation for not loading lots of packages. If you really only want one function from a package, it can be better to use :: than load the whole package, especially if you know the package will mask other functions you want to use.
Not in code, but in text, I do find :: very useful. It's much more concise to type stats::filter than "the filter function from the stats package".
From a performance perspective, there is a (very) small price for using ::. Long-time R-Core development team member Martin Maechler wrote (on the r-devel mailing list (Sept 2017))
Many people seem to forget that every use of :: is an R
function call and using it is inefficient compared to just using
the already imported name.
The performance penalty is very small, on the order of a few microseconds, so it's only a concern when you need highly optimized code. Running a line of code that uses :: one million times will take a second or two longer than code that doesn't use ::.
As far as portability goes, it's nice to explicitly load packages at the top of a script because it makes it easy to glance at the first few lines and see what packages are needed, installing them if necessary before getting too deep in anything else, i.e., getting halfway through a long process that now can't be completed without starting over.
Aside: a similar argument can be made to prefer library() over require(). Library will cause an error and stop if the package isn't there, whereas require will warn but continue. If your code has a contingency plan in case the package isn't there, then by all means use if (require(package)) ..., but if your code will fail without a package you should use library(package) at the top so it fails early and clearly.
Within your own package
The general solution is to make your own package that imports the other packages you need to use in the DESCRIPTION file. Those packages will be automatically installed when your package is installed, so you can use pkg::fun internally. Or, by also importing them in the NAMESPACE file, you can import an entire package or selectively importFrom specific functions and not need ::. Opinions differ on this. Martin Maechler (same r-devel source as above) says:
Personally I've got the impression that :: is
much "overused" nowadays, notably in packages where I'd strongly
advocate using importFrom() in NAMESPACE, so all this happens
at package load time, and then not using :: in the package
sources itself.
On the other hand, RStudio Chief Scientist Hadley Wickham says in his R Packages book:
It's common for packages to be listed in Imports in DESCRIPTION, but not in NAMESPACE. In fact, this is what I recommend: list the package in DESCRIPTION so that it’s installed, then always refer to it explicitly with pkg::fun(). Unless there is a strong reason not to, it's better to be explicit.
With two esteemed R experts giving opposite recommendations, I think it's fair to say that you should pick whichever style suits you best and meets your needs for clarity, efficiency, and maintainability.
If you frequently find yourself using just one function from another package, you can copy the code and add it to your own package. For example, I have a package for personal use that borrows %nin% from the Hmisc package because I think it's a great function, but I don't often use anything else from Hmisc. With roxygen2, it's easy to add #author and #references to properly attribute the code for a borrowed function. Also make sure the package licenses are compatible when doing this.