Importing functions in R - r

In Python we have chance to import a certain function from a library with a command "import function from library as smth. Do we have something similar in R?
I know that we can call the function like "library::function()", my question mostly refers to the "as" part.

It is not common and not necessary to do this in R. The assignment operator <- can be used to give a new name to an existing function. For example, one could define a function that does exactly the same as lubridate's, year() function with:
asYear <- lubridate::year
One could argue that, by doing so, the year() function has been "imported" from the lubridate package and that it is now called asYear(). In fact, the new function does just the same (which is no surprise, simply because it is the same):
asYear(Sys.Date())
#[1] 2016
So it is possible to construct an analogy to "from package import as", but it is not a good idea to do this. Here are a few reasons I can think of:
Debugging a code where library functions have been renamed will be
much more difficult.
The documentation is not available for the renamed function. In this example, ?asYear won't work, in contrast to ?lubridate::year or library(lubridate); help(year).
The function is not only renamed but it is copied, which clutters the environment and is inefficient in terms of memory usage.
The maintenance of the code becomes unnecessarily difficult. If another programmer (or the original programmer a few years later) looks at a code containing such a redefinition of a function, it will be harder for her or him to understand what this function is doing.
There are probably more reasons, but I hope that this is sufficient to discourage the use of such a construction. Different programming languages have different peculiarities, and as a programmer it is necessary to adapt to them. What is common in Python can be awkward in R, and vice versa.
A simple and commonly used way to handle such a standard situation in R is to use library() to load the entire namespace of the package containing the requested function:
library (lubridate)
year(Sys.Date())
However, one should be aware of possible namespace clashes, especially if many libraries are loaded simultaneously. Different functions could be defined with the same name in different packages. A well-known example thereof are the contrasting implementations of the lag() function in the dplyr and stats package.
In such cases one can use the double colon operator :: to resolve the namespace that should be addressed. This would be similar to the use of "from" in the case of "import", but such a specification would be needed each time the function is called.
lubridate::year(Sys.Date())
#[1] 2016

Related

R Development: Use of `::` Operator for `base` Package

TLDR
Does rigorous best practice recommend that a proactive R developer explicitly disambiguate all base functions — even the ubiquitously common functions like c() or cat() — within the .R files of their package, using the package::function() convention?
Context
Though a novice developer, I need to create a (proprietary) package in R. The text R Packages (by the authoritative authors Hadley Wickham and Jenny Bryan) has proven extremely helpful (if occasionally deprecated).
I am keen on following best practices from the start, so as to save myself time and effort down the road. As described in the text, the use of the :: operator can prevent current and
future conflicts, by disambiguating functions whose names are overloaded. To wit, the authors are careful to introduce each function with the package::function() convention, and they recommend its general use within the .R files of one's package.
However, their code examples often call functions that hail from the base package yet are unaccompanied by base::. Many base functions, like the ubiquitous c() or cat(), are used by R programmers in their sleep and (I imagine) are unlikely to ever be overloaded by a presumptuous developer. Nonetheless, it is confusing to see (for example) the juxtaposition of base::with() against (the base function) print(), all within a few lines of text.
...(These functions are inspired by how base::with() works.)
f <- function(x, sig_digits) {
# imagine lots of code here
withr::with_options(
list(digits = sig_digits),
print(x)
)
# ... and a lot more code here
}
I understand that the purpose of base::with() is to unambiguously introduce the with() function to the reader. However, the absence of base:: (within the code itself) seems to stick out like a sore thumb, when the package is explicitly named for any function called from any other package. Given my inexperience, I don't feel comfortable assuming the authors' intent.
Question
Are the names of base functions sufficiently unique that using this convention — of calling base::function() for every function() within the base package — would not be worth it? That the risk of overloading the functions (at some point in the future) is far outweighed by the inconvenience (and sheer ugliness) of
my_vector <- base::c(1, 2, 3)
throughout one's .R files? If not, is there an established convention that would balance unambiguity with elegance?
As always, I am grateful for any help, especially on this, my first post to Stack Overflow.

Using functions from other packages - when to use package::function?

When making your own package for R, one often wants to make use of functions from a different package.
Maybe it's a plotting library like ggplot2, dplyr, or some niche function.
However, when making a function that depends on functions in other packages, what is the appropriate way to call them? In particular, I am looking for examples of when to use
myFunction <- function(x) {
example_package::function(x)
}
or
require(example_package)
myFunction <- function(x) {
function(x)
}
When should I use one over the other?
If you're actually creating an R package (as opposed to a script to source, R Project, or other method), you should NEVER use library() or require(). This is not an alternative to using package::function(). You are essentially choosing between package::function() and function(), which as highlighted by #Bernhard, explicitly calling the package ensures consistency if there are conflicting names in two or more packages.
Rather than require(package), you need to worry about properly defining your DESCRIPTION and NAMESPACE files. There's many posts about that on SO and elsewhere, so won't go into details, see here for example.
Using package::function() can help with above if you are using roxygen2 to generate your package documentation (it will automatically generate a proper NAMESPACE file.
The douple-colon variant :: has a clear advantage in the rare situations, when the same function name is used by two packages. There is a function psych::alpha to calculate Cronbach's alpha as a measure of internal consistency and a function scales::alpha to modify color transparency. There are not that many examples but then again, there are examples. dplyr even masks functions from the stats and base package! (And the tidyverse is continuing to produce more and more entries in our namespaces. Should you use dyplr you do not know, if the base function you use today will be masked by a future version of dplyr thus leading to an unexpected runtime problem of your package in the future.)
All of that is no problem if you use the :: variant. All of that is not a problem if in your package the last package opened is the one you mean.
The require (or library) variant leads to overall shorter code and it is obvious, at what time and place in the code the problem of a not-available package will lead to an error and thus become visible.
In general, both work well and you are free to choose, which of these admittedly small differences appears more important to you.

R: How to best import infix operators like %>% into my package?

I am making an R-Package and I'm struggling with the import of infix functions like %>%, := or %dopar%.
In the DESCRIPTION-file I use the Imports: <otherPackage> (e.g. Imports: doParallel) notion. Within the code I use the package::function() (e.g. dplyr::mutate()) notion, which seems to work (R CMD check is pleased) but how do I import infix functions?
The #importFrom (e.g #' #importFrom magrittr %>%) roxygen way seems to work for %>%, := and %dopar%. But since it is copied over into the NAMSEPACE-file, adding the #importFrom to one function solves the problem package-wide, which seems rather "hacky".
What is the best practice to import such functions into my package?
I'm not sure if there's a single best practice in this case.
Using #importFrom to update the NAMESPACE file is indeed to be a package-wide directive,
but I've never come across a package that had problems with that,
or reasons to avoid it.
You can annotate several functions with the same #importFrom directive if you like,
denoting which functions use which imports,
and it won't cause any conflicts;
it's entirely up to you though,
a single one would suffice.
Using #import might be frowned upon,
but I think it really depends on which package you import.
From your question I gather you use :: explicitly
(which I would personally say is good practice),
and then you don't even need to alter the NAMESPACE.
For most cases that would be just fine,
though there can be very special cases that usually need to be considered individually.
These special cases, at least in my experience, are usually related with S4 generics.
Take for instance the base::rowSums function:
it is not a generic function in base,
but if the Matrix package is attached,
rowSums is "transformed" into an S4 generic,
but the generic is not in the base package.
Why that's the case is beyond the scope of this answer
(see ?Methods_for_Nongenerics for more information),
but it means that if your package uses the notation base::rowSums,
it would not dispatch to methods from Matrix.
The only way to support both cases
(i.e. when Matrix is not used by the user and when it is)
would be to use rowSums without base::.
Now, regarding infix operators,
if you wanted to use ::,
you'd need something like base::`%in%`("a", c("a", "b")),
which essentially entails using it as a function and losing the infix syntax,
something you probably don't want.
So unless you have very specific reasons to avoid one or the other,
just use whatever notation you prefer.
I'd personally stick to :: as much as possible,
but would never use it for infix operators.

How can I prevent a library from masking functions

A typical situation is the following:
library(dplyr)
library(xgboost)
When I import the library xgboost, the function slice of dplyr is masked, and I have to write dplyr::slice even though I never use xgboost::slice explicitly.
The obvious solution to the problem is to import xgboost before dplyr. But it is crazy to import all libraries which can affect the functions of dplyr in advance. Moreover this problem often happens when I use caret library. Namely train function imports automatically required libraries and some functions are masked at the time.
It is possible to prevent some functions from being masked?
Is it possible to mask "the masking function" (e.g. xgboost::slice) with an early imported function (e.g. dplyr::slice)?
Notes
I am NOT asking how to disable warning message.
I am NOT asking how to use the masked functions.
The next version of R has this in the NEWS{.Rd} file (quoted from the NEWS file post-build):
• The import() namespace directive now accepts an argument except
which names symbols to exclude from the imports. The except
expression should evaluate to a character vector (after
substituting symbols for strings). See Writing R Extensions.
There referenced text from the manual is here (in raw texi format).
So soon we can. Right now one cannot, and that is a huge pain in the aRse particular when functions from Base R packages are being masked: lag(), filter(), ...
We have used the term anti-social for this behaviour in the past. I don't think it is too strong.
To illustrate the problem, here is a snippet of code I wrote a decade ago (and had it posted on the now-vanished R Graph Gallery) which uses a clever and fast way to compute a moving average:
## create a (normalised, but that's just candy) weight vector
weights <- rep(1/ndays, ndays)
## and apply it as a one-sided moving average calculations, see help(filter)
bbmiddle <- as.vector(filter(dat$Close, weights,
method="convolution", side=1))
If you do library(dplyr) as you might in an interactive session, you're dead in the water as filter() is now something completely different. Not nice.
It is possible to prevent some functions from being masked?
I don't believe so but I could be wrong. I'm not sure what this would look like
Is it possible to mask "the masking function" (e.g. xgboost::slice) with an early imported function (e.g. dplyr::slice)?
If you're asking about just or use in an interactive session you can always just define slice to be the function you actually want to use like so
slice <- dplyr::slice
and then you can use slice as if it is the dplyr version (because now it is).
The solution is to manage your namespace like it is common to do in other languages. You can selectively import dplyr functions:
select <- dplyr::select
For convenience you can also import the whole package and selectively reimport functions from previously attached packages:
library("dplyr")
filter <- stats::filter
R has a great module system and attaching whole namespaces is especially handy for interactive use. It does requires a bit of manual adjusting if the preferences of the package authors do not match yours.
Note that in packages and long-term maintenance scripts you should privilege selective imports, in part because it is hard to predict new exported functions in future releases. Having several packages imported in bulk might give rise to unexpected masking over time.
More generally a good rule is to rely on a single attached package and selectively import the rest. To this end the tidyverse package might be handy if you're a heavy tidyverse user because it provides a single import point for several packages.
Finally it seems from your question that you think that the order of attached packages might have side effects inside other packages. This is nothing to worry about because all packages have their own contexts. The import scheme will only affect your script.
You can also now use the conflict_prefer() function from the conflicted package to specify which package's function should "win" and which should be masked when there are conflicting function names (details here). In your example, you would run
conflict_prefer("slice", "dplyr", "xgboost")
right after loading your libraries. Then when you run slice, it will default to using dplyr::slice rather than xgboost::slice. Or you can simply run
conflict_prefer("slice", "dplyr")
if you want to give dplyr::slice precedence over all other packages' slice functions.
I know this is a silly answering and this thread is very old (but I only had the same issue today): I changed the sequence of loading the packages. Personally, I was having a problem with MASS and Dplyr "select" function. I would like to use always Dplyr version. So I loaded MASS first!

How should I reference functions in imported packages?

When creating an R package, there are at least two alternatives for referencing functions in imported packages.
Either,
Explicitly name the function using the double colon operator whenever you call it, package::function.
Add importFrom(package, function) to the NAMESPACE file, either directly or via an #' #importFrom package function roxygen tag.
What are the advantages and disadvantages of each method?
Are there any technical differences in what each syntax achieves?
Arguments in favour of using package::function
It makes it completely clear where the function has come from.
Arguments in favour of using #importFrom package function
It involves less typing, particularly when a function is used many times by your package.
Since it involves looking up the package and a call to the :: function, package::function has a small runtime performance penalty. See https://stackoverflow.com/a/7283511/134830.
On balance, what's the verdict?
Both methods do the job and arguments either way aren't overwhelming, so don't lose sleep over this. Just pick one method and stick to it.
The policy that has been adopted at my place of work is that for a few commonly used packages, #importFrom roxygen tags should be used. For example, developers are expected to know that ddply comes from plyr, or functions beginning str_ come from stringr. In this case, the explicit parentage of the function isn't as useful to know. For functions outside this core list, (or if there is any ambiguity) :: should be used to make it clear where it came from.

Resources