This question already has an answer here:
What is the benefit of import in a namespace in R?
(1 answer)
Closed 6 years ago.
I noticed that some answers on SO contain the use of pkg::name where name is typically a function.
What is the advantage of this over library(pkg); ... name() or require(pkg); ... name()? R help, (help("::")) says
For a package pkg, pkg::name returns the value of the exported variable name in namespace pkg, ... The namespace will be loaded if it was not loaded before the call, but the package will not be attached to the search path.
Does this mean that the function is used without the additional memory loss of loading the entire package, (ie, is it equivalent to import <function> from <package>) in python? Or is it simply a means of telling R use the function from this package when there may be ambiguities?
My question relates the use of :: in an Rscript or directly in the console and so is not a duplicate of the linked question as the OP in that question is discussing the use of of functions from the stats4 package during a package development project. On the other hand, there appear to be answers within this post that shed some light on my question, however. Thanks for the link. (Note the following discussion on Meta: duplicates flag)
It avoids namespace collisions but it still has to load the pkg.
Example => I did this:
pryr::mem_used()
dplyr::filter(mtcars, cyl==4)
pryr::mem_used()
in one R instance and:
pryr::mem_used()
library(dplyr)
filter(mtcars, cyl==4)
pryr::mem_used()
in another.
mem before/after for the 1st was: 27.7 MB / 30.6 MB
mem before/after for the 2nd was: 27.7 MB / 30.7 MB
I didn't do multiple tests or see if the difference was rounding or something else, but no there were no real savings there IMO.
There are two main reasons why I use this notation:
Disambiguation: Some packages provide functions that have the same name as those of base R or of functions from other packages. Loading such libraries therefore replaces certain functions. This effect is referred to as "masking". In such cases I consider it a better way of coding to use the notation package::function(), in order to clarify which of the homonymous functions is used - even if it is the function that has been loaded most recently into the namespace and it is therefore not necessary for obtaining the desired output. If a function is masked by another package it is the only way to address it.
For example, library(raster)
contains (and loads into the namespace) a function called stack() which is different from the base R function stack() in the utils package. In order to still use the function stack() from base R it should be called with utils::stack() once the raster library has been loaded.
Accessing functions that are not exported into the namespace: This case is much less frequent and slightly different. Some libraries foo.pkg contain functions which are not loaded into the namespace with library(foo.pkg). As a result, in such cases library(foo.pkg) does not help in order to access these functions.
The only example where I encounter this situation on a regular basis is the quite useful function cbind.na() from the qpcR package. It can only be accessed by specifying qpcR:::cbind.na(). Note that three colons are required in this case.
Upon reconsideration, there could also be a third reason for me to use this notation: code compactness. If I know that I will only need one specific function of a package, possibly only once in the code, without being interested in the rest of the functionalities offered by that package, then I find that the notation package::function() is preferable. This may not give any tangible advantage in R, but I adopted this style from the general advice in other programming languages to avoid namespace pollution.
Related
When making your own package for R, one often wants to make use of functions from a different package.
Maybe it's a plotting library like ggplot2, dplyr, or some niche function.
However, when making a function that depends on functions in other packages, what is the appropriate way to call them? In particular, I am looking for examples of when to use
myFunction <- function(x) {
example_package::function(x)
}
or
require(example_package)
myFunction <- function(x) {
function(x)
}
When should I use one over the other?
If you're actually creating an R package (as opposed to a script to source, R Project, or other method), you should NEVER use library() or require(). This is not an alternative to using package::function(). You are essentially choosing between package::function() and function(), which as highlighted by #Bernhard, explicitly calling the package ensures consistency if there are conflicting names in two or more packages.
Rather than require(package), you need to worry about properly defining your DESCRIPTION and NAMESPACE files. There's many posts about that on SO and elsewhere, so won't go into details, see here for example.
Using package::function() can help with above if you are using roxygen2 to generate your package documentation (it will automatically generate a proper NAMESPACE file.
The douple-colon variant :: has a clear advantage in the rare situations, when the same function name is used by two packages. There is a function psych::alpha to calculate Cronbach's alpha as a measure of internal consistency and a function scales::alpha to modify color transparency. There are not that many examples but then again, there are examples. dplyr even masks functions from the stats and base package! (And the tidyverse is continuing to produce more and more entries in our namespaces. Should you use dyplr you do not know, if the base function you use today will be masked by a future version of dplyr thus leading to an unexpected runtime problem of your package in the future.)
All of that is no problem if you use the :: variant. All of that is not a problem if in your package the last package opened is the one you mean.
The require (or library) variant leads to overall shorter code and it is obvious, at what time and place in the code the problem of a not-available package will lead to an error and thus become visible.
In general, both work well and you are free to choose, which of these admittedly small differences appears more important to you.
Suppose I wish to find instances of the use of one or more functions in the code of base or submitted packages, for purpose of better understanding idiomatic use of those functions. That is to say, I want to do a code search for the places where a function is used, not a search for places where that function is defined. So I would want to include e.g., unexported functions.
Ideally I would like to do RegEx matching so as to find functions with similar names that might serve a parallel function. I would also like to be able to restrict the output based on R's logical tests of output type, to find, e.g., only functions, or some finer subdivisions, such as is.primitive() or is.closure(), or (from rlang) is_primitive_eager() or is_primitive_lazy().
I note that some of the kinds of search I am asking about exist for package documentation in the sos package. Also, I know that grep searches can be done on the names of exported functions of loaded packages, as here: Searching functions using grep over multiple loaded packages in R, and Jim Hester's lookup package finds function definitions in CRAN packages even if they are not installed. See also Ben Bolker's answer, here: Name of a package for a given function in R But none of these methods will search for function usage as opposed to function definition.
A typical situation is the following:
library(dplyr)
library(xgboost)
When I import the library xgboost, the function slice of dplyr is masked, and I have to write dplyr::slice even though I never use xgboost::slice explicitly.
The obvious solution to the problem is to import xgboost before dplyr. But it is crazy to import all libraries which can affect the functions of dplyr in advance. Moreover this problem often happens when I use caret library. Namely train function imports automatically required libraries and some functions are masked at the time.
It is possible to prevent some functions from being masked?
Is it possible to mask "the masking function" (e.g. xgboost::slice) with an early imported function (e.g. dplyr::slice)?
Notes
I am NOT asking how to disable warning message.
I am NOT asking how to use the masked functions.
The next version of R has this in the NEWS{.Rd} file (quoted from the NEWS file post-build):
• The import() namespace directive now accepts an argument except
which names symbols to exclude from the imports. The except
expression should evaluate to a character vector (after
substituting symbols for strings). See Writing R Extensions.
There referenced text from the manual is here (in raw texi format).
So soon we can. Right now one cannot, and that is a huge pain in the aRse particular when functions from Base R packages are being masked: lag(), filter(), ...
We have used the term anti-social for this behaviour in the past. I don't think it is too strong.
To illustrate the problem, here is a snippet of code I wrote a decade ago (and had it posted on the now-vanished R Graph Gallery) which uses a clever and fast way to compute a moving average:
## create a (normalised, but that's just candy) weight vector
weights <- rep(1/ndays, ndays)
## and apply it as a one-sided moving average calculations, see help(filter)
bbmiddle <- as.vector(filter(dat$Close, weights,
method="convolution", side=1))
If you do library(dplyr) as you might in an interactive session, you're dead in the water as filter() is now something completely different. Not nice.
It is possible to prevent some functions from being masked?
I don't believe so but I could be wrong. I'm not sure what this would look like
Is it possible to mask "the masking function" (e.g. xgboost::slice) with an early imported function (e.g. dplyr::slice)?
If you're asking about just or use in an interactive session you can always just define slice to be the function you actually want to use like so
slice <- dplyr::slice
and then you can use slice as if it is the dplyr version (because now it is).
The solution is to manage your namespace like it is common to do in other languages. You can selectively import dplyr functions:
select <- dplyr::select
For convenience you can also import the whole package and selectively reimport functions from previously attached packages:
library("dplyr")
filter <- stats::filter
R has a great module system and attaching whole namespaces is especially handy for interactive use. It does requires a bit of manual adjusting if the preferences of the package authors do not match yours.
Note that in packages and long-term maintenance scripts you should privilege selective imports, in part because it is hard to predict new exported functions in future releases. Having several packages imported in bulk might give rise to unexpected masking over time.
More generally a good rule is to rely on a single attached package and selectively import the rest. To this end the tidyverse package might be handy if you're a heavy tidyverse user because it provides a single import point for several packages.
Finally it seems from your question that you think that the order of attached packages might have side effects inside other packages. This is nothing to worry about because all packages have their own contexts. The import scheme will only affect your script.
You can also now use the conflict_prefer() function from the conflicted package to specify which package's function should "win" and which should be masked when there are conflicting function names (details here). In your example, you would run
conflict_prefer("slice", "dplyr", "xgboost")
right after loading your libraries. Then when you run slice, it will default to using dplyr::slice rather than xgboost::slice. Or you can simply run
conflict_prefer("slice", "dplyr")
if you want to give dplyr::slice precedence over all other packages' slice functions.
I know this is a silly answering and this thread is very old (but I only had the same issue today): I changed the sequence of loading the packages. Personally, I was having a problem with MASS and Dplyr "select" function. I would like to use always Dplyr version. So I loaded MASS first!
In Python we have chance to import a certain function from a library with a command "import function from library as smth. Do we have something similar in R?
I know that we can call the function like "library::function()", my question mostly refers to the "as" part.
It is not common and not necessary to do this in R. The assignment operator <- can be used to give a new name to an existing function. For example, one could define a function that does exactly the same as lubridate's, year() function with:
asYear <- lubridate::year
One could argue that, by doing so, the year() function has been "imported" from the lubridate package and that it is now called asYear(). In fact, the new function does just the same (which is no surprise, simply because it is the same):
asYear(Sys.Date())
#[1] 2016
So it is possible to construct an analogy to "from package import as", but it is not a good idea to do this. Here are a few reasons I can think of:
Debugging a code where library functions have been renamed will be
much more difficult.
The documentation is not available for the renamed function. In this example, ?asYear won't work, in contrast to ?lubridate::year or library(lubridate); help(year).
The function is not only renamed but it is copied, which clutters the environment and is inefficient in terms of memory usage.
The maintenance of the code becomes unnecessarily difficult. If another programmer (or the original programmer a few years later) looks at a code containing such a redefinition of a function, it will be harder for her or him to understand what this function is doing.
There are probably more reasons, but I hope that this is sufficient to discourage the use of such a construction. Different programming languages have different peculiarities, and as a programmer it is necessary to adapt to them. What is common in Python can be awkward in R, and vice versa.
A simple and commonly used way to handle such a standard situation in R is to use library() to load the entire namespace of the package containing the requested function:
library (lubridate)
year(Sys.Date())
However, one should be aware of possible namespace clashes, especially if many libraries are loaded simultaneously. Different functions could be defined with the same name in different packages. A well-known example thereof are the contrasting implementations of the lag() function in the dplyr and stats package.
In such cases one can use the double colon operator :: to resolve the namespace that should be addressed. This would be similar to the use of "from" in the case of "import", but such a specification would be needed each time the function is called.
lubridate::year(Sys.Date())
#[1] 2016
[Revised based on suggestion of exporting names.]
I have been working on an R package that is nearing about 100 functions, maybe more.
I want to have, say, 10 visible functions and each may have 10 "invisible" sub-functions.
Is there an easy way to select which functions are visible, and which are not?
Also, in the interest of avoiding 'diff', is there a command like "all.equal" that can be applied to two different packages to see where they differ?
You can make a file called NAMESPACE in the base directory of your package. In this you can define which functions you want to export to the user, and you can also import functions from other packages. Exporting will make a function usable, and import will transfer a function from another package to you without making it available to the user (useful if you just need one function and don't want to require your users to load another package when they load yours).
A trunctuated part of my packages NAMESPACE :
useDynLib(qgraph)
export(qgraph)
(...)
importFrom(psych,"principal")
(...)
import(plyr)
which respectively loads the compiled functions, makes the function qgraph() available, imports from psych the principal function and imports from plyr all functions that are exported in plyr's NAMESPACE.
For more details read:
http://cran.r-project.org/doc/manuals/R-exts.pdf
I think you should organise your package and code the way you feel most comfortable with; it is your package after all. NAMESPACE can be used to control what gets exposed or not to the user up-front, as other's have mentioned, and you don't need to document all the functions, just the main user-called functions, by adding \alias{} tags to the Rd files for all the support functions you don't want people to know too much about, or hide them on an package.internals.Rd man page.
That being said, if you want people to help develop your package, or run with it and do amazing things, the better organised it is the easier that job will be. So lay out your functions logically, perhaps one file per function, named after the function name, or group all the related functions into a single R file for example. But be consistent in which approach you do.
If you have generic functions that have more general use, consider splitting those functions out into a separate package that others can use, without having to depend on your mega package with the extra cruft that is more specific. Your package can then depend on this generic package, as can packages of other authors. But don't split packages up just for the sake of making them smaller.
The answer is almost certainly to create a package. Some rules of thumb may help in your design choice:
A package should solve one problem
If you have functions that solve a different problem, put them in a separate package
For example, have a look at the ggplot2 package:
ggplot2 is a package that creates wonderful graphics
It imports plyr, a package that gives a consistent syntax and approach to solve the Split, Apply, Combine problem
It depends on reshape2, a package with only few functions that turns wide data into long, and vice-versa.
The point is that all of these packages were written by a single author, i.e. Hadley Wickham.
If you do decide to make a package, you can control the visibility of your functions:
Only functions that are exported are directly visible in the namespace
You can additionally mark some functions with the keyword internal, which will prevent them appearing in automatically generated lists of functions.
If you decide to develop your own package, I strongly recommend the devtools package, and reading the devtools wiki
If your reformulated question is about 'how to organise large packages', then this may apply:
NAMESPACE allows for very fine-grained exporting of functions: your user would see 10 visisble functions
even the invisible function are accessible if you or the users 'known', that is done via the ::: triple colon operator
packages do come in all sizes and shapes; one common rule about 'when to split' may be that as soon as you have functionality of use in different contexts
As for diff on packages: Huh? Packages are not usually all that close so that one would need a comparison function. The diff command is indeed quite useful on source code. You could use a hash function on binary code if you really wanted to but I am still puzzled as to why one would want to.