This question is related to Rcmd check in R-Devel (3.1.0).
I am maintaining a package, call it A, that "Depends" on another package, let me call this second package B. I have used "Depends" instead of "import" for the following reasons:
most people using package A also use package B.
package A extensively use function from package B.
package B, itself, "depends" on other packages.
Package A make use of "unexported" function of package B and Rcmd check complaints about the following: Unexported objects imported by ':::' calls (as a NOTE, but I want to remove all of them). My question is: how should I properly handle this note? How should I properly make use of "unexported" function from another package?
I know that we should not use "unexported" function, but I am involved in the development of both packages.
Related
I would like to know of some packages that make use of assignInMyNamespace and what it's used for, if it is even advisable to use this function in production code. The help page gives the following information:
assignInMyNamespace is intended to be called from functions within a package, and chooses the namespace as the environment of the function calling it.
However it also gives the following warning about assignInNamespace:
assignInNamespace should not be used in final code, and will in future throw an error if called from a package. Already certain uses are disallowed.
Presumably this is because packages shouldn't ever try to change the namespaces of other packages, which is why the warning doesn't apply to assignInMyNamespace. Is this true?
NB: I am developing a package with an unexported testing function that allows any unexported function in the package to be temporarily replaced with one that saves its inputs and outputs. I am also considering such a technique for switching between memoised / un-memoised versions of functions.
EDIT: In practise, assignInMyNamespace only changes unexported functions - not 'any function in the package' as previously stated. I only realised this recently, and it's actually thrown a spanner in the works with the package I'm developing. Therefore I would also be very interested to know if there is a solution to the problem that works across both exported and unexported functions during package use.
Writing an R-package I use name spaces to use functions from existing packages, e.g. raster::writeRaster(...).
However, I am wondering if functions from the base package have also be used like this, e.g. base::sum(...). This might end up in very confusing code parts:
foo[base::which(base::sapply(bar, function())]
No you don't need to reference base packages like this. You only need to reference non-base packages to ensure they are loaded into the function environment when functions from your package are run, either by using :: or #import in the Roxegen notes at the top of your script. See why you don't need to reference base packages below:
http://adv-r.had.co.nz/Environments.html
"Package namespaces keep packages independent. For example, if package A uses the base mean() function, what happens if package B creates its own mean() function? Namespaces ensure that package A continues to use the base mean() function, and that package A is not affected by package B (unless explicitly asked for)."(Hadley Wickham)
The only time you need to reference base:: is if the namespace for your package contains a package that has an alternative function of the same name.
I'm creating an R package that will use a single function from plyr. According to this roxygen2 vignette:
If you are using just a few functions from another package, the
recommended option is to note the package name in the Imports: field
of the DESCRIPTION file and call the function(s) explicitly using ::,
e.g., pkg::fun().
That sounds good. I'm using plyr::ldply() - the full call with :: - so I list plyr in Imports: in my DESCRIPTION file. However, when I use devtools::check() I get this:
* checking dependencies in R code ... NOTE
All declared Imports should be used:
‘plyr’
All declared Imports should be used.
Why do I get this note?
I am able to avoid the note by adding #importFrom dplyr ldply in the file that is using plyr, but then I end but having ldply in my package namespace. Which I do not want, and should not need as I am using plyr::ldply() the single time I use the function.
Any pointers would be appreciated!
(This question might be relevant.)
If ldply() is important for your package's functionality, then you do want it in your package namespace. That is the point of namespace imports. Functions that you need, should be in the package namespace because this is where R will look first for the definition of functions, before then traversing the base namespace and the attached packages. It means that no matter what other packages are loaded or unloaded, attached or unattached, your package will always have access to that function. In such cases, use:
#importFrom plyr ldply
And you can just refer to ldply() without the plyr:: prefix just as if it were another function in your package.
If ldply() is not so important - perhaps it is called only once in a not commonly used function - then, Writing R Extensions 1.5.1 gives the following advice:
If a package only needs a few objects from another package it can use a fully qualified variable reference in the code instead of a formal import. A fully qualified reference to the function f in package foo is of the form foo::f. This is slightly less efficient than a formal import and also loses the advantage of recording all dependencies in the NAMESPACE file (but they still need to be recorded in the DESCRIPTION file). Evaluating foo::f will cause package foo to be loaded, but not attached, if it was not loaded already—this can be an advantage in delaying the loading of a rarely used package.
(I think this advice is actually a little outdated because it is implying more separation between DESCRIPTION and NAMESPACE than currently exists.) It implies you should use #import plyr and refer to the function as plyr::ldply(). But in reality, it's actually suggesting something like putting plyr in the Suggests field of DESCRIPTION, which isn't exactly accommodated by roxygen2 markup nor exactly compliant with R CMD check.
In sum, the official line is that Hadley's advice (which you are quoting) is only preferred for rarely used functions from rarely used packages (and/or packages that take a considerable amount of time to load). Otherwise, just do #importFrom like WRE advises:
Using importFrom selectively rather than import is good practice and recommended notably when importing from packages with more than a dozen exports.
The "Writing R Extensions" manual provides the following guidance on when to use Imports or Depends:
The general rules are
Packages whose namespace only is needed to load the package using library(pkgname) must be listed in the ‘Imports’ field and not in the
‘Depends’ field.
Packages that need to be attached to successfully load the package using library(pkgname) must be listed in the ‘Depends’ field, only.
Can someone provide a bit more clarity on this? How do I know when my package only needs namespaces loaded versus when I need a package to be attached? What are examples of both? I think the typical package is just a collection of functions that sometimes call functions in other packages (where some bit of work has already been coded-up). Is this scenario 1 or 2 above?
Edit
I wrote a blog post with a section on this specific topic (search for 'Imports v Depends'). The visuals make it a lot easier to understand.
"Imports" is safer than "Depends" (and also makes a package using it a 'better citizen' with respect to other packages that do use "Depends").
A "Depends" directive attempts to ensure that a function from another package is available by attaching the other package to the main search path (i.e. the list of environments returned by search()). This strategy can, however, be thwarted if another package, loaded later, places an identically named function earlier on the search path. Chambers (in SoDA) uses the example of the function "gam", which is found in both the gam and mgcv packages. If two other packages were loaded, one of them depending on gam and one depending on mgcv, the function found by calls to gam() would depend on the order in which they those two packages were attached. Not good.
An "Imports" directive should be used for any supporting package whose functions are to be placed in <imports:packageName> (searched immediately after <namespace:packageName>), instead of on the regular search path. If either one of the packages in the example above used the "Imports" mechanism (which also requires import or importFrom directives in the NAMESPACE file), matters would be improved in two ways. (1) The package would itself gain control over which mgcv function is used. (2) By keeping the main search path clear of the imported objects, it would not even potentially break the other package's dependency on the other mgcv function.
This is why using namespaces is such a good practice, why it is now enforced by CRAN, and (in particular) why using "Imports" is safer than using "Depends".
Edited to add an important caveat:
There is one unfortunately common exception to the advice above: if your package relies on a package A which itself "Depends" on another package B, your package will likely need to attach A with a "Depends directive.
This is because the functions in package A were written with the expectation that package B and its functions would be attached to the search() path.
A "Depends" directive will load and attach package A, at which point package A's own "Depends" directive will, in a chain reaction, cause package B to be loaded and attached as well. Functions in package A will then be able to find the functions in package B on which they rely.
An "Imports" directive will load but not attach package A and will neither load nor attach package B. ("Imports", after all, expects that package writers are using the namespace mechanism, and that package A will be using "Imports" to point to any functions in B that it need access to.) Calls by your functions to any functions in package A which rely on functions in package B will consequently fail.
The only two solutions are to either:
Have your package attach package A using a "Depends" directive.
Better in the long run, contact the maintainer of package A and ask them to do a more careful job of constructing their namespace (in the words of Martin Morgan in this related answer).
Hadley Wickham gives an easy explanation (http://r-pkgs.had.co.nz/namespace.html):
Listing a package in either Depends or Imports ensures that it’s
installed when needed. The main difference is that where Imports just
loads the package, Depends attaches it. There are no other
differences. [...]
Unless there is a good reason otherwise, you should always list
packages in Imports not Depends. That’s because a good package is
self-contained, and minimises changes to the global environment
(including the search path). The only exception is if your package is
designed to be used in conjunction with another package. For example,
the analogue package builds on top of vegan. It’s not useful without
vegan, so it has vegan in Depends instead of Imports. Similarly,
ggplot2 should really Depend on scales, rather than Importing it.
Chambers in SfDA says to use 'Imports' when this package uses a 'namespace' mechanism and since all packages are now required to have them, then the answer might now be always use 'Imports'. In the past packages could have been loaded without actually having namespaces and in that case you would need to have used Depends.
Here is a simple question to help you decide which to use:
Does your package require the end user to have direct access to the functions of another package?
NO -> Imports (most common answer)
YES -> Depends
The only time you should use 'Depends' is when your package is an add-on or companion to another package, where your end user will be using functions from both your package and the 'Depends' package in their code. If your end user will only be interfacing with your functions, and the other package will only be doing work behind the scenes, then use 'Imports' instead.
The caveat to this is that if you add a package to 'Imports', as you usually should, your code will need to refer to functions from that package, using the full namespace syntax, e.g. dplyr::mutate(), instead of just mutate(). It makes the code a little clunkier to read, but it’s a small price to pay for better package hygiene.
The "Writing R Extensions" manual provides the following guidance on when to use Imports or Depends:
The general rules are
Packages whose namespace only is needed to load the package using library(pkgname) must be listed in the ‘Imports’ field and not in the
‘Depends’ field.
Packages that need to be attached to successfully load the package using library(pkgname) must be listed in the ‘Depends’ field, only.
Can someone provide a bit more clarity on this? How do I know when my package only needs namespaces loaded versus when I need a package to be attached? What are examples of both? I think the typical package is just a collection of functions that sometimes call functions in other packages (where some bit of work has already been coded-up). Is this scenario 1 or 2 above?
Edit
I wrote a blog post with a section on this specific topic (search for 'Imports v Depends'). The visuals make it a lot easier to understand.
"Imports" is safer than "Depends" (and also makes a package using it a 'better citizen' with respect to other packages that do use "Depends").
A "Depends" directive attempts to ensure that a function from another package is available by attaching the other package to the main search path (i.e. the list of environments returned by search()). This strategy can, however, be thwarted if another package, loaded later, places an identically named function earlier on the search path. Chambers (in SoDA) uses the example of the function "gam", which is found in both the gam and mgcv packages. If two other packages were loaded, one of them depending on gam and one depending on mgcv, the function found by calls to gam() would depend on the order in which they those two packages were attached. Not good.
An "Imports" directive should be used for any supporting package whose functions are to be placed in <imports:packageName> (searched immediately after <namespace:packageName>), instead of on the regular search path. If either one of the packages in the example above used the "Imports" mechanism (which also requires import or importFrom directives in the NAMESPACE file), matters would be improved in two ways. (1) The package would itself gain control over which mgcv function is used. (2) By keeping the main search path clear of the imported objects, it would not even potentially break the other package's dependency on the other mgcv function.
This is why using namespaces is such a good practice, why it is now enforced by CRAN, and (in particular) why using "Imports" is safer than using "Depends".
Edited to add an important caveat:
There is one unfortunately common exception to the advice above: if your package relies on a package A which itself "Depends" on another package B, your package will likely need to attach A with a "Depends directive.
This is because the functions in package A were written with the expectation that package B and its functions would be attached to the search() path.
A "Depends" directive will load and attach package A, at which point package A's own "Depends" directive will, in a chain reaction, cause package B to be loaded and attached as well. Functions in package A will then be able to find the functions in package B on which they rely.
An "Imports" directive will load but not attach package A and will neither load nor attach package B. ("Imports", after all, expects that package writers are using the namespace mechanism, and that package A will be using "Imports" to point to any functions in B that it need access to.) Calls by your functions to any functions in package A which rely on functions in package B will consequently fail.
The only two solutions are to either:
Have your package attach package A using a "Depends" directive.
Better in the long run, contact the maintainer of package A and ask them to do a more careful job of constructing their namespace (in the words of Martin Morgan in this related answer).
Hadley Wickham gives an easy explanation (http://r-pkgs.had.co.nz/namespace.html):
Listing a package in either Depends or Imports ensures that it’s
installed when needed. The main difference is that where Imports just
loads the package, Depends attaches it. There are no other
differences. [...]
Unless there is a good reason otherwise, you should always list
packages in Imports not Depends. That’s because a good package is
self-contained, and minimises changes to the global environment
(including the search path). The only exception is if your package is
designed to be used in conjunction with another package. For example,
the analogue package builds on top of vegan. It’s not useful without
vegan, so it has vegan in Depends instead of Imports. Similarly,
ggplot2 should really Depend on scales, rather than Importing it.
Chambers in SfDA says to use 'Imports' when this package uses a 'namespace' mechanism and since all packages are now required to have them, then the answer might now be always use 'Imports'. In the past packages could have been loaded without actually having namespaces and in that case you would need to have used Depends.
Here is a simple question to help you decide which to use:
Does your package require the end user to have direct access to the functions of another package?
NO -> Imports (most common answer)
YES -> Depends
The only time you should use 'Depends' is when your package is an add-on or companion to another package, where your end user will be using functions from both your package and the 'Depends' package in their code. If your end user will only be interfacing with your functions, and the other package will only be doing work behind the scenes, then use 'Imports' instead.
The caveat to this is that if you add a package to 'Imports', as you usually should, your code will need to refer to functions from that package, using the full namespace syntax, e.g. dplyr::mutate(), instead of just mutate(). It makes the code a little clunkier to read, but it’s a small price to pay for better package hygiene.