R with roxygen2: How to use a single function from another package? - r

I'm creating an R package that will use a single function from plyr. According to this roxygen2 vignette:
If you are using just a few functions from another package, the
recommended option is to note the package name in the Imports: field
of the DESCRIPTION file and call the function(s) explicitly using ::,
e.g., pkg::fun().
That sounds good. I'm using plyr::ldply() - the full call with :: - so I list plyr in Imports: in my DESCRIPTION file. However, when I use devtools::check() I get this:
* checking dependencies in R code ... NOTE
All declared Imports should be used:
‘plyr’
All declared Imports should be used.
Why do I get this note?
I am able to avoid the note by adding #importFrom dplyr ldply in the file that is using plyr, but then I end but having ldply in my package namespace. Which I do not want, and should not need as I am using plyr::ldply() the single time I use the function.
Any pointers would be appreciated!
(This question might be relevant.)

If ldply() is important for your package's functionality, then you do want it in your package namespace. That is the point of namespace imports. Functions that you need, should be in the package namespace because this is where R will look first for the definition of functions, before then traversing the base namespace and the attached packages. It means that no matter what other packages are loaded or unloaded, attached or unattached, your package will always have access to that function. In such cases, use:
#importFrom plyr ldply
And you can just refer to ldply() without the plyr:: prefix just as if it were another function in your package.
If ldply() is not so important - perhaps it is called only once in a not commonly used function - then, Writing R Extensions 1.5.1 gives the following advice:
If a package only needs a few objects from another package it can use a fully qualified variable reference in the code instead of a formal import. A fully qualified reference to the function f in package foo is of the form foo::f. This is slightly less efficient than a formal import and also loses the advantage of recording all dependencies in the NAMESPACE file (but they still need to be recorded in the DESCRIPTION file). Evaluating foo::f will cause package foo to be loaded, but not attached, if it was not loaded already—this can be an advantage in delaying the loading of a rarely used package.
(I think this advice is actually a little outdated because it is implying more separation between DESCRIPTION and NAMESPACE than currently exists.) It implies you should use #import plyr and refer to the function as plyr::ldply(). But in reality, it's actually suggesting something like putting plyr in the Suggests field of DESCRIPTION, which isn't exactly accommodated by roxygen2 markup nor exactly compliant with R CMD check.
In sum, the official line is that Hadley's advice (which you are quoting) is only preferred for rarely used functions from rarely used packages (and/or packages that take a considerable amount of time to load). Otherwise, just do #importFrom like WRE advises:
Using importFrom selectively rather than import is good practice and recommended notably when importing from packages with more than a dozen exports.

Related

How to declare a dependency on an R package from which you only use S3/S4 methods, but no exports?

Currently I have in my package DESCRIPTION, a dependency on dbplyr:
Imports:
dbplyr,
dplyr
dbplyr is useful almost solely because of the S3 methods it defines: https://github.com/tidyverse/dbplyr/blob/main/NAMESPACE. The actual functions you call to use dbplyr are almost entirely from dplyr.
By putting dbplyr in my Imports, it should automatically get loaded, but not attached, which should be enough to register its S3 methods: https://r-pkgs.org/dependencies-mindset-background.html#sec-dependencies-attach-vs-load.
This seems to work fine, but whenever I R CMD check, it tells me:
N checking dependencies in R code (10.8s)
Namespace in Imports field not imported from: ‘dbplyr’
All declared Imports should be used.
Firstly, why does R CMD check even check this, considering that it often makes sense to load packages without importing them. Secondly, how am I supposed to satisfy R CMD check without loading things into my namespace that I don't want or need?
I am pretty sure two of your assumptions are false.
First, putting Imports: dbplyr into your DESCRIPTION file won't load it, so its methods won't be loaded from that alone. Basically the Imports field in the DESCRIPTION file just guarantees that dbplyr is available to be loaded when requested. If you import something via the NAMESPACE file, that will cause it to be loaded. If you evaluate dbplyr::something that will cause it to be loaded. Executing loadNamespace("dbplyr") is another way, and there are a few others. You may also load some other package that loads it.
Second, I think you have misinterpreted the error message. It isn't saying that you loaded it without importing it (though it would complain about that too), it is saying that it can't detect any use of it in your package, so maybe it shouldn't be a requirement for installing your package.
Unfortunately, the code to detect uses is fallible, so it sometimes misses uses. Examples I've heard about are:
if the package is only used in the default value for a function argument. This has been fixed in R-devel.
if the package is only used during the build to construct some object, e.g. code like someclass <- R6::R6Class( ... ) needs R6, but the check code won't see it because it looks at someclass, not at the source code that created it.
if the use of the package is hidden by specifying the name of the package in a character variable.
if the need for the package is indirect, e.g. you need to use ggplot2::geom_hex. That needs the hexbin package, but ggplot2 only declares it as "Suggested".
These examples come from this discussion: https://github.com/hadley/r-pkgs/issues/828#issuecomment-1421353457 .
The recommended workaround there is to create an object that refers to the imported package explicitly, e.g. putting the line
dummy_r6 <- function() R6::R6Class
into your package is enough to suppress the note without actually loading R6. (It will be loaded if you ever call this function.)
However, your requirement is stronger: you do need to make sure dbplyr is loaded if you want its methods to be used. I'd put something in your .onLoad() function that triggers the load. For example,
.onLoad <- function(lib, pkg) {
# Make sure the dbplyr methods are loaded
loadNamespace("dbplyr")
}
EDITED TO ADD: As pointed out in the comments, there's a bug in the check code that means it won't detect this as being a use of dbplyr. You really need to do both things, e.g.
.onLoad <- function(lib, pkg) {
# Make sure the dbplyr methods are loaded
loadNamespace("dbplyr")
# Work around bug in code checking in R 4.2.2 for use of packages
dummy <- function() dbplyr::across_apply_fns
}
The function used in the dummy construction is arbitrary; it probably doesn't even need to exist, but I chose one that does.

Why library() or require() should not be used in a R package

My goal is to create R package which use other library such as grid and ggplot2.
According to
https://tinyheero.github.io/jekyll/update/2015/07/26/making-your-first-R-package.html, it is said that library() or require() should not be used in a R package.
My questions are:
1)Is there a reason? (because, although I put library("ggplot2") and library("grid") in my R script in my package, it still worked).
2)Do I have to delete library("ggplot2") and library("grid") in my code and put "::" such as ggplot2::geom.segment()?
Is there an efficient way to convert script to the one for package?
You should never use library() or require() in a package, because they affect the user's search list, possibly causing errors for the user.
For example, both the dplyr and stats packages export a function called filter. If a user had only library(stats), then filter would mean stats::filter, but if your package called library(dplyr), the user might suddenly find that filter means dplyr::filter, and things would break.
There are a couple of alternatives for your package. You can import functions from another package by listing it in the Imports: field in the DESCRIPTION file and specifying the imports in the NAMESPACE file. (The roxygen2 package can make these changes for you automatically if you put appropriate comments in your .R source files, e.g.
#' #importFrom jsonlite toJSON unbox
before a function that uses those to import toJSON() and unbox() from the jsonlite package.)
The other way to do it is using the :: notation. Then you can still list a package in the Imports: field of DESCRIPTION, but use code like
jsonlite::toJSON(...)
every time you want to call it. Alternatively, if you don't want a strong dependence on jsonlite, you can put jsonlite in Suggests:, and wrap any uses of it in code like
if (requireNamespace("jsonlite")) {
jsonlite::toJSON(...)
}
Then people who don't have that package will still be able to run your function, but it may skip some operations that require jsonlite.

Name space of base package needed?

Writing an R-package I use name spaces to use functions from existing packages, e.g. raster::writeRaster(...).
However, I am wondering if functions from the base package have also be used like this, e.g. base::sum(...). This might end up in very confusing code parts:
foo[base::which(base::sapply(bar, function())]
No you don't need to reference base packages like this. You only need to reference non-base packages to ensure they are loaded into the function environment when functions from your package are run, either by using :: or #import in the Roxegen notes at the top of your script. See why you don't need to reference base packages below:
http://adv-r.had.co.nz/Environments.html
"Package namespaces keep packages independent. For example, if package A uses the base mean() function, what happens if package B creates its own mean() function? Namespaces ensure that package A continues to use the base mean() function, and that package A is not affected by package B (unless explicitly asked for)."(Hadley Wickham)
The only time you need to reference base:: is if the namespace for your package contains a package that has an alternative function of the same name.

How to use S3 methods from another package which uses export rather than S3method in its namespace without using Depends or library()

I'm working on an R package at present and trying to follow the best practice guidelines provided by Hadley Wickham at http://r-pkgs.had.co.nz. As part of this, I'm aiming to have all of the package dependencies within the Imports section of the DESCRIPTION file rather than the Depends since I agree with the philosophy of not unnecessarily altering the global environment (something that many CRAN and Bioconductor packages don't seem to follow).
I want to use functions within the Bioconductor package rhdf5 within one of my package functions, in particular h5write(). The issue I've now run into is that it doesn't have its S3 methods declared as such in its NAMESPACE. They are declared using (e.g.)
export(h5write.default)
export(h5writeDataset.matrix)
rather than
S3method(h5write, default)
S3method(h5writeDataset, matrix)
The generic h5write is defined as:
h5write <- function(obj, file, name, ...) {
res <- UseMethod("h5write")
invisible(res)
}
In practice, this means that calls to rhdf5::h5write fail because there is no appropriate h5write method registered.
As far as I can see, there are three solutions to this:
Use Depends rather than Imports in the DESCRIPTION file.
Use library("rhdf5") or require("rhdf5") in the code for the relevant function.
Amend the NAMESPACE file for rhdf5 to use S3methods() rather than export().
All of these have disadvantages. Option 1 means that the package is loaded and attached to the global environment even if the relevant function in my package is never called. Option 2 means use of library in a package, which while again attaches the package to the global environment, and is also deprecated per Hadley Wickham's guidelines. Option 3 would mean relying on the other package author to update their package on Bioconductor and also means that the S3 methods are no longer exported which could in turn break other packages which rely on calling them explicitly.
Have I missed another alternative? I've looked elsewhere on StackOverflow and found the following somewhat relevant questions Importing S3 method from another package and
How to export S3 method so it is available in namespace? but nothing that directly addresses my issue. Of note, the key difference from the first of these two is that the generic and the method are both in the same package, but the issue is the use of export rather than S3method.
Sample code to reproduce the error (without needing to create a package):
loadNamespace("rhdf5")
rdhf5::h5write(1:4, "test.h5", "test")
Error in UseMethod("h5write") :
no applicable method for 'h5write' applied to an object of class
"c('integer', 'numeric')
Alternatively, there is a skeleton package at https://github.com/NikNakk/s3issuedemo which provides a single function demonstrateIssue() which reproduces the error message. It can be installed using devtools::install_github("NikNakk/s3issuedemo").
The key here is to import the specific methods in addition to the generic you want to use. Here is how you can get it to work for the default method.
Note: this assumes that the test.h5 file already exists.
#' #importFrom rhdf5 h5write.default
#' #importFrom rhdf5 h5write
#' #export
myFun <- function(){
h5write(1:4, "test.h5", "test")
}
I also have put up my own small package demonstrating this here.

R: selective import with importFrom: namespace issues [duplicate]

The "Writing R Extensions" manual provides the following guidance on when to use Imports or Depends:
The general rules are
Packages whose namespace only is needed to load the package using library(pkgname) must be listed in the ‘Imports’ field and not in the
‘Depends’ field.
Packages that need to be attached to successfully load the package using library(pkgname) must be listed in the ‘Depends’ field, only.
Can someone provide a bit more clarity on this? How do I know when my package only needs namespaces loaded versus when I need a package to be attached? What are examples of both? I think the typical package is just a collection of functions that sometimes call functions in other packages (where some bit of work has already been coded-up). Is this scenario 1 or 2 above?
Edit
I wrote a blog post with a section on this specific topic (search for 'Imports v Depends'). The visuals make it a lot easier to understand.
"Imports" is safer than "Depends" (and also makes a package using it a 'better citizen' with respect to other packages that do use "Depends").
A "Depends" directive attempts to ensure that a function from another package is available by attaching the other package to the main search path (i.e. the list of environments returned by search()). This strategy can, however, be thwarted if another package, loaded later, places an identically named function earlier on the search path. Chambers (in SoDA) uses the example of the function "gam", which is found in both the gam and mgcv packages. If two other packages were loaded, one of them depending on gam and one depending on mgcv, the function found by calls to gam() would depend on the order in which they those two packages were attached. Not good.
An "Imports" directive should be used for any supporting package whose functions are to be placed in <imports:packageName> (searched immediately after <namespace:packageName>), instead of on the regular search path. If either one of the packages in the example above used the "Imports" mechanism (which also requires import or importFrom directives in the NAMESPACE file), matters would be improved in two ways. (1) The package would itself gain control over which mgcv function is used. (2) By keeping the main search path clear of the imported objects, it would not even potentially break the other package's dependency on the other mgcv function.
This is why using namespaces is such a good practice, why it is now enforced by CRAN, and (in particular) why using "Imports" is safer than using "Depends".
Edited to add an important caveat:
There is one unfortunately common exception to the advice above: if your package relies on a package A which itself "Depends" on another package B, your package will likely need to attach A with a "Depends directive.
This is because the functions in package A were written with the expectation that package B and its functions would be attached to the search() path.
A "Depends" directive will load and attach package A, at which point package A's own "Depends" directive will, in a chain reaction, cause package B to be loaded and attached as well. Functions in package A will then be able to find the functions in package B on which they rely.
An "Imports" directive will load but not attach package A and will neither load nor attach package B. ("Imports", after all, expects that package writers are using the namespace mechanism, and that package A will be using "Imports" to point to any functions in B that it need access to.) Calls by your functions to any functions in package A which rely on functions in package B will consequently fail.
The only two solutions are to either:
Have your package attach package A using a "Depends" directive.
Better in the long run, contact the maintainer of package A and ask them to do a more careful job of constructing their namespace (in the words of Martin Morgan in this related answer).
Hadley Wickham gives an easy explanation (http://r-pkgs.had.co.nz/namespace.html):
Listing a package in either Depends or Imports ensures that it’s
installed when needed. The main difference is that where Imports just
loads the package, Depends attaches it. There are no other
differences. [...]
Unless there is a good reason otherwise, you should always list
packages in Imports not Depends. That’s because a good package is
self-contained, and minimises changes to the global environment
(including the search path). The only exception is if your package is
designed to be used in conjunction with another package. For example,
the analogue package builds on top of vegan. It’s not useful without
vegan, so it has vegan in Depends instead of Imports. Similarly,
ggplot2 should really Depend on scales, rather than Importing it.
Chambers in SfDA says to use 'Imports' when this package uses a 'namespace' mechanism and since all packages are now required to have them, then the answer might now be always use 'Imports'. In the past packages could have been loaded without actually having namespaces and in that case you would need to have used Depends.
Here is a simple question to help you decide which to use:
Does your package require the end user to have direct access to the functions of another package?
NO -> Imports (most common answer)
YES -> Depends
The only time you should use 'Depends' is when your package is an add-on or companion to another package, where your end user will be using functions from both your package and the 'Depends' package in their code. If your end user will only be interfacing with your functions, and the other package will only be doing work behind the scenes, then use 'Imports' instead.
The caveat to this is that if you add a package to 'Imports', as you usually should, your code will need to refer to functions from that package, using the full namespace syntax, e.g. dplyr::mutate(), instead of just mutate(). It makes the code a little clunkier to read, but it’s a small price to pay for better package hygiene.

Resources