Run-time vs develop-time dependencies in R - r

I'm developing a package (golem) in R, and it returns a NOTE about excess package in an Import (DESCRIPTION):
checking package dependencies … NOTE
Imports includes 34 non-default packages.
Importing from so many packages makes the package vulnerable to any of
them becoming unavailable. Move as many as possible to Suggests and
use conditionally.
I have allocated some packages in Suggests (DESCRIPTION), like this:
usethis::use_package(package = "ggplot2", type = "Suggests")
usethis::use_package(package = "MASS", type = "Suggests")
I would like to know :
What is the difference between Imports (run-time) vs Suggests (develop-time) and if the latter has anything to do with the term "compile time" of other programming languages.
How do I know a package is needed by the user at runtime? Is there any universal rule for this (like a phrase to help you know)? And for Suggests?

In R, packages listed in the Imports clause of the DESCRIPTION file must be available or your package won't load. Normally they will all be loaded when your package is loaded, though it's possible to delay that by not importing anything, just using :: notation to access them.
Packages listed in the Suggests clause don't need to be available, and won't be automatically loaded. To access their functions, you normally call requireNamespace() to find out if the package is available, and if so use :: for access. If it is not available, your package should fail gracefully in whatever the user was trying to do, letting them know that they need to install the missing package if they want the task to succeed.
These aren't really "run-time" versus "develop-time" differences. It's all run-time.
There are two things in R that might be called "compile-time" in other languages. The best match is installing your package. That configures it to the particular R version and platform it is running on. R also has a "just-in-time" compiler that optimizes functions, but other than a bit of a speed increase that is pretty much invisible to the user.
I think #r2evans answered your second question clearly in a comment: the user needs a package to use functions that use that package. If some of your functions that use it are unlikely to be used by most users, use Suggests, and add the test.

Related

R testthat: use external package only in test file - not in DESCRIPTION

I'm trying to run a testthat script using GitHub Actions.
I would like to test a functionality of my function that allows it to be combined with (many) external packages. Now I want to test these external packages for the R CMD Check but I don't want to load the external packages generally (i.e. putting them into the Description) - after all, most people will not use these external packages.
Any ideas how to just include an external package in the testing files but not in the DESCRIPTION?
Thanks!
I think you describe a very standard use of Suggests.
I see two related but separable issues:
You want to test something using CI, in this case GHA. That is fine. Because you control the execution of the code, you could move your code from the test runner to, say, inst/examples and call it explicitly. That way the standard check of 'is the package using undeclared code' passes as inst/examples is not checked
You want to not force other people to have to load these packages. That is fine too, and we have Suggests: for this! Read Section 1.1 of Writing R Extensions about all the detailed semantics. If your package invokes other packages via tests, the every R CMD check touches this (and the external packages) so they must be declared. But you already know that only "some" people will want to use this "some of the time": that is precisely what Suggests: does, and you bracket the use with if (requireNamespace(pkgHere, quietly=TRUE)).
You can go either way, or even combine both. But you cannot call packages from tests and not declare them.

Use ::: or export everything when developing multiple, related R packages?

I am working on a family of R packages, all of which share substantial common code which is housed in an internal package, lets call it myPackageUtilities. So I have several packages
myPackage1, myPackage2, etc...
All of these packages depend on every method in myPackageUtilities. For a real-world example, please see statnet on CRAN. Each dependent package, of course, has
Depends: myPackageUtilities
in its DESCRIPTION file.
My question is: In the R code for myPackage1, which of the following two techniques for accessing methods from myPackageUtilities is preferable:
Use ::: to access the methods in myPackageUtilities, or
Export everything from myPackageUtilities (e.g. by including exportPattern("^[^\\.]") in the NAMESPACE)?
Option 2 clutters the end-user's search path, but the R gurus recommend against using :::.
Follow-up question: If (2) is the better choice, is there a way to export everything using roxygen2?
Suppose we have a package called randomUtils and this package has a function called sd that calculates the Slytherin Defiance Quotient.
Now I write a package called spellbound. If spellbound Depends on randomUtils, then randomUtils::sd will be found in the search path and can conflict with calculating standard deviation.
If spellbound Imports randomUtils, however, then R will install randomUtils but will not load it when spellbound is loaded. Thus, the new version of sd can't be found on the search path, but can still be accessed by randomUtils::sd
With an ever growing body of contributed work on CRAN, it is becoming very important that we use Imports as much as possible so that we don't introduce unexpected behaviors by having conflicting function definitions.
An example of when I have used Depends: when writing the HydeNet package, I wanted the end user to be able to use the rjags package in concert with HydeNet. So I put rjags in Depends so that library (HydeNet) would loaf both packages. (In other words, put rjags on the search path.
Moral of the story, if you don't intend for the user to directly access the functions, it should go in Imports.

I'm writing a package. How can make it such that when library(my_package) is called, other packages are loaded as well?

Title should be pretty clear I hope. I'm writing a package called forecasting, with imports for dplyr among other packages. With the imports written in to the DESCRIPTION file, I am able to force these other packages to be installed along with forecasting - is there an equivalent way to do this for the loading of the package? In other words, is there a way that when I load my package with library(forecasting), it automatically also loads dplyr and the other packages?
Thanks
Yes.
Re-read "Writing R Extensions". The Depends: forces both the initial installation as well as the loading of the depended-upon packages.
But these days you want Imports: along with importFrom() in the NAMESPACE file which is more fine-grained.
But first things first: get it working with Depends.
Edit:
Opps you're correct, the documentation I referenced is not a primary source. Perhaps this is better:
From the R documentation:
The ‘Depends’ field gives a comma-separated list of package names which this package depends on. Those packages will be attached before the current package when library or require is called.
and
The ‘Imports’ field lists packages whose namespaces are imported from (as specified in the NAMESPACE file) but which do not need to be attached. Namespaces accessed by the ‘::’ and ‘:::’ operators must be listed here, or in ‘Suggests’ or ‘Enhances’
Original:
From the R packages documentation:
Adding a package dependency here [the DESCRIPTION file] ensures that it’ll be installed. However, it does not mean that it will be attached along with your package (i.e., library(x)). The best practice is to explicitly refer to external functions using the syntax package::function(). This makes it very easy to identify which functions live outside of your package. This is especially useful when you read your code in the future.

R: selective import with importFrom: namespace issues [duplicate]

The "Writing R Extensions" manual provides the following guidance on when to use Imports or Depends:
The general rules are
Packages whose namespace only is needed to load the package using library(pkgname) must be listed in the ‘Imports’ field and not in the
‘Depends’ field.
Packages that need to be attached to successfully load the package using library(pkgname) must be listed in the ‘Depends’ field, only.
Can someone provide a bit more clarity on this? How do I know when my package only needs namespaces loaded versus when I need a package to be attached? What are examples of both? I think the typical package is just a collection of functions that sometimes call functions in other packages (where some bit of work has already been coded-up). Is this scenario 1 or 2 above?
Edit
I wrote a blog post with a section on this specific topic (search for 'Imports v Depends'). The visuals make it a lot easier to understand.
"Imports" is safer than "Depends" (and also makes a package using it a 'better citizen' with respect to other packages that do use "Depends").
A "Depends" directive attempts to ensure that a function from another package is available by attaching the other package to the main search path (i.e. the list of environments returned by search()). This strategy can, however, be thwarted if another package, loaded later, places an identically named function earlier on the search path. Chambers (in SoDA) uses the example of the function "gam", which is found in both the gam and mgcv packages. If two other packages were loaded, one of them depending on gam and one depending on mgcv, the function found by calls to gam() would depend on the order in which they those two packages were attached. Not good.
An "Imports" directive should be used for any supporting package whose functions are to be placed in <imports:packageName> (searched immediately after <namespace:packageName>), instead of on the regular search path. If either one of the packages in the example above used the "Imports" mechanism (which also requires import or importFrom directives in the NAMESPACE file), matters would be improved in two ways. (1) The package would itself gain control over which mgcv function is used. (2) By keeping the main search path clear of the imported objects, it would not even potentially break the other package's dependency on the other mgcv function.
This is why using namespaces is such a good practice, why it is now enforced by CRAN, and (in particular) why using "Imports" is safer than using "Depends".
Edited to add an important caveat:
There is one unfortunately common exception to the advice above: if your package relies on a package A which itself "Depends" on another package B, your package will likely need to attach A with a "Depends directive.
This is because the functions in package A were written with the expectation that package B and its functions would be attached to the search() path.
A "Depends" directive will load and attach package A, at which point package A's own "Depends" directive will, in a chain reaction, cause package B to be loaded and attached as well. Functions in package A will then be able to find the functions in package B on which they rely.
An "Imports" directive will load but not attach package A and will neither load nor attach package B. ("Imports", after all, expects that package writers are using the namespace mechanism, and that package A will be using "Imports" to point to any functions in B that it need access to.) Calls by your functions to any functions in package A which rely on functions in package B will consequently fail.
The only two solutions are to either:
Have your package attach package A using a "Depends" directive.
Better in the long run, contact the maintainer of package A and ask them to do a more careful job of constructing their namespace (in the words of Martin Morgan in this related answer).
Hadley Wickham gives an easy explanation (http://r-pkgs.had.co.nz/namespace.html):
Listing a package in either Depends or Imports ensures that it’s
installed when needed. The main difference is that where Imports just
loads the package, Depends attaches it. There are no other
differences. [...]
Unless there is a good reason otherwise, you should always list
packages in Imports not Depends. That’s because a good package is
self-contained, and minimises changes to the global environment
(including the search path). The only exception is if your package is
designed to be used in conjunction with another package. For example,
the analogue package builds on top of vegan. It’s not useful without
vegan, so it has vegan in Depends instead of Imports. Similarly,
ggplot2 should really Depend on scales, rather than Importing it.
Chambers in SfDA says to use 'Imports' when this package uses a 'namespace' mechanism and since all packages are now required to have them, then the answer might now be always use 'Imports'. In the past packages could have been loaded without actually having namespaces and in that case you would need to have used Depends.
Here is a simple question to help you decide which to use:
Does your package require the end user to have direct access to the functions of another package?
NO -> Imports (most common answer)
YES -> Depends
The only time you should use 'Depends' is when your package is an add-on or companion to another package, where your end user will be using functions from both your package and the 'Depends' package in their code. If your end user will only be interfacing with your functions, and the other package will only be doing work behind the scenes, then use 'Imports' instead.
The caveat to this is that if you add a package to 'Imports', as you usually should, your code will need to refer to functions from that package, using the full namespace syntax, e.g. dplyr::mutate(), instead of just mutate(). It makes the code a little clunkier to read, but it’s a small price to pay for better package hygiene.

Better explanation of when to use Imports/Depends

The "Writing R Extensions" manual provides the following guidance on when to use Imports or Depends:
The general rules are
Packages whose namespace only is needed to load the package using library(pkgname) must be listed in the ‘Imports’ field and not in the
‘Depends’ field.
Packages that need to be attached to successfully load the package using library(pkgname) must be listed in the ‘Depends’ field, only.
Can someone provide a bit more clarity on this? How do I know when my package only needs namespaces loaded versus when I need a package to be attached? What are examples of both? I think the typical package is just a collection of functions that sometimes call functions in other packages (where some bit of work has already been coded-up). Is this scenario 1 or 2 above?
Edit
I wrote a blog post with a section on this specific topic (search for 'Imports v Depends'). The visuals make it a lot easier to understand.
"Imports" is safer than "Depends" (and also makes a package using it a 'better citizen' with respect to other packages that do use "Depends").
A "Depends" directive attempts to ensure that a function from another package is available by attaching the other package to the main search path (i.e. the list of environments returned by search()). This strategy can, however, be thwarted if another package, loaded later, places an identically named function earlier on the search path. Chambers (in SoDA) uses the example of the function "gam", which is found in both the gam and mgcv packages. If two other packages were loaded, one of them depending on gam and one depending on mgcv, the function found by calls to gam() would depend on the order in which they those two packages were attached. Not good.
An "Imports" directive should be used for any supporting package whose functions are to be placed in <imports:packageName> (searched immediately after <namespace:packageName>), instead of on the regular search path. If either one of the packages in the example above used the "Imports" mechanism (which also requires import or importFrom directives in the NAMESPACE file), matters would be improved in two ways. (1) The package would itself gain control over which mgcv function is used. (2) By keeping the main search path clear of the imported objects, it would not even potentially break the other package's dependency on the other mgcv function.
This is why using namespaces is such a good practice, why it is now enforced by CRAN, and (in particular) why using "Imports" is safer than using "Depends".
Edited to add an important caveat:
There is one unfortunately common exception to the advice above: if your package relies on a package A which itself "Depends" on another package B, your package will likely need to attach A with a "Depends directive.
This is because the functions in package A were written with the expectation that package B and its functions would be attached to the search() path.
A "Depends" directive will load and attach package A, at which point package A's own "Depends" directive will, in a chain reaction, cause package B to be loaded and attached as well. Functions in package A will then be able to find the functions in package B on which they rely.
An "Imports" directive will load but not attach package A and will neither load nor attach package B. ("Imports", after all, expects that package writers are using the namespace mechanism, and that package A will be using "Imports" to point to any functions in B that it need access to.) Calls by your functions to any functions in package A which rely on functions in package B will consequently fail.
The only two solutions are to either:
Have your package attach package A using a "Depends" directive.
Better in the long run, contact the maintainer of package A and ask them to do a more careful job of constructing their namespace (in the words of Martin Morgan in this related answer).
Hadley Wickham gives an easy explanation (http://r-pkgs.had.co.nz/namespace.html):
Listing a package in either Depends or Imports ensures that it’s
installed when needed. The main difference is that where Imports just
loads the package, Depends attaches it. There are no other
differences. [...]
Unless there is a good reason otherwise, you should always list
packages in Imports not Depends. That’s because a good package is
self-contained, and minimises changes to the global environment
(including the search path). The only exception is if your package is
designed to be used in conjunction with another package. For example,
the analogue package builds on top of vegan. It’s not useful without
vegan, so it has vegan in Depends instead of Imports. Similarly,
ggplot2 should really Depend on scales, rather than Importing it.
Chambers in SfDA says to use 'Imports' when this package uses a 'namespace' mechanism and since all packages are now required to have them, then the answer might now be always use 'Imports'. In the past packages could have been loaded without actually having namespaces and in that case you would need to have used Depends.
Here is a simple question to help you decide which to use:
Does your package require the end user to have direct access to the functions of another package?
NO -> Imports (most common answer)
YES -> Depends
The only time you should use 'Depends' is when your package is an add-on or companion to another package, where your end user will be using functions from both your package and the 'Depends' package in their code. If your end user will only be interfacing with your functions, and the other package will only be doing work behind the scenes, then use 'Imports' instead.
The caveat to this is that if you add a package to 'Imports', as you usually should, your code will need to refer to functions from that package, using the full namespace syntax, e.g. dplyr::mutate(), instead of just mutate(). It makes the code a little clunkier to read, but it’s a small price to pay for better package hygiene.

Resources