why roxygen2 does not automatically update "Imports" in DESCRIPTION file? - r

I am trying to follow closely #hadley's book to learn best practices in writing R packages. And I was thrilled to read these lines about the philosophy of the book:
anything that can be automated, should be automated. Do as little as
possible by hand. Do as much as possible with functions.
So when I was reading about dependencies and the (sort of) confusing differences between import directives in the NAMESPACE file and the "Imports:" field in the DESCRIPTION file, I was hoping that roxygen2 would automatically handle both of them. After all
Every package mentioned in NAMESPACE must also be present in the
Imports or Depends fields.
I was hoping that roxygen2 would take every #import in my functions and make sure it is included in the DESCRIPTION file. But it does not do that automatically.
So I either have to add it manually to the DESCRIPTION file or almost manually using devtools::use_package.
Looking around for an answer, I found this question in SO, where #hadley confirms in the comments that
Currently, the namespace roclet will modify NAMESPACE but not
DESCRIPTION
and other posts (e.g. here or here) where collate_roclet is discussed, but "This only matters if your code has side-effects; most commonly because you’re using S4".
I wonder:
the reason that DESCRIPTION is not automatically updated (sort of contradicting the aforementioned philosophy, which is presumably shared by roxygen2) and
If someone has already crafted a way to do it

I have written a little R package for that task:
https://github.com/markusdumke/pkghelper
It scans the R Code and NAMESPACE for packages in use and adds them to the Imports section.

The namespace_roclet edits the NAMESPACE file based on the tags added in the script before the function. As there are three types of dependencies (Depends, Imports, and Suggests), a similar method as used by the namespace_roclet would require three different tags (notice Imports should be a different one, to differentiate it from the packages to attach in NAMESPACE).
If you are willing to take a semi-automated process, you could identify the packages you have used and add the missing ones to DESCRIPTION, in the adequate sections.
library(reinstallr)
package.dir <- getwd()
base_path <- normalizePath(package.dir)
files <- list.files(file.path(base_path, "R"), full.names = TRUE)
packages <- unique(reinstallr:::scan_for_packages(files)$package)
packages

Regarding the two bullets you wonder about at the bottom:
Updates to the DESCRIPTION file could be further automated with additional roclets, however already >4 years ago such a pull request was deferred:
https://github.com/klutometis/roxygen/pull/76
I have to assume that the guys would indeed rather have you use the devtools package for updating the DESCRIPTION file, instead of adding this to roxygen2. So in that sense, devtools would be the first available choice

Related

Read a package NAMESPACE file

I am looking for a fast R solution to read a package NAMESPACE file. The solution should contain already preprocessed (and aggregated) records and separated imports and exports.
Unfortunately I can’t use getNamespaceExports("dplyr")/getNamespaceImports("dplyr") as they need the package to be loaded to the R session, which is too slow.
I need a solution which simply process a text from the NAMESPACE file. Any solutions using external packages as well as partial solutions would still be welcome.
The raw data we could grabbed with a call like readLines("https://raw.githubusercontent.com/cran/dplyr/master/NAMESPACE"). roxygen2 generated files are formatted properly, but this will be not true for all manually generated files.
EDIT:
Thanks to Konrad R. answer I could develop such functionality in my new CRAN package - pacs. I recommended to check pacs::pac_namespace function. There is even one function which goes one step further, comparing NAMESPACE files between different package versions pacs::pac_comapre_namespace.
The function is included in R as base::parseNamespaceFile. Unfortunately the function does not directly take a path as an argument. Instead it constructs the path from a package name and the library location. However, armed with this knowledge you should be able to call it; e.g.:
parseNamespaceFile('dplyr', .libPaths()[1L])
EDIT
Somebody has to remember that the whole packages imports (like import(rlang)) have to be still invoked with the same function and the exports for them extracted. Two core elements are using parse on NAMESPACE code and then using the recursive extract function parseDirective.

R Package: How to automatically generate the `Imports` field in the DESC based on NAMESPACE imports

I'm developing an R package where I frequently use functions from other packages. I know it's usually best practice to be explicit about other packages in your source code (as in dplyr::filter()), but, for example, in plotting functions which are based on ggplot2 it becomes tedious and verbose to write something like ggplot2::ggplot(data, ggplot2::aes(x = ...)), and so on.
Let's say I follow the recommendations and add the #import ggplot2 tag as a roxygen comment in the respective function. Now, this adds import(ggplot2) to the NAMESPACE file, which is nice. But, it does not add ggplot2 to the Imports: section in the DESCRIPTION file, which is exactly what I would like to do. Calling roxygen2::roxygenize() does not do this either.
Note that the same question was basically asked here, but unless I missed something, the question is not answered (after all, the Imports: section is not automatically generated, only the NAMESPACE file).
I am aware of the importance of a correctly maintained NAMESPACE file, and given there are easy and convenient ways to populate it, I guess my main question is whether there is a function to automatically "translate" the NAMESPACE imports into the respective field in the DESCRIPTION. I know you can manually edit your DESCRIPTION file via usethis::use_package("ggplot2") but it seems odd to me to have to specify the same information twice, and in two completely different ways.

Use data.table in functions/packages (With roxygen)

I am quite new to R but it seems, this question is closely related to the following post 1, 2, 3 and a bit different topic 4. Unfortunately, I have not enough reputation to comment right there. My problem is that after going through all the suggestions there, the code still does not work:
I included "Depends" in the description file
I tried the second method including a change of NAMESPACE (Not reproducable)
I created a example package here containing a very small part of the code which showed a bit different error ("J" not found in routes[J(lat1, lng1, lat2, lng2), .I, roll = "nearest", by = .EACHI] instead of 'lat1' not found in routes[order(lat1, lng1, lat2, lng2, time)])
I tested all scripts using the console and R-scripts. There, the code ran without problems.
Thank you very much for your support!
Edit: #Roland
You are right. Roxygen overwrites the namespace. You have to include #' #import data.table to the function. Do you understand, why only inserting Depends: data.table in the DESCRIPTION file does not work? This might be a useful hint in the documentation or did I miss it?
It was missleading that changing to routes <- routes[order("lat1", "lng1", "lat2", "lng2", "time")] helped at least a bit as this line was suddenly no problem any more. Is it correct, that in this case data.frame order is used? I will see how far I get now. I will let you know the final result...
Answering your questions (after edit).
Quoting R exts manual:
Almost always packages mentioned in ‘Depends’ should also be imported from in the NAMESPACE file: this ensures that any needed parts of those packages are available when some other package imports the current package.
So you still should have import in NAMESPACE despite the fact if you depends or import data.table.
The order call doesn't seems to be what you expect, try the following:
order("lat1", "lng1", "lat2", "lng2", "time")
library(data.table)
data.table(a=2:1,b=1:2)[order("a","b")]
In case of issues I recommend to start debugging by writing unit test for your expected results. The most basic way to put unit tests in package is just plain R script in tests directory having stopifnot(...) call. Be aware you need to library/require your package at the start of the script.
This is more in addition to the answers above: I found this to be really useful...
From the docs [Hadley-description](http://r-pkgs.had.co.nz/description.html und)
Imports packages listed here must be present for your package to
work. In fact, any time your package is installed, those packages
will, if not already present, be installed on your computer
(devtools::load_all() also checks that the packages are installed).
Adding a package dependency here ensures that it’ll be installed.
However, it does not mean that it will be attached along with your
package (i.e., library(x)). The best practice is to explicitly refer
to external functions using the syntax package::function(). This
makes it very easy to identify which functions live outside of your
package. This is especially useful when you read your code in the
future.
If you use a lot of functions from other packages this is rather
verbose. There’s also a minor performance penalty associated with
:: (on the order of 5$\mu$s, so it will only matter if you call the
function millions of times).
From the docs Hadley-namespace
NAMESPACE also controls which external functions can be used by your
package without having to use ::. It’s confusing that both
DESCRIPTION (through the Imports field) and NAMESPACE (through import
directives) seem to be involved in imports. This is just an
unfortunate choice of names. The Imports field really has nothing to
do with functions imported into the namespace: it just makes sure the
package is installed when your package is. It doesn’t make functions
available. You need to import functions in exactly the same way
regardless of whether or not the package is attached.
... this is what I recommend: list the package in DESCRIPTION so that it’s
installed, then always refer to it explicitly with pkg::fun().
Unless there is a strong reason not to, it’s better to be explicit.
It’s a little more work to write, but a lot easier to read when you
come back to the code in the future. The converse is not true. Every
package mentioned in NAMESPACE must also be present in the Imports or
Depends fields.

I'm writing a package. How can make it such that when library(my_package) is called, other packages are loaded as well?

Title should be pretty clear I hope. I'm writing a package called forecasting, with imports for dplyr among other packages. With the imports written in to the DESCRIPTION file, I am able to force these other packages to be installed along with forecasting - is there an equivalent way to do this for the loading of the package? In other words, is there a way that when I load my package with library(forecasting), it automatically also loads dplyr and the other packages?
Thanks
Yes.
Re-read "Writing R Extensions". The Depends: forces both the initial installation as well as the loading of the depended-upon packages.
But these days you want Imports: along with importFrom() in the NAMESPACE file which is more fine-grained.
But first things first: get it working with Depends.
Edit:
Opps you're correct, the documentation I referenced is not a primary source. Perhaps this is better:
From the R documentation:
The ‘Depends’ field gives a comma-separated list of package names which this package depends on. Those packages will be attached before the current package when library or require is called.
and
The ‘Imports’ field lists packages whose namespaces are imported from (as specified in the NAMESPACE file) but which do not need to be attached. Namespaces accessed by the ‘::’ and ‘:::’ operators must be listed here, or in ‘Suggests’ or ‘Enhances’
Original:
From the R packages documentation:
Adding a package dependency here [the DESCRIPTION file] ensures that it’ll be installed. However, it does not mean that it will be attached along with your package (i.e., library(x)). The best practice is to explicitly refer to external functions using the syntax package::function(). This makes it very easy to identify which functions live outside of your package. This is especially useful when you read your code in the future.

R Packages - What is the file 'zzz.R' used for?

I'm planning to condense some of my code into a package, and was looking at the source of a few published packages on CRAN as a guide. I notice many packages include the file R\zzz.R, so I presume there must be some convention surrounding this.
However, I cannot find any mention of zzz.R in the official Writing R Extensions guide. What is this file for, and do I need to include one in my package? Why is it named the way it is - why not zzzz.R?
It's a file where one usually puts actions on load of the package. It is tradition/convention that it's called zzz.R and could be called anything.R
You only need to include this if you want you package to do something out of the ordinary when it loads. Keep looking at what people put in there and you'll begin to get a sense of what they're used for.
This zzz.R file was also mentioned by Hadley Wickham in his book "R packages", at the bottom of "When you do need side-effects" section.
https://r-pkgs.org/Code.html#when-you-do-need-side-effects
If you use .onLoad(), consider using .onUnload() to clean up any side effects. By convention, .onLoad() and friends are usually saved in a file called R/zzz.R. (Note that .First.lib() and .Last.lib() are old versions of .onLoad() and .onUnload() and should no longer be used.)

Resources