Use data.table in functions/packages (With roxygen) - r

I am quite new to R but it seems, this question is closely related to the following post 1, 2, 3 and a bit different topic 4. Unfortunately, I have not enough reputation to comment right there. My problem is that after going through all the suggestions there, the code still does not work:
I included "Depends" in the description file
I tried the second method including a change of NAMESPACE (Not reproducable)
I created a example package here containing a very small part of the code which showed a bit different error ("J" not found in routes[J(lat1, lng1, lat2, lng2), .I, roll = "nearest", by = .EACHI] instead of 'lat1' not found in routes[order(lat1, lng1, lat2, lng2, time)])
I tested all scripts using the console and R-scripts. There, the code ran without problems.
Thank you very much for your support!
Edit: #Roland
You are right. Roxygen overwrites the namespace. You have to include #' #import data.table to the function. Do you understand, why only inserting Depends: data.table in the DESCRIPTION file does not work? This might be a useful hint in the documentation or did I miss it?
It was missleading that changing to routes <- routes[order("lat1", "lng1", "lat2", "lng2", "time")] helped at least a bit as this line was suddenly no problem any more. Is it correct, that in this case data.frame order is used? I will see how far I get now. I will let you know the final result...

Answering your questions (after edit).
Quoting R exts manual:
Almost always packages mentioned in ‘Depends’ should also be imported from in the NAMESPACE file: this ensures that any needed parts of those packages are available when some other package imports the current package.
So you still should have import in NAMESPACE despite the fact if you depends or import data.table.
The order call doesn't seems to be what you expect, try the following:
order("lat1", "lng1", "lat2", "lng2", "time")
library(data.table)
data.table(a=2:1,b=1:2)[order("a","b")]
In case of issues I recommend to start debugging by writing unit test for your expected results. The most basic way to put unit tests in package is just plain R script in tests directory having stopifnot(...) call. Be aware you need to library/require your package at the start of the script.

This is more in addition to the answers above: I found this to be really useful...
From the docs [Hadley-description](http://r-pkgs.had.co.nz/description.html und)
Imports packages listed here must be present for your package to
work. In fact, any time your package is installed, those packages
will, if not already present, be installed on your computer
(devtools::load_all() also checks that the packages are installed).
Adding a package dependency here ensures that it’ll be installed.
However, it does not mean that it will be attached along with your
package (i.e., library(x)). The best practice is to explicitly refer
to external functions using the syntax package::function(). This
makes it very easy to identify which functions live outside of your
package. This is especially useful when you read your code in the
future.
If you use a lot of functions from other packages this is rather
verbose. There’s also a minor performance penalty associated with
:: (on the order of 5$\mu$s, so it will only matter if you call the
function millions of times).
From the docs Hadley-namespace
NAMESPACE also controls which external functions can be used by your
package without having to use ::. It’s confusing that both
DESCRIPTION (through the Imports field) and NAMESPACE (through import
directives) seem to be involved in imports. This is just an
unfortunate choice of names. The Imports field really has nothing to
do with functions imported into the namespace: it just makes sure the
package is installed when your package is. It doesn’t make functions
available. You need to import functions in exactly the same way
regardless of whether or not the package is attached.
... this is what I recommend: list the package in DESCRIPTION so that it’s
installed, then always refer to it explicitly with pkg::fun().
Unless there is a strong reason not to, it’s better to be explicit.
It’s a little more work to write, but a lot easier to read when you
come back to the code in the future. The converse is not true. Every
package mentioned in NAMESPACE must also be present in the Imports or
Depends fields.

Related

Run-time vs develop-time dependencies in R

I'm developing a package (golem) in R, and it returns a NOTE about excess package in an Import (DESCRIPTION):
checking package dependencies … NOTE
Imports includes 34 non-default packages.
Importing from so many packages makes the package vulnerable to any of
them becoming unavailable. Move as many as possible to Suggests and
use conditionally.
I have allocated some packages in Suggests (DESCRIPTION), like this:
usethis::use_package(package = "ggplot2", type = "Suggests")
usethis::use_package(package = "MASS", type = "Suggests")
I would like to know :
What is the difference between Imports (run-time) vs Suggests (develop-time) and if the latter has anything to do with the term "compile time" of other programming languages.
How do I know a package is needed by the user at runtime? Is there any universal rule for this (like a phrase to help you know)? And for Suggests?
In R, packages listed in the Imports clause of the DESCRIPTION file must be available or your package won't load. Normally they will all be loaded when your package is loaded, though it's possible to delay that by not importing anything, just using :: notation to access them.
Packages listed in the Suggests clause don't need to be available, and won't be automatically loaded. To access their functions, you normally call requireNamespace() to find out if the package is available, and if so use :: for access. If it is not available, your package should fail gracefully in whatever the user was trying to do, letting them know that they need to install the missing package if they want the task to succeed.
These aren't really "run-time" versus "develop-time" differences. It's all run-time.
There are two things in R that might be called "compile-time" in other languages. The best match is installing your package. That configures it to the particular R version and platform it is running on. R also has a "just-in-time" compiler that optimizes functions, but other than a bit of a speed increase that is pretty much invisible to the user.
I think #r2evans answered your second question clearly in a comment: the user needs a package to use functions that use that package. If some of your functions that use it are unlikely to be used by most users, use Suggests, and add the test.

Undo `usethis::use_XYZ()`

I'm developing a package in RStudio with usethis, trying to make use of best practices. Previously, I had run usethis::use_tidy_eval(). Now, I'm using data.table, and set this up by running usethis::use_data_table(). I get a warning,
Warning message:
replacing previous import ‘data.table:::=’ by ‘rlang:::=’ when loading ‘breakdown’
because the NAMESPACE contains the the two lines:
importFrom(rlang,":=")
importFrom(data.table,":=")
It turns out I no longer need usethis::use_tidy_eval(), so I'd like to revert it and in doing so get rid of the warning.
How can I undo whatever usethis helper functions do? Must I edit the NAMESPACE myself? How do I know what else was modified by usethis::use_tidy_eval()? What about undoing usethis::use_pipe()?
Unless you made a Git commit before and after running that code, there's probably not an extremely easy way. The two options I'd consider would be:
Read the source code of the function. This can require some hopping around to find definitions of helper functions, but use_tidy_eval looks like it:
adds roxygen to Suggests in DESCRIPTION
adds rlang to Imports in DESCRIPTION
adds the template R file tidy-eval.R
asks you to run document() which is what actually updates the NAMESPACE. You can find the lines added by looking for the importFrom roxygen tags in the template file.
To undo this, you should just be able to delete all of the above. However, you need to be a bit careful - e.g. if you import functions from rlang outside of tidy-eval.R, removing it from DESCRIPTION might prevent installation. Hopefully any such issues would be revealed by devtools::check() if they do happen.
The other option would be to get an older version of your package, run use_tidy_eval() and document() and then compare the changes. That will be more comprehensive and might catch things I missed above, but the same caveats about not being able to necessarily just reverse everything still apply.
Same strategy for use_pipe().
Sidenote: there are probably ways to adequately qualify different uses of := so that both can coexist in your package, in case that would be preferable.

Dealing with intentional NAMESPACE conflicts

I'm writing a package that imports (and uses) both SparkR::sql and dbplyr::sql. The other relevant questions involve unintentional collisions brought on my haphazard importing.
In my NAMESPACE I have:
importFrom(dbplyr, sql)
importFrom(SparkR, sql)
Both of these functions are used in the script and, being aware of the conflict, I'm sure to always prefix the package name:
dbplyr::sql(...)
SparkR::sql(...)
Nevertheless, I get an import replacement warning when building/checking the package:
Warning: replacing previous import ‘dbplyr::sql’ by ‘SparkR::sql’ when loading ‘my_pkg’
What I see in Writing R Extensions seems to be the following:
If a package only needs a few objects from another package it can use a fully qualified variable reference [foo::f] in the code instead of a formal import... This is slightly less efficient than a formal import and also loses the advantage of recording all dependencies in the NAMESPACE file (but they still need to be recorded in the DESCRIPTION file). Evaluating foo::f will cause package foo to be loaded, but not attached, if it was not loaded already—this can be an advantage in delaying the loading of a rarely used package.
Am I right that the takeaway of this / best practice is to:
Pick which function is used most often and add that to importFrom
Remove the "less-frequent" function from importFrom but keep that package in DESCRIPTION
Just use :: (perhaps preceded by require()) for the "less-frequent" function
I have always followed this advice:
It’s common for packages to be listed in Imports in DESCRIPTION, but not in NAMESPACE. In fact, this is what I recommend: list the package in DESCRIPTION so that it’s installed, then always refer to it explicitly with pkg::fun().
In your case:
Remove both importFrom
Keep both packages in Imports:
Use dbplyr::sql and SparkR::sql
My main motivation here is consistency: Even without any name clashes I want to always use the full name to make it clear when reading the code where some function comes from. If I do not use importForm and forget about using the full name at one place, R CMD ckeck will catch that. I value this code clarity higher than the (perceived) advantage of collecting dependencies in two places: DESCRIPTION and (more explicitly) NAMESPACE.

why roxygen2 does not automatically update "Imports" in DESCRIPTION file?

I am trying to follow closely #hadley's book to learn best practices in writing R packages. And I was thrilled to read these lines about the philosophy of the book:
anything that can be automated, should be automated. Do as little as
possible by hand. Do as much as possible with functions.
So when I was reading about dependencies and the (sort of) confusing differences between import directives in the NAMESPACE file and the "Imports:" field in the DESCRIPTION file, I was hoping that roxygen2 would automatically handle both of them. After all
Every package mentioned in NAMESPACE must also be present in the
Imports or Depends fields.
I was hoping that roxygen2 would take every #import in my functions and make sure it is included in the DESCRIPTION file. But it does not do that automatically.
So I either have to add it manually to the DESCRIPTION file or almost manually using devtools::use_package.
Looking around for an answer, I found this question in SO, where #hadley confirms in the comments that
Currently, the namespace roclet will modify NAMESPACE but not
DESCRIPTION
and other posts (e.g. here or here) where collate_roclet is discussed, but "This only matters if your code has side-effects; most commonly because you’re using S4".
I wonder:
the reason that DESCRIPTION is not automatically updated (sort of contradicting the aforementioned philosophy, which is presumably shared by roxygen2) and
If someone has already crafted a way to do it
I have written a little R package for that task:
https://github.com/markusdumke/pkghelper
It scans the R Code and NAMESPACE for packages in use and adds them to the Imports section.
The namespace_roclet edits the NAMESPACE file based on the tags added in the script before the function. As there are three types of dependencies (Depends, Imports, and Suggests), a similar method as used by the namespace_roclet would require three different tags (notice Imports should be a different one, to differentiate it from the packages to attach in NAMESPACE).
If you are willing to take a semi-automated process, you could identify the packages you have used and add the missing ones to DESCRIPTION, in the adequate sections.
library(reinstallr)
package.dir <- getwd()
base_path <- normalizePath(package.dir)
files <- list.files(file.path(base_path, "R"), full.names = TRUE)
packages <- unique(reinstallr:::scan_for_packages(files)$package)
packages
Regarding the two bullets you wonder about at the bottom:
Updates to the DESCRIPTION file could be further automated with additional roclets, however already >4 years ago such a pull request was deferred:
https://github.com/klutometis/roxygen/pull/76
I have to assume that the guys would indeed rather have you use the devtools package for updating the DESCRIPTION file, instead of adding this to roxygen2. So in that sense, devtools would be the first available choice

Importing snowfall into custom R package

I'm developing an R package which needs to use parallelisation as made available by the snowfall package. snowfall doesn't seem to import the same was as other packages like ggplot2, data.table, etc. I've included snowfall, rlecuyer, and snow in the description file, name space file, and as an import argument in the function itself. When I try to access this function, I get the following error:
Error in sfInit() : could not find function "setDefaultClusterOptions"
The sfInit function seems to have a nostart / nostop argument which it says is related to nested usage of sfInit but that doesn't seem to do the trick for me either.
The actual code itself uses an sfInit (which is where I get the error), some sfExports and sfLibrarys, and an sfLapply.
Possible solution:
It seems to work if I move snow from the import section to the depends section in the Desciption file. I don't know why though.
When you include a package in 'Depends' when one attaches your package they also attach the package on which your package Depends to their namespace.
This and other differences between Depends and Imports is explained well in other questions on this site.
If you look at {snowfall}'s DESCRIPTION you'll see that it Depends on {snow}. It is plausible that the authors of snowfall know something we don't and that {snow} has to be attached to the global search path in order to work. In fact that is the top caveat in the top answer to the question I linked above...
... if your package relies on a package A which itself "Depends" on
another package B, your package will likely need to attach A with a
"Depends directive.
This is because the functions in package A were written with the
expectation that package B and its functions would be attached to the
search() path.
So, in your case, it just so happens that all {snowfall} wants is {snow} and you happened to provide it. However, it appears the more correct behavior may be for you to Depend on {snowfall} directly.
setDefaultClusterOptions is a function from the snow package. You need to import that too.

Resources