Use `[` method from data.table package in package development - r

We are creating a package where one of our functions uses functions of the data.table package.
Instead of importing entire packages through our roxygen header, we try to use :: as much as possible in our code.
For a function, this is easy. For example:
data.table::setkey(our_data_1, our_variable)
Yet, we do not know how to do this for a method. For example:
our_data_3 <- our_data_1[our_data_2, roll = "nearest"]
where [ has a specific method for data.tables, which is indicated by:
methods(`[`)
I have tried multiple approaches. Multiple combinations, using #importFrom, failed. For example, adding the following line to our roxygen header...
#importFrom data.table `[.data.table`
...returned the following when running devtools::document():
Warning message:
object ‘[.data.table’ is not exported by 'namespace:data.table'
I have also tried things like [.data.table within our code, but those failed as well...
Importing the entire data.table package in our roxygen header worked (#import data.table), but this is not preferred since we want to refer to the package of each function within our code (or at least use #importFrom).
Is there a way to use the [ method of data.table within the code of a function without importing the entire data.table package? Or is it at least possible to only import the method, for example through using #importFrom in our roxygen header?
Thank you in advance!

There is no need to import S3 methods, they are automatically dispatched by class of an object.
In case of [ data.table method, there is a trick which we use to ensure that data.table passed to a library that expects data.frame, will be handled properly, as a data.frame. This handling is decided based on NAMESPACE file. If you don't import data.table in NAMESPACE then data.table method assumes you want to use it as data.frame.
You can state your intent explicitly by using extra variable .datatable.aware=TRUE in any of you R script files.
You should read Importing data.table vignette where this is well described.
I also put example package which you can run and debug from there if for some reason your code will still not work: https://gitlab.com/jangorecki/useDTmethod

I think you don't need to import S3 method or to use :: like we do on functions.
In my opinion you just need to add data.table as a dependency in DESCRIPTION and it should be working.
R will know that you are applying [ to a data.table object and will use the correct method.

Related

How can I use data.table in a package without importing all functions?

I'm building an R package in which I would like to use dtplyr to perform various bits of data manipulation. My issue is that dtplyr seems to only work if I import the whole of data.table (i.e. using the roxygen #' #import data.table). Without this I get errors like:
Error in .(x = sum(x), y = sum(y), :
could not find function "."
If I can solve this problem by only importing certain functions from data.table that would be great, but there seems to be no function .() in the package. My knowledge of data.table is limited, but I can only assume it uses .() to edit parsed code (similar to the base R bquote()), but that dtplyr for some reason needs data.table to be loaded for this to work.
I've tried various things such as withr::with_package("data.table", code) and requireNamespace("data.table"), but so far importing the whole package is the only thing that seems to work. This is not a viable solution because it completely ruins the well-maintained namespace in the package I'm working on by importing so many functions from data.table.
NB, this package houses a project which will be worked on by many other analysts well into the future. While simply writing data.table code may be preferable in terms of performance and general good-practice, using dtplyr to translate dplyr code gives a boost in readability and ease-of-use that is far more important in this context.
The (documented) solution I found is to set .datatable.aware <- TRUE somewhere in the package source code. According to the documentation, if you're using data.table in a package without importing the whole thing, you should do this so that [.data.table() does not revert to calling [.data.frame(). From the docs:
...please define .datatable.aware = TRUE anywhere in your R source code (no need to export). This tells data.table that you as a package developer have designed your code to intentionally rely on data.table functionality even though it may not be obvious from inspecting your NAMESPACE file.

import all the functions of a package except one when building a package

I'm building an R package (mypackage) that imports data.table and another package (let's call it myotherpackage).
Imports: data.table, myotherpackage is in the DESCRIPTION file of mypackage.
myotherpackage imports dplyr, which has several functions named like the data.table functions, so I get warnings like this everytime I load mypackage:
Warning: replacing previous import ‘data.table::first’ by ‘dplyr::first’ when loading ‘mypackage’
Is there a way to import all the functions of data.table except "first" for example? I'd then use data.table::first in the code if I need to use it.
Or is there a better way to handle it? I'm trying to avoid the warning every time someones imports the package. Thank you!
The NAMESPACE file is somewhat flexible here, as described in Writing R Extensions.
The two main import directives are:
import(PACKAGE)
which imports all objects in the namespace into your package. The second option is to do specific imports using:
importFrom(PACKAGE, foo)
which gives you access to foo() without needing the fully qualified reference PACKAGE::foo().
But these aren't the only two options. You can also use the except argument to exclude just a handful of imports:
import(PACKAGE, except=c(foo,bar))
which gives you everything from PACKAGE's namespace but foo() and bar(). This is useful - as in your case - for avoiding conflicts.
For roxygen, great catch on figuring out that you can do:
#' #rawNamespace import(PACKAGE, except = foo)
to pass a raw NAMESPACE directive through roxygen.

Calling an operator from a namespace in R

I am developing a package, which depends on zoo (listed in Imports, not Depends in DESCRIPTION).
In a function, I need to subset two-dimensional zoo object by [ operator. However, as long as zoo package is not loaded, R uses base [, which returns numeric instead of zoo.
A standard solution would be to use a namespace of zoo (like zoo::`[.zoo`()). However, when I try to execute this in the function R throws an error Error: '[.zoo' is not an exported object from 'namespace:zoo', so I conclude that this operator is not exported in zoo namespace (even though I could see it in https://github.com/rforge/zoo/blob/master/pkg/zoo/NAMESPACE).
Solution 1: It is possible to use ::: operator to use a non-exported function from a package. Probably it is not a best practice.
Solution 2: I can create a new zoo using coredata and index, which are explicitly exported, i.e.:
zoo_new <- zoo::zoo(x = zoo::coredata(zoo_old), order.by = zoo::index(zoo_old))
which is not very elegant.
Solution 3: Move zoo from Imports to Depends in DESCRIPTION. Also it is not the best practice.
I think you are right, using ::: is not best practice and not allowed for CRAN packages. According to this post http://kbroman.org/pkg_primer/pages/depends.html you should either contact the authors of zoo or use the code of their function directly inside your package.

R with roxygen2: How to use a single function from another package?

I'm creating an R package that will use a single function from plyr. According to this roxygen2 vignette:
If you are using just a few functions from another package, the
recommended option is to note the package name in the Imports: field
of the DESCRIPTION file and call the function(s) explicitly using ::,
e.g., pkg::fun().
That sounds good. I'm using plyr::ldply() - the full call with :: - so I list plyr in Imports: in my DESCRIPTION file. However, when I use devtools::check() I get this:
* checking dependencies in R code ... NOTE
All declared Imports should be used:
‘plyr’
All declared Imports should be used.
Why do I get this note?
I am able to avoid the note by adding #importFrom dplyr ldply in the file that is using plyr, but then I end but having ldply in my package namespace. Which I do not want, and should not need as I am using plyr::ldply() the single time I use the function.
Any pointers would be appreciated!
(This question might be relevant.)
If ldply() is important for your package's functionality, then you do want it in your package namespace. That is the point of namespace imports. Functions that you need, should be in the package namespace because this is where R will look first for the definition of functions, before then traversing the base namespace and the attached packages. It means that no matter what other packages are loaded or unloaded, attached or unattached, your package will always have access to that function. In such cases, use:
#importFrom plyr ldply
And you can just refer to ldply() without the plyr:: prefix just as if it were another function in your package.
If ldply() is not so important - perhaps it is called only once in a not commonly used function - then, Writing R Extensions 1.5.1 gives the following advice:
If a package only needs a few objects from another package it can use a fully qualified variable reference in the code instead of a formal import. A fully qualified reference to the function f in package foo is of the form foo::f. This is slightly less efficient than a formal import and also loses the advantage of recording all dependencies in the NAMESPACE file (but they still need to be recorded in the DESCRIPTION file). Evaluating foo::f will cause package foo to be loaded, but not attached, if it was not loaded already—this can be an advantage in delaying the loading of a rarely used package.
(I think this advice is actually a little outdated because it is implying more separation between DESCRIPTION and NAMESPACE than currently exists.) It implies you should use #import plyr and refer to the function as plyr::ldply(). But in reality, it's actually suggesting something like putting plyr in the Suggests field of DESCRIPTION, which isn't exactly accommodated by roxygen2 markup nor exactly compliant with R CMD check.
In sum, the official line is that Hadley's advice (which you are quoting) is only preferred for rarely used functions from rarely used packages (and/or packages that take a considerable amount of time to load). Otherwise, just do #importFrom like WRE advises:
Using importFrom selectively rather than import is good practice and recommended notably when importing from packages with more than a dozen exports.

Making a package in R that depends on data.table

I have to make an R package that depends on the package data.table. However, if I would do a function such as the next one in the package
randomdt <- function(){
dt <- data.table(random = rnorm(10))
dt[dt$random > 0]
}
the function [ will use the method for data.frame not for data.table and therefore the error
Error in `[.data.frame`(x, i) : undefined columns selected
will appear. Usually this would be solved by using get('[.data.table') or similar method (package::function is the simplest) but that appears not to work. After all, [ is a primitive function and I don't know how the methods to it work.
So, how can I call the data.table [ function from my package?
Updated based on some feedback from MichaelChirico and comments by Arun and Soheil.
Roughly speaking, there's two approaches you might consider. The first is building the dependency into your package itself, while the second is including lines in your R code that test for the presence of data.table (and possibly even install it automatically if it is not found).
The data.table FAQ specifically addresses this in 6.9, and states that you can ensure that data.table is appropriately loaded by your package by:
Either i) include data.table in the Depends: field of your DESCRIPTION file, or ii) include data.table in the Imports: field of your DESCRIPTION file AND import(data.table) in your NAMESPACE file.
As noted in the comments, this is common R behavior that is in numerous packages.
An alternative approach is to create specific lines of code which test for and import the required packages as part of your code. This is, I would contend, not the ideal solution given the elegance of using the option provided above. However, it is technically possible.
A simple way of doing this would be to use either require or library to check for the existence of data.table, with an error thrown if it could not be attached. You could even use a simple set of conditional statements to run install.packages to install what you need if loading them fails.
Yihui Xie (of knitr fame) has a great post about the difference between library and require here and makes a strong case for just using library in cases where the package is absolutely essential for the upcoming code.

Resources