I recently ran into a situation in which existing R code broke due the introduction of the dplyr library. Specifically, the lag function from the stats package, is being replaced by dplyr::lag. The problem is previously documented here, however no work around is provided. Research into R namespaces and environments leads to 2 possible solutions, neither very robust in my opinion:
Make sure that package:stats appears first in the search() path so that lag resolves as the function in the stats package.
Change all references of lag in my code to stats::lag
My question is whether either of these other solutions are possible:
Loading the dplyr package in a way to force it to be in a "private" namespace in which its objects can only be accessed through the :: operator.
A directive at library loading to force lag to resolve as stats::lag. This could be done either by removing dplyr::lag or overriding the search path (similar to the C++ using namespace::function directive.)
you should consider library(conflicted) as it's designed for exactly this problem.
https://cran.r-project.org/web/packages/conflicted/index.html
putting conflicted::conflict_prefer(name = "lag", winner = "stats") after you load your packages ensures that anytime the function lag() is called in your script, it will use the stats function by default.
Related
Currently I have in my package DESCRIPTION, a dependency on dbplyr:
Imports:
dbplyr,
dplyr
dbplyr is useful almost solely because of the S3 methods it defines: https://github.com/tidyverse/dbplyr/blob/main/NAMESPACE. The actual functions you call to use dbplyr are almost entirely from dplyr.
By putting dbplyr in my Imports, it should automatically get loaded, but not attached, which should be enough to register its S3 methods: https://r-pkgs.org/dependencies-mindset-background.html#sec-dependencies-attach-vs-load.
This seems to work fine, but whenever I R CMD check, it tells me:
N checking dependencies in R code (10.8s)
Namespace in Imports field not imported from: ‘dbplyr’
All declared Imports should be used.
Firstly, why does R CMD check even check this, considering that it often makes sense to load packages without importing them. Secondly, how am I supposed to satisfy R CMD check without loading things into my namespace that I don't want or need?
I am pretty sure two of your assumptions are false.
First, putting Imports: dbplyr into your DESCRIPTION file won't load it, so its methods won't be loaded from that alone. Basically the Imports field in the DESCRIPTION file just guarantees that dbplyr is available to be loaded when requested. If you import something via the NAMESPACE file, that will cause it to be loaded. If you evaluate dbplyr::something that will cause it to be loaded. Executing loadNamespace("dbplyr") is another way, and there are a few others. You may also load some other package that loads it.
Second, I think you have misinterpreted the error message. It isn't saying that you loaded it without importing it (though it would complain about that too), it is saying that it can't detect any use of it in your package, so maybe it shouldn't be a requirement for installing your package.
Unfortunately, the code to detect uses is fallible, so it sometimes misses uses. Examples I've heard about are:
if the package is only used in the default value for a function argument. This has been fixed in R-devel.
if the package is only used during the build to construct some object, e.g. code like someclass <- R6::R6Class( ... ) needs R6, but the check code won't see it because it looks at someclass, not at the source code that created it.
if the use of the package is hidden by specifying the name of the package in a character variable.
if the need for the package is indirect, e.g. you need to use ggplot2::geom_hex. That needs the hexbin package, but ggplot2 only declares it as "Suggested".
These examples come from this discussion: https://github.com/hadley/r-pkgs/issues/828#issuecomment-1421353457 .
The recommended workaround there is to create an object that refers to the imported package explicitly, e.g. putting the line
dummy_r6 <- function() R6::R6Class
into your package is enough to suppress the note without actually loading R6. (It will be loaded if you ever call this function.)
However, your requirement is stronger: you do need to make sure dbplyr is loaded if you want its methods to be used. I'd put something in your .onLoad() function that triggers the load. For example,
.onLoad <- function(lib, pkg) {
# Make sure the dbplyr methods are loaded
loadNamespace("dbplyr")
}
EDITED TO ADD: As pointed out in the comments, there's a bug in the check code that means it won't detect this as being a use of dbplyr. You really need to do both things, e.g.
.onLoad <- function(lib, pkg) {
# Make sure the dbplyr methods are loaded
loadNamespace("dbplyr")
# Work around bug in code checking in R 4.2.2 for use of packages
dummy <- function() dbplyr::across_apply_fns
}
The function used in the dummy construction is arbitrary; it probably doesn't even need to exist, but I chose one that does.
I'm building an R package in which I would like to use dtplyr to perform various bits of data manipulation. My issue is that dtplyr seems to only work if I import the whole of data.table (i.e. using the roxygen #' #import data.table). Without this I get errors like:
Error in .(x = sum(x), y = sum(y), :
could not find function "."
If I can solve this problem by only importing certain functions from data.table that would be great, but there seems to be no function .() in the package. My knowledge of data.table is limited, but I can only assume it uses .() to edit parsed code (similar to the base R bquote()), but that dtplyr for some reason needs data.table to be loaded for this to work.
I've tried various things such as withr::with_package("data.table", code) and requireNamespace("data.table"), but so far importing the whole package is the only thing that seems to work. This is not a viable solution because it completely ruins the well-maintained namespace in the package I'm working on by importing so many functions from data.table.
NB, this package houses a project which will be worked on by many other analysts well into the future. While simply writing data.table code may be preferable in terms of performance and general good-practice, using dtplyr to translate dplyr code gives a boost in readability and ease-of-use that is far more important in this context.
The (documented) solution I found is to set .datatable.aware <- TRUE somewhere in the package source code. According to the documentation, if you're using data.table in a package without importing the whole thing, you should do this so that [.data.table() does not revert to calling [.data.frame(). From the docs:
...please define .datatable.aware = TRUE anywhere in your R source code (no need to export). This tells data.table that you as a package developer have designed your code to intentionally rely on data.table functionality even though it may not be obvious from inspecting your NAMESPACE file.
Writing an R-package I use name spaces to use functions from existing packages, e.g. raster::writeRaster(...).
However, I am wondering if functions from the base package have also be used like this, e.g. base::sum(...). This might end up in very confusing code parts:
foo[base::which(base::sapply(bar, function())]
No you don't need to reference base packages like this. You only need to reference non-base packages to ensure they are loaded into the function environment when functions from your package are run, either by using :: or #import in the Roxegen notes at the top of your script. See why you don't need to reference base packages below:
http://adv-r.had.co.nz/Environments.html
"Package namespaces keep packages independent. For example, if package A uses the base mean() function, what happens if package B creates its own mean() function? Namespaces ensure that package A continues to use the base mean() function, and that package A is not affected by package B (unless explicitly asked for)."(Hadley Wickham)
The only time you need to reference base:: is if the namespace for your package contains a package that has an alternative function of the same name.
Is there a way to exclude a function from an imported package. For example, I use almost all of dplyr but recently, they added a new function called recode that overwrites a function that I have from a proprietary package (that I can't make changes to).
Is there a way to exclude the s3 function from the namespace so it only sees the function from my package and ignores the one from dplyr.
I'm aware that we can import one-off functions from a package with ease, but in this case, I'm looking to exclude - just one.
R 3.3.0 or later now support "import all but x,y,z from foo" statements:
\item The \code{import()} namespace directive now accepts an
argument \code{except} which names symbols to exclude from the
imports. The \code{except} expression should evaluate to a
character vector (after substituting symbols for strings). See
Writing R Extensions.
Methinks that is exactly what you want here, and want most people want who do not intend to have dplyr clobber over functions from the stats package included with R such as filter or lag.
Edited based on later discussion in comments:
Example usage example in file NAMESPACE per Section 1.5.1 of WRE is as follows:
import(dplyr, except = c(recode, lag, filter))
The other alternative would be to use
recode <- SILLY_PROPRIETARY_PACKAGENAME::recode
at the head of your code (with an explanatory comment) to create a copy of recode in the global workspace (which should then mask the version from dplyr). This could prevent future confusion when you hand your code to someone who has the stock dplyr, rather than your personally hacked version, installed.
Use the Hack-R version of dplyr instead of the Hadley version. Given that I created this in the past 2 minutes, you could also easily make your own version.
require(devtools)
install_github("hack-r/dplyr")
require(dplyr)
All I did was fork it, open the project in RStudio via version control, remove recode, commit, and push it back to my GitHub.
It looks like library() gained this functionality in version 3.6, in the form of the exclude and include.only parameters.
See https://developer.r-project.org/Blog/public/2019/03/19/managing-search-path-conflicts/
library(digest, exclude="sha1")
digest(letters)
#> [1] "5cab7c8e9f3d7042d6146f98602c88d2"
sha1(letters)
#> Error in sha1(letters): could not find function "sha1"
or:
library(digest, include.only="sha1")
digest(letters)
#> Error in digest(letters): could not find function "digest"
sha1(letters)
#> [1] "005ae317c931561a05b53fcfa860d7ac61dfec85"
As compared to how it would appear without either of the options:
library(digest)
digest(letters)
#> [1] "5cab7c8e9f3d7042d6146f98602c88d2"
sha1(letters)
#> [1] "005ae317c931561a05b53fcfa860d7ac61dfec85"
Very neat!
(R.4.0.3 was used for the reprexes above)
I'm creating an R package that will use a single function from plyr. According to this roxygen2 vignette:
If you are using just a few functions from another package, the
recommended option is to note the package name in the Imports: field
of the DESCRIPTION file and call the function(s) explicitly using ::,
e.g., pkg::fun().
That sounds good. I'm using plyr::ldply() - the full call with :: - so I list plyr in Imports: in my DESCRIPTION file. However, when I use devtools::check() I get this:
* checking dependencies in R code ... NOTE
All declared Imports should be used:
‘plyr’
All declared Imports should be used.
Why do I get this note?
I am able to avoid the note by adding #importFrom dplyr ldply in the file that is using plyr, but then I end but having ldply in my package namespace. Which I do not want, and should not need as I am using plyr::ldply() the single time I use the function.
Any pointers would be appreciated!
(This question might be relevant.)
If ldply() is important for your package's functionality, then you do want it in your package namespace. That is the point of namespace imports. Functions that you need, should be in the package namespace because this is where R will look first for the definition of functions, before then traversing the base namespace and the attached packages. It means that no matter what other packages are loaded or unloaded, attached or unattached, your package will always have access to that function. In such cases, use:
#importFrom plyr ldply
And you can just refer to ldply() without the plyr:: prefix just as if it were another function in your package.
If ldply() is not so important - perhaps it is called only once in a not commonly used function - then, Writing R Extensions 1.5.1 gives the following advice:
If a package only needs a few objects from another package it can use a fully qualified variable reference in the code instead of a formal import. A fully qualified reference to the function f in package foo is of the form foo::f. This is slightly less efficient than a formal import and also loses the advantage of recording all dependencies in the NAMESPACE file (but they still need to be recorded in the DESCRIPTION file). Evaluating foo::f will cause package foo to be loaded, but not attached, if it was not loaded already—this can be an advantage in delaying the loading of a rarely used package.
(I think this advice is actually a little outdated because it is implying more separation between DESCRIPTION and NAMESPACE than currently exists.) It implies you should use #import plyr and refer to the function as plyr::ldply(). But in reality, it's actually suggesting something like putting plyr in the Suggests field of DESCRIPTION, which isn't exactly accommodated by roxygen2 markup nor exactly compliant with R CMD check.
In sum, the official line is that Hadley's advice (which you are quoting) is only preferred for rarely used functions from rarely used packages (and/or packages that take a considerable amount of time to load). Otherwise, just do #importFrom like WRE advises:
Using importFrom selectively rather than import is good practice and recommended notably when importing from packages with more than a dozen exports.