Remove (or exclude) one function from imported package - r

Is there a way to exclude a function from an imported package. For example, I use almost all of dplyr but recently, they added a new function called recode that overwrites a function that I have from a proprietary package (that I can't make changes to).
Is there a way to exclude the s3 function from the namespace so it only sees the function from my package and ignores the one from dplyr.
I'm aware that we can import one-off functions from a package with ease, but in this case, I'm looking to exclude - just one.

R 3.3.0 or later now support "import all but x,y,z from foo" statements:
\item The \code{import()} namespace directive now accepts an
argument \code{except} which names symbols to exclude from the
imports. The \code{except} expression should evaluate to a
character vector (after substituting symbols for strings). See
Writing R Extensions.
Methinks that is exactly what you want here, and want most people want who do not intend to have dplyr clobber over functions from the stats package included with R such as filter or lag.
Edited based on later discussion in comments:
Example usage example in file NAMESPACE per Section 1.5.1 of WRE is as follows:
import(dplyr, except = c(recode, lag, filter))

The other alternative would be to use
recode <- SILLY_PROPRIETARY_PACKAGENAME::recode
at the head of your code (with an explanatory comment) to create a copy of recode in the global workspace (which should then mask the version from dplyr). This could prevent future confusion when you hand your code to someone who has the stock dplyr, rather than your personally hacked version, installed.

Use the Hack-R version of dplyr instead of the Hadley version. Given that I created this in the past 2 minutes, you could also easily make your own version.
require(devtools)
install_github("hack-r/dplyr")
require(dplyr)
All I did was fork it, open the project in RStudio via version control, remove recode, commit, and push it back to my GitHub.

It looks like library() gained this functionality in version 3.6, in the form of the exclude and include.only parameters.
See https://developer.r-project.org/Blog/public/2019/03/19/managing-search-path-conflicts/
library(digest, exclude="sha1")
digest(letters)
#> [1] "5cab7c8e9f3d7042d6146f98602c88d2"
sha1(letters)
#> Error in sha1(letters): could not find function "sha1"
or:
library(digest, include.only="sha1")
digest(letters)
#> Error in digest(letters): could not find function "digest"
sha1(letters)
#> [1] "005ae317c931561a05b53fcfa860d7ac61dfec85"
As compared to how it would appear without either of the options:
library(digest)
digest(letters)
#> [1] "5cab7c8e9f3d7042d6146f98602c88d2"
sha1(letters)
#> [1] "005ae317c931561a05b53fcfa860d7ac61dfec85"
Very neat!
(R.4.0.3 was used for the reprexes above)

Related

Name space of base package needed?

Writing an R-package I use name spaces to use functions from existing packages, e.g. raster::writeRaster(...).
However, I am wondering if functions from the base package have also be used like this, e.g. base::sum(...). This might end up in very confusing code parts:
foo[base::which(base::sapply(bar, function())]
No you don't need to reference base packages like this. You only need to reference non-base packages to ensure they are loaded into the function environment when functions from your package are run, either by using :: or #import in the Roxegen notes at the top of your script. See why you don't need to reference base packages below:
http://adv-r.had.co.nz/Environments.html
"Package namespaces keep packages independent. For example, if package A uses the base mean() function, what happens if package B creates its own mean() function? Namespaces ensure that package A continues to use the base mean() function, and that package A is not affected by package B (unless explicitly asked for)."(Hadley Wickham)
The only time you need to reference base:: is if the namespace for your package contains a package that has an alternative function of the same name.

R with roxygen2: How to use a single function from another package?

I'm creating an R package that will use a single function from plyr. According to this roxygen2 vignette:
If you are using just a few functions from another package, the
recommended option is to note the package name in the Imports: field
of the DESCRIPTION file and call the function(s) explicitly using ::,
e.g., pkg::fun().
That sounds good. I'm using plyr::ldply() - the full call with :: - so I list plyr in Imports: in my DESCRIPTION file. However, when I use devtools::check() I get this:
* checking dependencies in R code ... NOTE
All declared Imports should be used:
‘plyr’
All declared Imports should be used.
Why do I get this note?
I am able to avoid the note by adding #importFrom dplyr ldply in the file that is using plyr, but then I end but having ldply in my package namespace. Which I do not want, and should not need as I am using plyr::ldply() the single time I use the function.
Any pointers would be appreciated!
(This question might be relevant.)
If ldply() is important for your package's functionality, then you do want it in your package namespace. That is the point of namespace imports. Functions that you need, should be in the package namespace because this is where R will look first for the definition of functions, before then traversing the base namespace and the attached packages. It means that no matter what other packages are loaded or unloaded, attached or unattached, your package will always have access to that function. In such cases, use:
#importFrom plyr ldply
And you can just refer to ldply() without the plyr:: prefix just as if it were another function in your package.
If ldply() is not so important - perhaps it is called only once in a not commonly used function - then, Writing R Extensions 1.5.1 gives the following advice:
If a package only needs a few objects from another package it can use a fully qualified variable reference in the code instead of a formal import. A fully qualified reference to the function f in package foo is of the form foo::f. This is slightly less efficient than a formal import and also loses the advantage of recording all dependencies in the NAMESPACE file (but they still need to be recorded in the DESCRIPTION file). Evaluating foo::f will cause package foo to be loaded, but not attached, if it was not loaded already—this can be an advantage in delaying the loading of a rarely used package.
(I think this advice is actually a little outdated because it is implying more separation between DESCRIPTION and NAMESPACE than currently exists.) It implies you should use #import plyr and refer to the function as plyr::ldply(). But in reality, it's actually suggesting something like putting plyr in the Suggests field of DESCRIPTION, which isn't exactly accommodated by roxygen2 markup nor exactly compliant with R CMD check.
In sum, the official line is that Hadley's advice (which you are quoting) is only preferred for rarely used functions from rarely used packages (and/or packages that take a considerable amount of time to load). Otherwise, just do #importFrom like WRE advises:
Using importFrom selectively rather than import is good practice and recommended notably when importing from packages with more than a dozen exports.

Avoiding function name conflicts in R

I recently ran into a situation in which existing R code broke due the introduction of the dplyr library. Specifically, the lag function from the stats package, is being replaced by dplyr::lag. The problem is previously documented here, however no work around is provided. Research into R namespaces and environments leads to 2 possible solutions, neither very robust in my opinion:
Make sure that package:stats appears first in the search() path so that lag resolves as the function in the stats package.
Change all references of lag in my code to stats::lag
My question is whether either of these other solutions are possible:
Loading the dplyr package in a way to force it to be in a "private" namespace in which its objects can only be accessed through the :: operator.
A directive at library loading to force lag to resolve as stats::lag. This could be done either by removing dplyr::lag or overriding the search path (similar to the C++ using namespace::function directive.)
you should consider library(conflicted) as it's designed for exactly this problem.
https://cran.r-project.org/web/packages/conflicted/index.html
putting conflicted::conflict_prefer(name = "lag", winner = "stats") after you load your packages ensures that anytime the function lag() is called in your script, it will use the stats function by default.

Making a package in R that depends on data.table

I have to make an R package that depends on the package data.table. However, if I would do a function such as the next one in the package
randomdt <- function(){
dt <- data.table(random = rnorm(10))
dt[dt$random > 0]
}
the function [ will use the method for data.frame not for data.table and therefore the error
Error in `[.data.frame`(x, i) : undefined columns selected
will appear. Usually this would be solved by using get('[.data.table') or similar method (package::function is the simplest) but that appears not to work. After all, [ is a primitive function and I don't know how the methods to it work.
So, how can I call the data.table [ function from my package?
Updated based on some feedback from MichaelChirico and comments by Arun and Soheil.
Roughly speaking, there's two approaches you might consider. The first is building the dependency into your package itself, while the second is including lines in your R code that test for the presence of data.table (and possibly even install it automatically if it is not found).
The data.table FAQ specifically addresses this in 6.9, and states that you can ensure that data.table is appropriately loaded by your package by:
Either i) include data.table in the Depends: field of your DESCRIPTION file, or ii) include data.table in the Imports: field of your DESCRIPTION file AND import(data.table) in your NAMESPACE file.
As noted in the comments, this is common R behavior that is in numerous packages.
An alternative approach is to create specific lines of code which test for and import the required packages as part of your code. This is, I would contend, not the ideal solution given the elegance of using the option provided above. However, it is technically possible.
A simple way of doing this would be to use either require or library to check for the existence of data.table, with an error thrown if it could not be attached. You could even use a simple set of conditional statements to run install.packages to install what you need if loading them fails.
Yihui Xie (of knitr fame) has a great post about the difference between library and require here and makes a strong case for just using library in cases where the package is absolutely essential for the upcoming code.

How to tell what packages you have used in R

I have a very long R script with many if statements and exception cases. As i've been going, if been importing and testing libraries as I've gone and haven't really documented them very well. The problem is that if I run this from a clean installation, i'm not sure which statements the script will run, and so which libraries will be needed.
My question is: Is there any R function to test which libraries are being used in a script?
EDIT: I have not used all of the libraries that have been installed so print(sessionInfo()) won't be useful but and I just want to start the script with an install.packages function
I found the list.functions.in.file() function from NCmisc (install.packages("NCmisc")) quite helpful for this:
list.functions.in.file(filename, alphabetic = TRUE)
For more info see this link: https://rdrr.io/cran/NCmisc/man/list.functions.in.file.html
The ‘renv’ package provides a robust solution for this nowadays via renv::dependencies.
renv::dependencies performs proper static analysis and reliably finds package dependencies even when they are declared in non-standard ways (e.g. via box::use) or via a package DESCRIPTION file rather than via library or ::.
As a quick hack I’ve previously (pre-‘renv’) used a shell script for this:
#!/usr/bin/env bash
source_files=($(git ls-files '*.R'))
grep -hE '\b(require|library)\([\.a-zA-Z0-9]*\)' "${source_files[#]}" | \
sed '/^[[:space:]]*#/d' | \
sed -E 's/.*\(([\.a-zA-Z0-9]*)\).*/\1/' | \
sort -uf \
> DEPENDS
This uses Git to collect all R files under version control in a project. Since you should be using version control anyway this is normally a good solution (although you may want to adapt the version control system). For the few cases where the project isn’t under version control you should (1) put it under version control. Or, failing that, (2) use find . -regex '.*\.[rR]' instead of git ls-files '*.R'.
And it produces a DEPENDS file containing a very simple list of dependencies.
It only finds direct calls to library and require though – if you wrap those calls, the script won’t work.
Based on everyone's response, especially eh21's suggestion of the NCmisc package, I put together a little function that outputs a list of packages used in all your R scripts in a directory, as well as their frequencies.
library(NCmisc)
library(stringr)
library(dplyr)
checkPacks<-function(path){
## get all R files in your directory
## by the way, extract R code from Rmd: http://felixfan.github.io/extract-r-code/
files<-list.files(path)[str_detect(list.files(path), ".R$")]
## extract all functions and which package they are from
## using NCmisc::list.functions.in.file
funs<-unlist(lapply(paste0(path, "/", files), list.functions.in.file))
packs<-funs %>% names()
## "character" functions such as reactive objects in Shiny
characters<-packs[str_detect(packs, "^character")]
## user defined functions in the global environment
globals<-packs[str_detect(packs, "^.GlobalEnv")]
## functions that are in multiple packages' namespaces
multipackages<-packs[str_detect(packs, ", ")]
## get just the unique package names from multipackages
mpackages<-multipackages %>%
str_extract_all(., "[a-zA-Z0-9]+") %>%
unlist() %>%
unique()
mpackages<-mpackages[!mpackages %in% c("c", "package")]
## functions that are from single packages
packages<-packs[str_detect(packs, "package:") & !packs %in% multipackages] %>%
str_replace(., "[0-9]+$", "") %>%
str_replace(., "package:", "")
## unique packages
packages_u<-packages %>%
unique() %>%
union(., mpackages)
return(list(packs=packages_u, tb=table(packages)))
}
checkPacks("~/your/path")
I am not sure of a good way to automatize this... but what you could do is:
Open a new R console
Check with sessionInfo that you don't have extra packages loaded.
You could check this using sessionInfo. If you, by default, load extra packages (e.g. using your .RProfile file) I suggest you avoid doing that, as it's a recipe for disaster.
Normally you should only have the base packages loaded: stats, graphics, grDevices, utils, datasets, methods, and base.
You can unload any extra libraries using:
detach("package:<packageName>", unload=TRUE)
Now run the script after commenting all of the library and require calls and see which functions give an error.
To get which package is required by each function type in the console:
??<functionName>
Load the required packages and re-run steps 3-5 until satisfied.
You might want to look at the checkpoint function from Revolution Analytics on GitHub here: https://github.com/RevolutionAnalytics/checkpoint
It does some of this, and solves the problem of reproducibility. But I don't see that it can report a list of what you are using.
However if you looked a the code you probably get some ideas.
I had a similar need when I needed to convert my code into a package, thus I need to identify every package dependency and either import or use full qualified name.
In reading book Extending R I found XRtools::makeImports can scan a package and find all packages need to be imported. This doesn't solve our problem yet as it only apply to existing package, but it provided the main insight on how to do it.
I made a function and put it into my package mischelper. You can install the package, either use the RStudio addin menu to scan current file or selected code, or use command line functions. Every external function (fun_inside) and the function that called it (usage) will be listed in table.
You can now go to each function, press F1 to find which package it belongs. I actually have another package that can scan all installed packages for function names and build a database, but that may cause more false positives for this usage because if you only loaded some packages, pressing F1 only search loaded packages.
See details of the usage in my package page
https://github.com/dracodoc/mischelper
I'd trust the {renv} based solutions the most for identifying packages dependencies.
Though I wrote a package funspotr that contains similar functionality to the answers mentioning NCmisc::list.functions.in.file() and can be used for parsing the functions or packages in a file or files:
library(dplyr)
funspotr::spot_pkgs("https://gist.githubusercontent.com/brshallo/4b8c81bc1283a9c28876f38a7ad7c517/raw/b399b768e900a381d99f5120e44d119c7fb40ab9/source_rmd.R")
#> [1] "knitr" "magrittr" "stringr" "readr" "purrr" "glue"
funspotr::spot_funs("https://gist.githubusercontent.com/brshallo/4b8c81bc1283a9c28876f38a7ad7c517/raw/b399b768e900a381d99f5120e44d119c7fb40ab9/source_rmd.R") %>%
select(-in_multiple_pkgs)
#> # A tibble: 13 x 2
#> funs pkgs
#> <chr> <chr>
#> 1 tempfile base
#> 2 purl knitr
#> 3 getOption base
#> 4 options base
#> 5 .Call base
#> 6 source base
#> 7 library base
#> 8 read_file readr
#> 9 map purrr
#> 10 str_extract stringr
#> 11 glue glue
#> 12 str_c stringr
#> 13 write_file readr

Resources