I was helping somebody with their code recently and they asked me why it was possible to use functions from a certain package (e.g. dplyr) without explicitly 'loading' (and attaching them) to the search path within R.
Specifically, why it is possible to write:
dplyr::mutate(...)
As opposed to:
library(dplyr)
mutate(...)
When you use ::, it doesn't appear to automatically 'attach' the functions from the namespace dplyr onto the search path. That only happens when you call library(). This confuses me slightly: when you use the :: approach, how does R find the function mutate() within dplyr without it being attached to the search path?
You may also consider making use of with. with is traditionally is used to undertake operations on the data but combined with asNamespace can be used to access functions from the packages without the need to load the package.
with(asNamespace("dplyr"),
select(mtcars, cyl) %>%
mutate(cyl_two = cyl * 2)
)
You can use :: and ::: to access a single function from a package, accounting for exported or not exported functions. If you look at the code of double and triple colon operators you will see that those leverage very similar mechanism. Package name is used to construct an environment via asNamespace and get1 function is used to source the function.
Broadly speaking, a potentially efficient way of thinking about that is comparing package functions to objects in an environment. By calling ::, ::: or with we can reference those objects.
1 In case of :: the actual function is getExportedValue.
Related
I am trying to use where in my own R package. I was going to use it in the code as tidyselect::where() but the function is not exported. For this same reason, you cannot use #importFrom tidyselect where.
I do not want to reference it with :::. The code will work if I simply refer to it as where(), but then I receive a note in the checks.
Undefined global functions or variables:
where
What is going on here? I assume the function works as is since it it captured as an expression in my code, and tidyeval knows how to handle it on evaluation?
Example
As an example, if you start a clean R session, the following will work (dplyr 1.0.0) without running library(dplyr). It clearly knows how to handle where.
dplyr::mutate(iris, dplyr::across(where(is.numeric), ~.x + 10))
Likewise, this will also work, but I do not want to use it in a package. So I use the above, which gets flagged in devtools::check().
dplyr::mutate(iris, dplyr::across(tidyselect:::where(is.numeric), ~.x + 10))
Question
How do I use where from tidyselect in a package without it being flagged as undefined?
There is an existing workaround for this type of problem. One of the items that is exported from tidyselect is a list of helper function called vars_select_helpers. This includes where, so if you do
mutate(iris, dplyr::across(tidyselect::vars_select_helpers$where(is.numeric), ~.x + 10))
You should get the same functionality without any grumbling from the check tools.
I would like to add an idiosyncratically modified function to package written by someone else, with an R Script, i.e. just for the session, not permanently. The specific example is, let's say, bls_map_county2() added to the blscrapeR package. bls_map_county2 is just a copy of the bls_map_county() function with an added ... argument, for purposes of changing a few of the map drawing parameters. I have not yet inserted the additional parameters. Running the function as-is, I get the error:
Error in BLS_map_county(map_data = df, fill_rate = "unemployed_rate", :
could not find function "geom_map"
I assume this is because my function does not point to the blscrapeR namespace. How do I assign my function to the (installed, loaded) blscrapeR namespace, and is there anything else I need to do to let it access whatever machinery from the package it requires?
When I am hacking on a function in a particular package that in turn calls other functions I often use this form after the definition:
mod_func <- function( args) {body hacked}
environment(mod_func) <- environment(old_func)
But I think the function you might really want is assignInNamespace. These methods will allow the access to non-exported functions in loaded packages. They will not however succeed if the package is not loaded. So you may want to have a stopifnot() check surrounding require(pkgname).
There are two parts to this answer - first a generic answer to your question, and 2nd a specific answer for the particular function that you reference, in which the problem is something slightly different.
1) generic solution to accessing internal functions when you edit a package function
You should already have access to the package namespace, since you loaded it, so it is only the unexported functions that will give you issues.
I usually just prepend the package name with the ::: operator to the non exported functions. I.e., find every instance of a call to some_internal_function(), and replace it with PackageName:::some_internal_function(). If there are several different internal functions called within the function you are editing, you may need to do this a few times for each of the offending function calls.
The help page for ::: does contain these warnings
Beware -- use ':::' at your own risk!
and
It is typically a design mistake to use ::: in your code since the
corresponding object has probably been kept internal for a good
reason. Consider contacting the package maintainer if you feel the
need to access the object for anything but mere inspection.
But for what you are doing, in terms of temporarily hacking another function from the same package for your own use, these warnings should be safe to ignore (at you own risk, of course - as it says in the manual)
2) In the case of blscrapeR ::bls_map_county()
The offending line in this case is
ggplot2::ggplot() + geom_map(...
in which the package writers have specified the ggplot2 namespace for ggplot(), but forgotten to do so for geom_map() which is also part of ggplot2 (and not an internal function in blscrapeR ).
In this case, just load ggplot2, and you should be good to go.
You may also consider contacting the package maintainer to inform them of this error.
I'm creating an R package that will use a single function from plyr. According to this roxygen2 vignette:
If you are using just a few functions from another package, the
recommended option is to note the package name in the Imports: field
of the DESCRIPTION file and call the function(s) explicitly using ::,
e.g., pkg::fun().
That sounds good. I'm using plyr::ldply() - the full call with :: - so I list plyr in Imports: in my DESCRIPTION file. However, when I use devtools::check() I get this:
* checking dependencies in R code ... NOTE
All declared Imports should be used:
‘plyr’
All declared Imports should be used.
Why do I get this note?
I am able to avoid the note by adding #importFrom dplyr ldply in the file that is using plyr, but then I end but having ldply in my package namespace. Which I do not want, and should not need as I am using plyr::ldply() the single time I use the function.
Any pointers would be appreciated!
(This question might be relevant.)
If ldply() is important for your package's functionality, then you do want it in your package namespace. That is the point of namespace imports. Functions that you need, should be in the package namespace because this is where R will look first for the definition of functions, before then traversing the base namespace and the attached packages. It means that no matter what other packages are loaded or unloaded, attached or unattached, your package will always have access to that function. In such cases, use:
#importFrom plyr ldply
And you can just refer to ldply() without the plyr:: prefix just as if it were another function in your package.
If ldply() is not so important - perhaps it is called only once in a not commonly used function - then, Writing R Extensions 1.5.1 gives the following advice:
If a package only needs a few objects from another package it can use a fully qualified variable reference in the code instead of a formal import. A fully qualified reference to the function f in package foo is of the form foo::f. This is slightly less efficient than a formal import and also loses the advantage of recording all dependencies in the NAMESPACE file (but they still need to be recorded in the DESCRIPTION file). Evaluating foo::f will cause package foo to be loaded, but not attached, if it was not loaded already—this can be an advantage in delaying the loading of a rarely used package.
(I think this advice is actually a little outdated because it is implying more separation between DESCRIPTION and NAMESPACE than currently exists.) It implies you should use #import plyr and refer to the function as plyr::ldply(). But in reality, it's actually suggesting something like putting plyr in the Suggests field of DESCRIPTION, which isn't exactly accommodated by roxygen2 markup nor exactly compliant with R CMD check.
In sum, the official line is that Hadley's advice (which you are quoting) is only preferred for rarely used functions from rarely used packages (and/or packages that take a considerable amount of time to load). Otherwise, just do #importFrom like WRE advises:
Using importFrom selectively rather than import is good practice and recommended notably when importing from packages with more than a dozen exports.
I recently ran into a situation in which existing R code broke due the introduction of the dplyr library. Specifically, the lag function from the stats package, is being replaced by dplyr::lag. The problem is previously documented here, however no work around is provided. Research into R namespaces and environments leads to 2 possible solutions, neither very robust in my opinion:
Make sure that package:stats appears first in the search() path so that lag resolves as the function in the stats package.
Change all references of lag in my code to stats::lag
My question is whether either of these other solutions are possible:
Loading the dplyr package in a way to force it to be in a "private" namespace in which its objects can only be accessed through the :: operator.
A directive at library loading to force lag to resolve as stats::lag. This could be done either by removing dplyr::lag or overriding the search path (similar to the C++ using namespace::function directive.)
you should consider library(conflicted) as it's designed for exactly this problem.
https://cran.r-project.org/web/packages/conflicted/index.html
putting conflicted::conflict_prefer(name = "lag", winner = "stats") after you load your packages ensures that anytime the function lag() is called in your script, it will use the stats function by default.
I have two related questions about writing functions in magrittr package & including them in a package.
In normal way of writing a function, you can specify library(package.a) within function call if any of the steps uses a function from package.a. How would you call this in pipe environment (from magrittr)?
This part of the question arose when I tried to package my functions, and a few of my functions use magrittr's way of creating functions. I wasn't able to add those functions to package. Devtools package's combine function didn't recognize %>% pipe. Basically I had to re-write them to normal functions to include them in the package. How do you overcome this?
Update your NAMESPACE file, see 1.5 Package namespaces.
Add import(magrittr), don't forget to add Imports: magrittr in DESCRIPTION file.
Regarding your comment on ::.
While you are importing all magrittr exported function by using import(magrittr) you don't have to use :: operator to point the package.
Of course as long as you did not create a function with the same name in your package which would override the name from imported package, then you do need ::.
Also the :: would be needed if you would used importFrom() instead of import() and you did not import required function - that might be not recommended anyway.
Another case where you may want to use :: is when you use Suggests or Enhances, none of them is in scope of that question anyway.