I am trying to use where in my own R package. I was going to use it in the code as tidyselect::where() but the function is not exported. For this same reason, you cannot use #importFrom tidyselect where.
I do not want to reference it with :::. The code will work if I simply refer to it as where(), but then I receive a note in the checks.
Undefined global functions or variables:
where
What is going on here? I assume the function works as is since it it captured as an expression in my code, and tidyeval knows how to handle it on evaluation?
Example
As an example, if you start a clean R session, the following will work (dplyr 1.0.0) without running library(dplyr). It clearly knows how to handle where.
dplyr::mutate(iris, dplyr::across(where(is.numeric), ~.x + 10))
Likewise, this will also work, but I do not want to use it in a package. So I use the above, which gets flagged in devtools::check().
dplyr::mutate(iris, dplyr::across(tidyselect:::where(is.numeric), ~.x + 10))
Question
How do I use where from tidyselect in a package without it being flagged as undefined?
There is an existing workaround for this type of problem. One of the items that is exported from tidyselect is a list of helper function called vars_select_helpers. This includes where, so if you do
mutate(iris, dplyr::across(tidyselect::vars_select_helpers$where(is.numeric), ~.x + 10))
You should get the same functionality without any grumbling from the check tools.
Related
I am trying to use where in my own R package. I was going to use it in the code as tidyselect::where() but the function is not exported. For this same reason, you cannot use #importFrom tidyselect where.
I do not want to reference it with :::. The code will work if I simply refer to it as where(), but then I receive a note in the checks.
Undefined global functions or variables:
where
What is going on here? I assume the function works as is since it it captured as an expression in my code, and tidyeval knows how to handle it on evaluation?
Example
As an example, if you start a clean R session, the following will work (dplyr 1.0.0) without running library(dplyr). It clearly knows how to handle where.
dplyr::mutate(iris, dplyr::across(where(is.numeric), ~.x + 10))
Likewise, this will also work, but I do not want to use it in a package. So I use the above, which gets flagged in devtools::check().
dplyr::mutate(iris, dplyr::across(tidyselect:::where(is.numeric), ~.x + 10))
Question
How do I use where from tidyselect in a package without it being flagged as undefined?
There is an existing workaround for this type of problem. One of the items that is exported from tidyselect is a list of helper function called vars_select_helpers. This includes where, so if you do
mutate(iris, dplyr::across(tidyselect::vars_select_helpers$where(is.numeric), ~.x + 10))
You should get the same functionality without any grumbling from the check tools.
I was helping somebody with their code recently and they asked me why it was possible to use functions from a certain package (e.g. dplyr) without explicitly 'loading' (and attaching them) to the search path within R.
Specifically, why it is possible to write:
dplyr::mutate(...)
As opposed to:
library(dplyr)
mutate(...)
When you use ::, it doesn't appear to automatically 'attach' the functions from the namespace dplyr onto the search path. That only happens when you call library(). This confuses me slightly: when you use the :: approach, how does R find the function mutate() within dplyr without it being attached to the search path?
You may also consider making use of with. with is traditionally is used to undertake operations on the data but combined with asNamespace can be used to access functions from the packages without the need to load the package.
with(asNamespace("dplyr"),
select(mtcars, cyl) %>%
mutate(cyl_two = cyl * 2)
)
You can use :: and ::: to access a single function from a package, accounting for exported or not exported functions. If you look at the code of double and triple colon operators you will see that those leverage very similar mechanism. Package name is used to construct an environment via asNamespace and get1 function is used to source the function.
Broadly speaking, a potentially efficient way of thinking about that is comparing package functions to objects in an environment. By calling ::, ::: or with we can reference those objects.
1 In case of :: the actual function is getExportedValue.
In dplyr one can write code like e.g. using the '.' to refer to the data in the pipe
x <- data.frame(x = 2:4)
y <- data.frame(y = 1:3)
y %>% dplyr::bind_cols(x,.)
but when using it in a function and running the package check it produces the
no visible binding for global variable '.'.
What is the best practice to handle the NOTE?
It seems that best practice is to use .data instead of . and then use import .data from the rlang package. From the programming with dplyr vignette:
If this function is in a package, using .data also prevents R CMD check from giving a NOTE about undefined global variables (provided that you’ve also imported rlang::.data with #importFrom rlang .data).
Unfortunately that doesn't work for the original question with dplyr::bind_cols, but it works for example in dplyr::mutate and dplyr::do.
Best practice now is to probably use quosures. This other SO post has a good summary: How to evaluate a constructed string with non-standard evaluation using dplyr?
In practice, I've just included . = NULL at the top of my functions.
EDIT
As #MrFlick pointed out, quosures won't actually help in this case. You can feasibly use quosures to define column names etc. in a way that would allow you to avoid notes about non-standard evaluation in package functions (I haven't done this yet, but it's on my to-do list for at least one of my packages), but you can't actually use this strategy for piping values to a specified argument or position
with ..
It's worth pointing out that there is at least some overhead with using pipes. It might be that best practice is to not actually use pipes at all in your package functions, which gets around the issue of using .. For the rest of NSE with dplyrcommands, you can use quosures.
I recently ran into a situation in which existing R code broke due the introduction of the dplyr library. Specifically, the lag function from the stats package, is being replaced by dplyr::lag. The problem is previously documented here, however no work around is provided. Research into R namespaces and environments leads to 2 possible solutions, neither very robust in my opinion:
Make sure that package:stats appears first in the search() path so that lag resolves as the function in the stats package.
Change all references of lag in my code to stats::lag
My question is whether either of these other solutions are possible:
Loading the dplyr package in a way to force it to be in a "private" namespace in which its objects can only be accessed through the :: operator.
A directive at library loading to force lag to resolve as stats::lag. This could be done either by removing dplyr::lag or overriding the search path (similar to the C++ using namespace::function directive.)
you should consider library(conflicted) as it's designed for exactly this problem.
https://cran.r-project.org/web/packages/conflicted/index.html
putting conflicted::conflict_prefer(name = "lag", winner = "stats") after you load your packages ensures that anytime the function lag() is called in your script, it will use the stats function by default.
I have two related questions about writing functions in magrittr package & including them in a package.
In normal way of writing a function, you can specify library(package.a) within function call if any of the steps uses a function from package.a. How would you call this in pipe environment (from magrittr)?
This part of the question arose when I tried to package my functions, and a few of my functions use magrittr's way of creating functions. I wasn't able to add those functions to package. Devtools package's combine function didn't recognize %>% pipe. Basically I had to re-write them to normal functions to include them in the package. How do you overcome this?
Update your NAMESPACE file, see 1.5 Package namespaces.
Add import(magrittr), don't forget to add Imports: magrittr in DESCRIPTION file.
Regarding your comment on ::.
While you are importing all magrittr exported function by using import(magrittr) you don't have to use :: operator to point the package.
Of course as long as you did not create a function with the same name in your package which would override the name from imported package, then you do need ::.
Also the :: would be needed if you would used importFrom() instead of import() and you did not import required function - that might be not recommended anyway.
Another case where you may want to use :: is when you use Suggests or Enhances, none of them is in scope of that question anyway.