How to test if a function argument is a "quosurable" name? - r

I'm making a function that should be able to handle multiple classes for its first argument: formulas, characters, tidy-selection, var names... The goal is then to use tidyselection with tidyselect::vars_select, except with bare formulas.
The problem is that when I test the class of this argument, it will throw an error if the value is a name to be tidy-selected, since it will be considered as a not found object.
I found a workaround with tryCatch, which enquotes the first argument if its evaluation fails (and thus if it doesn't exist in this scope).
library(rlang)
foo=function(.vars){
.vars2=tryCatch(.vars, error=function(e) enquo(.vars))
print(class(.vars2))
print(class(.vars))
}
foo(Species)
# [1] "quosure" "formula"
# Error in print(class(.vars)) : object 'Species' not found
# In addition: Warning message:
# In print(class(.vars)) : restarting interrupted promise evaluation
foo(~Species)
# [1] "formula"
# [1] "formula"
foo(1)
# [1] "numeric"
# [1] "numeric"
foo("Species")
# [1] "character"
# [1] "character"
This doesn't seem clean to me, as I'm catching all errors without filtering on my specific case.
Is there a built-in function to test this, or a cleaner solution than this workaround?

I think the following is what you are trying to do (using here only base R).
foo=function(.vars) {
.vars2 = substitute(.vars)
ifelse(is.symbol(.vars2), class(.vars2), class(.vars))
}
foo(Species)
#[1] "name"
foo(~Species)
#[1] "formula"
foo(1)
#[1] "numeric"
foo("Species")
#[1] "character"

I don't think that there is a function which lets you avoid a structured control flow along the different input types.
library(rlang)
library(tidyselect)
library(dplyr)
foo <- function(df, .vars){
en_vars <- enquo(.vars)
var_expr <- quo_get_expr(en_vars)
if (is.name(var_expr)){
vars_select(names(df), !! en_vars)
} else if (is_formula(var_expr)) {
vars_select(names(df), all.vars(.vars))
} else {
vars_select(names(df), .vars)
}
}
iris_tbl <- as_tibble(iris)
foo(iris_tbl, Species)
#> Species
#> "Species"
foo(iris_tbl, ~Species)
#> Species
#> "Species"
foo(iris_tbl, 1)
#> Note: Using an external vector in selections is ambiguous.
#> ℹ Use `all_of(.vars)` instead of `.vars` to silence this message.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
#> Sepal.Length
#> "Sepal.Length"
foo(iris_tbl, "Species")
#> Species
#> "Species"
Created on 2020-06-21 by the reprex package (v0.3.0)

Related

How to make Ops method compatible between two objects of a non base class

Let's imagine that I have a class "my" and I want to trigger certain behaviour when it is added to an object that has units (i.e. from units package):
library(units)
my1 = structure(2, class="my")
Ops.my <- function(e1, e2=NULL) {
ok <-
switch(
.Generic,
`-` = ,
`*` = ,
`+` = ,
'<=' = TRUE,
FALSE
)
if (!ok) {
stop(gettextf("%s not meaningful", sQuote(.Generic)))
}
get(.Generic)(as.integer(e1), as.integer(e2))
}
my1+set_units(5,nm)
Currently, it gives me the following warning:
Warning message:
Incompatible methods ("Ops.my", "Ops.units") for "+"
But I actually want to handle "my" and "units" addition in a certain way, how do I do it?
I tried with something like Ops.my.units <- but it doesn't seem to work.
There doesn't seem to be a way to do this with Ops. From the docs:
The classes of both arguments are considered in dispatching any member of this group. For each argument its vector of classes is examined to see if there is a matching specific (preferred) or Ops method. If a method is found for just one argument or the same method is found for both, it is used. If different methods are found, there is a warning about ‘incompatible methods’
This is probably a good thing. Part of the benefit of an object-oriented system in a non-compiled language like R is that it helps preserve type safety. This stops you from accidentally adding apples to oranges, as we can see in the following example:
apples <- structure(2, class = "apples")
oranges <- structure(2, class = "oranges")
Ops.apples <- function(e1, e2) {
value <- do.call(.Generic, list(as.integer(e1), as.integer(e2)))
class(value) <- "apples"
value
}
Ops.oranges <- function(e1, e2) {
value <- do.call(.Generic, list(as.integer(e1), as.integer(e2)))
class(value) <- "oranges"
value
}
apples + apples
#> [1] 4
#> attr(,"class")
#> [1] "apples"
oranges + oranges
#> [1] 4
#> attr(,"class")
#> [1] "oranges"
apples + oranges
#> [1] 4
#> attr(,"class")
#> [1] "apples"
#> Warning message:
#> Incompatible methods ("Ops.apples", "Ops.oranges") for "+"
You can see that even here, we could just ignore the warning.
suppressWarnings(apples + oranges)
#> [1] 4
#> attr(,"class")
#> [1] "apples"
But hopefully you can see why this may not be good - we have added 2 apples and 2 oranges, and have returned 4 apples.
Throughout R and its extension packages, there are numerous type-conversion functions such as as.integer, as.numeric, as.logical, as.character, as.difftime etc. These allow for some element of control when converting between types and performing operations on different types.
The "right" way to do this kind of thing is specifically convert one of the object types to the other in order to perform the operation:
as.my <- function(x) UseMethod("as.my")
as.my.default <- function(x) {
value <- as.integer(x)
class(value) <- 'my'
value
}
my1 + as.my(set_units(5,nm))
#> [1] 7

R regex quantifier dash vs. comma with {stringr} vs. {pointblank}

I'm noticing some odd behavior with R regex quantifiers written as either {min, max} (as recommend in the stringr cheatsheet) vs. as {min - max}, when using the pointblank package. I expect the regexes to work with {min, max} and fail with {min - max}. However, in the two examples below, one works with {min, max} and one works with {min - max}.
Example 1 works as expected: pattern_comma works and pattern_dash does not. But example 2 works unexpectedly: doi_pattern_comma does not work and doi_pattern_dash does work.
Any suggestions about this regex? Or might this be a bug in pointblank (in which case I can open an issue there)?
Thank you, SO community!
library(dplyr)
library(stringr)
library(pointblank)
# EXAMPLE 1
df1 <- tibble(x = c("123", "68"))
pattern_comma <- "^\\d{1,3}$"
pattern_dash <- "^\\d{1-3}$"
stringr::str_detect(df1$x, pattern_comma) #pass
#> [1] TRUE TRUE
stringr::str_detect(df1$x, pattern_dash) #fail
#> Error in stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)): Error in {min,max} interval. (U_REGEX_BAD_INTERVAL, context=`^\d{1-3}$`)
#pass
df1 %>%
pointblank::col_vals_regex(
vars(x),
pattern_comma
)
#> # A tibble: 2 x 1
#> x
#> <chr>
#> 1 123
#> 2 68
#fail
df1 %>%
pointblank::col_vals_regex(
vars(x),
pattern_dash
)
#> Error: Exceedance of failed test units where values in `x` should have matched the regular expression: `^\d{1-3}$`.
#> The `col_vals_regex()` validation failed beyond the absolute threshold level (1).
#> * failure level (2) >= failure threshold (1)
# EXAMPLE 2
df2 <- tibble(doi = c("10.1186/s12872-020-01551-9", "10.1002/cpp.1968"))
doi_pattern_comma <- "^10\\.\\d{4,9}/[-.;()/:\\w\\d]+$"
doi_pattern_dash <- "^10\\.\\d{4-9}/[-.;()/:\\w\\d]+$"
stringr::str_detect(df2$doi, doi_pattern_comma) #pass
#> [1] TRUE TRUE
stringr::str_detect(df2$doi, doi_pattern_dash) #fail
#> Error in stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)): Error in {min,max} interval. (U_REGEX_BAD_INTERVAL, context=`^10\.\d{4-9}/[-.;()/:\w\d]+$`)
#fail
df2 %>%
col_vals_regex(
vars(doi),
doi_pattern_comma
)
#> Error: Exceedance of failed test units where values in `doi` should have matched the regular expression: `^10\.\d{4,9}/[-.;()/:\w\d]+$`.
#> The `col_vals_regex()` validation failed beyond the absolute threshold level (1).
#> * failure level (2) >= failure threshold (1)
#pass
df2 %>%
col_vals_regex(
vars(doi),
doi_pattern_dash
)
#> # A tibble: 2 x 1
#> doi
#> <chr>
#> 1 10.1186/s12872-020-01551-9
#> 2 10.1002/cpp.1968
Created on 2021-05-09 by the reprex package (v0.3.0)
You must not doubt: {min-max} quantifier does not exist, you need to use
{min,max}. \d{4-9} throws an exception (try it with sub and you will get invalid regular expression '\d{4-9}', reason 'Invalid contents of {}' ).
Next, the second issue is that the regex is parsed with the default TRE regex engine, and you can't use shorthand character classes like \w or \W inside bracket expressions there, so you need to use [:alnum:]_ instead of \w inside square brackets.
Now, that you know the right regex:
"^10\\.\\d{4,9}/[-.;()/:[:alnum:]_]+$"
you can dive deeper.
You can see what results you get if you use test_col_vals_regex:
> df2 %>% test_col_vals_regex(vars(doi), "^10\\.\\d{4,9}/[-.;()/:[:alnum:]_]+$")
[1] TRUE
> df2 %>% test_col_vals_regex(vars(doi), "^10\\.\\d{4-9}/[-.;()/:[:alnum:]_]+$")
[1] NA
> df2 %>% test_col_vals_regex(vars(doi), "^10\\.\\d{4,9}/[-.;()/:\\w]+$")
[1] FALSE
> df2 %>% test_col_vals_regex(vars(doi), "^10\\.\\d{4-9}/[-.;()/:\\w]+$")
[1] NA
So, all the cases when the regex is malformed return NA and the validation for those items is skipped, passing them in the end.
CONCLUSION: Always test your regex patterns for validity before using them in col_vals_regex.

purr::pmap does not keep parameter names

I'm trying to execute a function that uses the names of passed parameters with purrr::pmap. Unlike purrr::map (see below), pmap doesn't preserve these names. The below MWE captures the issue:
print_names <- function(x) {
print(names(x))
}
namedVec <- c(nameA = "valueA")
purrr::map(list(namedVec), print_names)
# [1] "nameA"
# [[1]]
# [1] "nameA"
purrr::pmap(list(namedVec), print_names)
# NULL
# $nameA
# NULL
Note that, in pmap, the .l argument needs to be a list of listed arguments, but in your function call it's just a list:
print_names <- function(x) {
print(names(x))
}
namedVec <- c(nameA = "valueA")
purrr::map(list(namedVec), ~print_names(.))
#> [1] "nameA"
#> [[1]]
#> [1] "nameA"
purrr::pmap(list(list(namedVec)), print_names)
#> [1] "nameA"
#> [[1]]
#> [1] "nameA"
Created on 2018-10-07 by the reprex package (v0.2.1)

Date column coerced to numeric when indexing dataframe with [[ and a vector

I am creating a data.frame with a column of type Date. When indexing the data frame with [[ and a numeric vector, the Date becomes a number. This is causing a problem when using purrr::pmap. Can anyone explain why this is happening and is there a work around?
Example:
x <- data.frame(d1 = lubridate::ymd(c("2018-01-01","2018-02-01")))
class(x$d1)
# [1] "Date"
x[[1]]
# [1] "2018-01-01" "2018-02-01"
x[[c(1, 1)]]
# [1] 17532
Overview
After reading why does unlist() kill dates in R and the documentation of unlist(), you've got to manually prevent purrr::map() from coercing the Date objects to integer by way of using the base::c() function.
Load mikmart's PR version of purrr::pmap()
After reading pmap strips Date, it looks like someone very awesome submitted a pull request to resolve this issue within a refactored version of the indexing that happens under the hood in purrr::pmap().
Using devtools::dev_mode(), you can install mikmart/purrr's "pmap" branch version of purrr to retain Date objects while using pmap().
# ******pmap() example ****
# load necessary packages -----
library(devtools)
library(lubridate)
# enter dev mode so you don't have to uninstall the cran version of purrr ----
dev_mode(on = TRUE)
# install mikmart's PR to fix the coercing of Dates to integer ----
install_github(repo = "mikmart/purrr", ref = "pmap")
# load mikmart's PR version of purrr ----
library(purrr)
# load necessary data
x <- data.frame(d1 = lubridate::ymd(c("2018-01-01","2018-02-01")))
# for the first column in x ----
# give me each element
# note: no need for c()
list.of.dates <-
x %>%
pmap(.f = ~ .x)
# view results -----
list.of.dates
# [[1]]
# [1] "2018-01-01"
#
# [[2]]
# [1] "2018-02-01"
# view the class of each list -----
map_chr(list.of.dates, class) # [1] "Date" "Date"
#
#
# turn off dev mode ---
dev_mode(on = FALSE)
#
# restart R -----
# Manually hit Shift+Cmd+F10 or point in click under the "Session" tab
#
# clear global environment ----
rm(list = ls())
#
# ******map() example********
# load necessary packages -----
library(tidyverse)
library(lubridate)
# load necessary data ----
x <- data.frame(d1 = lubridate::ymd(c("2018-01-01","2018-02-01")))
# from the first column ------
# give me each element
# and ensure the dates don't get coerced to integers
list.of.dates <-
x$d1 %>%
map(.f = ~ .x %>% c())
# view results -----
list.of.dates
# [[1]]
# [1] "2018-01-01"
#
# [[2]]
# [1] "2018-02-01"
# # view the class of each list -----
map_chr(list.of.dates, class) # [1] "Date" "Date"
# end of script #

Nested List Parsing with jsonlite

This is the second time that I have faced this recently, so I wanted to reach out to see if there is a better way to parse dataframes returned from jsonlite when one of elements is an array stored as a column in the dataframe as a list.
I know that this part of the power with jsonlite, but I am not sure how to work with this nested structure. In the end, I suppose that I can write my own custom parsing, but given that I am almost there, I wanted to see how to work with this data.
For example:
## options
options(stringsAsFactors=F)
## packages
library(httr)
library(jsonlite)
## setup
gameid="2015020759"
SEASON = '20152016'
BASE = "http://live.nhl.com/GameData/"
URL = paste0(BASE, SEASON, "/", gameid, "/PlayByPlay.json")
## get the data
x <- GET(URL)
## parse
api_response <- content(x, as="text")
api_response <- jsonlite::fromJSON(api_response, flatten=TRUE)
## get the data of interest
pbp <- api_response$data$game$plays$play
colnames(pbp)
And exploring what comes back:
> class(pbp$aoi)
[1] "list"
> class(pbp$desc)
[1] "character"
> class(pbp$xcoord)
[1] "integer"
From above, the column pbp$aoi is a list. Here are a few entries:
> head(pbp$aoi)
[[1]]
[1] 8465009 8470638 8471695 8473419 8475792 8475902
[[2]]
[1] 8470626 8471276 8471695 8476525 8476792 8477956
[[3]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[4]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[5]]
[1] 8469619 8471695 8473492 8474625 8475727 8476525
[[6]]
[1] 8469619 8471695 8473492 8474625 8475727 8475902
I don't really care if I parse these lists in the same dataframe, but what do I have for options to parse out the data?
I would prefer to take the data out of out lists and parse them into a dataframe that can be "related" to the original record it came from.
Thanks in advance for your help.
From #hrbmstr above, I was able to get what I wanted using unnest.
select(pbp, eventid, aoi) %>% unnest() %>% head

Resources