using quasiquotation in functions with formula interface - r

I want to write a custom function that can take bare and "string" inputs, and can handle both functions with and without the formula interface.
custom function example
# setup
set.seed(123)
library(tidyverse)
# custom function
foo <- function(data, x, y) {
# function without formula
print(table(data %>% dplyr::pull({{ x }}), data %>% dplyr::pull({{ y }})))
# function with formula
print(
broom::tidy(stats::t.test(
formula = rlang::new_formula({{ rlang::ensym(y) }}, {{ rlang::ensym(x) }}),
data = data
))
)
}
bare
works for both functions with and without formula interface
foo(mtcars, am, cyl)
#>
#> 4 6 8
#> 0 3 4 12
#> 1 8 3 2
#> # A tibble: 1 x 10
#> estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1.87 6.95 5.08 3.35 0.00246 25.9 0.724 3.02
#> # ... with 2 more variables: method <chr>, alternative <chr>
string
works for both functions with and without formula interface
foo(mtcars, "am", "cyl")
#>
#> 4 6 8
#> 0 3 4 12
#> 1 8 3 2
#> # A tibble: 1 x 10
#> estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1.87 6.95 5.08 3.35 0.00246 25.9 0.724 3.02
#> # ... with 2 more variables: method <chr>, alternative <chr>
as colnames
works only for functions without the formula interface
foo(mtcars, colnames(mtcars)[9], colnames(mtcars)[2])
#>
#> 4 6 8
#> 0 3 4 12
#> 1 8 3 2
#> Error: Only strings can be converted to symbols
#> Backtrace:
#> x
#> 1. \-global::foo(mtcars, colnames(mtcars)[9], colnames(mtcars)[2])
#> 2. +-base::print(...)
#> 3. +-broom::tidy(...)
#> 4. +-stats::t.test(...)
#> 5. +-rlang::new_formula(...)
#> 6. \-rlang::ensym(y)
How can I modify the original function so that it will work with all the above-mentioned ways of entering the inputs and for both kinds of functions used?

The nice philosophy of rlang is that you get to control when you want values to be evaluated via the !! and {{}} operators. You seem to want to make a function that takes strings, symbols, and (possibly evaluated) expressions all in the same parameter. Using symbols or bare strings is actually easy with ensym but also wanting to allow for code like colnames(mtcars)[9] that has to be evaulated before returning a string is the problem. This potentially can be quite confusing. For example, what's the behavior you expect when you run the following?
am <- 'disp'
cyl <- 'gear'
foo(mtcars, am, cyl)
You could write a helper function if you want to assume all "calls" should be evaluated but symbols and literals should not. Here's a "cleaner" function
clean_quo <- function(x) {
if (rlang::quo_is_call(x)) {
x <- rlang::eval_tidy(x)
} else if (!rlang::quo_is_symbolic(x)) {
x <- rlang::quo_get_expr(x)
}
if (is.character(x)) x <- rlang::sym(x)
if (!rlang::is_quosure(x)) x <- rlang::new_quosure(x)
x
}
and you could use that in your function with
foo <- function(data, x, y) {
x <- clean_quo(rlang::enquo(x))
y <- clean_quo(rlang::enquo(y))
# function without formula
print(table(data %>% dplyr::pull(!!x), data %>% dplyr::pull(!!y)))
# function with formula
print(
broom::tidy(stats::t.test(
formula = rlang::new_formula(rlang::quo_get_expr(y), rlang::quo_get_expr(x)),
data = data
))
)
}
Doing so will allow all these to return the same values
foo(mtcars, am, cyl)
foo(mtcars, "am", "cyl")
foo(mtcars, colnames(mtcars)[9], colnames(mtcars)[2])
But you are probably just delaying possible other problems. I would not recommend over-interpreting user intentions with this kind of code. That's why it's better to explicitly allow them to un-escape themselves. Perhaps provide two different versions of the function that can be used with parameter that require evaluation and those that do not.

I have to agree with #MrFlick and others about inherent ambiguity when mixing standard and non-standard evaluation. (I also pointed this out in your similar question from a while ago.)
However, one can argue that dplyr::select() works with symbols, strings and expressions of the form colnames(.)[.]. If you absolutely must have the same interface, then you can leverage tidyselect to resolve your inputs:
library( rlang )
library( tidyselect )
ttest <- function(data, x, y) {
## Identify locations of x and y in data, get column names as symbols
s <- eval_select( expr(c({{x}},{{y}})), data ) %>% names %>% syms
## Use the corresponding symbols to build the formula by hand
broom::tidy(stats::t.test(
formula = new_formula( s[[2]], s[[1]] ),
data = data
))
}
## All three now work
ttest( mtcars, am, cyl )
ttest( mtcars, "am", "cyl" )
ttest( mtcars, colnames(mtcars)[9], colnames(mtcars)[2] )

Related

How to use xtabs in a user defined function?

I'm trying to create a wrapper for xtabs with a custom function.
I tried doing this:
xtabs_fun <- function(data, variable) {
xtabs(~ variable, data = data)
}
But using it doesn't return the correct value.
xtabs_fun(mtcars, "cyl")
cyl
1
The expected output should be this:
xtabs(~ cyl, data = mtcars)
cyl
4 6 8
11 7 14
We could use reformulate
xtabs_fun <- function(data, variable) {
xtabs(reformulate(variable), data = data)
}
-testing
> xtabs_fun(mtcars, "cyl")
cyl
4 6 8
11 7 14
xtabs accepts a data frame so:
xtabs_fun <- function(data, variable) xtabs(data[variable])
# test
xtabs_fun(mtcars, "cyl")
## cyl
## 4 6 8
## 11 7 14
Alternately use table which works the same:
table_fun <- function(data, variable) table(data[variable])
Variable names inside formulas are always taken literally rather than being substituted, so inside your function, xtabs is looking for a column called variable inside mtcars, which doesn't exist. Instead, you can build the formula as a string:
xtabs_fun <- function(data, variable) {
xtabs(as.formula(paste("~", variable)), data = data)
}
xtabs_fun(mtcars, "cyl")
#> cyl
#> 4 6 8
#> 11 7 14
If you prefer your function to take unquoted variable names in a tidyverse style, you can do
xtabs_fun <- function(data, variable) {
xtabs(as.formula(paste("~", deparse(substitute(variable)))), data = data)
}
mtcars |> xtabs_fun(cyl)
#> cyl
#> 4 6 8
#> 11 7 14
Created on 2022-09-24 with reprex v2.0.2

why write.csv is not updating the changes on test$number_s

I am changing the values of a column of a data frame. Then, I am saving the file, supposedly with the changes, but not. What am I missing? Thanks,
test <- data.frame(name_s = c("x","y","z"), number_s = c(1,2,3))
lapply(1:length(test$number_s), function(x) {
test$number_s[x] <- test$number_s[[x]] + 1
})
write.csv(test,paste0("test ",format(Sys.time(),"%Y%m%d"),".csv"),
row.names = F)
that was oversimplified, the real deal is this one:
date_format_1 = "[0-9]-[:alpha:][:alpha:][:alpha:]"
date_format_2 = "[:alpha:][:alpha:][:alpha:]-[0-9][0-9]"
test <- data.frame(name_s = c("v","w","x","y","z"), event_text = c("Aug-89","7-May","9-Jun","4-Dec-2021","Feb-99"))
lapply(1:length(test$event_text), function(x) {
if (str_detect(test$event_text[[x]], paste0("\\b",date_format_1,"\\b")) == T){
test$event_text[x] <- paste0(str_sub(test$event_text[[x]],1,1), "/F",
which(month.abb %in% str_sub(
test$event_text[[x]], 3,5)))
} else if(str_detect(test$event_text[[x]], paste0("\\b", date_format_2,"\\b"))
== T) {
test$event_text[[x]] = paste0(which(month.abb %in% str_sub(
test$event_text[x],1,3)),"/F",str_sub(test$event_text[[x]],-2))
} else {
test$event_text[x] <- test$event_text[[x]]
}
})
write.csv(test,paste0("test ",format(Sys.time(),"%Y%m%d"),".csv"),
row.names = F)
Below I have written two calls to lapply that fix the issue you were having. The problem stems from the fact that R has scoped variables and so the value is changed within the function but the result is never returned or extracted from the function. As such I have demonstrated this by printing the dataframe after each of the lapply() calls below.
We can fix this in two ways. The first more correct version is to let lapply modify the exact vector directly by adding one to each value and returning x+1. (Note I have skipped curly braces and this will return the value from the next ppiece of code run, in this case x+1 alternatively you could write function(x) {return(x+1)} in that argument).
An alternate approach that will run slower but still use the indexing method is to use global assignment. <<- assigns the variable to the global scope/environment rather than the local scope of the function. (Note this code is run sequentially so the written call to this function is adding + 1 for the second time to the dataframe when shown below).
test <- data.frame(name_s = c("x","y","z"), number_s = c(1,2,3))
# Original Behaviour, doesn't work due to scoping issues
lapply(1:length(test$number_s), function(x) {
test$number_s[x] <- test$number_s[[x]] + 1
})
#> [[1]]
#> [1] 2
#>
#> [[2]]
#> [1] 3
#>
#> [[3]]
#> [1] 4
print(test)
#> name_s number_s
#> 1 x 1
#> 2 y 2
#> 3 z 3
# function that is syntactically and functionally correct
# instead of modifying the vector in the function scope the function returns the
# mutated vector which we then assign to the dataframe's vector
test$number_s <- lapply(test$number_s, function(x) x + 1)
print(test)
#> name_s number_s
#> 1 x 2
#> 2 y 3
#> 3 z 4
# function that is syntactically odd but functionally correct
# the function affects the values in the global scope, this works but is slower
# and is not best practice as it would be difficult to read
lapply(1:length(test$number_s), function(x) {
test$number_s[x] <<- test$number_s[[x]] + 1
})
#> [[1]]
#> [1] 3
#>
#> [[2]]
#> [1] 4
#>
#> [[3]]
#> [1] 5
print(test)
#> name_s number_s
#> 1 x 3
#> 2 y 4
#> 3 z 5
Created on 2021-07-23 by the reprex package (v2.0.0)

How to replace vector tidier

I am looking to find another function that can replace the broom::tidy() function after it get removed. Here is what the broom package warning says:
Tidy Atomic Vectors
Vector tidiers are deprecated and will be removed from an upcoming release of broom.
Here is a description of function:
tidy() produces a tibble() where each row contains information about an important component of the model. For regression models, this often corresponds to regression coefficients. This is can be useful if you want to inspect a model or create custom visualizations.
Thanks you,
John
As I understand the warning, there is no general deprecation of the function broom::tidy, this warning only occurs when it is called with an atomic vector. In this case tibble() seems to be a slot-in replacement:
No deprecation warning for tidy() when called for a linear model:
library(broom)
fit <- lm(Volume ~ Girth + Height, trees)
tidy(fit)
## A tibble: 3 x 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 (Intercept) -58.0 8.64 -6.71 2.75e- 7
#2 Girth 4.71 0.264 17.8 8.22e-17
#3 Height 0.339 0.130 2.61 1.45e- 2
#Deprecation warning:
tidy(1:5)
## A tibble: 5 x 1
# x
# <int>
#1 1
#2 2
#3 3
#4 4
#5 5
#Warning messages:
#1: 'tidy.numeric' is deprecated.
#See help("Deprecated")
#2: `data_frame()` is deprecated as of tibble 1.1.0.
#Please use `tibble()` instead.
No warning for tibble, same output :
tibble(1:5)
## A tibble: 5 x 1
# `1:5`
# <int>
#1 1
#2 2
#3 3
#4 4
#5 5
The deprecation warning is letting you know that the method tidy.numeric is being removed.
broom:::tidy.numeric
function (x, ...)
{
.Deprecated()
if (!is.null(names(x))) {
dplyr::data_frame(names = names(x), x = unname(x))
}
else {
dplyr::data_frame(x = x)
}
}
You can see the call to .Deprecated there, and the rest of the function just calls data_frame. As this function is also being deprecated, tibble is the new solution. As tibble does not honour row names, if you want to save the names, you could create something similar to the above.
tidy.numeric <- function (x, ...)
{
if (!is.null(names(x))) {
tibble::tibble(names = names(x), x = unname(x))
}
else {
tibble::tibble(x = x)
}
}
If you try to convert a named vector as mentioned by #Miff, you can also use the function enframe(). It creates a tibble with two columns, one with the names in the vector and one column with the values.

How does ddply split the data?

I have this data frame.
mydf<- data.frame(c("a","a","b","b","c","c"),c("e","e","e","e","e","e")
,c(1,2,3,10,20,30),
c(5,10,20,20,15,10))
colnames(mydf)<-c("Model", "Class","Length", "Speed")
I'm trying to get a better understanding on how ddply works.
I'd like to get the average length and speed for each pairing of model and class.
I know this is one way to do it: ddply(mydf, .(Model, Class), .fun = summarize, mSpeed = mean(Speed), mLength = mean(Length)).
I wonder if I can get the mean using ddply and without specifying it one at a time.
I tried ddply(mydf, .(Model, Class), .fun = mean) but I get the error
Warning messages: 1: In mean.default(piece, ...) : argument is not
numeric or logical: returning NA
What does ddply pass on to the function argument? Is there a way to apply one function to every column using ddply?
My goal is to learn more about ddply. I will only accept answers will ddply
Here's a solution using dplyr and the summarize function.
library(dplyr)
mydf<- data.frame(c("a","a","b","b","c","c"),c("e","e","e","e","e","e")
,c(1,2,3,10,20,30),
c(5,10,20,20,15,10))
colnames(mydf)<-c("Model", "Class","Length", "Speed")
#summarize data by Model & Class
mydf %>% group_by(Model, Class) %>% summarize_if(is.numeric, mean)
#> # A tibble: 3 x 4
#> # Groups: Model [3]
#> Model Class Length Speed
#> <fct> <fct> <dbl> <dbl>
#> 1 a e 1.5 7.5
#> 2 b e 6.5 20
#> 3 c e 25 12.5
Created on 2019-04-16 by the reprex package (v0.2.1)

Tidyeval: pass list of columns as quosure to select()

I want to pass a bunch of columns to pmap() inside mutate(). Later, I want to select those same columns.
At the moment, I'm passing a list of column names to pmap() as a quosure, which works fine, although I have no idea whether this is the "right" way to do it. But I can't figure out how to use the same quosure/list for select().
I've got almost no experience with tidyeval, I've only got this far by playing around. I imagine there must be a way to use the same thing both for pmap() and select(), preferably without having to put each of my column names in quotation marks, but I haven't found it yet.
library(dplyr)
library(rlang)
library(purrr)
df <- tibble(a = 1:3,
b = 101:103) %>%
print
#> # A tibble: 3 x 2
#> a b
#> <int> <int>
#> 1 1 101
#> 2 2 102
#> 3 3 103
cols_quo <- quo(list(a, b))
df2 <- df %>%
mutate(outcome = !!cols_quo %>%
pmap_int(function(..., word) {
args <- list(...)
# just to be clear this isn't what I actually want to do inside pmap
return(args[[1]] + args[[2]])
})) %>%
print()
#> # A tibble: 3 x 3
#> a b outcome
#> <int> <int> <int>
#> 1 1 101 102
#> 2 2 102 104
#> 3 3 103 106
# I get why this doesn't work, but I don't know how to do something like this that does
df2 %>%
select(!!cols_quo)
#> Error in .f(.x[[i]], ...): object 'a' not found
This is a bit tricky because of the mix of semantics involved in this problem. pmap() takes a list and passes each element as its own argument to a function (it's kind of equivalent to !!! in that sense). Your quoting function thus needs to quote its arguments and somehow pass a list of columns to pmap().
Our quoting function can go one of two ways. Either quote (i.e., delay) the list creation, or create an actual list of quoted expressions right away:
quoting_fn1 <- function(...) {
exprs <- enquos(...)
# For illustration purposes, return the quoted inputs instead of
# doing something with them. Normally you'd call `mutate()` here:
exprs
}
quoting_fn2 <- function(...) {
expr <- quo(list(!!!enquos(...)))
expr
}
Since our first variant does nothing but return a list of quoted inputs, it's actually equivalent to quos():
quoting_fn1(a, b)
#> <list_of<quosure>>
#>
#> [[1]]
#> <quosure>
#> expr: ^a
#> env: global
#>
#> [[2]]
#> <quosure>
#> expr: ^b
#> env: global
The second version returns a quoted expression that instructs R to create a list with quoted inputs:
quoting_fn2(a, b)
#> <quosure>
#> expr: ^list(^a, ^b)
#> env: 0x7fdb69d9bd20
There is a subtle but important difference between the two. The first version creates an actual list object:
exprs <- quoting_fn1(a, b)
typeof(exprs)
#> [1] "list"
On the other hand, the second version does not return a list, it returns an expression for creating a list:
expr <- quoting_fn2(a, b)
typeof(expr)
#> [1] "language"
Let's find out which version is more appropriate for interfacing with pmap(). But first we'll give a name to the pmapped function to make the code clearer and easier to experiment with:
myfunction <- function(..., word) {
args <- list(...)
# just to be clear this isn't what I actually want to do inside pmap
args[[1]] + args[[2]]
}
Understanding how tidy eval works is hard in part because we usually don't get to observe the unquoting step. We'll use rlang::qq_show() to reveal the result of unquoting expr (the delayed list) and exprs (the actual list) with !!:
rlang::qq_show(
mutate(df, outcome = pmap_int(!!expr, myfunction))
)
#> mutate(df, outcome = pmap_int(^list(^a, ^b), myfunction))
rlang::qq_show(
mutate(df, outcome = pmap_int(!!exprs, myfunction))
)
#> mutate(df, outcome = pmap_int(<S3: quosures>, myfunction))
When we unquote the delayed list, mutate() calls pmap_int() with list(a, b), evaluated in the data frame, which is exactly what we need:
mutate(df, outcome = pmap_int(!!expr, myfunction))
#> # A tibble: 3 x 3
#> a b outcome
#> <int> <int> <int>
#> 1 1 101 102
#> 2 2 102 104
#> 3 3 103 106
On the other hand, if we unquote an actual list of quoted expressions, we get an error:
mutate(df, outcome = pmap_int(!!exprs, myfunction))
#> Error in mutate_impl(.data, dots) :
#> Evaluation error: Element 1 is not a vector (language).
That's because the quoted expressions inside the list are not evaluated in the data frame. In fact, they are not evaluated at all. pmap() gets the quoted expressions as is, which it doesn't understand. Recall what qq_show() has shown us:
#> mutate(df, outcome = pmap_int(<S3: quosures>, myfunction))
Anything inside angular brackets is passed as is. This is a sign that we should somehow have used !!! instead, to inline each element of the list of quosures in the surrounding expression. Let's try it:
rlang::qq_show(
mutate(df, outcome = pmap_int(!!!exprs, myfunction))
)
#> mutate(df, outcome = pmap_int(^a, ^b, myfunction))
Hmm... Doesn't look right. We're supposed to pass a list to pmap_int(), and here it gets each quoted input as separate argument. Indeed we get a type error:
mutate(df, outcome = pmap_int(!!!exprs, myfunction))
#> Error in mutate_impl(.data, dots) :
#> Evaluation error: `.x` is not a list (integer).
That's easy to fix, just splice into a call to list():
rlang::qq_show(
mutate(df, outcome = pmap_int(list(!!!exprs), myfunction))
)
#> mutate(df, outcome = pmap_int(list(^a, ^b), myfunction))
And voilĂ !
mutate(df, outcome = pmap_int(list(!!!exprs), myfunction))
#> # A tibble: 3 x 3
#> a b outcome
#> <int> <int> <int>
#> 1 1 101 102
#> 2 2 102 104
#> 3 3 103 106
We can use quos when there are more than one element and evaluate with !!!
cols_quo <- quos(a, b)
df2 %>%
select(!!!cols_quo)
The object 'df2' can be created with
df %>%
mutate(output = list(!!! cols_quo) %>%
reduce(`+`))
If we want to use the quosure as in the OP's post
cols_quo <- quo(list(a, b))
df2 %>%
select(!!! as.list(quo_expr(cols_quo))[-1])
# A tibble: 3 x 2
# a b
# <int> <int>
#1 1 101
#2 2 102
#3 3 103

Resources