I want to expand the !!! expression just like they do in dplyr-verbs e.g.
aggregate_expressions <- list(n = quote(n()))
do_something(iris, !!!(aggregate_expressions))
and say I want do_something to perform
do_something <- function(...) {
iris %>%
some_function( # expand the ... # ) # some_function(n = n())
}
which will do this but the n = n() is dynamic
do_something <- function(...) {
iris %>%
some_function(n = n())
}
I tried to trace the code for dplyr::summarise and I see that enquos(...) which converts the ... to a list of quosure, but then how do I apply the quosures? I think I am meant to create the code summarise(n = n()) from the quosure and then evaluate i using eval_tidy, but I can't figure out how to generate the code. I know that pass ... summarise works but the actual use case is to pass it to summarise.disk.frame which means I can't just reuse dplyr::summarise
The actual case i not
For example in dplyr, the below works by expanding the aggregate_expression using !!!
aggregate_expressions <- list(n = quote(n()))
iris %>%
group_by(Species) %>%
summarise(!!!(aggregate_expressions))
Modify it like this:
do_something <- function(x) {
iris %>%
summarise(!!!x)
}
aggregate_expressions <- list(n = quote(n()))
do_something(aggregate_expressions)
## n
## 1 150
Related
This question already has answers here:
How to pass column name as argument to function for dplyr verbs?
(4 answers)
Closed 7 months ago.
I am trying to use group_by within a function call in dplyr (R) and I am getting unexpected results. Here is an example of what I am trying to do:
df = data.frame(a = c(0,0,1,1), b = c(0,1,0,1), c = c(1,2,3,4))
result1 = df %>%
group_by(a,b) %>%
mutate(d = sum(c))
result1$d
myFunc <- function(df, var) {
output = df %>%
group_by(a,!!var) %>%
mutate(d = sum(c))
return(output)
}
result2 = myFunc(df,"b")
result2$d
result1$d yields [1,2,3,4] which is what I expected. result2$d yields [3,3,7,7] which I do not want, and I am not sure what is going on.
It works to have b (without quotes) as the function argument, and {{var}} in place of !!var. Unfortunately, in my case, my column names are in string format (but maybe there is a way to transform the string beforehand so that it will work with the {{}} notation?)
If you want to pass a character object that can refer to a certain column of a data frame, you should use !!sym(var):
myFunc <- function(df, var) {
output = df %>%
group_by(a, !!sym(var)) %>%
mutate(d = sum(c))
return(output)
}
myFunc(df, "b")
If you want to pass a data-masked argument, you should use {{ var }} or equivalently !!enquo(var):
myFunc <- function(df, var) {
output = df %>%
group_by(a, {{ var }}) %>%
mutate(d = sum(c))
return(output)
}
myFunc(df, b)
Note that I pass "b" and b respectively into the function in the two different cases.
If we want to use quoting and unquoting instead of curlycurly {{}} the we should consider this basic procedure: https://tidyeval.tidyverse.org/dplyr.html
Creating a function around dplyr pipelines involves three steps: abstraction, quoting, and unquoting.
1. Abstraction step:
Here we identify the varying steps. In our case var in group_by:
2. Quoting step:
Identify all the arguments where the user is allowed to refer to data frame columns directly.
The function can’t evaluate these arguments right away.
Instead they should be automatically quoted. Apply enquo() to these arguments
3. Unquoting step:
Identify where these variables are passed to other quoting functions and unquote with !!.
In this case we pass var to group_by():
myFunc <- function(df, var) {
var <- enquo(var)
output = df %>%
group_by(a,!!var) %>%
mutate(d = sum(c))
return(output)
}
result2 = myFunc(df,b)
output:
[1] 1 2 3 4
Just as I post a question, I come across something that works...
myFunc <- function(df, var) {
output = df %>%
group_by_at(.vars = c("a",var)) %>%
mutate(d = sum(c))
return(output)
}
result2 = myFunc(df,"b")
Say I have created a function using the tidy eval framework -
library(tidyverse)
library(rlang)
my_function <- function(data, var){
var_expr <- enquo(var)
data %>%
group_by(!!var_expr) %>%
summarise(count = n()) %>%
ungroup()
}
When I run the following function, I get the result below it
my_function(mtcars, cyl)
# A tibble: 3 x 2
cyl count
<dbl> <int>
1 4 11
2 6 7
3 8 14
How do I add the following checks to this function -
Check if data is a dataframe. If not, return the error data should be a dataframe
Check if var is missing. If so return the error var is missing
You can make the following modifications.
In order to check if our input data is of a particular class we can check its class attribute and in this case whether it's a data frame or tibble they both contains the class data.frame
Also for missing function, it is normally used inside many functions to check whether an argument is assigned a value so that they generate a value as the default value. In your case we can terminate the execution of the function (you can also check the source code of length function on how it specifies a value for size argument when it is missing)
You can use base::stop in place of rlang::abort as specified by dear #akrun
library(rlang)
my_function <- function(data, var){
if(!"data.frame" %in% attr(data, "class")) {
abort("data should be a data frame")
}
if(missing(var)) {
abort("var is missing")
}
var_expr <- enquo(var)
data %>%
group_by(!!var_expr) %>%
summarise(count = n()) %>%
ungroup()
}
Special thanks to dear #27 ϕ 9 for bringing this valuable point to my attention. We can also customize the output error message in stopifnot function which is another way of checking your input arguments:
my_function <- function(data, var){
stopifnot("The input data is not of class data frame" = "data.frame" %in% attr(data, "class") ,
"var is missing" = !missing(var))
var_expr <- enquo(var)
data %>%
group_by(!!var_expr) %>%
summarise(count = n()) %>%
ungroup()
}
Special thanks to dear #IceCreamToucan for presenting yet another option which is using the inherits function in lieu of attr. In case the input data does not include data.frame in its class attributes it returns FALSE:
my_function <- function(data, var){
if(!inherits(data, "data.frame")) {
stop("data is not of class data.frame")
}
if(missing(var)) {
stop("var is missing")
}
var_expr <- enquo(var)
data %>%
group_by(!!var_expr) %>%
summarise(count = n()) %>%
ungroup()
}
I'm trying to create a function that essentially gets me the MODE...or MODE-X (2nd-Xth most common value & and the associated counts for each column in a data frame.
I can't figure out what I may be missing and I'm looking for some assistance? I believe it has to do with the passing in of a variable into dplyr function.
library(tidyverse)
myfunct_get_mode = function(x, rank=1){
mytable = dplyr::count(rlang::sym(x), sort = TRUE)
names(mytable)= c('variable','counts')
# return just the rank specified...such as mode or mode -1, etc
result = table %>% dplyr::slice(rlang::sym(rank))
return(result)
}
mtcars %>% lapply(. %>% (function(x) myfunct_get_mode(x, rank=2)))
There are some problems with your function:
You function-call is not doing what you think. Check with mtcars %>% lapply(. %>% (function(x) print(x))) that actually your x is the whole column of mtcars. To get the names of the column apply the function to names(mtcars). But then you also have to specify the dataframe you're working on.
To evaluate a symbol you get sym from you need to use !! in front of the rlang::sym(x).
rank is not a variable name, thus no need for rlang::sym here.
table should be mytable in second to last line of your function.
So how could it work (although there are probably better ways):
myfunct_get_mode = function(df, x, rank=1){
mytable = count(df, !!rlang::sym(x), sort = TRUE)
names(mytable)= c('variable','counts')
# return just the rank specified...such as mode or mode -1, etc
result = mytable %>% slice(rank)
return(result)
}
names(mtcars) %>% lapply(function(x) myfunct_get_mode(mtcars, x, rank=2))
If we need this in a list, we can use map
f1 <- function(dat, rank = 1) {
purrr::imap(dat, ~
dat %>%
count(!! rlang::sym(.y)) %>%
rename_all(~ c('variable', 'counts')) %>%
arrange(desc(counts)) %>%
slice(seq_len(rank))) #%>%
#bind_cols - convert to a data.frame
}
f1(mtcars, 2)
I have several variables (id.type and id.subtype in this example) I would like to check for distinct values in a tibble all.snags using the dplyr package. I would like them sorted and all values printed out in the console (a tibble typically prints only the first 10). The output would be equivalent to the following code:
distinct(all.snags,id.type) %>% arrange(id.type) %>% print(n = Inf)
distinct(all.snags,id.subtype) %>% arrange(id.subtype) %>% print(n = Inf)
I think this is better done by looping over the values in a vector, but I can't get it to work.
distinct.vars <- c("id.type","id.subtype")
for (i in distinct.vars) {
distinct(all.snags,distinct.vars[i]) %>%
arrange(distinct.vars[i]) %>%
print(n = Inf)
}
I think this function is what you want:
library(dplyr)
df = iris
print_distinct = function(df, columns) {
for (c in columns) {
print(df %>% distinct_(c) %>% arrange_(c))
}
}
print_distinct(df, c("Sepal.Length", "Sepal.Width"))
I'm trying to put together a function that creates a subset from my original data frame, and then uses dplyr's SELECT and MUTATE to give me the number of large/small entries, based on the sum of the width and length of sepals/petals.
filter <- function (spp, LENGTH, WIDTH) {
d <- subset (iris, subset=iris$Species == spp) # This part seems to work just fine
large <- d %>%
select (LENGTH, WIDTH) %>% # This is where the problem arises.
mutate (sum = LENGTH + WIDTH)
big_samples <- which(large$sum > 4)
return (length(big_samples))
}
Basically, I want the function to return the number of large flowers. However, when I run the function I get the following error -
filter("virginica", "Sepal.Length", "Sepal.Width")
Error: All select() inputs must resolve to integer column positions.
The following do not:
* LENGTH
* WIDTH
What am I doing wrong?
You are running into NSE/SE problems, see the vignette for more info.
Briefly, dplyr uses a non standard evaluation (NSE) of names, and passing names of columns into functions breaks it, without using the standard evaluation (SE) version.
The SE versions of the dplyr functions end in _. You can see that select_ works nicely with your original arguments.
However, things get more complicated when using functions. We can use lazyeval::interp to convert most function arguments into column names, see the conversion of the mutate to mutate_ call in your function below and more generally, the help: ?lazyeval::interp
Try:
filter <- function (spp, LENGTH, WIDTH) {
d <- subset (iris, subset=iris$Species == spp)
large <- d %>%
select_(LENGTH, WIDTH) %>%
mutate_(sum = lazyeval::interp(~X + Y, X = as.name(LENGTH), Y = as.name(WIDTH)))
big_samples <- which(large$sum > 4)
return (length(big_samples))
}
UPDATE: As of dplyr 0.7.0 you can use tidy eval to accomplish this.
See http://dplyr.tidyverse.org/articles/programming.html for more details.
filter_big <- function(spp, LENGTH, WIDTH) {
LENGTH <- enquo(LENGTH) # Create quosure
WIDTH <- enquo(WIDTH) # Create quosure
iris %>%
filter(Species == spp) %>%
select(!!LENGTH, !!WIDTH) %>% # Use !! to unquote the quosure
mutate(sum = (!!LENGTH) + (!!WIDTH)) %>% # Use !! to unquote the quosure
filter(sum > 4) %>%
nrow()
}
filter_big("virginica", Sepal.Length, Sepal.Width)
> filter_big("virginica", Sepal.Length, Sepal.Width)
[1] 50
If quosure and quasiquotation are too much for you, use either .data[[ ]] or rlang {{ }} (curly curly) instead. See Hadley Wickham's 5min video on tidy evaluation and (maybe) Tidy evaluation section in Hadley's Advanced R book for more information.
library(rlang)
library(dplyr)
filter_data <- function(df, spp, LENGTH, WIDTH) {
res <- df %>%
filter(Species == spp) %>%
select(.data[[LENGTH]], .data[[WIDTH]]) %>%
mutate(sum = .data[[LENGTH]] + .data[[WIDTH]]) %>%
filter(sum > 4) %>%
nrow()
return(res)
}
filter_data(iris, "virginica", "Sepal.Length", "Sepal.Width")
#> [1] 50
filter_rlang <- function(df, spp, LENGTH, WIDTH) {
res <- df %>%
filter(Species == spp) %>%
select({{LENGTH}}, {{WIDTH}}) %>%
mutate(sum = {{LENGTH}} + {{WIDTH}}) %>%
filter(sum > 4) %>%
nrow()
return(res)
}
filter_rlang(iris, "virginica", Sepal.Length, Sepal.Width)
#> [1] 50
Created on 2019-11-10 by the reprex package (v0.3.0)