I have a data frame and want to filter it in one of two ways, by either column "this" or column "that". I would like to be able to refer to the column name as a variable. How (in dplyr, if that makes a difference) do I refer to a column name by a variable?
library(dplyr)
df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
df
# this that
# 1 1 1
# 2 2 1
# 3 2 2
df %>% filter(this == 1)
# this that
# 1 1 1
But say I want to use the variable column to hold either "this" or "that", and filter on whatever the value of column is. Both as.symbol and get work in other contexts, but not this:
column <- "this"
df %>% filter(as.symbol(column) == 1)
# [1] this that
# <0 rows> (or 0-length row.names)
df %>% filter(get(column) == 1)
# Error in get("this") : object 'this' not found
How can I turn the value of column into a column name?
Using rlang's injection paradigm
From the current dplyr documentation (emphasis by me):
dplyr used to offer twin versions of each verb suffixed with an underscore. These versions had standard evaluation (SE) semantics: rather than taking arguments by code, like NSE verbs, they took arguments by value. Their purpose was to make it possible to program with dplyr. However, dplyr now uses tidy evaluation semantics. NSE verbs still capture their arguments, but you can now unquote parts of these arguments. This offers full programmability with NSE verbs. Thus, the underscored versions are now superfluous.
So, essentially we need to perform two steps to be able to refer to the value "this" of the variable column inside dplyr::filter():
We need to turn the variable column which is of type character into type symbol.
Using base R this can be achieved by the function as.symbol()
which is an alias for as.name(). The former is preferred by the
tidyverse developers because it
follows a more modern terminology (R types instead of S modes).
Alternatively, the same can be achieved by rlang::sym() from the tidyverse.
We need to inject the symbol from 1) into the dplyr::filter() expression.
This is done by the so called injection operator !! which is basically syntactic
sugar allowing to modify a piece of code before R evaluates it.
(In earlier versions of dplyr (or the underlying rlang respectively) there used to be situations (incl. yours) where !! would collide with the single !, but this is not an issue anymore since !! gained the right operator precedence.)
Applied to your example:
library(dplyr)
df <- data.frame(this = c(1, 2, 2),
that = c(1, 1, 2))
column <- "this"
df %>% filter(!!as.symbol(column) == 1)
# this that
# 1 1 1
Using alternative solutions
Other ways to refer to the value "this" of the variable column inside dplyr::filter() that don't rely on rlang's injection paradigm include:
Via the tidyselection paradigm, i.e. dplyr::if_any()/dplyr::if_all() with tidyselect::all_of()
df %>% filter(if_any(.cols = all_of(column),
.fns = ~ .x == 1))
Via rlang's .data pronoun and base R's [[:
df %>% filter(.data[[column]] == 1)
Via magrittr's . argument placeholder and base R's [[:
df %>% filter(.[[column]] == 1)
I would steer clear of using get() all together. It seems like it would be quite dangerous in this situation, especially if you're programming. You could use either an unevaluated call or a pasted character string, but you'll need to use filter_() instead of filter().
df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
column <- "this"
Option 1 - using an unevaluated call:
You can hard-code y as 1, but here I show it as y to illustrate how you can change the expression values easily.
expr <- lazyeval::interp(quote(x == y), x = as.name(column), y = 1)
## or
## expr <- substitute(x == y, list(x = as.name(column), y = 1))
df %>% filter_(expr)
# this that
# 1 1 1
Option 2 - using paste() (and obviously easier):
df %>% filter_(paste(column, "==", 1))
# this that
# 1 1 1
The main thing about these two options is that we need to use filter_() instead of filter(). In fact, from what I've read, if you're programming with dplyr you should always use the *_() functions.
I used this post as a helpful reference: character string as function argument r, and I'm using dplyr version 0.3.0.2.
Here's another solution for the latest dplyr version:
df <- data.frame(this = c(1, 2, 2),
that = c(1, 1, 2))
column <- "this"
df %>% filter(.[[column]] == 1)
# this that
#1 1 1
Regarding Richard's solution, just want to add that if you the column is character. You can add shQuote to filter by character values.
For example, you can use
df %>% filter_(paste(column, "==", shQuote("a")))
If you have multiple filters, you can specify collapse = "&" in paste.
df %>$ filter_(paste(c("column1","column2"), "==", shQuote(c("a","b")), collapse = "&"))
The latest way to do this is to use my.data.frame %>% filter(.data[[myName]] == 1), where myName is an environmental variable that contains the column name.
Or using filter_at
library(dplyr)
df %>%
filter_at(vars(column), any_vars(. == 1))
Like Salim B explained above but with a minor change:
df %>% filter(1 == !!as.name(column))
i.e. just reverse the condition because !! otherwise behaves
like
!!(as.name(column)==1)
You can use the across(all_of()) syntax, it takes a string as argument
column = "this"
df %>% filter(across(all_of(column)) == 1)
Related
With dplyr starting version 0.7 the methods ending with underscore such as summarize_ group_by_ are deprecated since we are supposed to use quosures.
See:
https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
I am trying to implement the following example using quo and !!
Working example:
df <- data.frame(x = c("a","a","a","b","b","b"), y=c(1,1,2,2,3,3), z = 1:6)
lFG <- df %>%
group_by( x,y)
lFG %>% summarize( min(z))
However, in the case, I need to implement the columns to group by and summarize are specified as strings.
cols2group <- c("x","y")
col2summarize <- "z"
How can I get the same example as above working?
For this you can now use _at versions of the verbs
df %>%
group_by_at(cols2group) %>%
summarize_at(.vars = col2summarize, .funs = min)
Edit (2021-06-09):
Please see Ronak Shah's answer, using
mutate(across(all_of(cols2summarize), min))
Now the preferred option
From dplyr 1.0.0 you can use across :
library(dplyr)
cols2group <- c("x","y")
col2summarize <- "z"
df %>%
group_by(across(all_of(cols2group))) %>%
summarise(across(all_of(col2summarize), min)) %>%
ungroup
# x y z
# <chr> <dbl> <int>
#1 a 1 1
#2 a 2 3
#3 b 2 4
#4 b 3 5
Another option is to use non-standard evaluation (NSE), and have R interpret the string as quoted names of objects:
cols2group <- c("x","y")
col2summarize <- "z"
df %>%
group_by(!!rlang::sym(cols2group)) %>%
summarize(min(!!rlang::sym(col2summarize)))
The rlang::sym() function takes the strings and turns them into quotes, which are in turn unquoted by !! and used as names in the context of df where they refer to the relevant columns. There's different ways of doing the same thing, as always, and this is the shorthand I tend to use!
See ?dplyr::across for the updated way to do this since group_by_at and summarize_at are now Superseded
I have a data frame and want to filter it in one of two ways, by either column "this" or column "that". I would like to be able to refer to the column name as a variable. How (in dplyr, if that makes a difference) do I refer to a column name by a variable?
library(dplyr)
df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
df
# this that
# 1 1 1
# 2 2 1
# 3 2 2
df %>% filter(this == 1)
# this that
# 1 1 1
But say I want to use the variable column to hold either "this" or "that", and filter on whatever the value of column is. Both as.symbol and get work in other contexts, but not this:
column <- "this"
df %>% filter(as.symbol(column) == 1)
# [1] this that
# <0 rows> (or 0-length row.names)
df %>% filter(get(column) == 1)
# Error in get("this") : object 'this' not found
How can I turn the value of column into a column name?
Using rlang's injection paradigm
From the current dplyr documentation (emphasis by me):
dplyr used to offer twin versions of each verb suffixed with an underscore. These versions had standard evaluation (SE) semantics: rather than taking arguments by code, like NSE verbs, they took arguments by value. Their purpose was to make it possible to program with dplyr. However, dplyr now uses tidy evaluation semantics. NSE verbs still capture their arguments, but you can now unquote parts of these arguments. This offers full programmability with NSE verbs. Thus, the underscored versions are now superfluous.
So, essentially we need to perform two steps to be able to refer to the value "this" of the variable column inside dplyr::filter():
We need to turn the variable column which is of type character into type symbol.
Using base R this can be achieved by the function as.symbol()
which is an alias for as.name(). The former is preferred by the
tidyverse developers because it
follows a more modern terminology (R types instead of S modes).
Alternatively, the same can be achieved by rlang::sym() from the tidyverse.
We need to inject the symbol from 1) into the dplyr::filter() expression.
This is done by the so called injection operator !! which is basically syntactic
sugar allowing to modify a piece of code before R evaluates it.
(In earlier versions of dplyr (or the underlying rlang respectively) there used to be situations (incl. yours) where !! would collide with the single !, but this is not an issue anymore since !! gained the right operator precedence.)
Applied to your example:
library(dplyr)
df <- data.frame(this = c(1, 2, 2),
that = c(1, 1, 2))
column <- "this"
df %>% filter(!!as.symbol(column) == 1)
# this that
# 1 1 1
Using alternative solutions
Other ways to refer to the value "this" of the variable column inside dplyr::filter() that don't rely on rlang's injection paradigm include:
Via the tidyselection paradigm, i.e. dplyr::if_any()/dplyr::if_all() with tidyselect::all_of()
df %>% filter(if_any(.cols = all_of(column),
.fns = ~ .x == 1))
Via rlang's .data pronoun and base R's [[:
df %>% filter(.data[[column]] == 1)
Via magrittr's . argument placeholder and base R's [[:
df %>% filter(.[[column]] == 1)
I would steer clear of using get() all together. It seems like it would be quite dangerous in this situation, especially if you're programming. You could use either an unevaluated call or a pasted character string, but you'll need to use filter_() instead of filter().
df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
column <- "this"
Option 1 - using an unevaluated call:
You can hard-code y as 1, but here I show it as y to illustrate how you can change the expression values easily.
expr <- lazyeval::interp(quote(x == y), x = as.name(column), y = 1)
## or
## expr <- substitute(x == y, list(x = as.name(column), y = 1))
df %>% filter_(expr)
# this that
# 1 1 1
Option 2 - using paste() (and obviously easier):
df %>% filter_(paste(column, "==", 1))
# this that
# 1 1 1
The main thing about these two options is that we need to use filter_() instead of filter(). In fact, from what I've read, if you're programming with dplyr you should always use the *_() functions.
I used this post as a helpful reference: character string as function argument r, and I'm using dplyr version 0.3.0.2.
Here's another solution for the latest dplyr version:
df <- data.frame(this = c(1, 2, 2),
that = c(1, 1, 2))
column <- "this"
df %>% filter(.[[column]] == 1)
# this that
#1 1 1
Regarding Richard's solution, just want to add that if you the column is character. You can add shQuote to filter by character values.
For example, you can use
df %>% filter_(paste(column, "==", shQuote("a")))
If you have multiple filters, you can specify collapse = "&" in paste.
df %>$ filter_(paste(c("column1","column2"), "==", shQuote(c("a","b")), collapse = "&"))
The latest way to do this is to use my.data.frame %>% filter(.data[[myName]] == 1), where myName is an environmental variable that contains the column name.
Or using filter_at
library(dplyr)
df %>%
filter_at(vars(column), any_vars(. == 1))
Like Salim B explained above but with a minor change:
df %>% filter(1 == !!as.name(column))
i.e. just reverse the condition because !! otherwise behaves
like
!!(as.name(column)==1)
You can use the across(all_of()) syntax, it takes a string as argument
column = "this"
df %>% filter(across(all_of(column)) == 1)
I have a data frame and want to filter it in one of two ways, by either column "this" or column "that". I would like to be able to refer to the column name as a variable. How (in dplyr, if that makes a difference) do I refer to a column name by a variable?
library(dplyr)
df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
df
# this that
# 1 1 1
# 2 2 1
# 3 2 2
df %>% filter(this == 1)
# this that
# 1 1 1
But say I want to use the variable column to hold either "this" or "that", and filter on whatever the value of column is. Both as.symbol and get work in other contexts, but not this:
column <- "this"
df %>% filter(as.symbol(column) == 1)
# [1] this that
# <0 rows> (or 0-length row.names)
df %>% filter(get(column) == 1)
# Error in get("this") : object 'this' not found
How can I turn the value of column into a column name?
Using rlang's injection paradigm
From the current dplyr documentation (emphasis by me):
dplyr used to offer twin versions of each verb suffixed with an underscore. These versions had standard evaluation (SE) semantics: rather than taking arguments by code, like NSE verbs, they took arguments by value. Their purpose was to make it possible to program with dplyr. However, dplyr now uses tidy evaluation semantics. NSE verbs still capture their arguments, but you can now unquote parts of these arguments. This offers full programmability with NSE verbs. Thus, the underscored versions are now superfluous.
So, essentially we need to perform two steps to be able to refer to the value "this" of the variable column inside dplyr::filter():
We need to turn the variable column which is of type character into type symbol.
Using base R this can be achieved by the function as.symbol()
which is an alias for as.name(). The former is preferred by the
tidyverse developers because it
follows a more modern terminology (R types instead of S modes).
Alternatively, the same can be achieved by rlang::sym() from the tidyverse.
We need to inject the symbol from 1) into the dplyr::filter() expression.
This is done by the so called injection operator !! which is basically syntactic
sugar allowing to modify a piece of code before R evaluates it.
(In earlier versions of dplyr (or the underlying rlang respectively) there used to be situations (incl. yours) where !! would collide with the single !, but this is not an issue anymore since !! gained the right operator precedence.)
Applied to your example:
library(dplyr)
df <- data.frame(this = c(1, 2, 2),
that = c(1, 1, 2))
column <- "this"
df %>% filter(!!as.symbol(column) == 1)
# this that
# 1 1 1
Using alternative solutions
Other ways to refer to the value "this" of the variable column inside dplyr::filter() that don't rely on rlang's injection paradigm include:
Via the tidyselection paradigm, i.e. dplyr::if_any()/dplyr::if_all() with tidyselect::all_of()
df %>% filter(if_any(.cols = all_of(column),
.fns = ~ .x == 1))
Via rlang's .data pronoun and base R's [[:
df %>% filter(.data[[column]] == 1)
Via magrittr's . argument placeholder and base R's [[:
df %>% filter(.[[column]] == 1)
I would steer clear of using get() all together. It seems like it would be quite dangerous in this situation, especially if you're programming. You could use either an unevaluated call or a pasted character string, but you'll need to use filter_() instead of filter().
df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
column <- "this"
Option 1 - using an unevaluated call:
You can hard-code y as 1, but here I show it as y to illustrate how you can change the expression values easily.
expr <- lazyeval::interp(quote(x == y), x = as.name(column), y = 1)
## or
## expr <- substitute(x == y, list(x = as.name(column), y = 1))
df %>% filter_(expr)
# this that
# 1 1 1
Option 2 - using paste() (and obviously easier):
df %>% filter_(paste(column, "==", 1))
# this that
# 1 1 1
The main thing about these two options is that we need to use filter_() instead of filter(). In fact, from what I've read, if you're programming with dplyr you should always use the *_() functions.
I used this post as a helpful reference: character string as function argument r, and I'm using dplyr version 0.3.0.2.
Here's another solution for the latest dplyr version:
df <- data.frame(this = c(1, 2, 2),
that = c(1, 1, 2))
column <- "this"
df %>% filter(.[[column]] == 1)
# this that
#1 1 1
Regarding Richard's solution, just want to add that if you the column is character. You can add shQuote to filter by character values.
For example, you can use
df %>% filter_(paste(column, "==", shQuote("a")))
If you have multiple filters, you can specify collapse = "&" in paste.
df %>$ filter_(paste(c("column1","column2"), "==", shQuote(c("a","b")), collapse = "&"))
The latest way to do this is to use my.data.frame %>% filter(.data[[myName]] == 1), where myName is an environmental variable that contains the column name.
Or using filter_at
library(dplyr)
df %>%
filter_at(vars(column), any_vars(. == 1))
Like Salim B explained above but with a minor change:
df %>% filter(1 == !!as.name(column))
i.e. just reverse the condition because !! otherwise behaves
like
!!(as.name(column)==1)
You can use the across(all_of()) syntax, it takes a string as argument
column = "this"
df %>% filter(across(all_of(column)) == 1)
I have a variable with the same name as a column in a dataframe:
df <- data.frame(a=c(1,2,3), b=c(4,5,6))
b <- 5
I want to get the rows where df$b == b, but dplyr interprets this as df$b == df$b:
df %>% filter(b == b) # interpreted as df$b == df$b
# a b
# 1 1 4
# 2 2 5
# 3 3 6
If I change the variable name, it works:
B <- 5
df %>% filter(b == B) # interpreted as df$b == B
# a b
# 1 2 5
I'm wondering if there is a better way to tell filter that b refers to an outside variable.
Recently I have found this to be an elegant solution to this problem, although I'm just starting to wrap my head around how it works.
df %>% filter(b == !!b)
which is syntactic sugar for
df %>% filter(b == UQ(b))
A high-level sense of this is that the UQ (un-quote) operation causes its contents to be evaluated before the filter operation, so that it's not evaluated within the data.frame.
This is described in this chapter of Advanced R, on 'quasi-quotation'. This chapter also includes a few solutions to similar problems related to non-standard evaluation (NSE).
You could use the get function to fetch the value of the variable from the environment.
df %>% filter(b == get("b")) # Note the "" around b
As a general solution, you can use the SE (standard evaluation) version of filter, which is filter_. In this case, things get a bit confusing because your are mixing a variable and an 'external' constant in a single expression. Here is how you do that with the interp function:
library(lazyeval)
df %>% filter_(interp(~ b == x, x = b))
If you would like to use more values in b you can write:
df %>% filter_(interp(~ b == x, .values = list(x = b)))
rlang, which is imported with dplyr, has the .env and .data pronouns for exactly this situation when you need to be explicit because of data-masking. To explicitly reference columns in your data frame use .data and to explicitly reference your environment use .env:
library(dplyr)
df %>%
filter(.data$b == .env$b) # b == .env$b works the same here
a b
1 2 5
From the documentation:
Note that .data is only a pronoun, it is not a real data frame. This means that you can't take its names or map a function over the contents of .data. Similarly, .env is not an actual R environment.
You do not necessarily need to use .data$b here because the evaluation searches the data frame for a column with that name first (as you found out).
I have a data frame and want to filter it in one of two ways, by either column "this" or column "that". I would like to be able to refer to the column name as a variable. How (in dplyr, if that makes a difference) do I refer to a column name by a variable?
library(dplyr)
df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
df
# this that
# 1 1 1
# 2 2 1
# 3 2 2
df %>% filter(this == 1)
# this that
# 1 1 1
But say I want to use the variable column to hold either "this" or "that", and filter on whatever the value of column is. Both as.symbol and get work in other contexts, but not this:
column <- "this"
df %>% filter(as.symbol(column) == 1)
# [1] this that
# <0 rows> (or 0-length row.names)
df %>% filter(get(column) == 1)
# Error in get("this") : object 'this' not found
How can I turn the value of column into a column name?
Using rlang's injection paradigm
From the current dplyr documentation (emphasis by me):
dplyr used to offer twin versions of each verb suffixed with an underscore. These versions had standard evaluation (SE) semantics: rather than taking arguments by code, like NSE verbs, they took arguments by value. Their purpose was to make it possible to program with dplyr. However, dplyr now uses tidy evaluation semantics. NSE verbs still capture their arguments, but you can now unquote parts of these arguments. This offers full programmability with NSE verbs. Thus, the underscored versions are now superfluous.
So, essentially we need to perform two steps to be able to refer to the value "this" of the variable column inside dplyr::filter():
We need to turn the variable column which is of type character into type symbol.
Using base R this can be achieved by the function as.symbol()
which is an alias for as.name(). The former is preferred by the
tidyverse developers because it
follows a more modern terminology (R types instead of S modes).
Alternatively, the same can be achieved by rlang::sym() from the tidyverse.
We need to inject the symbol from 1) into the dplyr::filter() expression.
This is done by the so called injection operator !! which is basically syntactic
sugar allowing to modify a piece of code before R evaluates it.
(In earlier versions of dplyr (or the underlying rlang respectively) there used to be situations (incl. yours) where !! would collide with the single !, but this is not an issue anymore since !! gained the right operator precedence.)
Applied to your example:
library(dplyr)
df <- data.frame(this = c(1, 2, 2),
that = c(1, 1, 2))
column <- "this"
df %>% filter(!!as.symbol(column) == 1)
# this that
# 1 1 1
Using alternative solutions
Other ways to refer to the value "this" of the variable column inside dplyr::filter() that don't rely on rlang's injection paradigm include:
Via the tidyselection paradigm, i.e. dplyr::if_any()/dplyr::if_all() with tidyselect::all_of()
df %>% filter(if_any(.cols = all_of(column),
.fns = ~ .x == 1))
Via rlang's .data pronoun and base R's [[:
df %>% filter(.data[[column]] == 1)
Via magrittr's . argument placeholder and base R's [[:
df %>% filter(.[[column]] == 1)
I would steer clear of using get() all together. It seems like it would be quite dangerous in this situation, especially if you're programming. You could use either an unevaluated call or a pasted character string, but you'll need to use filter_() instead of filter().
df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
column <- "this"
Option 1 - using an unevaluated call:
You can hard-code y as 1, but here I show it as y to illustrate how you can change the expression values easily.
expr <- lazyeval::interp(quote(x == y), x = as.name(column), y = 1)
## or
## expr <- substitute(x == y, list(x = as.name(column), y = 1))
df %>% filter_(expr)
# this that
# 1 1 1
Option 2 - using paste() (and obviously easier):
df %>% filter_(paste(column, "==", 1))
# this that
# 1 1 1
The main thing about these two options is that we need to use filter_() instead of filter(). In fact, from what I've read, if you're programming with dplyr you should always use the *_() functions.
I used this post as a helpful reference: character string as function argument r, and I'm using dplyr version 0.3.0.2.
Here's another solution for the latest dplyr version:
df <- data.frame(this = c(1, 2, 2),
that = c(1, 1, 2))
column <- "this"
df %>% filter(.[[column]] == 1)
# this that
#1 1 1
Regarding Richard's solution, just want to add that if you the column is character. You can add shQuote to filter by character values.
For example, you can use
df %>% filter_(paste(column, "==", shQuote("a")))
If you have multiple filters, you can specify collapse = "&" in paste.
df %>$ filter_(paste(c("column1","column2"), "==", shQuote(c("a","b")), collapse = "&"))
The latest way to do this is to use my.data.frame %>% filter(.data[[myName]] == 1), where myName is an environmental variable that contains the column name.
Or using filter_at
library(dplyr)
df %>%
filter_at(vars(column), any_vars(. == 1))
Like Salim B explained above but with a minor change:
df %>% filter(1 == !!as.name(column))
i.e. just reverse the condition because !! otherwise behaves
like
!!(as.name(column)==1)
You can use the across(all_of()) syntax, it takes a string as argument
column = "this"
df %>% filter(across(all_of(column)) == 1)