How to avoid ellipsis ... in dplyr? - r

I want to create a function that takes a grouping argument. Which can be a single or multiple variables. I want it to look like this:
wanted <- function(data, groups, other_params){
data %>% group_by( {{groups}} ) %>% count()
}
This work only when a single group is given but breaks when there are multiple groups. I know it's possible to use the following with ellipsis ... (But I want the syntax groups = something):
not_wanted <- function(data, ..., other_params){
data %>% group_by( ... ) %>% count()
}
Here is the entire code:
library(dplyr)
library(magrittr)
iris$group2 <- rep(1:5, 30)
wanted <- function(data, groups, other_params){
data %>% group_by( {{groups}} ) %>% count()
}
not_wanted <- function(data, ..., other_params){
data %>% group_by( ... ) %>% count()
}
# works
wanted(iris, groups = Species )
not_wanted(iris, Species, group2)
# doesn't work
wanted(iris, groups = vars(Species, group2) )
wanted(iris, groups = c(Species, group2) )
wanted(iris, groups = vars("Species", "group2") )
# Error: Column `vars(Species, group2)` must be length 150 (the number of rows) or one, not 2

You guys are over complicating things, this works just fine:
library(tidyverse)
wanted <- function(data, groups){
data %>% count(!!!groups)
}
mtcars %>% wanted(groups = vars(mpg,disp,hp))
# A tibble: 31 x 4
mpg disp hp n
<dbl> <dbl> <dbl> <int>
1 10.4 460 215 1
2 10.4 472 205 1
3 13.3 350 245 1
4 14.3 360 245 1
5 14.7 440 230 1
6 15 301 335 1
7 15.2 276. 180 1
8 15.2 304 150 1
9 15.5 318 150 1
10 15.8 351 264 1
# … with 21 more rows

The triple bang operator and parse_quos from the rlang package will do the trick. For more info, see e.g. https://stackoverflow.com/a/49941635/6086135
library(dplyr)
library(magrittr)
iris$group2 <- rep(1:5, 30)
vec <- c("Species", "group2")
wanted <- function(data, groups){
data %>% count(!!!rlang::parse_quos(groups, rlang::current_env()))
}
wanted(iris, vec)
#> # A tibble: 15 x 3
#> Species group2 n
#> <fct> <int> <int>
#> 1 setosa 1 10
#> 2 setosa 2 10
#> 3 setosa 3 10
#> 4 setosa 4 10
#> 5 setosa 5 10
#> 6 versicolor 1 10
#> 7 versicolor 2 10
#> 8 versicolor 3 10
#> 9 versicolor 4 10
#> 10 versicolor 5 10
#> 11 virginica 1 10
#> 12 virginica 2 10
#> 13 virginica 3 10
#> 14 virginica 4 10
#> 15 virginica 5 10
Created on 2020-01-06 by the reprex package (v0.3.0)

Here is another option to avoid quotations in the function call. I admit its not very pretty though.
library(tidyverse)
wanted <- function(data, groups){
grouping <- gsub(x = rlang::quo_get_expr(enquo(groups)), pattern = "\\((.*)?\\)", replacement = "\\1")[-1]
data %>% group_by_at(grouping) %>% count()
}
iris$group2 <- rep(1:5, 30)
wanted(iris, groups = c(Species, group2) )
#> # A tibble: 15 x 3
#> # Groups: Species, group2 [15]
#> Species group2 n
#> <fct> <int> <int>
#> 1 setosa 1 10
#> 2 setosa 2 10
#> 3 setosa 3 10
#> 4 setosa 4 10
#> 5 setosa 5 10
#> 6 versicolor 1 10
#> 7 versicolor 2 10
#> 8 versicolor 3 10
#> 9 versicolor 4 10
#> 10 versicolor 5 10
#> 11 virginica 1 10
#> 12 virginica 2 10
#> 13 virginica 3 10
#> 14 virginica 4 10
#> 15 virginica 5 10

Related

using lapply with list of arguments on dplyr functions that uses data masking

when programming using dplyr, to programmatically use variables in dplyr vers from function arguments, they need to be references by {{var}}
This works well, but I would like to use lapply with the var argument supplied in a list. This is throwing me an error. I have tried back and forth using substitute and rlang vars like sym but to no avail.
any suggestions? Thanks!
library(tidyverse)
tb <- tibble(a = 1:10, b = 10:1)
foo <- function(var, scalar){
tb %>% mutate(new_var = {{var}}*scalar)
}
foo(a, pi) #works
lapply(X = list(
list(sym("a"), pi),
list(substitute(b), exp(1))), FUN = function(ll) foo(var = ll$a, scalar = ll$pi) ) #err
You can get round the non-standard evalutation by naming the list elements and using do.call
lapply(X = list(
list(var = sym("a"), scalar = pi),
list(var = substitute(b), scalar = exp(1))),
FUN = function(ll) do.call(foo, ll))
#> [[1]]
#> # A tibble: 10 x 3
#> a b new_var
#> <int> <int> <dbl>
#> 1 1 10 3.14
#> 2 2 9 6.28
#> 3 3 8 9.42
#> 4 4 7 12.6
#> 5 5 6 15.7
#> 6 6 5 18.8
#> 7 7 4 22.0
#> 8 8 3 25.1
#> 9 9 2 28.3
#> 10 10 1 31.4
#>
#> [[2]]
#> # A tibble: 10 x 3
#> a b new_var
#> <int> <int> <dbl>
#> 1 1 10 27.2
#> 2 2 9 24.5
#> 3 3 8 21.7
#> 4 4 7 19.0
#> 5 5 6 16.3
#> 6 6 5 13.6
#> 7 7 4 10.9
#> 8 8 3 8.15
#> 9 9 2 5.44
#> 10 10 1 2.72
Created on 2022-11-03 with reprex v2.0.2

How to make multi layer cross tabs in R

I am attempting to create a multi-layered cross tab in R. Currently, when using this code:
NewMexico_DEM_xtab_ <- NewMexico_DEM_Voterfile %>%
group_by(Sex, CountyName) %>%
tally() %>%
spread(Sex, n)
I receive this output:
My goal is to add a layer for age using the Age column and for R to output a tab like this:
Is there a way I can do this with my current code or a package that would make this easier?
Do either of these approaches solve your problem?
library(tidyverse)
# Create sample data
iris_df <- iris
iris_df$Sample <- sample(c("M","F"), 150, replace = TRUE)
# crosstabs
iris_df %>%
group_by(Species, Sample) %>%
tally() %>%
spread(Sample, n)
#> # A tibble: 3 × 3
#> # Groups: Species [3]
#> Species F M
#> <fct> <int> <int>
#> 1 setosa 26 24
#> 2 versicolor 25 25
#> 3 virginica 27 23
# Add in 'Age'
iris_df$Age <- sample(c("18-24", "25-35", "36-45", "45+"), 150, replace = TRUE)
# crosstabs
iris_df %>%
group_by(Species, Sample, Age) %>%
tally() %>%
spread(Age, n)
#> # A tibble: 6 × 6
#> # Groups: Species, Sample [6]
#> Species Sample `18-24` `25-35` `36-45` `45+`
#> <fct> <chr> <int> <int> <int> <int>
#> 1 setosa F 2 4 14 6
#> 2 setosa M 11 4 5 4
#> 3 versicolor F 3 8 8 6
#> 4 versicolor M 5 8 2 10
#> 5 virginica F 5 8 7 7
#> 6 virginica M 6 10 3 4
# Using janitor::tabyl()
library(janitor)
#>
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#>
#> chisq.test, fisher.test
iris_df %>%
tabyl(Species, Sample, Age)
#> $`18-24`
#> Species F M
#> setosa 2 11
#> versicolor 3 5
#> virginica 5 6
#>
#> $`25-35`
#> Species F M
#> setosa 4 4
#> versicolor 8 8
#> virginica 8 10
#>
#> $`36-45`
#> Species F M
#> setosa 14 5
#> versicolor 8 2
#> virginica 7 3
#>
#> $`45+`
#> Species F M
#> setosa 6 4
#> versicolor 6 10
#> virginica 7 4
Created on 2022-08-24 by the reprex package (v2.0.1)

How to summarize by sets of grouping variables in R and dplyr?

I want to group a data frame using different sets of grouping variables. For each group I want to count the number of observations (or summarize in any other way) and then collect all results in one data frame.
Important: I want to define the sets of grouping variables programmatically, for example as a list.
How do I achieve this in the tidyverse?
Here is my attempt:
library(tidyverse)
count_by_group <- function(...) {
mtcars %>%
count(...) %>%
mutate(
grouping_variable = paste(ensyms(...), collapse = "."),
group = paste(!!!enquos(...), sep = ".")
) %>%
select(grouping_variable, group, n)
}
# I want this ...
bind_rows(
count_by_group(cyl),
count_by_group(gear),
count_by_group(cyl, gear)
)
#> grouping_variable group n
#> 1 cyl 4 11
#> 2 cyl 6 7
#> 3 cyl 8 14
#> 4 gear 3 15
#> 5 gear 4 12
#> 6 gear 5 5
#> 7 cyl.gear 4.3 1
#> 8 cyl.gear 4.4 8
#> 9 cyl.gear 4.5 2
#> 10 cyl.gear 6.3 2
#> 11 cyl.gear 6.4 4
#> 12 cyl.gear 6.5 1
#> 13 cyl.gear 8.3 12
#> 14 cyl.gear 8.5 2
# ... but without the repetition of "count_by_group(var)".
# The following does not work:
map_dfr(
list(
cyl,
gear,
c(cyl, gear)
),
count_by_group
)
#> Error in map(.x, .f, ...): object 'cyl' not found
Created on 2020-09-17 by the reprex package (v0.3.0)
Update (2020-10-12): More transparent solution (thanks to #LionelHenry)
library(tidyverse)
count_by_group <- function(...) {
dots <- enquos(..., .named = TRUE)
names <- names(dots)
counted <- count(mtcars, !!!dots)
group <- counted %>%
select(-n) %>%
rowwise() %>%
mutate(paste(c_across(), collapse = ".")) %>%
pull()
# # Equivalently:
# group <- counted %>%
# select(-n) %>%
# pmap(counted, paste, sep = ".")
counted %>%
mutate(
grouping_variable = paste(names, collapse = "."),
group = group
) %>%
select(grouping_variable, group, n)
}
grouping_variables <- list(
vars(cyl),
vars(gear),
vars(cyl, gear)
)
map_dfr(grouping_variables, ~ count_by_group(!!! .x))
#> grouping_variable group n
#> 1 cyl 4 11
#> 2 cyl 6 7
#> 3 cyl 8 14
#> 4 gear 3 15
#> 5 gear 4 12
#> 6 gear 5 5
#> 7 cyl.gear 4.3 1
#> 8 cyl.gear 4.4 8
#> 9 cyl.gear 4.5 2
#> 10 cyl.gear 6.3 2
#> 11 cyl.gear 6.4 4
#> 12 cyl.gear 6.5 1
#> 13 cyl.gear 8.3 12
#> 14 cyl.gear 8.5 2
Created on 2020-10-12 by the reprex package (v0.3.0)
I just found that this works!
library(tidyverse)
count_by_group <- function(...) {
mtcars %>%
count(...) %>%
mutate(
grouping_variable = paste(ensyms(...), collapse = "."),
group = paste(!!!enquos(...), sep = ".")
) %>%
select(grouping_variable, group, n)
}
grouping_variables <- list(
vars(cyl),
vars(gear),
vars(cyl, gear)
)
map_dfr(grouping_variables, ~count_by_group(!!! .))
#> grouping_variable group n
#> 1 cyl 4 11
#> 2 cyl 6 7
#> 3 cyl 8 14
#> 4 gear 3 15
#> 5 gear 4 12
#> 6 gear 5 5
#> 7 cyl.gear 4.3 1
#> 8 cyl.gear 4.4 8
#> 9 cyl.gear 4.5 2
#> 10 cyl.gear 6.3 2
#> 11 cyl.gear 6.4 4
#> 12 cyl.gear 6.5 1
#> 13 cyl.gear 8.3 12
#> 14 cyl.gear 8.5 2
Created on 2020-10-12 by the reprex package (v0.3.0)

For loop to iterate through dplyr pipe

I'm trying to get the total number of entries of each row in a dataframe in order to compress on these fields later.
However the dataframe has over 60 rows and writing the below 60 times is extremely inefficient
df %>%
group_by(colname) %>%
count() %>%
arrange(desc(n))
Is there a way I can write a for loop to loop through all the names in the dataframe and produce the pipe function result for each? I tried
for (i in colnames(df)) {
df %>%
group_by(colname) %>%
count() %>%
arrange(desc(n))
}
But I'm getting an 'i is unknown' error. Any help would be appreciated thanks.
If I understand correctly you want to count the number of occurrences of the unique elements in every single column or did I get that completely wrong? Why are you not just using a combination of some apply function and table?
set.seed(101)
df <- data.frame("x" = 1:20, "y" = LETTERS[sample(1:26, 20, replace = TRUE)], "z" = letters[sample(1:26, 20, replace = TRUE)])
l <- sapply(df, table)
lapply(l, sort, decreasing = T)
You can try this:
#Data
df <- iris
#Create list
List <- list()
#Compute
for (colname in colnames(df)) {
List[[colname]]<- df %>%
group_by(df[,colname]) %>%
count() %>%
arrange(desc(n))
}
#Print
List
$Sepal.Length
# A tibble: 35 x 2
# Groups: df[, colname] [35]
`df[, colname]` n
<dbl> <int>
1 5 10
2 5.1 9
3 6.3 9
4 5.7 8
5 6.7 8
6 5.5 7
7 5.8 7
8 6.4 7
9 4.9 6
10 5.4 6
# ... with 25 more rows
$Sepal.Width
# A tibble: 23 x 2
# Groups: df[, colname] [23]
`df[, colname]` n
<dbl> <int>
1 3 26
2 2.8 14
3 3.2 13
4 3.4 12
5 3.1 11
6 2.9 10
7 2.7 9
8 2.5 8
9 3.3 6
10 3.5 6
# ... with 13 more rows
$Petal.Length
# A tibble: 43 x 2
# Groups: df[, colname] [43]
`df[, colname]` n
<dbl> <int>
1 1.4 13
2 1.5 13
3 4.5 8
4 5.1 8
5 1.3 7
6 1.6 7
7 5.6 6
8 4 5
9 4.7 5
10 4.9 5
# ... with 33 more rows
$Petal.Width
# A tibble: 22 x 2
# Groups: df[, colname] [22]
`df[, colname]` n
<dbl> <int>
1 0.2 29
2 1.3 13
3 1.5 12
4 1.8 12
5 1.4 8
6 2.3 8
7 0.3 7
8 0.4 7
9 1 7
10 2 6
# ... with 12 more rows
$Species
# A tibble: 3 x 2
# Groups: df[, colname] [3]
`df[, colname]` n
<fct> <int>
1 setosa 50
2 versicolor 50
3 virginica 50

How to group by multiple values in a function with dplyr

I wonder how to modify below code
xxx<-function(df,groupbys){
groupbys<-enquo(groupbys)
df%>%group_by_(groupbys)%>%summarise(count=n())
}
zzz<-xxx(iris,Species)
to have the option to feed in either one column or more than one column to group by? For example, goup_by_ both Speciesand Petal.Length with iris dataset.
When using enquo (single argument) or enquos (multiple), you should use the !! and !!! operators, respectively.
xxx <- function(df, ...) {
grps <- enquos(...)
df %>%
group_by(!!!grps) %>%
tally() %>%
ungroup()
}
mtcars %>% xxx(cyl, am)
# # A tibble: 6 x 3
# cyl am n
# <dbl> <dbl> <int>
# 1 4 0 3
# 2 4 1 8
# 3 6 0 4
# 4 6 1 3
# 5 8 0 12
# 6 8 1 2
or if you want to keep a single argument in the function formals for one or more column names, I think you'll need to use vars() in the call. (Perhaps there's another way suggested in the Programming with dplyr vignette.)
xxx <- function(df, groups) {
df %>%
group_by(!!!groups) %>%
tally() %>%
ungroup()
}
xxx(mtcars, vars(cyl, am))
This is a point whereby you just need to use the .dots argument in the groupby function. Just ensure the groupbys is a character. ie
xxx<-function(df,groupbys){
df%>%group_by(.dots = groupbys)%>%summarise(count=n())
}
xxx(iris,"Species")
# A tibble: 3 x 2
Species count
<fct> <int>
1 setosa 50
2 versicolor 50
3 virginica 50
xxx(iris,c("Species","Petal.Length"))
# A tibble: 48 x 3
# Groups: Species [3]
Species Petal.Length count
<fct> <dbl> <int>
1 setosa 1 1
2 setosa 1.1 1
3 setosa 1.2 2
4 setosa 1.3 7
5 setosa 1.4 13
6 setosa 1.5 13
7 setosa 1.6 7
8 setosa 1.7 4
9 setosa 1.9 2
10 versicolor 3 1
Here are two approaches to the problem. If you want to pass column name as unquoted variables, you can use ... and use it in count instead of group_by + summarise.
xxx<-function(df,...){
df %>% count(...)
}
xxx(mtcars, cyl)
# A tibble: 3 x 2
# cyl n
# <dbl> <int>
#1 4 11
#2 6 7
#3 8 14
xxx(mtcars, cyl, am)
# A tibble: 6 x 3
# cyl am n
# <dbl> <dbl> <int>
#1 4 0 3
#2 4 1 8
#3 6 0 4
#4 6 1 3
#5 8 0 12
#6 8 1 2
Second approach if you want to pass column name as quoted variable (strings), you can use group_by_at which accepts string inputs.
xxx<-function(df,groupbys){
df %>% group_by_at(groupbys) %>% summarise(n = n())
}
xxx(mtcars, c("cyl", "am"))

Resources