I want to create a function that takes a grouping argument. Which can be a single or multiple variables. I want it to look like this:
wanted <- function(data, groups, other_params){
data %>% group_by( {{groups}} ) %>% count()
}
This work only when a single group is given but breaks when there are multiple groups. I know it's possible to use the following with ellipsis ... (But I want the syntax groups = something):
not_wanted <- function(data, ..., other_params){
data %>% group_by( ... ) %>% count()
}
Here is the entire code:
library(dplyr)
library(magrittr)
iris$group2 <- rep(1:5, 30)
wanted <- function(data, groups, other_params){
data %>% group_by( {{groups}} ) %>% count()
}
not_wanted <- function(data, ..., other_params){
data %>% group_by( ... ) %>% count()
}
# works
wanted(iris, groups = Species )
not_wanted(iris, Species, group2)
# doesn't work
wanted(iris, groups = vars(Species, group2) )
wanted(iris, groups = c(Species, group2) )
wanted(iris, groups = vars("Species", "group2") )
# Error: Column `vars(Species, group2)` must be length 150 (the number of rows) or one, not 2
You guys are over complicating things, this works just fine:
library(tidyverse)
wanted <- function(data, groups){
data %>% count(!!!groups)
}
mtcars %>% wanted(groups = vars(mpg,disp,hp))
# A tibble: 31 x 4
mpg disp hp n
<dbl> <dbl> <dbl> <int>
1 10.4 460 215 1
2 10.4 472 205 1
3 13.3 350 245 1
4 14.3 360 245 1
5 14.7 440 230 1
6 15 301 335 1
7 15.2 276. 180 1
8 15.2 304 150 1
9 15.5 318 150 1
10 15.8 351 264 1
# … with 21 more rows
The triple bang operator and parse_quos from the rlang package will do the trick. For more info, see e.g. https://stackoverflow.com/a/49941635/6086135
library(dplyr)
library(magrittr)
iris$group2 <- rep(1:5, 30)
vec <- c("Species", "group2")
wanted <- function(data, groups){
data %>% count(!!!rlang::parse_quos(groups, rlang::current_env()))
}
wanted(iris, vec)
#> # A tibble: 15 x 3
#> Species group2 n
#> <fct> <int> <int>
#> 1 setosa 1 10
#> 2 setosa 2 10
#> 3 setosa 3 10
#> 4 setosa 4 10
#> 5 setosa 5 10
#> 6 versicolor 1 10
#> 7 versicolor 2 10
#> 8 versicolor 3 10
#> 9 versicolor 4 10
#> 10 versicolor 5 10
#> 11 virginica 1 10
#> 12 virginica 2 10
#> 13 virginica 3 10
#> 14 virginica 4 10
#> 15 virginica 5 10
Created on 2020-01-06 by the reprex package (v0.3.0)
Here is another option to avoid quotations in the function call. I admit its not very pretty though.
library(tidyverse)
wanted <- function(data, groups){
grouping <- gsub(x = rlang::quo_get_expr(enquo(groups)), pattern = "\\((.*)?\\)", replacement = "\\1")[-1]
data %>% group_by_at(grouping) %>% count()
}
iris$group2 <- rep(1:5, 30)
wanted(iris, groups = c(Species, group2) )
#> # A tibble: 15 x 3
#> # Groups: Species, group2 [15]
#> Species group2 n
#> <fct> <int> <int>
#> 1 setosa 1 10
#> 2 setosa 2 10
#> 3 setosa 3 10
#> 4 setosa 4 10
#> 5 setosa 5 10
#> 6 versicolor 1 10
#> 7 versicolor 2 10
#> 8 versicolor 3 10
#> 9 versicolor 4 10
#> 10 versicolor 5 10
#> 11 virginica 1 10
#> 12 virginica 2 10
#> 13 virginica 3 10
#> 14 virginica 4 10
#> 15 virginica 5 10
Related
when programming using dplyr, to programmatically use variables in dplyr vers from function arguments, they need to be references by {{var}}
This works well, but I would like to use lapply with the var argument supplied in a list. This is throwing me an error. I have tried back and forth using substitute and rlang vars like sym but to no avail.
any suggestions? Thanks!
library(tidyverse)
tb <- tibble(a = 1:10, b = 10:1)
foo <- function(var, scalar){
tb %>% mutate(new_var = {{var}}*scalar)
}
foo(a, pi) #works
lapply(X = list(
list(sym("a"), pi),
list(substitute(b), exp(1))), FUN = function(ll) foo(var = ll$a, scalar = ll$pi) ) #err
You can get round the non-standard evalutation by naming the list elements and using do.call
lapply(X = list(
list(var = sym("a"), scalar = pi),
list(var = substitute(b), scalar = exp(1))),
FUN = function(ll) do.call(foo, ll))
#> [[1]]
#> # A tibble: 10 x 3
#> a b new_var
#> <int> <int> <dbl>
#> 1 1 10 3.14
#> 2 2 9 6.28
#> 3 3 8 9.42
#> 4 4 7 12.6
#> 5 5 6 15.7
#> 6 6 5 18.8
#> 7 7 4 22.0
#> 8 8 3 25.1
#> 9 9 2 28.3
#> 10 10 1 31.4
#>
#> [[2]]
#> # A tibble: 10 x 3
#> a b new_var
#> <int> <int> <dbl>
#> 1 1 10 27.2
#> 2 2 9 24.5
#> 3 3 8 21.7
#> 4 4 7 19.0
#> 5 5 6 16.3
#> 6 6 5 13.6
#> 7 7 4 10.9
#> 8 8 3 8.15
#> 9 9 2 5.44
#> 10 10 1 2.72
Created on 2022-11-03 with reprex v2.0.2
I am attempting to create a multi-layered cross tab in R. Currently, when using this code:
NewMexico_DEM_xtab_ <- NewMexico_DEM_Voterfile %>%
group_by(Sex, CountyName) %>%
tally() %>%
spread(Sex, n)
I receive this output:
My goal is to add a layer for age using the Age column and for R to output a tab like this:
Is there a way I can do this with my current code or a package that would make this easier?
Do either of these approaches solve your problem?
library(tidyverse)
# Create sample data
iris_df <- iris
iris_df$Sample <- sample(c("M","F"), 150, replace = TRUE)
# crosstabs
iris_df %>%
group_by(Species, Sample) %>%
tally() %>%
spread(Sample, n)
#> # A tibble: 3 × 3
#> # Groups: Species [3]
#> Species F M
#> <fct> <int> <int>
#> 1 setosa 26 24
#> 2 versicolor 25 25
#> 3 virginica 27 23
# Add in 'Age'
iris_df$Age <- sample(c("18-24", "25-35", "36-45", "45+"), 150, replace = TRUE)
# crosstabs
iris_df %>%
group_by(Species, Sample, Age) %>%
tally() %>%
spread(Age, n)
#> # A tibble: 6 × 6
#> # Groups: Species, Sample [6]
#> Species Sample `18-24` `25-35` `36-45` `45+`
#> <fct> <chr> <int> <int> <int> <int>
#> 1 setosa F 2 4 14 6
#> 2 setosa M 11 4 5 4
#> 3 versicolor F 3 8 8 6
#> 4 versicolor M 5 8 2 10
#> 5 virginica F 5 8 7 7
#> 6 virginica M 6 10 3 4
# Using janitor::tabyl()
library(janitor)
#>
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#>
#> chisq.test, fisher.test
iris_df %>%
tabyl(Species, Sample, Age)
#> $`18-24`
#> Species F M
#> setosa 2 11
#> versicolor 3 5
#> virginica 5 6
#>
#> $`25-35`
#> Species F M
#> setosa 4 4
#> versicolor 8 8
#> virginica 8 10
#>
#> $`36-45`
#> Species F M
#> setosa 14 5
#> versicolor 8 2
#> virginica 7 3
#>
#> $`45+`
#> Species F M
#> setosa 6 4
#> versicolor 6 10
#> virginica 7 4
Created on 2022-08-24 by the reprex package (v2.0.1)
I want to group a data frame using different sets of grouping variables. For each group I want to count the number of observations (or summarize in any other way) and then collect all results in one data frame.
Important: I want to define the sets of grouping variables programmatically, for example as a list.
How do I achieve this in the tidyverse?
Here is my attempt:
library(tidyverse)
count_by_group <- function(...) {
mtcars %>%
count(...) %>%
mutate(
grouping_variable = paste(ensyms(...), collapse = "."),
group = paste(!!!enquos(...), sep = ".")
) %>%
select(grouping_variable, group, n)
}
# I want this ...
bind_rows(
count_by_group(cyl),
count_by_group(gear),
count_by_group(cyl, gear)
)
#> grouping_variable group n
#> 1 cyl 4 11
#> 2 cyl 6 7
#> 3 cyl 8 14
#> 4 gear 3 15
#> 5 gear 4 12
#> 6 gear 5 5
#> 7 cyl.gear 4.3 1
#> 8 cyl.gear 4.4 8
#> 9 cyl.gear 4.5 2
#> 10 cyl.gear 6.3 2
#> 11 cyl.gear 6.4 4
#> 12 cyl.gear 6.5 1
#> 13 cyl.gear 8.3 12
#> 14 cyl.gear 8.5 2
# ... but without the repetition of "count_by_group(var)".
# The following does not work:
map_dfr(
list(
cyl,
gear,
c(cyl, gear)
),
count_by_group
)
#> Error in map(.x, .f, ...): object 'cyl' not found
Created on 2020-09-17 by the reprex package (v0.3.0)
Update (2020-10-12): More transparent solution (thanks to #LionelHenry)
library(tidyverse)
count_by_group <- function(...) {
dots <- enquos(..., .named = TRUE)
names <- names(dots)
counted <- count(mtcars, !!!dots)
group <- counted %>%
select(-n) %>%
rowwise() %>%
mutate(paste(c_across(), collapse = ".")) %>%
pull()
# # Equivalently:
# group <- counted %>%
# select(-n) %>%
# pmap(counted, paste, sep = ".")
counted %>%
mutate(
grouping_variable = paste(names, collapse = "."),
group = group
) %>%
select(grouping_variable, group, n)
}
grouping_variables <- list(
vars(cyl),
vars(gear),
vars(cyl, gear)
)
map_dfr(grouping_variables, ~ count_by_group(!!! .x))
#> grouping_variable group n
#> 1 cyl 4 11
#> 2 cyl 6 7
#> 3 cyl 8 14
#> 4 gear 3 15
#> 5 gear 4 12
#> 6 gear 5 5
#> 7 cyl.gear 4.3 1
#> 8 cyl.gear 4.4 8
#> 9 cyl.gear 4.5 2
#> 10 cyl.gear 6.3 2
#> 11 cyl.gear 6.4 4
#> 12 cyl.gear 6.5 1
#> 13 cyl.gear 8.3 12
#> 14 cyl.gear 8.5 2
Created on 2020-10-12 by the reprex package (v0.3.0)
I just found that this works!
library(tidyverse)
count_by_group <- function(...) {
mtcars %>%
count(...) %>%
mutate(
grouping_variable = paste(ensyms(...), collapse = "."),
group = paste(!!!enquos(...), sep = ".")
) %>%
select(grouping_variable, group, n)
}
grouping_variables <- list(
vars(cyl),
vars(gear),
vars(cyl, gear)
)
map_dfr(grouping_variables, ~count_by_group(!!! .))
#> grouping_variable group n
#> 1 cyl 4 11
#> 2 cyl 6 7
#> 3 cyl 8 14
#> 4 gear 3 15
#> 5 gear 4 12
#> 6 gear 5 5
#> 7 cyl.gear 4.3 1
#> 8 cyl.gear 4.4 8
#> 9 cyl.gear 4.5 2
#> 10 cyl.gear 6.3 2
#> 11 cyl.gear 6.4 4
#> 12 cyl.gear 6.5 1
#> 13 cyl.gear 8.3 12
#> 14 cyl.gear 8.5 2
Created on 2020-10-12 by the reprex package (v0.3.0)
I'm trying to get the total number of entries of each row in a dataframe in order to compress on these fields later.
However the dataframe has over 60 rows and writing the below 60 times is extremely inefficient
df %>%
group_by(colname) %>%
count() %>%
arrange(desc(n))
Is there a way I can write a for loop to loop through all the names in the dataframe and produce the pipe function result for each? I tried
for (i in colnames(df)) {
df %>%
group_by(colname) %>%
count() %>%
arrange(desc(n))
}
But I'm getting an 'i is unknown' error. Any help would be appreciated thanks.
If I understand correctly you want to count the number of occurrences of the unique elements in every single column or did I get that completely wrong? Why are you not just using a combination of some apply function and table?
set.seed(101)
df <- data.frame("x" = 1:20, "y" = LETTERS[sample(1:26, 20, replace = TRUE)], "z" = letters[sample(1:26, 20, replace = TRUE)])
l <- sapply(df, table)
lapply(l, sort, decreasing = T)
You can try this:
#Data
df <- iris
#Create list
List <- list()
#Compute
for (colname in colnames(df)) {
List[[colname]]<- df %>%
group_by(df[,colname]) %>%
count() %>%
arrange(desc(n))
}
#Print
List
$Sepal.Length
# A tibble: 35 x 2
# Groups: df[, colname] [35]
`df[, colname]` n
<dbl> <int>
1 5 10
2 5.1 9
3 6.3 9
4 5.7 8
5 6.7 8
6 5.5 7
7 5.8 7
8 6.4 7
9 4.9 6
10 5.4 6
# ... with 25 more rows
$Sepal.Width
# A tibble: 23 x 2
# Groups: df[, colname] [23]
`df[, colname]` n
<dbl> <int>
1 3 26
2 2.8 14
3 3.2 13
4 3.4 12
5 3.1 11
6 2.9 10
7 2.7 9
8 2.5 8
9 3.3 6
10 3.5 6
# ... with 13 more rows
$Petal.Length
# A tibble: 43 x 2
# Groups: df[, colname] [43]
`df[, colname]` n
<dbl> <int>
1 1.4 13
2 1.5 13
3 4.5 8
4 5.1 8
5 1.3 7
6 1.6 7
7 5.6 6
8 4 5
9 4.7 5
10 4.9 5
# ... with 33 more rows
$Petal.Width
# A tibble: 22 x 2
# Groups: df[, colname] [22]
`df[, colname]` n
<dbl> <int>
1 0.2 29
2 1.3 13
3 1.5 12
4 1.8 12
5 1.4 8
6 2.3 8
7 0.3 7
8 0.4 7
9 1 7
10 2 6
# ... with 12 more rows
$Species
# A tibble: 3 x 2
# Groups: df[, colname] [3]
`df[, colname]` n
<fct> <int>
1 setosa 50
2 versicolor 50
3 virginica 50
I wonder how to modify below code
xxx<-function(df,groupbys){
groupbys<-enquo(groupbys)
df%>%group_by_(groupbys)%>%summarise(count=n())
}
zzz<-xxx(iris,Species)
to have the option to feed in either one column or more than one column to group by? For example, goup_by_ both Speciesand Petal.Length with iris dataset.
When using enquo (single argument) or enquos (multiple), you should use the !! and !!! operators, respectively.
xxx <- function(df, ...) {
grps <- enquos(...)
df %>%
group_by(!!!grps) %>%
tally() %>%
ungroup()
}
mtcars %>% xxx(cyl, am)
# # A tibble: 6 x 3
# cyl am n
# <dbl> <dbl> <int>
# 1 4 0 3
# 2 4 1 8
# 3 6 0 4
# 4 6 1 3
# 5 8 0 12
# 6 8 1 2
or if you want to keep a single argument in the function formals for one or more column names, I think you'll need to use vars() in the call. (Perhaps there's another way suggested in the Programming with dplyr vignette.)
xxx <- function(df, groups) {
df %>%
group_by(!!!groups) %>%
tally() %>%
ungroup()
}
xxx(mtcars, vars(cyl, am))
This is a point whereby you just need to use the .dots argument in the groupby function. Just ensure the groupbys is a character. ie
xxx<-function(df,groupbys){
df%>%group_by(.dots = groupbys)%>%summarise(count=n())
}
xxx(iris,"Species")
# A tibble: 3 x 2
Species count
<fct> <int>
1 setosa 50
2 versicolor 50
3 virginica 50
xxx(iris,c("Species","Petal.Length"))
# A tibble: 48 x 3
# Groups: Species [3]
Species Petal.Length count
<fct> <dbl> <int>
1 setosa 1 1
2 setosa 1.1 1
3 setosa 1.2 2
4 setosa 1.3 7
5 setosa 1.4 13
6 setosa 1.5 13
7 setosa 1.6 7
8 setosa 1.7 4
9 setosa 1.9 2
10 versicolor 3 1
Here are two approaches to the problem. If you want to pass column name as unquoted variables, you can use ... and use it in count instead of group_by + summarise.
xxx<-function(df,...){
df %>% count(...)
}
xxx(mtcars, cyl)
# A tibble: 3 x 2
# cyl n
# <dbl> <int>
#1 4 11
#2 6 7
#3 8 14
xxx(mtcars, cyl, am)
# A tibble: 6 x 3
# cyl am n
# <dbl> <dbl> <int>
#1 4 0 3
#2 4 1 8
#3 6 0 4
#4 6 1 3
#5 8 0 12
#6 8 1 2
Second approach if you want to pass column name as quoted variable (strings), you can use group_by_at which accepts string inputs.
xxx<-function(df,groupbys){
df %>% group_by_at(groupbys) %>% summarise(n = n())
}
xxx(mtcars, c("cyl", "am"))