Get Frequency using a loop in dplyr - r

I am triying to get individual frequency table for each variable using a loop and dplyr package, example of my code is below using mtcars data:
library(dplyr)
var= c("vs", "am", "gear")
for (i in var){
mtcars %>%
group_by(carb) %>%
count(i)
}
Lamentably only i get:
Error: Column `i` is unknown
I also tried with
for (i in var){
mtcars %>%
group_by(carb) %>%
summarise_each(funs(n()), i)
}
But not succces,
Please any advice I will gratefull.

We can use !!sym() for the variable names. I would also recommend to save the results to a list as follows.
var <- c("vs", "am", "gear")
library(dplyr)
count_tables <- list()
for (i in var){
temp <- mtcars %>%
group_by(carb) %>%
count(!!sym(i))
count_tables[[i]] <- temp
}
count_tables
# $vs
# # A tibble: 8 x 3
# # Groups: carb [6]
# carb vs n
# <dbl> <dbl> <int>
# 1 1 1 7
# 2 2 0 5
# 3 2 1 5
# 4 3 0 3
# 5 4 0 8
# 6 4 1 2
# 7 6 0 1
# 8 8 0 1
#
# $am
# # A tibble: 9 x 3
# # Groups: carb [6]
# carb am n
# <dbl> <dbl> <int>
# 1 1 0 3
# 2 1 1 4
# 3 2 0 6
# 4 2 1 4
# 5 3 0 3
# 6 4 0 7
# 7 4 1 3
# 8 6 1 1
# 9 8 1 1
#
# $gear
# # A tibble: 11 x 3
# # Groups: carb [6]
# carb gear n
# <dbl> <dbl> <int>
# 1 1 3 3
# 2 1 4 4
# 3 2 3 4
# 4 2 4 4
# 5 2 5 2
# 6 3 3 3
# 7 4 3 5
# 8 4 4 4
# 9 4 5 1
# 10 6 5 1
# 11 8 5 1
It is also common to use lapply to loop through a vector or a list to apply a function and return the objects as a list. The following generates the same output as the for-loop.
count_tables <- lapply(var, function(x) {
mtcars %>%
group_by(carb) %>%
count(!!sym(i))
})
names(count_tables) <- var

For programmatically passing the variable as a string, you can use the version of those functions with and underscore at the end, such as count_, group_by_, etc.
In this case it would be:
for (i in var){
mtcars %>%
group_by(carb) %>%
count_(i) %>%
print()
}
You specifically asked for a for loop, but for your consideration, here goes a lapply alternative, which makes it easier to store the different results in one place for later access:
lapply(var, FUN = function(i) mtcars %>% group_by(carb) %>% count_(i))

Related

Is there a base version of tidyr::expand?

Is there an easy way or a built-in based function that is equivalent to tidyr::expand?
To elaborate on the comment made by #onyambu, you could do
mtcars |> with(expand.grid(cyl=unique(cyl), am=unique(am)))
# cyl am
# 1 6 1
# 2 4 1
# 3 8 1
# 4 6 0
# 5 4 0
# 6 8 0
whereas tidyr throws this:
library(magrittr)
mtcars %>% tidyr::expand(cyl, am)
# # A tibble: 6 × 2
# cyl am
# <dbl> <dbl>
# 1 4 0
# 2 4 1
# 3 6 0
# 4 6 1
# 5 8 0
# 6 8 1

Group By counts of zero are missing using dplyr [duplicate]

Sometimes it is desirable to have a complete dataframe with observations for all combinations of grouping factors, even when these are absent in the original data (i.e. by filling these gaps with NA data).
Consider the following example with mtcars:
mtcars %>% group_by(cyl, gear) %>% dplyr::summarise(N = n())
# A tibble: 8 x 3
# Groups: cyl [3]
cyl gear N
<dbl> <dbl> <int>
1 4 3 1
2 4 4 8
3 4 5 2
4 6 3 2
5 6 4 4
6 6 5 1
7 8 3 12
8 8 5 2
When grouping by cyl and gear, observations are missing for cyl=8 and gear=4. Is it possible to obtain this summary table in a straightforward, hopefully tidyverse-based, way that includes a row with NA observations for combinations of factors that are missing?. E.g. the desired output would be:
# A tibble: 9 x 3
# Groups: cyl [3]
cyl gear N
<dbl> <dbl> <int>
1 4 3 1
2 4 4 8
3 4 5 2
4 6 3 2
5 6 4 4
6 6 5 1
7 8 3 12
8 8 4 NA
9 8 5 2
We can use complete after removing the group attributes with ungroup
library(dplyr)
library(tidyr)
mtcars %>%
group_by(cyl, gear) %>%
dplyr::summarise(N = n()) %>%
ungroup %>%
complete(cyl, gear)
# A tibble: 9 x 3
# cyl gear N
# <dbl> <dbl> <int>
#1 4 3 1
#2 4 4 8
#3 4 5 2
#4 6 3 2
#5 6 4 4
#6 6 5 1
#7 8 3 12
#8 8 4 NA
#9 8 5 2
Or another option is to create a combination dataset with unique elements of the columns and then do a left_join (not as straightforward as the previous one)
crossing(cyl = unique(mtcars$cyl), gear = unique(mtcars$gear)) %>%
left_join(mtcars %>%
group_by(cyl, gear) %>%
dplyr::summarise(N = n()))
If you convert the groups to factor and use count (alternative for group_by with summarise n()) with .drop = FALSE it will complete missing observations.
library(dplyr)
mtcars %>% mutate_at(vars(cyl, gear), factor) %>% count(cyl, gear, .drop = FALSE)
# cyl gear N
# <fct> <fct> <int>
#1 4 3 1
#2 4 4 8
#3 4 5 2
#4 6 3 2
#5 6 4 4
#6 6 5 1
#7 8 3 12
#8 8 4 0
#9 8 5 2

How to get specific values out of a list of values passed to one argument of a UDF with tidyeval

I used tidyeval to write a short function which takes grouping variables as an input, groups the mtcars dataset and counts the number of occurences per group:
test_function <- function(grps){
mtcars %>%
group_by(across({{grps}})) %>%
summarise(Count = n())
}
test_function(grps = c(cyl, gear))
---
cyl gear Count
<dbl> <dbl> <int>
1 4 3 1
2 4 4 8
3 4 5 2
4 6 3 2
5 6 4 4
6 6 5 1
7 8 3 12
8 8 5 2
Now imagine for that example I want a subtotal column for each group cyl. So how many cars have 4 (6,8) cylinders? This is what the result should look like:
test_function(grps = c(cyl, gear), subtotalrows = TRUE) ### example function execution
---
cyl gear Count
<dbl> <dbl> <int>
1 4 3 1
2 4 4 8
3 4 5 2
4 4 total 11
5 6 3 2
6 6 4 4
7 6 5 1
8 6 total 7
9 8 3 12
10 8 5 2
11 8 total 14
In this case the subtotal columns I am looking for can simply be produced with the same function but with one less grouping variable:
test_function(grps = cyl)
---
cyl Count
<dbl> <int>
1 4 11
2 6 7
3 8 14
But since I don't want to use the function in itself (not even sure wether this is possible in R) I would like to go for a different approach: As far as I know the best (and only way) to create subtotal rows so far is by calculating them independently and then binding them row wise to the grouped table (i.e.: rbind, bind_rows). In my case that means only take the first grouping variable, create the subtotal rows and later on bind them to the table. But here is where I have problems with the tidyeval syntax. Here is in pseudocode what I would like to do in the function:
test_function <- function(grps, subtotalrows = TRUE){
grouped_result <- mtcars %>%
group_by(across({{grps}})) %>%
summarise(Count = n())
if(subtotalrows == FALSE){
return(grouped_result)
} else {
#pseudocode
group_for_subcalculation <- grps[[1]] #I want the first element of the grps argument
subtotal_result <- mtcars %>%
group_by(across({{group_for_subcalculation}})) %>%
summarise(Count = n()) %>%
mutate(grps[[2]] := "total") %>%
arrange(grps[[1]], grps[[2]], Count)
return(rbind(grouped_result, subtotal_result))
}
}
So, two questions: I am curious how I can extract the first column name passed by grps and work with it in the following code. Second, this pseudocode example is specific for 2 columns passed by grps. Imagine I want to pass 3 or more even. How would you do that (loops)?
Try this function -
library(dplyr)
test_function <- function(grps, subtotalrows = TRUE){
grouped_data <- mtcars %>% group_by(across({{grps}}))
groups <- group_vars(grouped_data)
col_to_change <- groups[length(groups)] #Last value in grps
grouped_result <- grouped_data %>% summarise(Count = n())
if(!subtotalrows) return(grouped_result)
else {
result <- grouped_result %>%
summarise(Count = sum(Count),
!!col_to_change := 'Total') %>%
bind_rows(grouped_result %>%
mutate(!!col_to_change := as.character(.data[[col_to_change]]))) %>%
select(all_of(groups), Count) %>%
arrange(across(all_of(groups)))
}
return(result)
}
Test the function -
test_function(grps = c(cyl, gear))
# A tibble: 11 x 3
# cyl gear Count
# <dbl> <chr> <int>
# 1 4 3 1
# 2 4 4 8
# 3 4 5 2
# 4 4 Total 11
# 5 6 3 2
# 6 6 4 4
# 7 6 5 1
# 8 6 Total 7
# 9 8 3 12
#10 8 5 2
#11 8 Total 14
test_function(grps = c(cyl, gear), FALSE)
# cyl gear Count
# <dbl> <dbl> <int>
#1 4 3 1
#2 4 4 8
#3 4 5 2
#4 6 3 2
#5 6 4 4
#6 6 5 1
#7 8 3 12
#8 8 5 2
For 3 variables -
test_function(grps = c(cyl, gear, carb))
# cyl gear carb Count
# <dbl> <dbl> <chr> <int>
# 1 4 3 1 1
# 2 4 3 Total 1
# 3 4 4 1 4
# 4 4 4 2 4
# 5 4 4 Total 8
# 6 4 5 2 2
# 7 4 5 Total 2
# 8 6 3 1 2
# 9 6 3 Total 2
#10 6 4 4 4
#11 6 4 Total 4
#12 6 5 6 1
#13 6 5 Total 1
#14 8 3 2 4
#15 8 3 3 3
#16 8 3 4 5
#17 8 3 Total 12
#18 8 5 4 1
#19 8 5 8 1
#20 8 5 Total 2

Adding an incremental count of sub-groups using dplyr

If I have a grouping:
mtcars %>% group_by(cyl,carb)
How can I add a column that counts the number of unique group combinations; so carb groups within cyl groups? This would be something like:
cyl carb combination
6 2 1
6 4 2
6 6 3
4 2 1
4 4 2
4 6 3
Maybe there's a better way to avoid the n column, but below should be a good start:
mtcars %>% count(cyl,carb) %>% group_by(cyl) %>% mutate(combination=1:n())
# A tibble: 9 x 4
# Groups: cyl [3]
cyl carb n combination
<dbl> <dbl> <int> <int>
1 4 1 5 1
2 4 2 6 2
3 6 1 2 1
4 6 4 4 2
5 6 6 1 3
6 8 2 4 1
7 8 3 3 2
8 8 4 6 3
9 8 8 1 4
There are many ways to do this, this is the way I did it:
library(dplyr)
mtcars %>% group_by(cyl,carb) %>% summarize("count" = length(carb))

Programmatically rename data frame columns using lookup data frame

What is the best way to batch rename columns using a lookup data frame?
Can I do it as part of a pipe?
library(tidyverse)
df <- data_frame(
a = seq(1, 10)
, b = seq(10, 1)
, c = rep(1, 10)
)
df_lookup <- data_frame(
old_name = c("b", "c", "a")
, new_name = c("y", "z", "x")
)
I know how to do it manually
df %>%
rename(x = a
, y = b
, z = c)
I am seeking a solution in tidyverse / dplyr packages.
Use rlang; Firstly build up a list of names using syms, and then splice the arguments to rename with UQS or !!! operator:
library(rlang); library(dplyr)
df %>% rename(!!!syms(with(df_lookup, setNames(old_name, new_name))))
# A tibble: 10 x 3
# x y z
# <int> <int> <dbl>
# 1 1 10 1
# 2 2 9 1
# 3 3 8 1
# 4 4 7 1
# 5 5 6 1
# 6 6 5 1
# 7 7 4 1
# 8 8 3 1
# 9 9 2 1
#10 10 1 1
You could write your own helper to make it easier
rename_to <- function(data, old, new) {
data %>% rename_at(old, function(x) new[old==x])
}
df %>% rename_to(df_lookup$old_name, df_lookup$new_name)
In base-R:
names(df)[match(df_lookup$old_name,names(df))] <- df_lookup$new_name
# # A tibble: 10 x 3
# x y z
# <int> <int> <dbl>
# 1 1 10 1
# 2 2 9 1
# 3 3 8 1
# 4 4 7 1
# 5 5 6 1
# 6 6 5 1
# 7 7 4 1
# 8 8 3 1
# 9 9 2 1
# 10 10 1 1
Using data.table:
library(data.table)
setnames(setDT(df), old = df_lookup$old_name, new = df_lookup$new_name)
# x y z
# 1: 1 10 1
# 2: 2 9 1
# 3: 3 8 1
# 4: 4 7 1
# 5: 5 6 1
# 6: 6 5 1
# 7: 7 4 1
# 8: 8 3 1
# 9: 9 2 1
# 10: 10 1 1

Resources