dplyr summarise and group_by for unique values

dplyr summarise and group_by for unique values - r

Here's a representative example:
DF <- as.data.frame(matrix(data = 0, nrow = 9, ncol = 3))
colnames(DF) <- c("code", "actual", "expected")
DF$code <- letters[rep(1:3, each = 3)]
DF$actual <- runif(9, 3,5)
DF$expected <- rep(1:3, each = 3)
The following crashes:
DF %>%
group_by(code) %>%
summarise(Exp = expected)
Error: expecting a single value
However, the following works:
DF %>%
group_by(code) %>%
summarise(Exp = unique(expected))
However, the unique value by code is just one value. Why doesn't returnign the value work? Why do I need to wrap it up in a "unique"?
Thanks!

This is a common mistake. One way to debug it is to use paste() in the summarise call.
> DF %>%
group_by(code) %>%
summarise(Exp=paste(expected, collapse='-'))
Source: local data frame [3 x 2]
code Exp
(chr) (chr)
1 a 1-1-1
2 b 2-2-2
3 c 3-3-3
Do you see what is going on? You are trying to assign multiple values to a single group.
One solution is to use unique as you describe. In alternative, if you know that all the rows with the same code have always the same expected value, you can group_by directly:
> DF%>% group_by(code, expected) %>% summarise()
Source: local data frame [3 x 2]
Groups: code [?]
code expected
(chr) (int)
1 a 1
2 b 2
3 c 3
If the dataframe is big, group_by will be much faster than the solution based on unique()

Related

Loop through specific columns of dataframe keeping some columns as fixed

I have a large dataset with the two first columns that serve as ID (one is an ID and the other one is a year variable). I would like to compute a count by group and to loop over each variable that is not an ID one. This code below shows what I want to achieve for one variable:
library(tidyverse)
df <- tibble(
ID1 = c(rep("a", 10), rep("b", 10)),
year = c(2001:2020),
var1 = rnorm(20),
var2 = rnorm(20))
df %>%
select(ID1, year, var1) %>%
filter(if_any(starts_with("var"), ~!is.na(.))) %>%
group_by(year) %>%
count() %>%
print(n = Inf)
I cannot use a loop that starts with for(i in names(df)) since I want to keep the variables "ID1" and "year". How can I run this piece of code for all the columns that start with "var"? I tried using quosures but it did not work as I receive the error select() doesn't handle lists. I also tried to work with select(starts_with("var") but with no success.
Many thanks!

Another possible solution:
library(tidyverse)
df %>%
group_by(ID1) %>%
summarise(across(starts_with("var"), ~ length(na.omit(.x))))
#> # A tibble: 2 × 3
#> ID1 var1 var2
#> <chr> <int> <int>
#> 1 a 10 10
#> 2 b 10 10

for(i in names(df)[grepl('var',names(df))])

issues calculating rowwise maximum

suppose I have a tibble dat below, what I would like to do is to calculate maximum of (x 2, x 3) and then minus x 1, where x can be either a or b. In my real data I have more than 3 columns, so something like 2:n (e.g., 2:3) would be great. tried many things, seems not working as I wanted them to, still struggling with the string vs column name thing..
dat <- tibble(`a 1` = c(0, 0, 0), `a 2` = 1:3, `a 3` = 3:1,
`b 1` = rep(1, 3), `b 2` = 4:6, `b 3` = 6:4)
foo <- function(x = 'a')
{
???
}
end result:
if x == `a`
c(3, 2, 3)
if x == `b`
c(5, 4, 5)

Solution 1
This solution uses only base R. The idea is to define a function (max_minus_first) to calculate the answer. The max_minus_first function has two arguments. The first argument, dat, is a data frame for analysis with the same format as the OP provided. group is the name of the group for analysis. The end product is a vector with the answer.
max_minus_first <- function(dat, group){
# Get all column names with starting string "group"
col_names <- colnames(dat)
dat2 <- dat[, col_names[grepl(paste0("^", group), col_names)]]
# Get the maximum values from all columns except the first column
max_value <- apply(dat2[, -1], 1, max, na.rm = TRUE)
# Calculate max_value minus the values from the first column
final_value <- max_value - unlist(dat2[, 1], use.names = FALSE)
return(final_value)
}
max_minus_first(dat, "a")
# [1] 3 2 3
max_minus_first(dat, "b")
# [1] 5 4 5
Solution 2
A solution using the tidyverse. The end product (dat2) is a tibble with the output from each group (a, b, ...)
library(tidyverse)
dat2 <- dat %>%
rowid_to_column() %>%
gather(Column, Value, -rowid, -ends_with(" 1")) %>%
separate(Column, into = c("Group", "Column_Number")) %>%
gather(Column_1, Value_1, ends_with(" 1")) %>%
separate(Column_1, into = c("Group_1", "Column_Number_1")) %>%
filter(Group == Group_1) %>%
group_by(rowid, Group, Value_1) %>%
summarise(Value = max(Value, na.rm = TRUE)) %>%
mutate(Final = Value - Value_1) %>%
ungroup() %>%
select(-starts_with("Value")) %>%
spread(Group, Final)
dat2
# # A tibble: 3 x 3
# rowid a b
# * <int> <dbl> <dbl>
# 1 1 3 5
# 2 2 2 4
# 3 3 3 5
Explanation
rowid_to_column() is from the tibble package, a way to create a new column based on row ID.
gather is from the tidyr package to convert the data frame from the wide format to long format. I used gather twice because the first column of each group is different than other columns in the same group. ends_with(" 1") is a select helper function from the dplyr, which select the column with a name ending in " 1". Notice that the space in " 1" is important because "1" may select other columns like a 11 if such columns exist.
separate is from the tidyr package to separate a column into two columns. I used it to separate the Group name and column numbers in each Group.
filter(Group == Group_1) is to filter rows with Group == Group_1.
group_by(rowid, Group, Value_1) and then summarise(Value = max(Value, na.rm = TRUE)) make sure the maximum from each Group is calculated.
mutate(Final = Value - Value_1) is to calculate the difference between maximum from each Group and the value from the first column. The results are stored in the Final column.
select(-starts_with("Value")) removes any columns with a name beginning with "Value".
spread from the tidyr package converts the data frame from long format to wide format.
Solution 3
Another tidyverse solution, which similar to Solution 2. It uses do to conduct operation to each Group hence making the code more concise.
dat2 <- dat %>%
rowid_to_column() %>%
gather(Column, Value, -rowid) %>%
separate(Column, into = c("Group", "Column_Number")) %>%
group_by(rowid, Group) %>%
do(data_frame(Max = max(.$Value[.$Column_Number != 1]),
First = .$Value[.$Column_Number == 1])) %>%
mutate(Final = Max - First) %>%
select(-Max, -First) %>%
spread(Group, Final) %>%
ungroup()
dat2
# # A tibble: 3 x 3
# rowid a b
# * <int> <dbl> <dbl>
# 1 1 3 5
# 2 2 2 4
# 3 3 3 5

stringr str_locate_all not returning the proper index in a dplyr string

I'm trying to use str_locate_all to find the index of the third occurrence of '/' in a dplyr chain but it's not returning the correct index.
ga.categoryViews.2016 <- ga.data %>%
mutate(province = str_sub(pagePath,2,3),
index = str_locate_all(pagePath, '/')[[1]][,"start"][3],
category = str_sub(pagePath,
str_locate_all(pagePath, '/')[[1]][,"start"][3] + 1,
ifelse(str_detect(pagePath,'\\?'), str_locate(pagePath, '\\?') - 1, str_length(pagePath))
)
)
an example of what it's returning is
The first column is pagePath, the fourth is the index
It seems to be always returning an index of 12.
Any help is appreciated.
Thanks,

You need to use rowwise(), i.e.
library(dplyr)
library(stringr)
df %>%
rowwise() %>%
mutate(new = str_locate_all(v1, '/')[[1]][,2][3])
Source: local data frame [2 x 2]
Groups: <by row>
# A tibble: 2 x 2
# v1 new
# <chr> <int>
#1 /on/srgsfsfs-gfdgdg/dfgsdfg-df 20
#2 /on/sgsddg-dfgsd/dfg-dg 17
DATA
x <- c('/on/srgsfsfs-gfdgdg/dfgsdfg-df', '/on/sgsddg-dfgsd/dfg-dg')
df <- data.frame(v1 = x, stringsAsFactors = F)
df
# v1
#1 /on/srgsfsfs-gfdgdg/dfgsdfg-df
#2 /on/sgsddg-dfgsd/dfg-dg

Reference Column Names in R Functions [duplicate]

I'm trying to transfer my understanding of plyr into dplyr, but I can't figure out how to group by multiple columns.
# make data with weird column names that can't be hard coded
data = data.frame(
asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3], 100, replace=TRUE),
a30mvxigxkghc5cdsvxvyv0ja = sample(LETTERS[1:3], 100, replace=TRUE),
value = rnorm(100)
)
# get the columns we want to average within
columns = names(data)[-3]
# plyr - works
ddply(data, columns, summarize, value=mean(value))
# dplyr - raises error
data %.%
group_by(columns) %.%
summarise(Value = mean(value))
#> Error in eval(expr, envir, enclos) : index out of bounds
What am I missing to translate the plyr example into a dplyr-esque syntax?
Edit 2017: Dplyr has been updated, so a simpler solution is available. See the currently selected answer.

Just so as to write the code in full, here's an update on Hadley's answer with the new syntax:
library(dplyr)
df <- data.frame(
asihckhdoydk = sample(LETTERS[1:3], 100, replace=TRUE),
a30mvxigxkgh = sample(LETTERS[1:3], 100, replace=TRUE),
value = rnorm(100)
)
# Columns you want to group by
grp_cols <- names(df)[-3]
# Convert character vector to list of symbols
dots <- lapply(grp_cols, as.symbol)
# Perform frequency counts
df %>%
group_by_(.dots=dots) %>%
summarise(n = n())
output:
Source: local data frame [9 x 3]
Groups: asihckhdoydk
asihckhdoydk a30mvxigxkgh n
1 A A 10
2 A B 10
3 A C 13
4 B A 14
5 B B 10
6 B C 12
7 C A 9
8 C B 12
9 C C 10

Since this question was posted, dplyr added scoped versions of group_by (documentation here). This lets you use the same functions you would use with select, like so:
data = data.frame(
asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3], 100, replace=TRUE),
a30mvxigxkghc5cdsvxvyv0ja = sample(LETTERS[1:3], 100, replace=TRUE),
value = rnorm(100)
)
# get the columns we want to average within
columns = names(data)[-3]
library(dplyr)
df1 <- data %>%
group_by_at(vars(one_of(columns))) %>%
summarize(Value = mean(value))
#compare plyr for reference
df2 <- plyr::ddply(data, columns, plyr::summarize, value=mean(value))
table(df1 == df2, useNA = 'ifany')
## TRUE
## 27
The output from your example question is as expected (see comparison to plyr above and output below):
# A tibble: 9 x 3
# Groups: asihckhdoydkhxiydfgfTgdsx [?]
asihckhdoydkhxiydfgfTgdsx a30mvxigxkghc5cdsvxvyv0ja Value
<fctr> <fctr> <dbl>
1 A A 0.04095002
2 A B 0.24943935
3 A C -0.25783892
4 B A 0.15161805
5 B B 0.27189974
6 B C 0.20858897
7 C A 0.19502221
8 C B 0.56837548
9 C C -0.22682998
Note that since dplyr::summarize only strips off one layer of grouping at a time, you've still got some grouping going on in the resultant tibble (which can sometime catch people by suprise later down the line). If you want to be absolutely safe from unexpected grouping behavior, you can always add %>% ungroup to your pipeline after you summarize.

The support for this in dplyr is currently pretty weak, eventually I think the syntax will be something like:
df %.% group_by(.groups = c("asdfgfTgdsx", "asdfk30v0ja"))
But that probably won't be there for a while (because I need to think through all the consequences).
In the meantime, you can use regroup(), which takes a list of symbols:
library(dplyr)
df <- data.frame(
asihckhdoydk = sample(LETTERS[1:3], 100, replace=TRUE),
a30mvxigxkgh = sample(LETTERS[1:3], 100, replace=TRUE),
value = rnorm(100)
)
df %.%
regroup(list(quote(asihckhdoydk), quote(a30mvxigxkgh))) %.%
summarise(n = n())
If you have have a character vector of column names, you can convert them to the right structure with lapply() and as.symbol():
vars <- setdiff(names(df), "value")
vars2 <- lapply(vars, as.symbol)
df %.% regroup(vars2) %.% summarise(n = n())

String specification of columns in dplyr are now supported through variants of the dplyr functions with names finishing in an underscore. For example, corresponding to the group_by function there is a group_by_ function that may take string arguments. This vignette describes the syntax of these functions in detail.
The following snippet cleanly solves the problem that #sharoz originally posed (note the need to write out the .dots argument):
# Given data and columns from the OP
data %>%
group_by_(.dots = columns) %>%
summarise(Value = mean(value))
(Note that dplyr now uses the %>% operator, and %.% is deprecated).

Update with across() from dplyr 1.0.0
All the answers above are still working, and the solutions with the .dots argument are intruiging.
BUT if you look for a solution that is easier to remember, the new across() comes in handy. It was published 2020-04-03 by Hadley Wickham and can be used in mutate() and summarise() and replace the scoped variants like _at or _all. Above all, it replaces very elegantly the cumbersome non-standard evaluation (NSE) with quoting/unquoting such as !!! rlang::syms().
So the solution with across looks very readable:
data %>%
group_by(across(all_of(columns))) %>%
summarize(Value = mean(value))

Until dplyr has full support for string arguments, perhaps this gist is useful:
https://gist.github.com/skranz/9681509
It contains bunch of wrapper functions like s_group_by, s_mutate, s_filter, etc that use string arguments. You can mix them with the normal dplyr functions. For example
cols = c("cyl","gear")
mtcars %.%
s_group_by(cols) %.%
s_summarise("avdisp=mean(disp), max(disp)") %.%
arrange(avdisp)

It works if you pass it the objects (well, you aren't, but...) rather than as a character vector:
df %.%
group_by(asdfgfTgdsx, asdfk30v0ja) %.%
summarise(Value = mean(value))
> df %.%
+ group_by(asdfgfTgdsx, asdfk30v0ja) %.%
+ summarise(Value = mean(value))
Source: local data frame [9 x 3]
Groups: asdfgfTgdsx
asdfgfTgdsx asdfk30v0ja Value
1 A C 0.046538002
2 C B -0.286359899
3 B A -0.305159419
4 C A -0.004741504
5 B B 0.520126476
6 C C 0.086805492
7 B C -0.052613078
8 A A 0.368410146
9 A B 0.088462212
where df was your data.
?group_by says:
...: variables to group by. All tbls accept variable names, some
will also accept functons of variables. Duplicated groups
will be silently dropped.
which I interpret to mean not the character versions of the names, but how you would refer to them in foo$bar; bar is not quoted here. Or how you'd refer to variables in a formula: foo ~ bar.
#Arun also mentions that you can do:
df %.%
group_by("asdfgfTgdsx", "asdfk30v0ja") %.%
summarise(Value = mean(value))
But you can't pass in something that unevaluated is not a name of a variable in the data object.
I presume this is due to the internal methods Hadley is using to look up the things you pass in via the ... argument.

data = data.frame(
my.a = sample(LETTERS[1:3], 100, replace=TRUE),
my.b = sample(LETTERS[1:3], 100, replace=TRUE),
value = rnorm(100)
)
group_by(data,newcol=paste(my.a,my.b,sep="_")) %>% summarise(Value=mean(value))

One (tiny) case that is missing from the answers here, that I wanted to make explicit, is when the variables to group by are generated dynamically midstream in a pipeline:
library(wakefield)
df_foo = r_series(rnorm, 10, 1000)
df_foo %>%
# 1. create quantized versions of base variables
mutate_each(
funs(Quantized = . > 0)
) %>%
# 2. group_by the indicator variables
group_by_(
.dots = grep("Quantized", names(.), value = TRUE)
) %>%
# 3. summarize the base variables
summarize_each(
funs(sum(., na.rm = TRUE)), contains("X_")
)
This basically shows how to use grep in conjunction with group_by_(.dots = ...) to achieve this.

General example on using the .dots argument as character vector input to the dplyr::group_by function :
iris %>%
group_by(.dots ="Species") %>%
summarise(meanpetallength = mean(Petal.Length))
Or without a hard coded name for the grouping variable (as asked by the OP):
iris %>%
group_by(.dots = names(iris)[5]) %>%
summarise_at("Petal.Length", mean)
With the example of the OP:
data %>%
group_by(.dots =names(data)[-3]) %>%
summarise_at("value", mean)
See also the dplyr vignette on programming which explains pronouns, quasiquotation, quosures, and tidyeval.

Group by multiple columns in dplyr, using string vector input

I'm trying to transfer my understanding of plyr into dplyr, but I can't figure out how to group by multiple columns.
# make data with weird column names that can't be hard coded
data = data.frame(
asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3], 100, replace=TRUE),
a30mvxigxkghc5cdsvxvyv0ja = sample(LETTERS[1:3], 100, replace=TRUE),
value = rnorm(100)
)
# get the columns we want to average within
columns = names(data)[-3]
# plyr - works
ddply(data, columns, summarize, value=mean(value))
# dplyr - raises error
data %.%
group_by(columns) %.%
summarise(Value = mean(value))
#> Error in eval(expr, envir, enclos) : index out of bounds
What am I missing to translate the plyr example into a dplyr-esque syntax?
Edit 2017: Dplyr has been updated, so a simpler solution is available. See the currently selected answer.

Just so as to write the code in full, here's an update on Hadley's answer with the new syntax:
library(dplyr)
df <- data.frame(
asihckhdoydk = sample(LETTERS[1:3], 100, replace=TRUE),
a30mvxigxkgh = sample(LETTERS[1:3], 100, replace=TRUE),
value = rnorm(100)
)
# Columns you want to group by
grp_cols <- names(df)[-3]
# Convert character vector to list of symbols
dots <- lapply(grp_cols, as.symbol)
# Perform frequency counts
df %>%
group_by_(.dots=dots) %>%
summarise(n = n())
output:
Source: local data frame [9 x 3]
Groups: asihckhdoydk
asihckhdoydk a30mvxigxkgh n
1 A A 10
2 A B 10
3 A C 13
4 B A 14
5 B B 10
6 B C 12
7 C A 9
8 C B 12
9 C C 10

Since this question was posted, dplyr added scoped versions of group_by (documentation here). This lets you use the same functions you would use with select, like so:
data = data.frame(
asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3], 100, replace=TRUE),
a30mvxigxkghc5cdsvxvyv0ja = sample(LETTERS[1:3], 100, replace=TRUE),
value = rnorm(100)
)
# get the columns we want to average within
columns = names(data)[-3]
library(dplyr)
df1 <- data %>%
group_by_at(vars(one_of(columns))) %>%
summarize(Value = mean(value))
#compare plyr for reference
df2 <- plyr::ddply(data, columns, plyr::summarize, value=mean(value))
table(df1 == df2, useNA = 'ifany')
## TRUE
## 27
The output from your example question is as expected (see comparison to plyr above and output below):
# A tibble: 9 x 3
# Groups: asihckhdoydkhxiydfgfTgdsx [?]
asihckhdoydkhxiydfgfTgdsx a30mvxigxkghc5cdsvxvyv0ja Value
<fctr> <fctr> <dbl>
1 A A 0.04095002
2 A B 0.24943935
3 A C -0.25783892
4 B A 0.15161805
5 B B 0.27189974
6 B C 0.20858897
7 C A 0.19502221
8 C B 0.56837548
9 C C -0.22682998
Note that since dplyr::summarize only strips off one layer of grouping at a time, you've still got some grouping going on in the resultant tibble (which can sometime catch people by suprise later down the line). If you want to be absolutely safe from unexpected grouping behavior, you can always add %>% ungroup to your pipeline after you summarize.

The support for this in dplyr is currently pretty weak, eventually I think the syntax will be something like:
df %.% group_by(.groups = c("asdfgfTgdsx", "asdfk30v0ja"))
But that probably won't be there for a while (because I need to think through all the consequences).
In the meantime, you can use regroup(), which takes a list of symbols:
library(dplyr)
df <- data.frame(
asihckhdoydk = sample(LETTERS[1:3], 100, replace=TRUE),
a30mvxigxkgh = sample(LETTERS[1:3], 100, replace=TRUE),
value = rnorm(100)
)
df %.%
regroup(list(quote(asihckhdoydk), quote(a30mvxigxkgh))) %.%
summarise(n = n())
If you have have a character vector of column names, you can convert them to the right structure with lapply() and as.symbol():
vars <- setdiff(names(df), "value")
vars2 <- lapply(vars, as.symbol)
df %.% regroup(vars2) %.% summarise(n = n())

String specification of columns in dplyr are now supported through variants of the dplyr functions with names finishing in an underscore. For example, corresponding to the group_by function there is a group_by_ function that may take string arguments. This vignette describes the syntax of these functions in detail.
The following snippet cleanly solves the problem that #sharoz originally posed (note the need to write out the .dots argument):
# Given data and columns from the OP
data %>%
group_by_(.dots = columns) %>%
summarise(Value = mean(value))
(Note that dplyr now uses the %>% operator, and %.% is deprecated).

Update with across() from dplyr 1.0.0
All the answers above are still working, and the solutions with the .dots argument are intruiging.
BUT if you look for a solution that is easier to remember, the new across() comes in handy. It was published 2020-04-03 by Hadley Wickham and can be used in mutate() and summarise() and replace the scoped variants like _at or _all. Above all, it replaces very elegantly the cumbersome non-standard evaluation (NSE) with quoting/unquoting such as !!! rlang::syms().
So the solution with across looks very readable:
data %>%
group_by(across(all_of(columns))) %>%
summarize(Value = mean(value))

Until dplyr has full support for string arguments, perhaps this gist is useful:
https://gist.github.com/skranz/9681509
It contains bunch of wrapper functions like s_group_by, s_mutate, s_filter, etc that use string arguments. You can mix them with the normal dplyr functions. For example
cols = c("cyl","gear")
mtcars %.%
s_group_by(cols) %.%
s_summarise("avdisp=mean(disp), max(disp)") %.%
arrange(avdisp)

It works if you pass it the objects (well, you aren't, but...) rather than as a character vector:
df %.%
group_by(asdfgfTgdsx, asdfk30v0ja) %.%
summarise(Value = mean(value))
> df %.%
+ group_by(asdfgfTgdsx, asdfk30v0ja) %.%
+ summarise(Value = mean(value))
Source: local data frame [9 x 3]
Groups: asdfgfTgdsx
asdfgfTgdsx asdfk30v0ja Value
1 A C 0.046538002
2 C B -0.286359899
3 B A -0.305159419
4 C A -0.004741504
5 B B 0.520126476
6 C C 0.086805492
7 B C -0.052613078
8 A A 0.368410146
9 A B 0.088462212
where df was your data.
?group_by says:
...: variables to group by. All tbls accept variable names, some
will also accept functons of variables. Duplicated groups
will be silently dropped.
which I interpret to mean not the character versions of the names, but how you would refer to them in foo$bar; bar is not quoted here. Or how you'd refer to variables in a formula: foo ~ bar.
#Arun also mentions that you can do:
df %.%
group_by("asdfgfTgdsx", "asdfk30v0ja") %.%
summarise(Value = mean(value))
But you can't pass in something that unevaluated is not a name of a variable in the data object.
I presume this is due to the internal methods Hadley is using to look up the things you pass in via the ... argument.

data = data.frame(
my.a = sample(LETTERS[1:3], 100, replace=TRUE),
my.b = sample(LETTERS[1:3], 100, replace=TRUE),
value = rnorm(100)
)
group_by(data,newcol=paste(my.a,my.b,sep="_")) %>% summarise(Value=mean(value))

One (tiny) case that is missing from the answers here, that I wanted to make explicit, is when the variables to group by are generated dynamically midstream in a pipeline:
library(wakefield)
df_foo = r_series(rnorm, 10, 1000)
df_foo %>%
# 1. create quantized versions of base variables
mutate_each(
funs(Quantized = . > 0)
) %>%
# 2. group_by the indicator variables
group_by_(
.dots = grep("Quantized", names(.), value = TRUE)
) %>%
# 3. summarize the base variables
summarize_each(
funs(sum(., na.rm = TRUE)), contains("X_")
)
This basically shows how to use grep in conjunction with group_by_(.dots = ...) to achieve this.

General example on using the .dots argument as character vector input to the dplyr::group_by function :
iris %>%
group_by(.dots ="Species") %>%
summarise(meanpetallength = mean(Petal.Length))
Or without a hard coded name for the grouping variable (as asked by the OP):
iris %>%
group_by(.dots = names(iris)[5]) %>%
summarise_at("Petal.Length", mean)
With the example of the OP:
data %>%
group_by(.dots =names(data)[-3]) %>%
summarise_at("value", mean)
See also the dplyr vignette on programming which explains pronouns, quasiquotation, quosures, and tidyeval.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

dplyr summarise and group_by for unique values - r

Related

Loop through specific columns of dataframe keeping some columns as fixed

issues calculating rowwise maximum

stringr str_locate_all not returning the proper index in a dplyr string

Reference Column Names in R Functions [duplicate]

Group by multiple columns in dplyr, using string vector input

Categories

Resources