group_by by a vector of characters using tidy evaluation semantics - r

I used to do it, using group_by_
library(dplyr)
group_by <- c('cyl', 'vs')
mtcars %>% group_by_(.dots = group_by) %>% summarise(gear = mean(gear))
but now group_by_ is deprecated. I don't know how to do it using the tidy evaluation framework.

New answer
With dplyr 1.0, you can now use selection helpers like all_of() inside across():
df |>
group_by(
across(all_of(my_vars))
)
Old answer
Transform the character vector into a list of symbols and splice it in
df %>% group_by(!!!syms(group_by))

There is group_by_at variant of group_by:
library(dplyr)
group_by <- c('cyl', 'vs')
mtcars %>% group_by_at(group_by) %>% summarise(gear = mean(gear))
Above it's simplified version of generalized:
mtcars %>% group_by_at(vars(one_of(group_by))) %>% summarise(gear = mean(gear))
inside vars you could use any dplyr way of select variables:
mtcars %>%
group_by_at(vars(
one_of(group_by) # columns from predefined set
,starts_with("a") # add ones started with a
,-hp # but omit that one
,vs # this should be always include
,contains("_gr_") # and ones with string _gr_
)) %>%
summarise(gear = mean(gear))

Related

Changing factors order inside a function [duplicate]

I have been reading from this SO post on how to work with string references to variables in dplyr.
I would like to mutate a existing column based on string input:
var <- 'vs'
my_mtcars <- mtcars %>%
mutate(get(var) = factor(get(var)))
Error: unexpected '=' in:
"my_mtcars <- mtcars %>%
mutate(get(var) ="
Also tried:
my_mtcars <- mtcars %>%
mutate(!! rlang::sym(var) = factor(!! rlang::symget(var)))
This resulted in the exact same error message.
How can I do the following based on passing string 'vs' within var variable to mutate?
# works
my_mtcars <- mtcars %>%
mutate(vs = factor(vs))
This operation can be carried out with := while evaluating (!!) and using the conversion to symbol and evaluating on the rhs of assignment
library(dplyr)
my_mtcars <- mtcars %>%
mutate(!! var := factor(!! rlang::sym(var)))
class(my_mtcars$vs)
#[1] "factor"
Or without thinking too much, use mutate_at, which can take strings in vars and apply the function of interest
my_mtcars2 <- mtcars %>%
mutate_at(vars(var), factor)

Replacing group_by_ with group_by when the argument is a string in dplyr

I have some code that specifies a grouping variable as a string.
group_var <- "cyl"
My current code for using this grouping variable in a dplyr pipeline is:
mtcars %>%
group_by_(group_var) %>%
summarize(mean_mpg = mean(mpg))
My best guess as to how to replace the deprecated group_by_ function with group_by is:
mtcars %>%
group_by(!!as.name(group_var)) %>%
summarize(mean_mpg = mean(mpg))
This works but is not explicitly mentioned in the programming with dplyr vignette.
Is using !!as.name() the preferred way to replace group_by_() with group_by()?
Is this within a function? Otherwise I think the !!as.name() part is unnecessary and I would stick with the group_by_at(group_var) suggestion by #aosmith for simplicity sake. Otherwise, I would set it up as so:
examplr <- function(data, group_var){
group_var <- as.name(group_var)
data %>%
group_by(!!group_var) %>%
summarize(mean_mpg = mean(mpg))
}
examplr(data = mtcars,
group_var = "cyl")

dplyr::mutate multiple functions on one column

I am trying to figure out how to mutate a single column of data by several functions using dplyr. I can do every column:
library(dplyr)
iris %>%
group_by(Species) %>%
mutate_all(funs(min, max))
But I don't know how to select one column. I can imagine something like this though this obviously does not run:
iris %>%
group_by(Species) %>%
mutate(Sepal.Length, funs(min, max))
I can sort of accomplish this task using do() and a custom function like this:
summary_func = function(x){
tibble(max_out = max(x),
min_out = min(x)
)
}
iris %>%
group_by(Species) %>%
do(summary_func(.$Sepal.Length))
However this doesn't really do what I want to do either because it isn't adding to the exist tibble a la mutate.
Any ideas?
Use mutate_at
iris %>%
group_by(Species) %>%
mutate_at("Sepal.Length", funs(min, max))
It takes a character so watch the quotes
Use mutate
iris %>%
group_by(Species) %>%
mutate(min = min(Sepal.Length),
max = max(Sepal.Length))

Multiple manipulations to the same variable using dplyr

How can make I several, sequential manipulations of the same variable using dplyr, but more elegantly than the code below?
Specifically, I would like to remove the multiple calls to car_names = without having to nest any of the functions.
mtcars2 <- mtcars %>% mutate(car_names = row.names(.)) %>%
mutate(car_names=stri_extract_first_words(car_names)) %>%
mutate(car_names=as.factor(car_names)
If you want to type less and not nest the function, you can use the pipe inside the mutate call :
library(dplyr)
library(stringi)
# What you did
mtcars2 <- mtcars %>%
mutate(car_names = row.names(.)) %>%
mutate(car_names = stri_extract_first_words(car_names)) %>%
mutate(car_names = as.factor(car_names))
# Another way with less typing and no nesting
mtcars3 <- mtcars %>%
mutate(car_names = rownames(.) %>%
stri_extract_first_words(.) %>%
as.factor(.))
identical(mtcars2, mtcars3)
[1] TRUE

dplyr: apply function table() to each column of a data.frame

Apply function table() to each column of a data.frame using dplyr
I often apply the table-function on each column of a data frame using plyr, like this:
library(plyr)
ldply( mtcars, function(x) data.frame( table(x), prop.table( table(x) ) ) )
Is it possible to do this in dplyr also?
My attempts fail:
mtcars %>% do( table %>% data.frame() )
melt( mtcars ) %>% do( table %>% data.frame() )
You can try the following which does not rely on the tidyr package.
mtcars %>%
lapply(table) %>%
lapply(as.data.frame) %>%
Map(cbind,var = names(mtcars),.) %>%
rbind_all() %>%
group_by(var) %>%
mutate(pct = Freq / sum(Freq))
Using tidyverse (dplyr and purrr):
library(tidyverse)
mtcars %>%
map( function(x) table(x) )
Or:
mtcars %>%
map(~ table(.x) )
Or simply:
library(tidyverse)
mtcars %>%
map( table )
In general you probably would not want to run table() on every column of a data frame because at least one of the variables will be unique (an id field) and produce a very long output. However, you can use group_by() and tally() to obtain frequency tables in a dplyr chain. Or you can use count() which does the group_by() for you.
> mtcars %>%
group_by(cyl) %>%
tally()
> # mtcars %>% count(cyl)
Source: local data frame [3 x 2]
cyl n
1 4 11
2 6 7
3 8 14
If you want to do a two-way frequency table, group by more than one variable.
> mtcars %>%
group_by(gear, cyl) %>%
tally()
> # mtcars %>% count(gear, cyl)
You can use spread() of the tidyr package to turn that two-way output into the output one is used to receiving with table() when two variables are input.
Solution by Caner did not work but from comenter akrun (credit goes to him), this solution worked great. Also using a much larger tibble to demo it. Also I added an order by percent descending.
library(nycflights13);dim(flights)
tte<-gather(flights, Var, Val) %>%
group_by(Var) %>% dplyr::mutate(n=n()) %>%
group_by(Var,Val) %>% dplyr::mutate(n1=n(), Percent=n1/n)%>%
arrange(Var,desc(n1) %>% unique()

Resources