I am trying to figure out how to mutate a single column of data by several functions using dplyr. I can do every column:
library(dplyr)
iris %>%
group_by(Species) %>%
mutate_all(funs(min, max))
But I don't know how to select one column. I can imagine something like this though this obviously does not run:
iris %>%
group_by(Species) %>%
mutate(Sepal.Length, funs(min, max))
I can sort of accomplish this task using do() and a custom function like this:
summary_func = function(x){
tibble(max_out = max(x),
min_out = min(x)
)
}
iris %>%
group_by(Species) %>%
do(summary_func(.$Sepal.Length))
However this doesn't really do what I want to do either because it isn't adding to the exist tibble a la mutate.
Any ideas?
Use mutate_at
iris %>%
group_by(Species) %>%
mutate_at("Sepal.Length", funs(min, max))
It takes a character so watch the quotes
Use mutate
iris %>%
group_by(Species) %>%
mutate(min = min(Sepal.Length),
max = max(Sepal.Length))
Related
I have created this function that quickly does some summarization operations (mean, median, geometric mean and arranges them in descending order). This is the function:
summarize_values <- function(tbl, variable){
tbl %>%
summarize(summarized_mean = mean({{variable}}),
summarized_median = median({{variable}}),
geom_mean = exp(mean(log({{variable}}))),
n = n()) %>%
arrange(desc(n))
}
I can do this and it works:
summarize_values(data, lifeExp)
However, I would like to be able to do this:
data %>%
select(year, lifeExp) %>%
summarize_values()
or something like this
data %>%
summarize_values(year, lifeExp)
What am I missing to make this work?
thanks
With pipe, we don't need to specify the first argument which is the tbl,
library(dplyr)
data %>%
summarize_values(lifeExp)
-reproducible example
> mtcars %>%
summarize_values(gear)
summarized_mean summarized_median geom_mean n
1 3.6875 4 3.619405 32
I would like to use a tidy approach to produce correlograms by group.
My attempt with iris and libraries dplyr and corrplot:
library(corrplot)
library(dplyr)
par(mfrow=c(2,2))
iris %>%
group_by(Species) %>%
group_map(~ corrplot::corrplot(cor(.x,use = "complete.obs"),tl.cex=0.7,title =""))
It works but I would like to add the Species name on each plot.
Also, any other tidy approaches/ functions are very welcome!
We could use cur_group()
library(dplyr)
library(corrplot)
out <- iris %>%
group_by(Species) %>%
summarise(outr = list( corrplot::corrplot(cor(cur_data(),
use = "complete.obs"),tl.cex=0.7,title = cur_group()[[1]])))
Or if we are using group_map, the .keep = FALSE by default. Specify it as TRUE and extract the group element
iris %>%
group_by(Species) %>%
group_map(~ corrplot::corrplot(cor(select(.x, where(is.numeric)),
use = "complete.obs"),tl.cex=0.7,title = first(.x$Species)), .keep = TRUE)
You can use split and map approach with imap -
library(dplyr)
library(purrr)
iris %>%
split(.$Species) %>%
imap(~corrplot::corrplot(cor(.x[-5],use ="complete.obs"),tl.cex=0.7,title =.y))
I have a set of chains of pipe operators (%>%) doing different things with different datasets.
For instance:
dataset %>%
mutate(...) %>%
filter(...) %>%
rowwise() %>%
summarise() %>%
etc...
If I want to reuse some parts of these chains, is there a way to do it, without just wrapping it into a function?
For instance (in pseudocode obviously):
subchain <- filter(...) %>%
rowwise() %>%
summarise()
# and then instead of the chain above it would be:
dataset %>%
mutate(...) %>%
subchain() %>%
etc...
Similar in syntax to desired pseudo-code:
library(dplyr)
subchain <- . %>%
filter(mass > mean(mass, na.rm = TRUE)) %>%
select(name, gender, homeworld)
all.equal(
starwars %>%
group_by(gender) %>%
filter(mass > mean(mass, na.rm = TRUE)) %>%
select(name, gender, homeworld),
starwars %>%
group_by(gender) %>%
subchain()
)
Using a dot . as start of a piping sequence. This is in effect close to function wrapping, but this is called a magrittr functional sequence. See ?functions and try magrittr::functions(subchain)
Thanks again for allowing me to be a part of the community. I appreciate it immensely and iv'e learned a lot.
I would like to aggregate two colums as means of the rows (by group) and keep the other columns.
transmute_at has done a nice job with the mean, but has dropped the other columns.
Plus, I saw this is a sort of deprecated function, any thoughts on how to do it with dplyr 1.0?
This is the code
prod<-iris
prod_avg <- iris %>% filter(!is.na(Species) | Species != "") %>%
group_by(Species) %>%
transmute_at(
c("Sepal.Length","Sepal.Width"), ~ mean(.x, na.rm=T))
Instead of transmute_at use mutate_at
library(dplyr)
iris %>%
filter(!is.na(Species) | Species != "") %>%
#There are no NA or empty values in Species though
group_by(Species) %>%
mutate_at(vars(c("Sepal.Length","Sepal.Width")), ~ mean(.x, na.rm=TRUE))
In dplyr 1.0.0 use across
iris %>%
filter(!is.na(Species) | Species != "") %>%
group_by(Species) %>%
mutate(across(c(Sepal.Length,Sepal.Width), ~ mean(.x, na.rm=TRUE)))
I used to do it, using group_by_
library(dplyr)
group_by <- c('cyl', 'vs')
mtcars %>% group_by_(.dots = group_by) %>% summarise(gear = mean(gear))
but now group_by_ is deprecated. I don't know how to do it using the tidy evaluation framework.
New answer
With dplyr 1.0, you can now use selection helpers like all_of() inside across():
df |>
group_by(
across(all_of(my_vars))
)
Old answer
Transform the character vector into a list of symbols and splice it in
df %>% group_by(!!!syms(group_by))
There is group_by_at variant of group_by:
library(dplyr)
group_by <- c('cyl', 'vs')
mtcars %>% group_by_at(group_by) %>% summarise(gear = mean(gear))
Above it's simplified version of generalized:
mtcars %>% group_by_at(vars(one_of(group_by))) %>% summarise(gear = mean(gear))
inside vars you could use any dplyr way of select variables:
mtcars %>%
group_by_at(vars(
one_of(group_by) # columns from predefined set
,starts_with("a") # add ones started with a
,-hp # but omit that one
,vs # this should be always include
,contains("_gr_") # and ones with string _gr_
)) %>%
summarise(gear = mean(gear))