How can make I several, sequential manipulations of the same variable using dplyr, but more elegantly than the code below?
Specifically, I would like to remove the multiple calls to car_names = without having to nest any of the functions.
mtcars2 <- mtcars %>% mutate(car_names = row.names(.)) %>%
mutate(car_names=stri_extract_first_words(car_names)) %>%
mutate(car_names=as.factor(car_names)
If you want to type less and not nest the function, you can use the pipe inside the mutate call :
library(dplyr)
library(stringi)
# What you did
mtcars2 <- mtcars %>%
mutate(car_names = row.names(.)) %>%
mutate(car_names = stri_extract_first_words(car_names)) %>%
mutate(car_names = as.factor(car_names))
# Another way with less typing and no nesting
mtcars3 <- mtcars %>%
mutate(car_names = rownames(.) %>%
stri_extract_first_words(.) %>%
as.factor(.))
identical(mtcars2, mtcars3)
[1] TRUE
Related
I would like to use a tidy approach to produce correlograms by group.
My attempt with iris and libraries dplyr and corrplot:
library(corrplot)
library(dplyr)
par(mfrow=c(2,2))
iris %>%
group_by(Species) %>%
group_map(~ corrplot::corrplot(cor(.x,use = "complete.obs"),tl.cex=0.7,title =""))
It works but I would like to add the Species name on each plot.
Also, any other tidy approaches/ functions are very welcome!
We could use cur_group()
library(dplyr)
library(corrplot)
out <- iris %>%
group_by(Species) %>%
summarise(outr = list( corrplot::corrplot(cor(cur_data(),
use = "complete.obs"),tl.cex=0.7,title = cur_group()[[1]])))
Or if we are using group_map, the .keep = FALSE by default. Specify it as TRUE and extract the group element
iris %>%
group_by(Species) %>%
group_map(~ corrplot::corrplot(cor(select(.x, where(is.numeric)),
use = "complete.obs"),tl.cex=0.7,title = first(.x$Species)), .keep = TRUE)
You can use split and map approach with imap -
library(dplyr)
library(purrr)
iris %>%
split(.$Species) %>%
imap(~corrplot::corrplot(cor(.x[-5],use ="complete.obs"),tl.cex=0.7,title =.y))
I have some code that specifies a grouping variable as a string.
group_var <- "cyl"
My current code for using this grouping variable in a dplyr pipeline is:
mtcars %>%
group_by_(group_var) %>%
summarize(mean_mpg = mean(mpg))
My best guess as to how to replace the deprecated group_by_ function with group_by is:
mtcars %>%
group_by(!!as.name(group_var)) %>%
summarize(mean_mpg = mean(mpg))
This works but is not explicitly mentioned in the programming with dplyr vignette.
Is using !!as.name() the preferred way to replace group_by_() with group_by()?
Is this within a function? Otherwise I think the !!as.name() part is unnecessary and I would stick with the group_by_at(group_var) suggestion by #aosmith for simplicity sake. Otherwise, I would set it up as so:
examplr <- function(data, group_var){
group_var <- as.name(group_var)
data %>%
group_by(!!group_var) %>%
summarize(mean_mpg = mean(mpg))
}
examplr(data = mtcars,
group_var = "cyl")
Lets say I want to split out mtcars into 3 csv files based on their cyl grouping. I can use mutate to do this, but it will create a NULL column in the output.
library(tidyverse)
by_cyl = mtcars %>%
group_by(cyl) %>%
nest()
by_cyl %>%
mutate(unused = map2(data, cyl, function(x, y) write.csv(x, paste0(y, '.csv'))))
is there a way to do this on the by_cyl object without calling mutate?
Here is an option using purrr without mutate from dplyr.
library(tidyverse)
mtcars %>%
split(.$cyl) %>%
walk2(names(.), ~write_csv(.x, paste0(.y, '.csv')))
Update
This drops the cyl column before saving the output.
library(tidyverse)
mtcars %>%
split(.$cyl) %>%
map(~ .x %>% select(-cyl)) %>%
walk2(names(.), ~write_csv(.x, paste0(.y, '.csv')))
Update2
library(tidyverse)
by_cyl <- mtcars %>%
group_by(cyl) %>%
nest()
by_cyl %>%
split(.$cyl) %>%
walk2(names(.), ~write_csv(.x[["data"]][[1]], paste0(.y, '.csv')))
Here's a solution with do and group_by, so if your data is already grouped as it should, you save one line:
mtcars %>%
group_by(cyl) %>%
do(data.frame(write.csv(.,paste0(.$cyl[1],".csv"))))
data.frame is only used here because do needs to return a data.frame, so it's a little hack.
I used to do it, using group_by_
library(dplyr)
group_by <- c('cyl', 'vs')
mtcars %>% group_by_(.dots = group_by) %>% summarise(gear = mean(gear))
but now group_by_ is deprecated. I don't know how to do it using the tidy evaluation framework.
New answer
With dplyr 1.0, you can now use selection helpers like all_of() inside across():
df |>
group_by(
across(all_of(my_vars))
)
Old answer
Transform the character vector into a list of symbols and splice it in
df %>% group_by(!!!syms(group_by))
There is group_by_at variant of group_by:
library(dplyr)
group_by <- c('cyl', 'vs')
mtcars %>% group_by_at(group_by) %>% summarise(gear = mean(gear))
Above it's simplified version of generalized:
mtcars %>% group_by_at(vars(one_of(group_by))) %>% summarise(gear = mean(gear))
inside vars you could use any dplyr way of select variables:
mtcars %>%
group_by_at(vars(
one_of(group_by) # columns from predefined set
,starts_with("a") # add ones started with a
,-hp # but omit that one
,vs # this should be always include
,contains("_gr_") # and ones with string _gr_
)) %>%
summarise(gear = mean(gear))
Apply function table() to each column of a data.frame using dplyr
I often apply the table-function on each column of a data frame using plyr, like this:
library(plyr)
ldply( mtcars, function(x) data.frame( table(x), prop.table( table(x) ) ) )
Is it possible to do this in dplyr also?
My attempts fail:
mtcars %>% do( table %>% data.frame() )
melt( mtcars ) %>% do( table %>% data.frame() )
You can try the following which does not rely on the tidyr package.
mtcars %>%
lapply(table) %>%
lapply(as.data.frame) %>%
Map(cbind,var = names(mtcars),.) %>%
rbind_all() %>%
group_by(var) %>%
mutate(pct = Freq / sum(Freq))
Using tidyverse (dplyr and purrr):
library(tidyverse)
mtcars %>%
map( function(x) table(x) )
Or:
mtcars %>%
map(~ table(.x) )
Or simply:
library(tidyverse)
mtcars %>%
map( table )
In general you probably would not want to run table() on every column of a data frame because at least one of the variables will be unique (an id field) and produce a very long output. However, you can use group_by() and tally() to obtain frequency tables in a dplyr chain. Or you can use count() which does the group_by() for you.
> mtcars %>%
group_by(cyl) %>%
tally()
> # mtcars %>% count(cyl)
Source: local data frame [3 x 2]
cyl n
1 4 11
2 6 7
3 8 14
If you want to do a two-way frequency table, group by more than one variable.
> mtcars %>%
group_by(gear, cyl) %>%
tally()
> # mtcars %>% count(gear, cyl)
You can use spread() of the tidyr package to turn that two-way output into the output one is used to receiving with table() when two variables are input.
Solution by Caner did not work but from comenter akrun (credit goes to him), this solution worked great. Also using a much larger tibble to demo it. Also I added an order by percent descending.
library(nycflights13);dim(flights)
tte<-gather(flights, Var, Val) %>%
group_by(Var) %>% dplyr::mutate(n=n()) %>%
group_by(Var,Val) %>% dplyr::mutate(n1=n(), Percent=n1/n)%>%
arrange(Var,desc(n1) %>% unique()