Using map on specific column in list? - r

I'm trying to split a dataframe in a list of dataframes and then sort each dataframe by a specific variable using map(). I thought my approach would work, but I'm obviously not correctly passing something to the function, but I'm unsure as to how to make it work. For instance, using lapply() I could do this:
library(tidyverse)
df = iris
df %>%
group_split(Species) %>%
{lapply(.,function(x) {x %>% arrange(desc(Sepal.Length))})}
Using map(), I've tried this approach but it's not working:
df %>%
group_split(Species) %>%
map(.,arrange(Sepal.Length),desc)
How can I structure this so it works? I only want to apply the map() to one of the columns as in the lapply() example.

df %>%
group_split(Species) %>%
map(~arrange(.data = .x, desc(Sepal.Length)))
or
df %>%
group_split(Species) %>%
map(~.x %>% arrange(desc(Sepal.Length)))

Related

Filter the first group after group_by

Sometimes it is handy to take a test case out of your data when working with group_by() from the dplyr library. I was wondering if there is any fast way to just grab the first group of a grouped dataframe and cast it to a new dataframe.
All I could come up with was this workaround:
library(dplyr)
smalldf <- mtcars %>% group_by(gear) %>% group_split(.) %>% .[[1]]

tabulate using tabyl by grouping variable using group_split and group_map

To get a quick frequency (tabulate) of one column or multiple columns at the one time I use tabyl function like so:
library(janitor)
library(tidyverse)
#tabulate one column at a time
iris %>%
tabyl(Petal.Width)
#tabulate multiple columns at once using map
iris %>%
select(Petal.Width, Petal.Length) %>%
map(tabyl)
I'm trying to replicate these two cases but have the output by a grouping variable, Species in this example. I would like the simplest solution and I would like to try the newer group_split and group_map commands for this.
I have been able to produce a similar type output in a dataframe format (although a simple list that tabyl produces is what I want for the case of more than one variable):
#works
iris %>%
group_by(Species) %>%
nest() %>%
mutate(out = map(data, ~ tabyl(.x$Petal.Width) %>%
as_tibble)) %>%
select(-data) %>%
unnest
This works but I would have thought it could be a bit more simple like my column method approach, I was thinking something like this for one column per grouping variable:
#by group for one column
iris %>%
group_by(Species) %>%
group_split() %>%
map(~tabyl(Petal.Width))
For multiple columns I'm not sure I need the select row here? Maybe group_map could simplify it in one line?
#by group for multiple columns
iris %>%
#do i need to select grouping variable and variables of interest?
select(Species, Petal.Width, Petal.Length) %>%
group_by(Species) %>%
group_split() %>%
map(~tabyl()) #could I use group_map and select the columns at once?
Any suggestions please?
iris %>%
#use split(.$Species) if you need a list with names
group_split(Species) %>%
map(~imap(.x %>%select(Species, Petal.Width, Petal.Length),
function(x,y){
out <-tabyl(x)
colnames(out)[1]=y
out}))
If you jsut need the default column name for the first column, then you can do iris %>% group_split(Species) %>% map(~map(.x, tabyl))

Can I use mutate() to mimic a value i would join from a summarize() with dplyr?

I feel like there is a more elegant way with dplyr to recreate the following result of joining the results of a summarize call with mutate.
inner_join(iris,
iris %>% group_by(Species) %>% summarize(n = length(Species),
Mean.Sepal.Length = mean(Sepal.Length)),
by = "Species")
When I feel there may be a way to use mutate in this way...
#iris %>% mutate(???)
No need for the inner_join You can just do group_by() with a mutate().
iris %>%
group_by(Species) %>%
mutate(n=n(), Mean.Sepal.Length=mean(Sepal.Length))

Subset a dplyr result

Im trying to subset the result of a dplyr call. Can someone explain why this doesnt work?
library(dplyr)
df<-data.frame(name=c("bob","ann"),age=c(22,24),random=c(1,2))
View(df%>%filter(name=="bob")) #works fine
#Now to avoid showing the random column I tried:
View(df%>%filter(name="bob")[,c(1,2)]) #standard subset notation to remove column 3 doesnt work here
I think if you're going to use dplyr to filter the df, you should use dplyr to select from the df. Not sure if there's any performance differences.
df %>%
filter(name == "bob") %>%
select(1,2)
df %>%
filter(name == "bob") %>%
select(name, age)

group_by and global mean within a single dplyr pipe

Is there a way using dplyr to summarise using group_by() then take a global mean, then add that to the same data frame without having to create a second dataframe?
Right now I am doing this like this:
library(dplyr)
speciesiris <- iris %>%
group_by(Species) %>%
summarise(mpw=mean(Petal.Width))
iris %>%
summarise(mpw=mean(Petal.Width)) %>%
mutate(Species="All Species") %>%
bind_rows(speciesiris)
One potential pitfall here is that I want not the mean of means but rather a global mean or at least the option of both. So is there a better way of doing this hopefully all in one pipe?
One line to do everything (but not recommended):
iris %>% summarise(mpw=mean(Petal.Width)) # Global mean
%>% mutate(Species="All Species")
%>% bind_rows(
iris %>% group_by(Species) # Mean by Species
%>% summarise(mpw=mean(Petal.Width))
)

Resources