preserve dataframe name using dplyr group_split [duplicate] - r

This question already has answers here:
Give name to list variable
(3 answers)
Closed 3 years ago.
using group_split from dplyr but I need every dataframe in the list to preserve the name.
Example from dplyr documentation (notice the dataframes are numbered. The optimal output is every dataframe to have the name of the grouped variable (Setosa, versicolor....):
ir <- iris %>%
group_by(Species)
group_split(ir)
#> [[1]]
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # … with 40 more rows
#>
#> [[2]]
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 7 3.2 4.7 1.4 versicolor
#> 2 6.4 3.2 4.5 1.5 versicolor
#> 3 6.9 3.1 4.9 1.5 versicolor
#> 4 5.5 2.3 4 1.3 versicolor
#> 5 6.5 2.8 4.6 1.5 versicolor
#> 6 5.7 2.8 4.5 1.3 versicolor
#> 7 6.3 3.3 4.7 1.6 versicolor
#> 8 4.9 2.4 3.3 1 versicolor
#> 9 6.6 2.9 4.6 1.3 versicolor
#> 10 5.2 2.7 3.9 1.4 versicolor
#> # … with 40 more rows
#>
#> [[3]]
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 6.3 3.3 6 2.5 virginica
#> 2 5.8 2.7 5.1 1.9 virginica
#> 3 7.1 3 5.9 2.1 virginica
#> 4 6.3 2.9 5.6 1.8 virginica
#> 5 6.5 3 5.8 2.2 virginica
#> 6 7.6 3 6.6 2.1 virginica
#> 7 4.9 2.5 4.5 1.7 virginica
#> 8 7.3 2.9 6.3 1.8 virginica
#> 9 6.7 2.5 5.8 1.8 virginica
#> 10 7.2 3.6 6.1 2.5 virginica
#> # … with 40 more rows
#>
#> attr(,"ptype")
#> # A tibble: 0 x 5
#> # … with 5 variables: Sepal.Length <dbl>, Sepal.Width <dbl>,
#> # Petal.Length <dbl>, Petal.Width <dbl>, Species <fct>

group_split does not preserve names. From ?group_split
it does not name the elements of the list based on the grouping as this typically loses information and is confusing.
You could use base base::split for that
split(iris, iris$Species)
Or name the list of tibbles separately using setNames.
library(dplyr)
group_split(ir) %>% setNames(unique(iris$Species))
group_split split based on factor levels of data, so if we want to split them based on their occurrence in the data, we might have to rearrange the factor levels. In iris dataset the factor levels are in the same order as they occur in the data, hence the above works.
More generally we should use.
iris %>%
mutate(Species= factor(Species, levels = unique(Species))) %>%
group_split(Species) %>%
setNames(unique(iris$Species))

We can use set_names from tidyverse
library(tidyverse)
ir %>%
group_split() %>%
set_names(levels(iris$Species))

Related

Dplyr filter behaviour when using vector [duplicate]

This question already has answers here:
What is the difference between `%in%` and `==`?
(3 answers)
Closed last year.
I use dplyr quite a lot for data wrangling, but I never figured out dplyr filter behaviour when using filter(df, variable == c(value1, value2)
Lets use iris data set as an example.
library(dplyr)
data(iris)
# I want to filter by Species 'setosa' and 'versicolor'
# Solution 1
filter1 <- filter(iris, Species == 'setosa' | Species == 'versicolor')
nrow(filter1)
[1] 100 # expected result
# Solution 2
filter2 <- filter(iris, Species %in% c('setosa', 'versicolor'))
nrow(filter2)
[1] 100 # expected result
filter1 == filter2 # both solutions return the exact same result
#Solution 3
filter3 <- filter(iris, Species == c('setosa', 'versicolor'))
nrow(filter3)
[1] 50 # unexpected result
unique(filter3$Species)
[1] setosa versicolor
Levels: setosa versicolor virginica
Although Solution 3 is filtering for the intended species, as shown by unique(filter3$Species), it only returns half of the occurrences (50 compared to 100 in Solution 1and Solution2). I would appreciate some guidance on what is actually going on in Solution 3.
filter(iris, Species == c("versicolor", "setosa")) does not make sense in an intuitive way, because one Species is not a 2-tuple:
> "setosa" == c("setosa", "versicolor")
[1] TRUE FALSE
Interestingly, filter(iris, Species == c("setosa", "versicolor")) produce the same results: The first Species of the data frame will be returned, so descending sorting will give you versicolor:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
iris %>%
as_tibble()
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # … with 140 more rows
iris %>%
filter(Species == c('setosa', 'versicolor')) %>%
as_tibble()
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 4.9 3 1.4 0.2 setosa
#> 2 4.6 3.1 1.5 0.2 setosa
#> 3 5.4 3.9 1.7 0.4 setosa
#> 4 5 3.4 1.5 0.2 setosa
#> 5 4.9 3.1 1.5 0.1 setosa
#> 6 4.8 3.4 1.6 0.2 setosa
#> 7 4.3 3 1.1 0.1 setosa
#> 8 5.7 4.4 1.5 0.4 setosa
#> 9 5.1 3.5 1.4 0.3 setosa
#> 10 5.1 3.8 1.5 0.3 setosa
#> # … with 40 more rows
iris %>%
arrange(Species) %>%
filter(Species == c('versicolor', 'setosa')) %>%
as_tibble()
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 4.9 3 1.4 0.2 setosa
#> 2 4.6 3.1 1.5 0.2 setosa
#> 3 5.4 3.9 1.7 0.4 setosa
#> 4 5 3.4 1.5 0.2 setosa
#> 5 4.9 3.1 1.5 0.1 setosa
#> 6 4.8 3.4 1.6 0.2 setosa
#> 7 4.3 3 1.1 0.1 setosa
#> 8 5.7 4.4 1.5 0.4 setosa
#> 9 5.1 3.5 1.4 0.3 setosa
#> 10 5.1 3.8 1.5 0.3 setosa
#> # … with 40 more rows
iris %>%
arrange(desc(Species)) %>%
filter(Species == c('setosa', 'versicolor')) %>%
as_tibble()
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 6.4 3.2 4.5 1.5 versicolor
#> 2 5.5 2.3 4 1.3 versicolor
#> 3 5.7 2.8 4.5 1.3 versicolor
#> 4 4.9 2.4 3.3 1 versicolor
#> 5 5.2 2.7 3.9 1.4 versicolor
#> 6 5.9 3 4.2 1.5 versicolor
#> 7 6.1 2.9 4.7 1.4 versicolor
#> 8 6.7 3.1 4.4 1.4 versicolor
#> 9 5.8 2.7 4.1 1 versicolor
#> 10 5.6 2.5 3.9 1.1 versicolor
#> # … with 40 more rows
iris %>%
arrange(desc(Species)) %>%
filter(Species == c('versicolor', 'setosa')) %>%
as_tibble()
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 7 3.2 4.7 1.4 versicolor
#> 2 6.9 3.1 4.9 1.5 versicolor
#> 3 6.5 2.8 4.6 1.5 versicolor
#> 4 6.3 3.3 4.7 1.6 versicolor
#> 5 6.6 2.9 4.6 1.3 versicolor
#> 6 5 2 3.5 1 versicolor
#> 7 6 2.2 4 1 versicolor
#> 8 5.6 2.9 3.6 1.3 versicolor
#> 9 5.6 3 4.5 1.5 versicolor
#> 10 6.2 2.2 4.5 1.5 versicolor
#> # … with 40 more rows
Created on 2022-02-11 by the reprex package (v2.0.0)

R {dplyr}: `rename` or `mutate` data.frames in `rowwise` list-column with different column names on LHS

I'm playing around with list-columns of data.frames with {dplyr} 1.0.0 and I'm wondering whether it is possible to rename() and mutate() columns in each data.frame without leaving the pipe when the nested data.frame is grouped rowwise.
Why do I want to know / do this? As far I understand the philosophy of {dplyr} 1.0.0 it is recommending rowwise() instead of using {purrr}'s map-family on columns. Below I first show what I did before {dplyr} 1.0.0 and then show a couple of examples (most of them not working) for {dplyr} 1.0.0.
While {rlang} supports glue strings on the left hand side (LHS) which can be used when writing {dplyr} custom functions, the LHS of {dplyr} functions in a rowwise tibble seems not to be supported yet (at least my examples below are not working).
For rename I found a way using rename_with(), but I have no idea how to get it working with mutate.
I also do not understand most of the error message I get. They more or less say that I'm not using a string on the LHS before := but in rowwise mode my referenced column (new) is actually a character vector of length == 1.
library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
library(purrr)
myiris <- iris %>%
nest_by(Species, .key = "mydat") %>%
ungroup %>%
mutate(new = letters[1:3])
# our data looks like this
# we want to use the strings in column `new` on the LHS of `rename` and `mutate`
myiris
#> # A tibble: 3 x 3
#> Species mydat new
#> <fct> <list<tbl_df[,4]>> <chr>
#> 1 setosa [50 x 4] a
#> 2 versicolor [50 x 4] b
#> 3 virginica [50 x 4] c
# For reference: under dplyr < 1.0 I did the following:
# rename in pipe
# working
myiris %>%
mutate(mydat = map2(mydat, new,
~ rename_at(.x, "Sepal.Length", function(z) paste(.y)))) %>%
pull(mydat)
#> [[1]]
#> # A tibble: 50 x 4
#> a Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> # ... with 46 more rows
#>
#> [[2]]
#> # A tibble: 50 x 4
#> b Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 7 3.2 4.7 1.4
#> 2 6.4 3.2 4.5 1.5
#> 3 6.9 3.1 4.9 1.5
#> 4 5.5 2.3 4 1.3
#> # ... with 46 more rows
#>
#> [[3]]
#> # A tibble: 50 x 4
#> c Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 6.3 3.3 6 2.5
#> 2 5.8 2.7 5.1 1.9
#> 3 7.1 3 5.9 2.1
#> 4 6.3 2.9 5.6 1.8
#> # ... with 46 more rows
# mutate in pipe
# was never working even under dplyr < 1.0.0
myiris %>%
mutate(mydat = map2(mydat, new,
~ mutate(.x, eval(.y) := .y))) %>%
pull(mydat)
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `map2(mydat, new, ~mutate(.x, `:=`(eval(.y), .y)))`.
# mutate with custom function
# working
mymutate <- function(df, y) {
mutate(df, !! y := y)
}
myiris %>%
mutate(mydat = map2(mydat, new,
~ mymutate(.x, .y))) %>%
pull(mydat)
#> [[1]]
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width a
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 5.1 3.5 1.4 0.2 a
#> 2 4.9 3 1.4 0.2 a
#> 3 4.7 3.2 1.3 0.2 a
#> 4 4.6 3.1 1.5 0.2 a
#> # ... with 46 more rows
#>
#> [[2]]
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width b
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 7 3.2 4.7 1.4 b
#> 2 6.4 3.2 4.5 1.5 b
#> 3 6.9 3.1 4.9 1.5 b
#> 4 5.5 2.3 4 1.3 b
#> # ... with 46 more rows
#>
#> [[3]]
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width c
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 6.3 3.3 6 2.5 c
#> 2 5.8 2.7 5.1 1.9 c
#> 3 7.1 3 5.9 2.1 c
#> 4 6.3 2.9 5.6 1.8 c
#> # ... with 46 more rows
# dplyr > 1.0.0
# objective: `rename()` or `mutate()` in pipe on list-column of data.frames
# while using different column names on LHS coming from another
# column (here `new`)
myiris_row <- myiris %>% rowwise
# rename --------
# not working
myiris_row %>%
mutate(mydat = list(mydat %>% rename({{new}} := "Sepal.Length")))
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `list(...)`.
#> i The error occured in row 1.
# not working
myiris_row %>%
mutate(mydat = list(mydat %>% rename(!! new := "Sepal.Length")))
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `list(...)`.
#> i The error occured in row 1.
# not working
myiris_row %>%
mutate(mydat = list(mydat %>% rename(!! sym(new) := "Sepal.Length")))
#> Error: Only strings can be converted to symbols
# not working
myiris_row %>%
mutate(mydat = list(mydat %>% rename(all_of(new) := "Sepal.Length")))
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `list(mydat %>% rename(`:=`(all_of(new), "Sepal.Length")))`.
#> i The error occured in row 1.
# working, but only with `rename_with()`
myiris_row %>%
mutate(mydat = list(mydat %>% rename_with(~ new, "Sepal.Length"))) %>%
pull(mydat)
#> [[1]]
#> # A tibble: 50 x 4
#> a Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> # ... with 46 more rows
#>
#> [[2]]
#> # A tibble: 50 x 4
#> b Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 7 3.2 4.7 1.4
#> 2 6.4 3.2 4.5 1.5
#> 3 6.9 3.1 4.9 1.5
#> 4 5.5 2.3 4 1.3
#> # ... with 46 more rows
#>
#> [[3]]
#> # A tibble: 50 x 4
#> c Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 6.3 3.3 6 2.5
#> 2 5.8 2.7 5.1 1.9
#> 3 7.1 3 5.9 2.1
#> 4 6.3 2.9 5.6 1.8
#> # ... with 46 more rows
# mutate ------
# the values of the new column don't matter
# here we just use the same input as the name, to show that RHS evaluation is easier.
# not working
myiris_row %>%
mutate(mydat = list(mydat %>% mutate(!! new := new)))
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `list(...)`.
#> i The error occured in row 1.
# not working
myiris %>%
mutate(mydat = list(mydat %>% mutate(!! sym(new) := new)))
#> Error: Only strings can be converted to symbols
# not working
myiris_row %>%
mutate(mydat = list(mydat %>% mutate(all_of(new) := new)))
#> Error: Problem with `mutate()` input `mydat`.
#> x The LHS of `:=` must be a string or a symbol
#> i Input `mydat` is `list(mydat %>% mutate(`:=`(all_of(new), new)))`.
#> i The error occured in row 1.
# almost working (what's going on in the data[[1]] btw!)
myiris_row %>%
mutate(mydat = list(mydat %>% mutate("{{new}}" := new))) %>%
pull(mydat)
#> [[1]]
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width `promise_fn(3L)`
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 5.1 3.5 1.4 0.2 a
#> 2 4.9 3 1.4 0.2 a
#> 3 4.7 3.2 1.3 0.2 a
#> 4 4.6 3.1 1.5 0.2 a
#> # ... with 46 more rows
#>
#> [[2]]
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width `"b"`
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 7 3.2 4.7 1.4 b
#> 2 6.4 3.2 4.5 1.5 b
#> 3 6.9 3.1 4.9 1.5 b
#> 4 5.5 2.3 4 1.3 b
#> # ... with 46 more rows
#>
#> [[3]]
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width `"c"`
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 6.3 3.3 6 2.5 c
#> 2 5.8 2.7 5.1 1.9 c
#> 3 7.1 3 5.9 2.1 c
#> 4 6.3 2.9 5.6 1.8 c
#> # ... with 46 more rows
Created on 2020-12-22 by the reprex package (v0.3.0)
You can protect your !! from the outside call by using quote(), and then use !! again in your nested call to unquote it :
myiris_row %>%
mutate(mydat = list(mydat %>% mutate(!! quote(!!new) := new))) %>%
pull(mydat)
#> [[1]]
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width a
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 5.1 3.5 1.4 0.2 a
#> 2 4.9 3 1.4 0.2 a
#> 3 4.7 3.2 1.3 0.2 a
#> 4 4.6 3.1 1.5 0.2 a
#> 5 5 3.6 1.4 0.2 a
#> 6 5.4 3.9 1.7 0.4 a
#> 7 4.6 3.4 1.4 0.3 a
#> 8 5 3.4 1.5 0.2 a
#> 9 4.4 2.9 1.4 0.2 a
#> 10 4.9 3.1 1.5 0.1 a
#> # ... with 40 more rows
#>
#> [[2]]
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width b
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 7 3.2 4.7 1.4 b
#> 2 6.4 3.2 4.5 1.5 b
#> 3 6.9 3.1 4.9 1.5 b
#> 4 5.5 2.3 4 1.3 b
#> 5 6.5 2.8 4.6 1.5 b
#> 6 5.7 2.8 4.5 1.3 b
#> 7 6.3 3.3 4.7 1.6 b
#> 8 4.9 2.4 3.3 1 b
#> 9 6.6 2.9 4.6 1.3 b
#> 10 5.2 2.7 3.9 1.4 b
#> # ... with 40 more rows
#>
#> [[3]]
#> # A tibble: 50 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width c
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 6.3 3.3 6 2.5 c
#> 2 5.8 2.7 5.1 1.9 c
#> 3 7.1 3 5.9 2.1 c
#> 4 6.3 2.9 5.6 1.8 c
#> 5 6.5 3 5.8 2.2 c
#> 6 7.6 3 6.6 2.1 c
#> 7 4.9 2.5 4.5 1.7 c
#> 8 7.3 2.9 6.3 1.8 c
#> 9 6.7 2.5 5.8 1.8 c
#> 10 7.2 3.6 6.1 2.5 c
#> # ... with 40 more rows
myiris_row %>%
mutate(mydat = list(mydat %>% rename(!! quote(!!new) := "Sepal.Length"))) %>%
pull(mydat)
#> [[1]]
#> # A tibble: 50 x 4
#> a Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> 5 5 3.6 1.4 0.2
#> 6 5.4 3.9 1.7 0.4
#> 7 4.6 3.4 1.4 0.3
#> 8 5 3.4 1.5 0.2
#> 9 4.4 2.9 1.4 0.2
#> 10 4.9 3.1 1.5 0.1
#> # ... with 40 more rows
#>
#> [[2]]
#> # A tibble: 50 x 4
#> b Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 7 3.2 4.7 1.4
#> 2 6.4 3.2 4.5 1.5
#> 3 6.9 3.1 4.9 1.5
#> 4 5.5 2.3 4 1.3
#> 5 6.5 2.8 4.6 1.5
#> 6 5.7 2.8 4.5 1.3
#> 7 6.3 3.3 4.7 1.6
#> 8 4.9 2.4 3.3 1
#> 9 6.6 2.9 4.6 1.3
#> 10 5.2 2.7 3.9 1.4
#> # ... with 40 more rows
#>
#> [[3]]
#> # A tibble: 50 x 4
#> c Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 6.3 3.3 6 2.5
#> 2 5.8 2.7 5.1 1.9
#> 3 7.1 3 5.9 2.1
#> 4 6.3 2.9 5.6 1.8
#> 5 6.5 3 5.8 2.2
#> 6 7.6 3 6.6 2.1
#> 7 4.9 2.5 4.5 1.7
#> 8 7.3 2.9 6.3 1.8
#> 9 6.7 2.5 5.8 1.8
#> 10 7.2 3.6 6.1 2.5
#> # ... with 40 more rows

Can you list an exception to tidyselect `everything()`

library(tidyverse)
iris %>% as_tibble() %>% select(everything())
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 140 more rows
Say I want to select everything in the iris data frame except Species. How do I list this one exception while utilizing tidyselect::everything()?
My actual pipe is below, and when
... %>%
group_by(`ID`) %>%
fill(everything, .direction = "updown") %>%
... %>%
and I get the following error:
Error: Column ID can't be modified because it's a grouping variable
You would do
iris %>% as_tibble() %>% select(-Species)
but assuming you have good reason not to want that, here's a way using everything()
iris %>% as_tibble() %>% select(setdiff(everything(), one_of("Species")))
#> # A tibble: 150 x 4
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> 5 5 3.6 1.4 0.2
#> 6 5.4 3.9 1.7 0.4
#> 7 4.6 3.4 1.4 0.3
#> 8 5 3.4 1.5 0.2
#> 9 4.4 2.9 1.4 0.2
#> 10 4.9 3.1 1.5 0.1
#> # ... with 140 more rows
(or just iris %>% as_tibble() %>% select(setdiff(everything(), 5)) if it's acceptable)

R - Selecting Top Records but With a Grouping

Using the Iris dataframe I can pretty easily pull the first n = 100 records with:
m_data<-iris
m_data[1:100,]
But I am also interested in pulling the first 100 records based on a nice split of the Species. Assume for the moment that the first 100 records are all the same species - I would like to pull the data with a "first sampling" based on the varying Species instead.
Any suggestions are welcome. Thank you.
You can also do this with dplyr, here selecting the first 10 from each species:
library(dplyr)
iris %>%
group_by(Species) %>%
filter(row_number() <= 10) # or slice(1:10)
#> # A tibble: 30 x 5
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 20 more rows
Created on 2018-08-13 by the reprex package (v0.2.0).
Here's an alternative:
do.call(rbind, lapply(split(iris, iris$Species), head, 100))
This pulls the first 100 records from iris by Species
You can use by instead of lapply
do.call(rbind, by(iris, iris$Species, head, 100))

How to replicate a ddply behavior that uses a custom function with dplyr?

I'm trying to replace all my plyr calls with dplyr. There are still a few snags and one of them is with the group_by function. I imagine it acts the same way as the second ddply argument and does a split, apply and combine based on the grouping variables I list. But that doesn't appear to be the case. Here is a rather trivial example.
Let's define a silly function
mm <- function(x) return(x[1:5, ])
Now we can split the species in the irisdataset like so and apply this function to each piece.
ddply(iris, .(Species), mm)
This works as intended. However, when I try the same with dplyr, it doesn't work as expected.
iris %>% group_by(Species) %>% mm
What am I doing wrong?
As shown in ?do, you can refer to a group with . in your expression. The following will replicate your ddply output:
iris %>% group_by(Species) %>% do(.[1:5, ])
# Source: local data frame [15 x 5]
# Groups: Species
#
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 7.0 3.2 4.7 1.4 versicolor
# 7 6.4 3.2 4.5 1.5 versicolor
# 8 6.9 3.1 4.9 1.5 versicolor
# 9 5.5 2.3 4.0 1.3 versicolor
# 10 6.5 2.8 4.6 1.5 versicolor
# 11 6.3 3.3 6.0 2.5 virginica
# 12 5.8 2.7 5.1 1.9 virginica
# 13 7.1 3.0 5.9 2.1 virginica
# 14 6.3 2.9 5.6 1.8 virginica
# 15 6.5 3.0 5.8 2.2 virginica
More generally, to apply a custom function to groups with dplyr, you can do something like the following (thanks #docendodiscimus):
iris %>% group_by(Species) %>% do(mm(.))
slice has been created for this :
library(dplyr)
iris %>% group_by(Species) %>% slice(1:5)
#> # A tibble: 15 x 5
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 7 3.2 4.7 1.4 versicolor
#> 7 6.4 3.2 4.5 1.5 versicolor
#> 8 6.9 3.1 4.9 1.5 versicolor
#> 9 5.5 2.3 4 1.3 versicolor
#> 10 6.5 2.8 4.6 1.5 versicolor
#> 11 6.3 3.3 6 2.5 virginica
#> 12 5.8 2.7 5.1 1.9 virginica
#> 13 7.1 3 5.9 2.1 virginica
#> 14 6.3 2.9 5.6 1.8 virginica
#> 15 6.5 3 5.8 2.2 virginica

Resources