Unnest one column list to many columns in tidyr - r

For example, I have a tidy data frame like this:
df <- tibble(id=1:2,
ctn=list(list(a="x",b=1),
list(a="y",b=2)))
# A tibble: 2 x 2
id ctn
<int> <list>
1 1 <list [2]>
2 2 <list [2]>
How could I unnest ctn column to the right so that the data frame will be like this:
# A tibble: 2 x 3
id a b
<int> <chr> <dbl>
1 1 x 1
2 2 y 2

With dplyr and purrr
df %>%
mutate(ctn = map(ctn, as_tibble)) %>%
unnest()
# A tibble: 2 x 3
id a b
<int> <chr> <dbl>
1 1 x 1
2 2 y 2

One option is
library(data.table)
setDT(df)[, unlist(ctn, recursive = FALSE), id]
# id a b
#1: 1 x 1
#2: 2 y 2
Or with tidyr
library(tidyverse)
df$ctn %>%
setNames(., df$id) %>%
bind_rows(., .id = 'id')
# A tibble: 2 x 3
# id a b
# <chr> <chr> <dbl>
#1 1 x 1
#2 2 y 2

In a tidy way we can now (dplyr 1.0.2 and above) do this using rowwise():
df %>% rowwise() %>% mutate(as_tibble(ctn))
# A tibble: 2 x 4
# Rowwise:
id ctn a b
<int> <list> <chr> <dbl>
1 1 <named list [2]> x 1
2 2 <named list [2]> y 2
And sticking to purrr we can also:
df %>% mutate(map_dfr(ctn, as_tibble))
# A tibble: 2 x 4
id ctn a b
<int> <list> <chr> <dbl>
1 1 <named list [2]> x 1
2 2 <named list [2]> y 2

Related

How to join tibbles while merging non-matched columns into lists

I'm looking for a way to perform a full join on 2+ tibbles, by a column with unique indices, in a way that would preserve the original column names and merge (non-identical) values into a vector or list. The tibbles have the same column names.
Example input tibbles
> a
# A tibble: 3 × 3
id name location
<dbl> <chr> <chr>
1 1 Caspar NL
2 2 Monica USA
3 3 Martin DE
> b
# A tibble: 3 × 3
id name location
<dbl> <chr> <chr>
1 1 Caspar WWW
2 2 Monique USA
3 4 Francis FR
Desired output
or:
The ability to handle more than just 2 tibbles at the same time would be ideal.
All I know is dyplr's full_join(), which doesn't give me the desired result:
> dplyr::full_join(a,b, by='id')
# A tibble: 4 × 5
id name.x location.x name.y location.y
<dbl> <chr> <chr> <chr> <chr>
1 1 Caspar NL Caspar WWW
2 2 Monica USA Monique USA
3 3 Martin DE NA NA
4 4 NA NA Francis FR
Reprex
a <- tibble::tribble(~id, ~name, ~location, 1, 'Caspar', 'NL', 2, 'Monica', 'USA', 3, 'Martin', 'DE')
b <- tibble::tribble(~id, ~name, ~location, 1, 'Caspar', 'WWW', 2, 'Monique', 'USA', 4, 'Francis', 'FR')
It may be better with binding the rows first and then do a group by summarise
library(dplyr)
bind_rows(a, b) %>%
group_by(id) %>%
summarise(across(c('name', 'location'), list), .groups = 'drop')
-output
# A tibble: 4 × 3
id name location
<dbl> <list> <list>
1 1 <chr [2]> <chr [2]>
2 2 <chr [2]> <chr [2]>
3 3 <chr [1]> <chr [1]>
4 4 <chr [1]> <chr [1]>

R unnest multiple columns

Any functional approach to unnest multiple columns of different sizes?
Example:
library(tidyr)
library(dplyr)
my_list <- list(year = 2018:2020, period = 1, id = c(17,35))
expand_grid(my_list) %>%
pivot_wider(
names_from = my_list,
values_from = my_list
) %>%
rename_at(., names(.), ~ names(my_list))
# A tibble: 1 x 3
year period id
<named list> <named list> <named list>
1 <int [3]> <dbl [1]> <dbl [2]>
expand_grid(my_list) %>%
pivot_wider(
names_from = my_list,
values_from = my_list
) %>%
rename_at(., names(.), ~ names(my_list)) %>%
unnest(cols = names(my_list))
Erro: Incompatible lengths: 3, 2.
unnest requires column names, is it possible for a string vector?
Expected:
# A tibble: 1 x 3
year period id
<int> <int> <int>
1 2018 1 17
2 2019 1 17
3 2020 1 17
4 2018 1 35
5 2019 1 35
6 2020 1 35
We can use cross_df from purrr :
purrr::cross_df(my_list)
# year period id
# <int> <dbl> <dbl>
#1 2018 1 17
#2 2019 1 17
#3 2020 1 17
#4 2018 1 35
#5 2019 1 35
#6 2020 1 35
Or in base R use expand.grid with do.call :
do.call(expand.grid, my_list)

how to "spread" a list-column?

Consider this simple example
mydf <- data_frame(regular_col = c(1,2),
normal_col = c('a','b'),
weird_col = list(list('hakuna', 'matata'),
list('squash', 'banana')))
> mydf
# A tibble: 2 x 3
regular_col normal_col weird_col
<dbl> <chr> <list>
1 1 a <list [2]>
2 2 b <list [2]>
I would like to extract the elements of weird_col (programmatically, the number of elements may change) so that each element is placed on a different column. That is, I expect the following output
> data_frame(regular_col = c(1,2),
+ normal_col = c('a','b'),
+ weirdo_one = c('hakuna', 'squash'),
+ weirdo_two = c('matata', 'banana'))
# A tibble: 2 x 4
regular_col normal_col weirdo_one weirdo_two
<dbl> <chr> <chr> <chr>
1 1 a hakuna matata
2 2 b squash banana
However, I am unable to do so in simple terms. For instance, using the classic unnest fails here, as it expands the dataframe instead of placing each element of the list in a different column.
> mydf %>% unnest(weird_col)
# A tibble: 4 x 3
regular_col normal_col weird_col
<dbl> <chr> <list>
1 1 a <chr [1]>
2 1 a <chr [1]>
3 2 b <chr [1]>
4 2 b <chr [1]>
Is there any solution in the tidyverse for that?
You can extract the values from the output of unnest, process a little to make your column names, and then spread back out. Note that I use flatten_chr because of your depth-one list-column, but if it is nested you can use flatten and spread works just as well on list-cols.
library(tidyverse)
#> Warning: package 'dplyr' was built under R version 3.5.1
mydf <- data_frame(
regular_col = c(1, 2),
normal_col = c("a", "b"),
weird_col = list(
list("hakuna", "matata"),
list("squash", "banana")
)
)
mydf %>%
unnest(weird_col) %>%
group_by(regular_col, normal_col) %>%
mutate(
weird_col = flatten_chr(weird_col),
weird_colname = str_c("weirdo_", row_number())
) %>% # or just as.character
spread(weird_colname, weird_col)
#> # A tibble: 2 x 4
#> # Groups: regular_col, normal_col [2]
#> regular_col normal_col weirdo_1 weirdo_2
#> <dbl> <chr> <chr> <chr>
#> 1 1 a hakuna matata
#> 2 2 b squash banana
Created on 2018-08-12 by the reprex package (v0.2.0).
unnest develops lists and vectors vertically, and one row data frames horizontally. So what we can do is change your lists into data frames (with adequate column names) and unnest afterwards.
mydf %>% mutate(weird_col = map(weird_col,~ as_data_frame(
setNames(.,paste0("weirdo_",1:length(.)))
))) %>%
unnest
# # A tibble: 2 x 4
# regular_col normal_col weirdo_1 weirdo_2
# <dbl> <chr> <chr> <chr>
# 1 1 a hakuna matata
# 2 2 b squash banana

Creating tibble or data frame of tibbles or data frames and other class

Is it possible to create a tibble or data.frame, which has columns that are integers and other columns that are tibbles or data.frames?
E.g.:
library(tibble)
set.seed(1)
df.1 <- tibble(name=sample(LETTERS,20,replace = F),score=sample(1:100,20,replace = F))
df.2 <- tibble(name=sample(LETTERS,20,replace = F),score=sample(1:100,20,replace = F))
And then:
df <- tibble(id=1,rank=2,data=df.1)
which gives this error:
Error: Column `data` must be a 1d atomic vector or a list
I guess df.1 has to be a list for this to work?
Is this what you are looking for? I think the key is the length of each column should be the same, and we need to use list to create a list column to store df.1 and df.2.
df <- tibble(id = 1:2,
rank = 2,
data = list(df.1, df.2))
df
# # A tibble: 2 x 3
# id rank data
# <int> <dbl> <list>
# 1 1 2 <tibble [20 x 2]>
# 2 2 2 <tibble [20 x 2]>
head(df$data[[1]])
# # A tibble: 6 x 2
# name score
# <chr> <int>
# 1 G 94
# 2 J 22
# 3 N 64
# 4 U 13
# 5 E 26
# 6 S 37
head(df$data[[2]])
# # A tibble: 6 x 2
# name score
# <chr> <int>
# 1 V 92
# 2 Q 30
# 3 S 45
# 4 M 33
# 5 L 63
# 6 Y 25
And since the structure of each tibble in the data column are the same. We can use tidyr::unnest to expand the tibble.
library(tidyr)
df_un <- unnest(df)
# # A tibble: 40 x 4
# id rank name score
# <int> <dbl> <chr> <int>
# 1 1 2 G 94
# 2 1 2 J 22
# 3 1 2 N 64
# 4 1 2 U 13
# 5 1 2 E 26
# 6 1 2 S 37
# 7 1 2 W 2
# 8 1 2 M 36
# 9 1 2 L 81
# 10 1 2 B 31
# # ... with 30 more rows
And we can also nest the tibble, making it back to the original format with a list column.
library(dplyr)
df_n <- df_un %>%
group_by(id, rank) %>%
nest() %>%
ungroup()
df_n
# # A tibble: 2 x 3
# id rank data
# <int> <dbl> <list>
# 1 1 2 <tibble [20 x 2]>
# 2 2 2 <tibble [20 x 2]>
# Check if df and df_n are the same
identical(df_n, df)
# [1] TRUE
Using tidyr's nest:
set.seed(1)
df.1 <- data.frame(name=sample(LETTERS,20,replace = F),score=sample(1:100,20,replace = F))
df.2 <- data.frame(name=sample(LETTERS,20,replace = F),score=sample(1:100,20,replace = F))
I can create a tibble where df.1 is nested under id and rank:
library(dplyr)
library(tidyr)
data.frame(id=1,rank=2,data=df.1) %>% nest(-id,-rank)
# A tibble: 1 × 3
id rank data
<dbl> <dbl> <list>
1 1 2 <tibble [20 × 2]>
For having both df.1 and df.2 in a tibble, I'd simply do:
data.frame(id=c(1,2),rank=c(2,1),data=c(df.1,df.2)) %>% nest(-id,-rank)
# A tibble: 2 × 3
id rank data
<dbl> <dbl> <list>
1 1 2 <tibble [10 × 4]>
2 2 1 <tibble [10 × 4]>

How to separate a column list of fixed size X to X different columns?

I have a tibble with one column being a list column, always having two numeric values named a and b (e.g. as a result of calling purrr:map to a function which returns a list), say:
df <- tibble(x = 1:3, y = list(list(a = 1, b = 2), list(a = 3, b = 4), list(a = 5, b = 6)))
df
# A tibble: 3 × 2
x y
<int> <list>
1 1 <list [2]>
2 2 <list [2]>
3 3 <list [2]>
How do I separate the list column y into two columns a and b, and get:
df_res <- tibble(x = 1:3, a = c(1,3,5), b = c(2,4,6))
df_res
# A tibble: 3 × 3
x a b
<int> <dbl> <dbl>
1 1 1 2
2 2 3 4
3 3 5 6
Looking for something like tidyr::separate to deal with a list instead of a string.
Using dplyr (current release: 0.7.0):
bind_cols(df[1], bind_rows(df$y))
# # A tibble: 3 x 3
# x a b
# <int> <dbl> <dbl>
# 1 1 1 2
# 2 2 3 4
# 3 3 5 6
edit based on OP's comment:
To embed this in a pipe and in case you have many non-list columns, we can try:
df %>% select(-y) %>% bind_cols(bind_rows(df$y))
We could also make use the map_df from purrr
library(tidyverse)
df %>%
summarise(x = list(x), new = list(map_df(.$y, bind_rows))) %>%
unnest
# A tibble: 3 x 3
# x a b
# <int> <dbl> <dbl>
#1 1 1 2
#2 2 3 4
#3 3 5 6

Resources