R: Join two tables (tibbles) by *list* columns - r

Seems like there should be a simple answer for this but I haven't been able to find one:
tib1 <- tibble(x = list(1, 2, 3), y = list(4, 5, 6))
tib1
# A tibble: 3 × 2
x y
<list> <list>
1 <dbl [1]> <dbl [1]>
2 <dbl [1]> <dbl [1]>
3 <dbl [1]> <dbl [1]>
tib2 <- tibble(x = list(1, 2, 4, 5), y = list(4, c(5, 10), 6, 7))
tib2
# A tibble: 4 × 2
x y
<list> <list>
1 <dbl [1]> <dbl [1]>
2 <dbl [1]> <dbl [2]>
3 <dbl [1]> <dbl [1]>
4 <dbl [1]> <dbl [1]>
dplyr::inner_join(tib1, tib2)
Joining, by = c("x", "y")
Error in inner_join_impl(x, y, by$x, by$y, suffix$x, suffix$y) :
Can't join on 'x' x 'x' because of incompatible types (list / list)
So is there a way to perform a join based on list columns (before I start writing my own)?
Basically if the list of both key variables is identical, I want the row to be included in the final table, and if not - not. In the above example there are two key variables x and y and the result should be only the first row in the two tibbles since it's the only identical one in both key variables:
tibble(x = list(1), y = list(4))
# A tibble: 1 × 2
x y
<list> <list>
1 <dbl [1]> <dbl [1]>

We could use hashes from digest:
tib1 <- tibble(x = list(1, 2, 3), y = list(4, 5, 6))
tib2 <- tibble(x = list(1, 2, 4, 5), y = list(4, c(5, 10), 6, 7))
tib1 <- mutate_all(tib1, funs(hash = map_chr(., digest::digest)))
tib2 <- mutate_all(tib2, funs(hash = map_chr(., digest::digest)))
inner_join(tib1, tib2, c('x_hash', 'y_hash')) %>%
select(x.x, x.y)
# A tibble: 1 × 2
x.x x.y
<list> <list>
1 <dbl [1]> <dbl [1]>

Related

Remove empty lists from a tibble in R

I am trying to remove any list from my tibble that has "<chr [0]>"
library(tidyverse)
df <- tibble(x = 1:3, y = list(as.character()),
z=list(as.character("ATC"),as.character("TAC"), as.character()))
df
#> # A tibble: 3 × 3
#> x y z
#> <int> <list> <list>
#> 1 1 <chr [0]> <chr [1]>
#> 2 2 <chr [0]> <chr [1]>
#> 3 3 <chr [0]> <chr [0]>
Created on 2022-02-15 by the reprex package (v2.0.1)
I want my tibble to look like this
#> # A tibble: 3 × 3
#> x z
#> <int> <list>
#> 1 1 <chr [1]>
#> 2 2 <chr [1]>
#> 3 3 NA
any help is appreciated
You can do:
df %>%
select(where(~!all(lengths(.) == 0))) %>%
mutate(z = lapply(z, function(x) ifelse(length(x) == 0, NA, x)))
# A tibble: 3 x 2
x z
<int> <list>
1 1 <chr [1]>
2 2 <chr [1]>
3 3 <lgl [1]>
Note, in your z column you can‘t have list elemtents for row 1 and 2 and a direct logical value NA. The whole column needs to be a list.
If all elements of z have only one element, you can add another line of code with mutate(z = unlist(z)).
TO asked for a more dynamic solution to pass several columns.
Here is an example where I simply created another z2 variable. Generally, you can repeat the recoding for several columns using across.
library(tidyverse)
df <- tibble(x = 1:3, y = list(as.character()),
z=list(as.character("ATC"),as.character("TAC"), as.character()),
z2 = z)
df %>%
select(where(~!all(lengths(.) == 0))) %>%
mutate(across(starts_with('z'), ~ lapply(., function(x) ifelse(length(x) == 0, NA, x))))
Which gives:
# A tibble: 3 x 3
x z z2
<int> <list> <list>
1 1 <chr [1]> <chr [1]>
2 2 <chr [1]> <chr [1]>
3 3 <lgl [1]> <lgl [1]>
A two-step way using base R:
df <- tibble(x = 1:3, y = list(as.character()),
z=list(as.character("ATC"),as.character("TAC"), as.character()))
df <- df[apply(df, 2, function(x) any(lapply(x, length) > 0))] #Remove empty columns
df[apply(df, 2, function(x) lapply(x, length) == 0)] <- NA #Replace empty lists with NA
df
# A tibble: 3 x 2
x z
<int> <list>
1 1 <chr [1]>
2 2 <chr [1]>
3 3 <NULL>

Unnesting a combination variable (combn) as a vector

With the following code, I manage to get a fine combination :
tibble(
x = list(c(1, 2, 3), c(4,5,6))
) %>%
mutate(
combination =
x %>%
map(
.f = combn
, 2
) %>%
map(.f = t)
) %>%
unnest(combination)
# A tibble: 6 x 2
x combination[,1] [,2]
<list> <dbl> <dbl>
1 <dbl [3]> 1 2
2 <dbl [3]> 1 3
3 <dbl [3]> 2 3
4 <dbl [3]> 4 5
5 <dbl [3]> 4 6
6 <dbl [3]> 5 6
Howerver, when observed with the View() function, I get :
How can I proceed to get combination displayed as a vector? i.e. :
We can specify the simplify = FALSE in combn to return a list instead of coercing to matrix
library(purrr)
library(dplyr)
library(tidyr)
tbl1 <- tibble(
x = list(c(1, 2, 3), c(4,5,6))
) %>%
mutate(
combination =
x %>%
map(
.f = combn
, 2, simplify = FALSE
))
Now, do the unnest
out <- tbl1 %>%
unnest(combination)
out
# A tibble: 6 x 2
# x combination
# <list> <list>
#1 <dbl [3]> <dbl [2]>
#2 <dbl [3]> <dbl [2]>
#3 <dbl [3]> <dbl [2]>
#4 <dbl [3]> <dbl [2]>
#5 <dbl [3]> <dbl [2]>
#6 <dbl [3]> <dbl [2]>
check the View
Here is a data.table option that might help
library(data.table)
library(tidyr)
unnest(setDT(df)[, combination := lapply(x, function(v) combn(v, 2, simplify = FALSE))], combination)

Concatanate two columns with different vector sizes in R [duplicate]

This question already has answers here:
Merge Two Lists in R
(9 answers)
Closed 2 years ago.
I have a data frame that has two columns, a and b, that either contain single character values or a vector of values in specific rows. I want to combine the two columns so that I can concatanate the values of both the columns in a single vector. However, when i use the pastefunction, I am unable to concatanate the values in each row in a single vector.
The following is a reproducible example of this problem:
library(tibble)
library(tidyverse)
data_frame <-
tribble(
~a, ~b,
50, 3,
17, 50,
c("21", "19"), 50,
c("1", "10"), c("50", "51")
)
data_frame %>%
mutate(new_column = paste(a, b))
#> # A tibble: 4 x 3
#> a b new_column
#> <list> <list> <chr>
#> 1 <dbl [1]> <dbl [1]> "50 3"
#> 2 <dbl [1]> <dbl [1]> "17 50"
#> 3 <chr [2]> <dbl [1]> "c(\"21\", \"19\") 50"
#> 4 <chr [2]> <chr [2]> "c(\"1\", \"10\") c(\"50\", \"51\")"
In the new_column column, I want the results to be as following:
c("50" "3")
c("17" "50")
c("21" "19" "50")
c("1" "10" "50" "51")
Is there a way that I can combine the columns a and b to get the result in the above format? Thank you.
To combine two columns you can use c. In base R, you can do this with Map :
data_frame$new_col <- Map(c, data_frame$a, data_frame$b)
Or in tidyverse use map2 :
library(dplyr)
library(purrr)
data_frame %>% mutate(new_col = map2(a, b, c))
# A tibble: 4 x 3
# a b new_col
# <list> <list> <list>
#1 <dbl [1]> <dbl [1]> <dbl [2]>
#2 <dbl [1]> <dbl [1]> <dbl [2]>
#3 <chr [2]> <dbl [1]> <chr [3]>
#4 <chr [2]> <chr [2]> <chr [4]>

Subset a vector of lists in R

Let's say I have a vector of lists:
library(tidyverse)
d <- tribble(
~x,
c(10, 20, 64),
c(22, 11),
c(5, 9, 99),
c(55, 67),
c(76, 65)
)
How can I subset this vector such that, for example, I have have rows with lists having a length greater than 2? Here is my unsuccessful attempt using the tidyverse:
filter(d, length(x) > 2)
# A tibble: 5 x 1
x
<list>
1 <dbl [3]>
2 <dbl [2]>
3 <dbl [3]>
4 <dbl [2]>
5 <dbl [2]>
It would be lengths as the 'x' is a list
library(dplyr)
d %>%
filter(lengths(x) > 2)
You can use subset() + lengths()
subset(d,lengths(x)>2)

Error unnesting list-columns in R even all have equal lengths

I'm getting this error "Error: All nested columns must have the same number of elements." but I can't figure out why as my list columns have equal lengths.
Here is an example:
> df <- tibble(x = c("a", "b", "c"), y = list(a = 1:3, b = 4:6, c = 7:9), z = list(as.character(1:3)))
> df
# A tibble: 3 x 3
x y z
<chr> <list> <list>
1 a <int [3]> <chr [3]>
2 b <int [3]> <chr [3]>
3 c <int [3]> <chr [3]>
> unnest(df)
Error: All nested columns must have the same number of elements.
EDIT: unnest(slice(df,1:n())) works.

Resources