Unnesting a combination variable (combn) as a vector - r

With the following code, I manage to get a fine combination :
tibble(
x = list(c(1, 2, 3), c(4,5,6))
) %>%
mutate(
combination =
x %>%
map(
.f = combn
, 2
) %>%
map(.f = t)
) %>%
unnest(combination)
# A tibble: 6 x 2
x combination[,1] [,2]
<list> <dbl> <dbl>
1 <dbl [3]> 1 2
2 <dbl [3]> 1 3
3 <dbl [3]> 2 3
4 <dbl [3]> 4 5
5 <dbl [3]> 4 6
6 <dbl [3]> 5 6
Howerver, when observed with the View() function, I get :
How can I proceed to get combination displayed as a vector? i.e. :

We can specify the simplify = FALSE in combn to return a list instead of coercing to matrix
library(purrr)
library(dplyr)
library(tidyr)
tbl1 <- tibble(
x = list(c(1, 2, 3), c(4,5,6))
) %>%
mutate(
combination =
x %>%
map(
.f = combn
, 2, simplify = FALSE
))
Now, do the unnest
out <- tbl1 %>%
unnest(combination)
out
# A tibble: 6 x 2
# x combination
# <list> <list>
#1 <dbl [3]> <dbl [2]>
#2 <dbl [3]> <dbl [2]>
#3 <dbl [3]> <dbl [2]>
#4 <dbl [3]> <dbl [2]>
#5 <dbl [3]> <dbl [2]>
#6 <dbl [3]> <dbl [2]>
check the View

Here is a data.table option that might help
library(data.table)
library(tidyr)
unnest(setDT(df)[, combination := lapply(x, function(v) combn(v, 2, simplify = FALSE))], combination)

Related

Remove empty lists from a tibble in R

I am trying to remove any list from my tibble that has "<chr [0]>"
library(tidyverse)
df <- tibble(x = 1:3, y = list(as.character()),
z=list(as.character("ATC"),as.character("TAC"), as.character()))
df
#> # A tibble: 3 × 3
#> x y z
#> <int> <list> <list>
#> 1 1 <chr [0]> <chr [1]>
#> 2 2 <chr [0]> <chr [1]>
#> 3 3 <chr [0]> <chr [0]>
Created on 2022-02-15 by the reprex package (v2.0.1)
I want my tibble to look like this
#> # A tibble: 3 × 3
#> x z
#> <int> <list>
#> 1 1 <chr [1]>
#> 2 2 <chr [1]>
#> 3 3 NA
any help is appreciated
You can do:
df %>%
select(where(~!all(lengths(.) == 0))) %>%
mutate(z = lapply(z, function(x) ifelse(length(x) == 0, NA, x)))
# A tibble: 3 x 2
x z
<int> <list>
1 1 <chr [1]>
2 2 <chr [1]>
3 3 <lgl [1]>
Note, in your z column you can‘t have list elemtents for row 1 and 2 and a direct logical value NA. The whole column needs to be a list.
If all elements of z have only one element, you can add another line of code with mutate(z = unlist(z)).
TO asked for a more dynamic solution to pass several columns.
Here is an example where I simply created another z2 variable. Generally, you can repeat the recoding for several columns using across.
library(tidyverse)
df <- tibble(x = 1:3, y = list(as.character()),
z=list(as.character("ATC"),as.character("TAC"), as.character()),
z2 = z)
df %>%
select(where(~!all(lengths(.) == 0))) %>%
mutate(across(starts_with('z'), ~ lapply(., function(x) ifelse(length(x) == 0, NA, x))))
Which gives:
# A tibble: 3 x 3
x z z2
<int> <list> <list>
1 1 <chr [1]> <chr [1]>
2 2 <chr [1]> <chr [1]>
3 3 <lgl [1]> <lgl [1]>
A two-step way using base R:
df <- tibble(x = 1:3, y = list(as.character()),
z=list(as.character("ATC"),as.character("TAC"), as.character()))
df <- df[apply(df, 2, function(x) any(lapply(x, length) > 0))] #Remove empty columns
df[apply(df, 2, function(x) lapply(x, length) == 0)] <- NA #Replace empty lists with NA
df
# A tibble: 3 x 2
x z
<int> <list>
1 1 <chr [1]>
2 2 <chr [1]>
3 3 <NULL>

Find differences in character column in R

I have a dataframe with ICPM codes before and after recoding of an operation.
df1 <- tibble::tribble(~ops, ~opsalt,
"8-915, 5-847.32", "5-847.32, 5-852.f3, 8-915",
"8-915, 5-781.30, 8-919, 5-807.4, 5-800.c1, 5-79b.81", "5-79b.81, 5-800.c1, 5-805.y, 5-807.4, 8-919, 5-781.30, 8-915",
"5-786.1, 5-808.a4, 5-784.1u, 5-783.2d, 5-788.5e", "5-788.5e, 5-783.2d, 5-780.4d, 5-784.7d, 5-784.1u, 5-808.a4, 5-786.1",
"8-915, 5-784.0v, 5-788.5f, 5-788.40, 5-808.b0, 5-786.k, 5-788.60, 5-788.00, 5-786.0, 5-783.2d", "5-788.00, 5-788.60, 5-786.0, 5-786.k, 5-788.40, 5-808.b0, 5-788.5f, 5-781.ad, 5-784.0v, 8-915")
I want to calculate two columns which contains the differing codes between the two columns.
For the first row the difference between ops and opsalt would be character(0).
The difference between opsalt and ops would be 5-852.f3.
Tried:
df <– df %>% mutate(ops = strsplit(ops,",")) %>%
mutate(opsalt =strsplit(opsalt,","))
df <- df %>% rowwise() %>% mutate(neu_alt = list(setdiff(ops,opsalt))) %>% mutate(alt_neu = list(setdiff(opsalt,ops)))
This didn't work, because I want to compare parts of the respective strings and not the whole string.
It should work if you use ", " in strsplit and df1 in your first mutate call.
library(dplyr)
df1 %>%
mutate(across(.fns = ~ strsplit(.x, ", "))) %>%
rowwise %>%
mutate(neu_alt = list(setdiff(ops, opsalt)),
alt_neu = list(setdiff(opsalt, ops)))
#> # A tibble: 4 x 4
#> # Rowwise:
#> ops opsalt neu_alt alt_neu
#> <list> <list> <list> <list>
#> 1 <chr [2]> <chr [3]> <chr [0]> <chr [1]>
#> 2 <chr [6]> <chr [7]> <chr [0]> <chr [1]>
#> 3 <chr [5]> <chr [7]> <chr [0]> <chr [2]>
#> 4 <chr [10]> <chr [10]> <chr [1]> <chr [1]>
Created on 2022-01-04 by the reprex package (v0.3.0)
If you want to keep them as strings, you can try this method. If you intend to do similar ops repeatedly, then I suggest retaining the list-columns (instead of repeatedly strspliting them).
df1 %>%
mutate(
d = mapply(function(...) toString(setdiff(...)),
strsplit(ops, "[ ,]+"), strsplit(opsalt, "[ ,]+"))
)
# # A tibble: 4 x 3
# ops opsalt d
# <chr> <chr> <chr>
# 1 8-915, 5-847.32 5-847.32, 5-852.f3, 8-915 ""
# 2 8-915, 5-781.30, 8-919, 5-807.4, 5-800.c1, 5-79b.81 5-79b.81, 5-800.c1, 5-805.y, 5-807.4, 8-919, 5-781.30, 8-915 ""
# 3 5-786.1, 5-808.a4, 5-784.1u, 5-783.2d, 5-788.5e 5-788.5e, 5-783.2d, 5-780.4d, 5-784.7d, 5-784.1u, 5-808.a4, 5-786.1 ""
# 4 8-915, 5-784.0v, 5-788.5f, 5-788.40, 5-808.b0, 5-786.k, 5-788.60, 5-788.00, 5-786.0, 5-783.2d 5-788.00, 5-788.60, 5-786.0, 5-786.k, 5-788.40, 5-808.b0, 5-788.5f, 5-781.ad, 5-784.0v, 8-915 "5-783.2d"
(I recommend using list-columns, though, as demonstrated in TimTeaFan's answer.)

Concatanate two columns with different vector sizes in R [duplicate]

This question already has answers here:
Merge Two Lists in R
(9 answers)
Closed 2 years ago.
I have a data frame that has two columns, a and b, that either contain single character values or a vector of values in specific rows. I want to combine the two columns so that I can concatanate the values of both the columns in a single vector. However, when i use the pastefunction, I am unable to concatanate the values in each row in a single vector.
The following is a reproducible example of this problem:
library(tibble)
library(tidyverse)
data_frame <-
tribble(
~a, ~b,
50, 3,
17, 50,
c("21", "19"), 50,
c("1", "10"), c("50", "51")
)
data_frame %>%
mutate(new_column = paste(a, b))
#> # A tibble: 4 x 3
#> a b new_column
#> <list> <list> <chr>
#> 1 <dbl [1]> <dbl [1]> "50 3"
#> 2 <dbl [1]> <dbl [1]> "17 50"
#> 3 <chr [2]> <dbl [1]> "c(\"21\", \"19\") 50"
#> 4 <chr [2]> <chr [2]> "c(\"1\", \"10\") c(\"50\", \"51\")"
In the new_column column, I want the results to be as following:
c("50" "3")
c("17" "50")
c("21" "19" "50")
c("1" "10" "50" "51")
Is there a way that I can combine the columns a and b to get the result in the above format? Thank you.
To combine two columns you can use c. In base R, you can do this with Map :
data_frame$new_col <- Map(c, data_frame$a, data_frame$b)
Or in tidyverse use map2 :
library(dplyr)
library(purrr)
data_frame %>% mutate(new_col = map2(a, b, c))
# A tibble: 4 x 3
# a b new_col
# <list> <list> <list>
#1 <dbl [1]> <dbl [1]> <dbl [2]>
#2 <dbl [1]> <dbl [1]> <dbl [2]>
#3 <chr [2]> <dbl [1]> <chr [3]>
#4 <chr [2]> <chr [2]> <chr [4]>

How to deal with lists of lists when the first index represents rows?

How can I convert a list of list, to a DataFrame, where the first "layer" of lists should be rows?
myList = list(
list(name="name1",num=20,dogs=list("dog1")),
list(name="name2",num=13,dogs = list()),
list(name="name3",num=5,dogs=list("dog2","dog4"))
)
My first idea was to unlist the elements in the "third layer"
myUnList = sapply(myList,function(x){y=x;y$dogs = unlist(y$dogs);y})
I can create a tibble
tibble(myUnList)
# A tibble: 3 x 1
myUnList
<list>
1 <list [3]>
2 <list [2]>
3 <list [3]>
Note that, if I had myList[[1]] to represent the vector of name, it would be simple, but I'm having trouble on how to tidy the data presented the other way. I though about using purrr to "invert" the order.
Expected result:
# A tibble: 3 x 3
names num dogs
<list> <list> <list>
1 <chr [1]> <dbl [1]> <list [1]>
2 <chr [1]> <dbl [1]> <list [0]>
3 <chr [1]> <dbl [1]> <list [2]>
Are there other type of data structure that supports varying length entries?
We can extract the list element by using map function from the purrr package and then create a new tibble using data_frame.
library(tidyverse)
dat <- data_frame(name = map_chr(myList, "name"),
num = map_dbl(myList, "num"),
dogs = map(myList, "dogs"))
dat
# # A tibble: 3 x 3
# name num dogs
# <chr> <dbl> <list>
# 1 name1 20.0 <list [1]>
# 2 name2 13.0 <NULL>
# 3 name3 5.00 <list [2]>
And if you prefer everything to be in list column, replace map_chr and map_dbl with map.
dat <- data_frame(name = map(myList, "name"),
num = map(myList, "num"),
dogs = map(myList, "dogs"))
dat
# name num dogs
# <list> <list> <list>
# 1 <chr [1]> <dbl [1]> <list [1]>
# 2 <chr [1]> <dbl [1]> <NULL>
# 3 <chr [1]> <dbl [1]> <list [2]>
After some time playing around with purrr, I got another solution that doesn't requires typing the names (could be troublesome for really large lists).
myList %>% transpose %>% simplify_all %>% tbl_df
Results in
# A tibble: 3 x 3
name num dogs
<chr> <dbl> <list>
1 name1 20 <list [1]>
2 name2 13 <list [0]>
3 name3 5 <list [2]>
The transpose function from purrr makes this type of conversion automatically.

R: Join two tables (tibbles) by *list* columns

Seems like there should be a simple answer for this but I haven't been able to find one:
tib1 <- tibble(x = list(1, 2, 3), y = list(4, 5, 6))
tib1
# A tibble: 3 × 2
x y
<list> <list>
1 <dbl [1]> <dbl [1]>
2 <dbl [1]> <dbl [1]>
3 <dbl [1]> <dbl [1]>
tib2 <- tibble(x = list(1, 2, 4, 5), y = list(4, c(5, 10), 6, 7))
tib2
# A tibble: 4 × 2
x y
<list> <list>
1 <dbl [1]> <dbl [1]>
2 <dbl [1]> <dbl [2]>
3 <dbl [1]> <dbl [1]>
4 <dbl [1]> <dbl [1]>
dplyr::inner_join(tib1, tib2)
Joining, by = c("x", "y")
Error in inner_join_impl(x, y, by$x, by$y, suffix$x, suffix$y) :
Can't join on 'x' x 'x' because of incompatible types (list / list)
So is there a way to perform a join based on list columns (before I start writing my own)?
Basically if the list of both key variables is identical, I want the row to be included in the final table, and if not - not. In the above example there are two key variables x and y and the result should be only the first row in the two tibbles since it's the only identical one in both key variables:
tibble(x = list(1), y = list(4))
# A tibble: 1 × 2
x y
<list> <list>
1 <dbl [1]> <dbl [1]>
We could use hashes from digest:
tib1 <- tibble(x = list(1, 2, 3), y = list(4, 5, 6))
tib2 <- tibble(x = list(1, 2, 4, 5), y = list(4, c(5, 10), 6, 7))
tib1 <- mutate_all(tib1, funs(hash = map_chr(., digest::digest)))
tib2 <- mutate_all(tib2, funs(hash = map_chr(., digest::digest)))
inner_join(tib1, tib2, c('x_hash', 'y_hash')) %>%
select(x.x, x.y)
# A tibble: 1 × 2
x.x x.y
<list> <list>
1 <dbl [1]> <dbl [1]>

Resources