Join 2 nested lists - r

I want to combine two lists
list_1 <- list(LIST1 = list(list("a"), list("b"), list("c")))
list_2 <- list(LIST2 = list(list("1"), list("2"), list("3")))
Desired Output:
combined_list <- list()
combined_list[[1]] <- c("a", "1")
combined_list[[2]] <- c("b", "2")
combined_list[[3]] <- c("c", "3")
I have a nasty for loop way of doing this but I'd like to clean it up using purrr maybe? Any help appreciated!!

Here's a variant that recursively concatenates two nested lists of the same structure and preserves that structure
# Add additional checks if you expect the structures of .x and .y may differ
f <- function(.x, .y)
if(is.list(.x)) purrr::map2(.x, .y, f) else c(.x, .y)
res <- f( list_1, list_2 )
# ...is identical to...
# list(LIST1 = list(list(c("a","1")), list(c("b","2")), list(c("c","3"))))
You can then unroll the structure as needed. For example, to get the desired output, you can do
purrr::flatten(purrr::flatten(res))
# [[1]]
# [1] "a" "1"
#
# [[2]]
# [1] "b" "2"
#
# [[3]]
# [1] "c" "3"

There's a few odd things with your input and so I am not sure if this will wholly generalize to your real situation. If it does not, then please expand your example. Each list has only one element, for one, and the individual letters are also wrapped in a list of their own. I get around that by indexing the input lists with [[1]] and flattening the output with as.character.
list_1 <- list(LIST1 = list(list("a"), list("b"), list("c")))
list_2 <- list(LIST2 = list(list("1"), list("2"), list("3")))
library(purrr)
combined_list <- map2(list_1[[1]], list_2[[1]], c) %>%
map(as.character)
str(combined_list)
#> List of 3
#> $ : chr [1:2] "a" "1"
#> $ : chr [1:2] "b" "2"
#> $ : chr [1:2] "c" "3"
Created on 2019-11-07 by the reprex package (v0.3.0)

You can actually use this one line:
map2(list_1,list_2,map2,~paste(c(..1,..2)))[[1]]
Output:
[[1]]
[1] "a" "1"
[[2]]
[1] "b" "2"
[[3]]
[1] "c" "3"

Related

Combining lists, possibly with mapply

I have a list of lists - a simple example is given below:
my_list <- vector(mode = "list", length = 4)
my_list[[1]] <- c(1, 2, 3)
my_list[[2]] <- c(1, 2, 6)
my_list[[3]] <- c("A")
my_list[[4]] <- c("A", "B")
I would like to combine a subset of these lists based on their indices in a vector. For example if
my_indices <- c(1,2,3), I would like to combine the first three lists and eliminates duplicates to get
c(1, 2, 3, 6, "A")
I can do this manually as follows:
c(my_list[[1]], my_list[[2]], my_list[[3]]) %>%
unique()
[1] "1" "2" "3" "6" "A"
but when i try and simplify / generalize this to
my_indices <- c(1, 2, 3)
c(my_list[[my_indices ]]) %>%
unique()
I get an error message:
error in my_list[[my_indices]] : recursive indexing failed at level 2
How can i combine lists in this setting. I do want a general solution, as my list of lists is large, and I want to be able to extract any subset of it. I have seen posts that use mapply in a related setting, but have not successfully got it to work.
Many thanks in advance for your help
Thomas Philips
c(1, 2, 3, 6, "A") is not what you think, it will be converted to c("1", "2", "3", "6", "A"). If you want mixed class, you cannot unlist, it must stay a list.
Some thoughts:
my_list[my_indices]
# [[1]]
# [1] 1 2 3
# [[2]]
# [1] 1 2 6
# [[3]]
# [1] "A"
unlist(my_list[my_indices])
# [1] "1" "2" "3" "1" "2" "6" "A"
unique(unlist(my_list[my_indices]))
# [1] "1" "2" "3" "6" "A"
To preserve class and ensure uniqueness, you can do
func <- function(a, b) {
a_chrs <- as.character(a)
b_chrs <- as.character(b)
b[ match(setdiff(b_chrs, a_chrs), b_chrs) ]
}
Reduce(func, my_list[my_indices], accumulate = TRUE)
# [[1]]
# [1] 1 2 3
# [[2]]
# [1] 6
# [[3]]
# [1] "A"
The _chrs fancy footwork is because setdiff by itself will not reduce correctly:
out <- Reduce(setdiff, my_list[my_indices], accumulate = TRUE)
out
# [[1]]
# [1] 1 2 3
# [[2]]
# [1] 3
# [[3]]
# [1] 3
If you need that with individually-indexable values, then
unlist(lapply(out, as.list), recursive = FALSE)
# [[1]]
# [1] 1
# [[2]]
# [1] 2
# [[3]]
# [1] 3
# [[4]]
# [1] 6
# [[5]]
# [1] "A"
Here's a tidyverse solution using reduce.
library(tidyverse)
my_list <- vector(mode = "list", length = 4)
my_list[[1]] <- c(1, 2, 3)
my_list[[2]] <- c(1, 2, 6)
my_list[[3]] <- c("A")
my_list[[4]] <- c("A", "B")
to_merge <- c(1,2,3)
unique(reduce(my_list[to_merge], c))
#> [1] "1" "2" "3" "6" "A"
Created on 2021-01-08 by the reprex package (v0.3.0)

Indexing named list with vector in R

How would you index the second element of a vector which is stored as a value in a named list?
I start with this:
hi <- list("1" = c("a","b"),
"2" = c("dog","cat"),
"3" = c("sister","brother")
)
and would like to end up with a named list with the key plus the 2nd element of the vector i.e:
list("1" = "b",
"2" = "cat",
"3" = "brother"
)
You can do:
lapply(hi, `[`, 2)
$`1`
[1] "b"
$`2`
[1] "cat"
$`3`
[1] "brother"
We can use map
library(purrr)
map(hi, pluck, 2)
#$`1`
#[1] "b"
#$`2`
#[1] "cat"
#$`3`
#[1] "brother"

Creating several new vectors from an original vector with separators

I'm trying to create several vectors from an original vector.
I read some posts but couldn't find something to solve my problem.
My original vector is looking like this:
> orig_vec
[1] "A" "B" "C" "D;" "1" "2;" "a1" "a2" "a3"
I want vectors that look like this:
> vector1
[1] "A" "B" "C" "D"
> vector2
[1] "1" "2"
> vector3
[1] "a1" "a2" "a3"
So what I need is a code which recognizes the semicolons as separators and creates new vectors depending on the number of separated values in "orig_vec".
I also have the problem that the "orig_vec" can change.
When it looks like this:
> orig_vec
[1] "A" "B" "C" "D" "E;" "1" "2;" "a1" "a2" "a3;" "b1"
I need to get automatically these vectors:
> vector1
[1] "A" "B" "C" "D" "E"
> vector2
[1] "1" "2"
> vector3
[1] "a1" "a2" "a3"
> vector4
[1] "b1"
I'm sorry that I can't provide more code or any idea of a solution.
This should work:
x <- c("A", "B", "C", "D;", "1", "2;", "a1", "a2", "a3")
sapply(split(x, c(0, cumsum(grepl(";", x))[-length(x)])), function(x) gsub(";", "", x))
$`0`
[1] "A" "B" "C" "D"
$`1`
[1] "1" "2"
$`2`
[1] "a1" "a2" "a3"
We use the cumsum() of condition grepl(";", x) to create a vector for subsetting with split(), then remove the semicolons by sapply()ing gsub().
I like #LAP's as well, here's another option:
vec <- c("A", "B", "C", "D;", "1", "2;", "a1", "a2", "a3;", "b1")
ix <- grep(";", vec)
mapply(function(x, ix1, ix2) x[ix1:ix2],
x = list(sub(";", "", vec)),
ix1 = c(1, ix + 1),
ix2 = c(ix, length(vec)))
[[1]]
[1] "A" "B" "C" "D"
[[2]]
[1] "1" "2"
[[3]]
[1] "a1" "a2" "a3"
[[4]]
[1] "b1"
You'll notice most people are giving you answers that result in a list of vectors, rather than a handful of vectors assigned to variable names. It's generally much cleaner and easier to work with lists of objects rather than objects scattered around in your namespace. Just an added $.02.
Here is one way, based on the idea of first joining on a space then successively splitting, first on ; and then on a space:
s <- c("A", "B", "C", "D;", "1" , "2;" ,"a1", "a2", "a3")
s <- paste0(s,collapse = ' ')
s <- unlist(strsplit(s, ';'))
vectors <- lapply(s,function(x) unlist(strsplit(trimws(x),' ')))
> vectors
[[1]]
[1] "A" "B" "C" "D"
[[2]]
[1] "1" "2"
[[3]]
[1] "a1" "a2" "a3"
Just throwing in a tidyverse approach that works in a single pipe.
Similar to other answers, collapse the vector into a single string, then split that string on each ;. I'm using a space as the collapse so I can use str_trim easily later on.
library(tidyverse)
x %>%
paste(collapse = " ") %>%
strsplit(split = ";", fixed = T)
#> [[1]]
#> [1] "A B C D E" " 1 2" " a1 a2 a3" " b1"
Since strsplit gives you a list and, at least in this scenario, you're only interested in the first list entry, pull it out with [[ and trim the beginning and trailing spaces of those vectors. The map gives you a list of vectors of one string each.
x %>%
paste(collapse = " ") %>%
strsplit(split = ";", fixed = T) %>%
`[[`(1) %>%
map(str_trim)
#> [[1]]
#> [1] "A B C D E"
#>
#> [[2]]
#> [1] "1 2"
#>
#> [[3]]
#> [1] "a1 a2 a3"
#>
#> [[4]]
#> [1] "b1"
Then split each vector by the spaces, and flatten into one list of vectors.
All in one pipe:
x %>%
paste(collapse = " ") %>%
strsplit(split = ";", fixed = T) %>%
`[[`(1) %>%
map(str_trim) %>%
map(str_split, " ") %>%
flatten()
#> [[1]]
#> [1] "A" "B" "C" "D" "E"
#>
#> [[2]]
#> [1] "1" "2"
#>
#> [[3]]
#> [1] "a1" "a2" "a3"
#>
#> [[4]]
#> [1] "b1"
Created on 2019-02-13 by the reprex package (v0.2.1)

R - How do I check if an element is in a list of vectors?

Ok, my question might be a bit weirder than what the title suggests.
I have this list:
x <- list(
c("a", "d"),
c("a", "c"),
c("d", "e"),
c("e", "f"),
c("b", "c"),
c("f", "c"), # row 6
c("c", "e"),
c("f", "b"),
c("b", "a")
)
And I need to copy this stuff in another list called T. The only condition is that both letters of the pair must not be in T already. If one of them is already in T and the other isn't it's fine.
Basically in this example I would take the first 5 positions and copy them in T one after another because either one or both letters are new to T.
Then I would skip the 6th position because the letter "f" was already in the 4th position of T and the letter "c" is already in the 2nd and 5th positions of T.
Then I would skip the remaining 3 positions for the same reason (the letters "c", "e", "f", "b", "a" are already in T at this point)
I tried doing this
for(i in 1:length(T){
if (!( *first letter* %in% T && *second letter* %in% T)) {
T[[i]] <- c(*first letter*, *second letter*)
}
}
But it's like the "if" isn't even there, and I'm pretty sure I'm using %in% in the wrong way.
Any suggestions? I hope what I wrote makes sense, I'm new to R and to this site in general.
Thanks for your time
Effectively, for each element of the list, you want to lose it if both of its elements exist in earlier elements. A logical index is helpful here.
# Make a logical vector the length of x.
lose <- logical(length(x))
Now you can run a loop over the length of lose and compare it against all previous elements of x. Using seq_len saves us the headache of having to guard against the special case of i = 1 (seq_len(0) returns a zero-length integer instead of 0).
for (i in seq_along(lose)){
lose[i] <- all(x[[i]] %in% unique(unlist(x[seq_len(i - 1)])))
}
Now let's use the logical vector to subset x to T
T <- x[!lose]
T
#> [[1]]
#> [1] "a" "d"
#>
#> [[2]]
#> [1] "a" "c"
#>
#> [[3]]
#> [1] "d" "e"
#>
#> [[4]]
#> [1] "e" "f"
#>
#> [[5]]
#> [1] "b" "c"
# Created on 2018-07-19 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).
You can put the set of all previous elements in a list cum.sets, then use Map to check if all elements of the current vector are in the lagged cumulative set.
cum.sets <- lapply(seq_along(x), function(y) unlist(x[1:y]))
keep <- unlist(
Map(function(x, y) !all(x %in% y)
, x
, c(NA, cum.sets[-length(cum.sets)])))
x[keep]
# [[1]]
# [1] "a" "d"
#
# [[2]]
# [1] "a" "c"
#
# [[3]]
# [1] "d" "e"
#
# [[4]]
# [1] "e" "f"
#
# [[5]]
# [1] "b" "c"
tidyverse version (same output)
library(tidyverse)
cum.sets <- imap(x, ~ unlist(x[1:.y]))
keep <- map2_lgl(x, lag(cum.sets), ~!all(.x %in% .y))
x[keep]
You can use Reduce. In this case. IF all the new values are not in the list already, then concatenate it to the list, else drop it. the initial is the first element of the list:
Reduce(function(i, y) c(i, if(!all(y %in% unlist(i))) list(y)), x[-1],init = x[1])
[[1]]
[1] "a" "d"
[[2]]
[1] "a" "c"
[[3]]
[1] "d" "e"
[[4]]
[1] "e" "f"
[[5]]
[1] "b" "c"
The most straightforward option is that you could store unique entries in another vector as you're looping through your input data.
Here's a solution without considering the positions (1 or 2) of the alphabets in your output list or the order of your input list.
dat <- list(c('a','d'),c('a','c'),c('d','e'),c('e','f'),c('b','c'),
c('f','c'),c('c','e'),c('f','b'),c('b','a'))
Dat <- list()
idx <- list()
for(i in dat){
if(!all(i %in% idx)){
Dat <- append(Dat, list(i))
## append to idx if not previously observed
if(! i[1] %in% idx) idx <- append(idx, i[1])
if(! i[2] %in% idx) idx <- append(idx, i[2])
}
}
print(Dat)
#> [[1]]
#> [1] "a" "d"
#>
#> [[2]]
#> [1] "a" "c"
#>
#> [[3]]
#> [1] "d" "e"
#>
#> [[4]]
#> [1] "e" "f"
#>
#> [[5]]
#> [1] "b" "c"
On another note, I'd advise against using T as your vector name as it's used as TRUE in R.
We can unlist, check duplicated values with duplicated, reformat as a matrix and filter out pairs of TRUE values:
x[colSums(matrix(duplicated(unlist(x)), nrow = 2)) != 2]
# [[1]]
# [1] "a" "d"
#
# [[2]]
# [1] "a" "c"
#
# [[3]]
# [1] "d" "e"
#
# [[4]]
# [1] "e" "f"
#
# [[5]]
# [1] "b" "c"
#
And I recommend you don't use T as a variable name, it means TRUE by default (thought it's discouraged to use it as such), this could lead to unpleasant debugging.

Select nested sublist of a list based on condition in R

I do have the following simple example of a nested list:
list(list(structure(list(group = "a", def = "control"), .Names = c("group",
"def"))), list(structure(list(group = "b", def = "disease1"), .Names = c("group",
"def"))))
The structure is as follows:
str(t1)
List of 2
$ :List of 1
..$ :List of 2
.. ..$ group: chr "a"
.. ..$ def : chr "control"
$ :List of 1
..$ :List of 2
.. ..$ group: chr "b"
.. ..$ def : chr "disease1"
Is there an easy way of getting only the nested list that satisfies a specific condition. As an example, if I knew only the name of the group, e.g., "a", how would I get the according sublist; in the example, this would be the first nested list:
[[1]]
[[1]]$group
[1] "a"
[[1]]$def
[1] "control"
So essentially I am looking for a way to apply group == "a" in this nested list structure.
We can extract a sublist of a list using lapply. We can write a function as well.
get_sublist <- function(group_name) {
lst[lapply(lst, function(x) x[[1]][[1]]) == group_name]
}
get_sublist("a")
#[[1]]
#[[1]][[1]]
#[[1]][[1]]$group
#[1] "a"
#[[1]][[1]]$def
#[1] "control"
get_sublist("b")
#[[1]]
#[[1]][[1]]
#[[1]][[1]]$group
#[1] "b"
#[[1]][[1]]$def
#[1] "disease1"
We can convert to tibble and then with map create a logical vector to subset the 'lst'
library(purrr)
library(magrittr)
library(tibble)
lst %>%
map_lgl(., ~map_lgl(., ~as.tibble(.) %>%
.$group=='a')) %>%
extract(lst, .) %>%
.[[1]]
#[[1]]
#[[1]]$group
#[1] "a"
#[[1]]$def
#[1] "control"
Or use the modify_depth
lst %>%
modify_depth(., 2, ~as.tibble(.)[['group']]=='a') %>%
unlist %>%
extract(lst, .)
Here, we assume that the position of 'group' can change in the list.
On top of the answers already provided, I have also managed to get the correct results using "keep" from the "purrr" library:
library(purrr)
get_sublist <- function(group_name) {
keep(l, function(x) x[[1]][[1]] == group_name)
}
get_sublist("b")

Resources