Combining lists, possibly with mapply - r

I have a list of lists - a simple example is given below:
my_list <- vector(mode = "list", length = 4)
my_list[[1]] <- c(1, 2, 3)
my_list[[2]] <- c(1, 2, 6)
my_list[[3]] <- c("A")
my_list[[4]] <- c("A", "B")
I would like to combine a subset of these lists based on their indices in a vector. For example if
my_indices <- c(1,2,3), I would like to combine the first three lists and eliminates duplicates to get
c(1, 2, 3, 6, "A")
I can do this manually as follows:
c(my_list[[1]], my_list[[2]], my_list[[3]]) %>%
unique()
[1] "1" "2" "3" "6" "A"
but when i try and simplify / generalize this to
my_indices <- c(1, 2, 3)
c(my_list[[my_indices ]]) %>%
unique()
I get an error message:
error in my_list[[my_indices]] : recursive indexing failed at level 2
How can i combine lists in this setting. I do want a general solution, as my list of lists is large, and I want to be able to extract any subset of it. I have seen posts that use mapply in a related setting, but have not successfully got it to work.
Many thanks in advance for your help
Thomas Philips

c(1, 2, 3, 6, "A") is not what you think, it will be converted to c("1", "2", "3", "6", "A"). If you want mixed class, you cannot unlist, it must stay a list.
Some thoughts:
my_list[my_indices]
# [[1]]
# [1] 1 2 3
# [[2]]
# [1] 1 2 6
# [[3]]
# [1] "A"
unlist(my_list[my_indices])
# [1] "1" "2" "3" "1" "2" "6" "A"
unique(unlist(my_list[my_indices]))
# [1] "1" "2" "3" "6" "A"
To preserve class and ensure uniqueness, you can do
func <- function(a, b) {
a_chrs <- as.character(a)
b_chrs <- as.character(b)
b[ match(setdiff(b_chrs, a_chrs), b_chrs) ]
}
Reduce(func, my_list[my_indices], accumulate = TRUE)
# [[1]]
# [1] 1 2 3
# [[2]]
# [1] 6
# [[3]]
# [1] "A"
The _chrs fancy footwork is because setdiff by itself will not reduce correctly:
out <- Reduce(setdiff, my_list[my_indices], accumulate = TRUE)
out
# [[1]]
# [1] 1 2 3
# [[2]]
# [1] 3
# [[3]]
# [1] 3
If you need that with individually-indexable values, then
unlist(lapply(out, as.list), recursive = FALSE)
# [[1]]
# [1] 1
# [[2]]
# [1] 2
# [[3]]
# [1] 3
# [[4]]
# [1] 6
# [[5]]
# [1] "A"

Here's a tidyverse solution using reduce.
library(tidyverse)
my_list <- vector(mode = "list", length = 4)
my_list[[1]] <- c(1, 2, 3)
my_list[[2]] <- c(1, 2, 6)
my_list[[3]] <- c("A")
my_list[[4]] <- c("A", "B")
to_merge <- c(1,2,3)
unique(reduce(my_list[to_merge], c))
#> [1] "1" "2" "3" "6" "A"
Created on 2021-01-08 by the reprex package (v0.3.0)

Related

Extract colnames from a nested list of data.frames

I have a nested list of data.frames, what is the easiest way to get the column names of all data.frames?
Example:
d = data.frame(a = 1:3, b = 1:3, c = 1:3)
l = list(a = d, list(b = d, c = d))
Result:
$a
[1] "a" "b" "c"
$b
[1] "a" "b" "c"
$c
[1] "a" "b" "c"
There are already a couple of answers. But let me leave another approach. I used rapply2() in the rawr package.
devtools::install_github('raredd/rawr')
library(rawr)
library(purrr)
rapply2(l = l, FUN = colnames) %>%
flatten
$a
[1] "a" "b" "c"
$b
[1] "a" "b" "c"
$c
[1] "a" "b" "c"
Here is a base R solution.
You can define a customized function to flatten your nested list (which can deal nested list of any depths, e.g., more than 2 levels), i.e.,
flatten <- function(x){
islist <- sapply(x, class) %in% "list"
r <- c(x[!islist], unlist(x[islist],recursive = F))
if(!sum(islist))return(r)
flatten(r)
}
and then use the following code to achieve the colnames
out <- Map(colnames,flatten(l))
such that
> out
$a
[1] "a" "b" "c"
$b
[1] "a" "b" "c"
$c
[1] "a" "b" "c"
Example with a deeper nested list
l <- list(a = d, list(b = d, list(c = list(e = list(f= list(g = d))))))
> l
$a
a b c
1 1 1 1
2 2 2 2
3 3 3 3
[[2]]
[[2]]$b
a b c
1 1 1 1
2 2 2 2
3 3 3 3
[[2]][[2]]
[[2]][[2]]$c
[[2]][[2]]$c$e
[[2]][[2]]$c$e$f
[[2]][[2]]$c$e$f$g
a b c
1 1 1 1
2 2 2 2
3 3 3 3
and you will get
> out
$a
[1] "a" "b" "c"
$b
[1] "a" "b" "c"
$c.e.f.g
[1] "a" "b" "c"
Here is an attempt to do this as Vectorized as possible,
i1 <- names(unlist(l, TRUE, TRUE))
#[1] "a.a1" "a.a2" "a.a3" "a.b1" "a.b2" "a.b3" "a.c1" "a.c2" "a.c3" "b.a1" "b.a2" "b.a3" "b.b1" "b.b2" "b.b3" "b.c1" "b.c2" "b.c3" "c.a1" "c.a2" "c.a3" "c.b1" "c.b2" "c.b3" "c.c1" "c.c2" "c.c3"
i2 <- names(split(i1, gsub('\\d+', '', i1)))
#[1] "a.a" "a.b" "a.c" "b.a" "b.b" "b.c" "c.a" "c.b" "c.c"
We can now split i2 on everything before the dot, which will give,
split(i2, sub('\\..*', '', i2))
# $a
# [1] "a.a" "a.b" "a.c"
# $b
# [1] "b.a" "b.b" "b.c"
# $c
# [1] "c.a" "c.b" "c.c"
To get them fully cleaned, we need to loop over and apply a simple regex,
lapply(split(i2, sub('\\..*', '', i2)), function(i)sub('.*\\.', '', i))
which gives,
$a
[1] "a" "b" "c"
$b
[1] "a" "b" "c"
$c
[1] "a" "b" "c"
The Code compacted
i1 <- names(unlist(l, TRUE, TRUE))
i2 <- names(split(i1, gsub('\\d+', '', i1)))
final_res <- lapply(split(i2, sub('\\..*', '', i2)), function(i)sub('.*\\.', '', i))
Try this
d = data.frame(a = 1:3, b = 1:3, c = 1:3)
l = list(a = d, list(b = d, c = d))
foo <- function(x, f){
if (is.data.frame(x)) return(f(x))
lapply(x, foo, f = f)
}
foo(l, names)
The crux here is that data.frames actually are special list, so it's important what to test for.
Small explanation: what needs to be done here is a recursion, since with every element you might look at either a dataframe, so you want to decide if you apply the names or go deeper into the recursion and call foo again.
First create l1, a nested list with only the colnames
l1 <- lapply(l, function(x) if(is.data.frame(x)){
list(colnames(x)) #necessary to list it for the unlist() step afterwards
}else{
lapply(x, colnames)
})
Then unlist l1
unlist(l1, recursive=F)
Here is one way using purrr functions map_depth and vec_depth
library(purrr)
return_names <- function(x) {
if(inherits(x, "list"))
return(map_depth(x, vec_depth(x) - 2, names))
else return(names(x))
}
map(l, return_names)
#$a
#[1] "a" "b" "c"
#[[2]]
#[[2]]$b
#[1] "a" "b" "c"
#[[2]]$c
#[1] "a" "b" "c"
Using an external package, this is also straightforward with rrapply() in the rrapply-package (and works for arbitrary levels of nesting):
library(rrapply)
rrapply(l, classes = "data.frame", f = colnames, how = "flatten")
#> $a
#> [1] "a" "b" "c"
#>
#> $b
#> [1] "a" "b" "c"
#>
#> $c
#> [1] "a" "b" "c"
## deeply nested list
l2 <- list(a = d, list(b = d, list(c = list(e = list(f = list(g = d))))))
rrapply(l2, classes = "data.frame", f = colnames, how = "flatten")
#> $a
#> [1] "a" "b" "c"
#>
#> $b
#> [1] "a" "b" "c"
#>
#> $g
#> [1] "a" "b" "c"

Join 2 nested lists

I want to combine two lists
list_1 <- list(LIST1 = list(list("a"), list("b"), list("c")))
list_2 <- list(LIST2 = list(list("1"), list("2"), list("3")))
Desired Output:
combined_list <- list()
combined_list[[1]] <- c("a", "1")
combined_list[[2]] <- c("b", "2")
combined_list[[3]] <- c("c", "3")
I have a nasty for loop way of doing this but I'd like to clean it up using purrr maybe? Any help appreciated!!
Here's a variant that recursively concatenates two nested lists of the same structure and preserves that structure
# Add additional checks if you expect the structures of .x and .y may differ
f <- function(.x, .y)
if(is.list(.x)) purrr::map2(.x, .y, f) else c(.x, .y)
res <- f( list_1, list_2 )
# ...is identical to...
# list(LIST1 = list(list(c("a","1")), list(c("b","2")), list(c("c","3"))))
You can then unroll the structure as needed. For example, to get the desired output, you can do
purrr::flatten(purrr::flatten(res))
# [[1]]
# [1] "a" "1"
#
# [[2]]
# [1] "b" "2"
#
# [[3]]
# [1] "c" "3"
There's a few odd things with your input and so I am not sure if this will wholly generalize to your real situation. If it does not, then please expand your example. Each list has only one element, for one, and the individual letters are also wrapped in a list of their own. I get around that by indexing the input lists with [[1]] and flattening the output with as.character.
list_1 <- list(LIST1 = list(list("a"), list("b"), list("c")))
list_2 <- list(LIST2 = list(list("1"), list("2"), list("3")))
library(purrr)
combined_list <- map2(list_1[[1]], list_2[[1]], c) %>%
map(as.character)
str(combined_list)
#> List of 3
#> $ : chr [1:2] "a" "1"
#> $ : chr [1:2] "b" "2"
#> $ : chr [1:2] "c" "3"
Created on 2019-11-07 by the reprex package (v0.3.0)
You can actually use this one line:
map2(list_1,list_2,map2,~paste(c(..1,..2)))[[1]]
Output:
[[1]]
[1] "a" "1"
[[2]]
[1] "b" "2"
[[3]]
[1] "c" "3"

R - How do I check if an element is in a list of vectors?

Ok, my question might be a bit weirder than what the title suggests.
I have this list:
x <- list(
c("a", "d"),
c("a", "c"),
c("d", "e"),
c("e", "f"),
c("b", "c"),
c("f", "c"), # row 6
c("c", "e"),
c("f", "b"),
c("b", "a")
)
And I need to copy this stuff in another list called T. The only condition is that both letters of the pair must not be in T already. If one of them is already in T and the other isn't it's fine.
Basically in this example I would take the first 5 positions and copy them in T one after another because either one or both letters are new to T.
Then I would skip the 6th position because the letter "f" was already in the 4th position of T and the letter "c" is already in the 2nd and 5th positions of T.
Then I would skip the remaining 3 positions for the same reason (the letters "c", "e", "f", "b", "a" are already in T at this point)
I tried doing this
for(i in 1:length(T){
if (!( *first letter* %in% T && *second letter* %in% T)) {
T[[i]] <- c(*first letter*, *second letter*)
}
}
But it's like the "if" isn't even there, and I'm pretty sure I'm using %in% in the wrong way.
Any suggestions? I hope what I wrote makes sense, I'm new to R and to this site in general.
Thanks for your time
Effectively, for each element of the list, you want to lose it if both of its elements exist in earlier elements. A logical index is helpful here.
# Make a logical vector the length of x.
lose <- logical(length(x))
Now you can run a loop over the length of lose and compare it against all previous elements of x. Using seq_len saves us the headache of having to guard against the special case of i = 1 (seq_len(0) returns a zero-length integer instead of 0).
for (i in seq_along(lose)){
lose[i] <- all(x[[i]] %in% unique(unlist(x[seq_len(i - 1)])))
}
Now let's use the logical vector to subset x to T
T <- x[!lose]
T
#> [[1]]
#> [1] "a" "d"
#>
#> [[2]]
#> [1] "a" "c"
#>
#> [[3]]
#> [1] "d" "e"
#>
#> [[4]]
#> [1] "e" "f"
#>
#> [[5]]
#> [1] "b" "c"
# Created on 2018-07-19 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).
You can put the set of all previous elements in a list cum.sets, then use Map to check if all elements of the current vector are in the lagged cumulative set.
cum.sets <- lapply(seq_along(x), function(y) unlist(x[1:y]))
keep <- unlist(
Map(function(x, y) !all(x %in% y)
, x
, c(NA, cum.sets[-length(cum.sets)])))
x[keep]
# [[1]]
# [1] "a" "d"
#
# [[2]]
# [1] "a" "c"
#
# [[3]]
# [1] "d" "e"
#
# [[4]]
# [1] "e" "f"
#
# [[5]]
# [1] "b" "c"
tidyverse version (same output)
library(tidyverse)
cum.sets <- imap(x, ~ unlist(x[1:.y]))
keep <- map2_lgl(x, lag(cum.sets), ~!all(.x %in% .y))
x[keep]
You can use Reduce. In this case. IF all the new values are not in the list already, then concatenate it to the list, else drop it. the initial is the first element of the list:
Reduce(function(i, y) c(i, if(!all(y %in% unlist(i))) list(y)), x[-1],init = x[1])
[[1]]
[1] "a" "d"
[[2]]
[1] "a" "c"
[[3]]
[1] "d" "e"
[[4]]
[1] "e" "f"
[[5]]
[1] "b" "c"
The most straightforward option is that you could store unique entries in another vector as you're looping through your input data.
Here's a solution without considering the positions (1 or 2) of the alphabets in your output list or the order of your input list.
dat <- list(c('a','d'),c('a','c'),c('d','e'),c('e','f'),c('b','c'),
c('f','c'),c('c','e'),c('f','b'),c('b','a'))
Dat <- list()
idx <- list()
for(i in dat){
if(!all(i %in% idx)){
Dat <- append(Dat, list(i))
## append to idx if not previously observed
if(! i[1] %in% idx) idx <- append(idx, i[1])
if(! i[2] %in% idx) idx <- append(idx, i[2])
}
}
print(Dat)
#> [[1]]
#> [1] "a" "d"
#>
#> [[2]]
#> [1] "a" "c"
#>
#> [[3]]
#> [1] "d" "e"
#>
#> [[4]]
#> [1] "e" "f"
#>
#> [[5]]
#> [1] "b" "c"
On another note, I'd advise against using T as your vector name as it's used as TRUE in R.
We can unlist, check duplicated values with duplicated, reformat as a matrix and filter out pairs of TRUE values:
x[colSums(matrix(duplicated(unlist(x)), nrow = 2)) != 2]
# [[1]]
# [1] "a" "d"
#
# [[2]]
# [1] "a" "c"
#
# [[3]]
# [1] "d" "e"
#
# [[4]]
# [1] "e" "f"
#
# [[5]]
# [1] "b" "c"
#
And I recommend you don't use T as a variable name, it means TRUE by default (thought it's discouraged to use it as such), this could lead to unpleasant debugging.

Creating strings from dataframe

My dataframe
x1 <- data.frame(C1 = letters[1:4], C3=1:4, C3=letters[11:14])
I need something a list where each listelement are two values from a row
x2 <- list(c("a", "1"), c("b", "2"), c("c", "3"), c("d", "4"))
Basically each two values from a row need to be a listelement so that I can process them later on!
I tried
lapply(X = x2, MARGIN = 1, FUN = paste, collapse = "")
But that did not give me the desired output!
Is this what you want?
paste0(x1[,1], x1[,2])
# [1] "a1" "b2" "c3" "d4"
How about:
as.list(paste0(x1[,1], x1[,2]))
# [[1]]
# [1] "a1"
#
# [[2]]
# [1] "b2"
#
# [[3]]
# [1] "c3"
#
# [[4]]
# [1] "d4"
It doesn't matter how many rows you have. You just need to specify the columns you want pasted into a string.
Here is a method using lapply:
lapply(1:nrow(x1), function(i) c(x1[i,1], x1[i,2]))
The result is
[[1]]
[1] "a" "1"
[[2]]
[1] "b" "2"
[[3]]
[1] "c" "3"
[[4]]
[1] "d" "4"
data
x1 <- data.frame(C1 = letters[1:4], C3=1:4, C3=letters[11:14],
stringsAsFactors = F)
Note that I used the stringsAsFactors = F argument to construct the data. If I didn't do this, then C1 and C3 would be factors, so I'd have to wrap x[i, 1] in as.character.
If there are multiple columns, we can use do.call
as.list(do.call(paste0, x1[-3]))

How to extract the non-empty elements of list in R?

I have very big list, but some of the elements(positions) are NULL, means nothing inside there.
I want just extract the part of my list, which is non-empty. Here is my effort, but I faced with error:
ind<-sapply(mylist, function() which(x)!=NULL)
list<-mylist[ind]
#Error in which(x) : argument to 'which' is not logical
Would someone help me to implement it ?
You can use the logical negation of is.null here. That can be applied over the list with vapply, and we can return the non-null elements with [
(mylist <- list(1:5, NULL, letters[1:5]))
# [[1]]
# [1] 1 2 3 4 5
# [[2]]
# NULL
# [[3]]
# [1] "a" "b" "c" "d" "e"
mylist[vapply(mylist, Negate(is.null), NA)]
# [[1]]
# [1] 1 2 3 4 5
# [[2]]
# [1] "a" "b" "c" "d" "e"
Try:
myList <- list(NULL, c(5,4,3), NULL, 25)
Filter(Negate(is.null), myList)
If you don't care of the result structure , you can just unlist:
unlist(mylist)
What the error means is that your brackets are not correct, the condition you want to test must be in the which function :
which(x != NULL)
One can extract the indices of null enteries in the list using "which" function and not include them in the new list by using "-".
new_list=list[-which(is.null(list[]))]
should do the job :)
Try this:
list(NULL, 1, 2, 3, NULL, 5) %>%
purrr::map_if(is.null, ~ NA_character_) %>% #convert NULL into NA
is.na() %>% #find NA
`!` %>% #Negate
which() #get index of Non-NULLs
or even this:
list(NULL, 1, 2, 3, NULL, 5) %>%
purrr::map_lgl(is.null) %>%
`!` %>% #Negate
which()
MyList <- list(NULL, c(5, 4, 3), NULL, NULL)
[[1]]
NULL
[[2]]
[1] 5 4 3
[[3]]
NULL
[[4]]
NULL
MyList[!unlist(lapply(MyList,is.null))]
[[1]]
[1] 5 4 3

Resources