Chaining list of vectors - r

If I have a list of vectors such as below
list.x <- list(1:2, 1:3, 3:4, 5, 5:6)
Is there a way to replace each list element with an element that includes all the other values that the element can be paired with?
For example the first element (list.x[[1]]) would be replace with 1:4 because element 2 (list.x[[2]]) shows that 2, is also paired with 3, and element 3 shows that 3 is also paired with 4.
The final result I would like to achieve would be this list
final.list <- list(1:4, 1:4, 1:4, 5:6, 5:6)

I needed a change of pace today, so I decided to try to answer the question using base R. Here it goes:
First, I created a function that unions two vectors if they intersect, and if not, simply returns the first vector:
expand.if.toucing <- function(vector1, vector2) {
i = intersect(vector1, vector2);
if (NROW(i) > 0)
union(vector1, vector2)
else
vector1;
}
Then I made a function that merges one element in the list of vectors with another:
list.reduce <- function (lst) {
for(v1 in 1:NROW(lst))
for (v2 in 1:NROW(lst)) {
if (v1 == v2)
next;
prevLength <- NROW(lst[[v1]]);
lst[[v1]] <- expand.if.toucing(lst[[v1]], lst[[v2]]);
newLength <- NROW(lst[[v1]]);
if (newLength == prevLength)
next;
return(lst[-v2]);
}
lst;
}
After this, I made a function that merges all vectors in the list that can be merged. This is sort of a proto cluster analysis, so I called it clusterize:
clusterize <- function (lst) {
reduced = TRUE;
while(reduced) {
prevLength <- NROW(lst);
lst <- list.reduce(lst);
newLength <- NROW(lst);
reduced <- prevLength != newLength;
}
lst;
}
Now it's just a matter of replacing each element in the original list with its associated cluster:
replace.with.clusters <- function(lst, clusters) {
for(l in 1:NROW(lst))
for(c in 1:NROW(clusters)) {
lst[[l]] <- expand.if.toucing(lst[[l]], clusters[[c]]);
next;
}
lst;
}
You're good to go. The two main functions are clusterize and replace.with.cluster. Use them like this:
list.x <- list(1:2, 1:3, 3:4, 5, 5:6)
clusters <- clusterize(list.x);
replace.with.clusters(list.x, clusters);
# Outputs the following:
#
# [[1]]
# [1] 1 2 3 4
#
# [[2]]
# [1] 1 2 3 4
#
# [[3]]
# [1] 3 4 1 2
#
# [[4]]
# [1] 5 6
#
# [[5]]
# [1] 5 6
The third element is in a different order than your list, but from the way you describe the problem, order is not truly relevant.

Related

concatenation sublists of two different lists [duplicate]

I have two lists
first = list(a = 1, b = 2, c = 3)
second = list(a = 2, b = 3, c = 4)
I want to merge these two lists so the final product is
$a
[1] 1 2
$b
[1] 2 3
$c
[1] 3 4
Is there a simple function to do this?
If lists always have the same structure, as in the example, then a simpler solution is
mapply(c, first, second, SIMPLIFY=FALSE)
This is a very simple adaptation of the modifyList function by Sarkar. Because it is recursive, it will handle more complex situations than mapply would, and it will handle mismatched name situations by ignoring the items in 'second' that are not in 'first'.
appendList <- function (x, val)
{
stopifnot(is.list(x), is.list(val))
xnames <- names(x)
for (v in names(val)) {
x[[v]] <- if (v %in% xnames && is.list(x[[v]]) && is.list(val[[v]]))
appendList(x[[v]], val[[v]])
else c(x[[v]], val[[v]])
}
x
}
> appendList(first,second)
$a
[1] 1 2
$b
[1] 2 3
$c
[1] 3 4
Here are two options, the first:
both <- list(first, second)
n <- unique(unlist(lapply(both, names)))
names(n) <- n
lapply(n, function(ni) unlist(lapply(both, `[[`, ni)))
and the second, which works only if they have the same structure:
apply(cbind(first, second),1,function(x) unname(unlist(x)))
Both give the desired result.
Here's some code that I ended up writing, based upon #Andrei's answer but without the elegancy/simplicity. The advantage is that it allows a more complex recursive merge and also differs between elements that should be connected with rbind and those that are just connected with c:
# Decided to move this outside the mapply, not sure this is
# that important for speed but I imagine redefining the function
# might be somewhat time-consuming
mergeLists_internal <- function(o_element, n_element){
if (is.list(n_element)){
# Fill in non-existant element with NA elements
if (length(n_element) != length(o_element)){
n_unique <- names(n_element)[! names(n_element) %in% names(o_element)]
if (length(n_unique) > 0){
for (n in n_unique){
if (is.matrix(n_element[[n]])){
o_element[[n]] <- matrix(NA,
nrow=nrow(n_element[[n]]),
ncol=ncol(n_element[[n]]))
}else{
o_element[[n]] <- rep(NA,
times=length(n_element[[n]]))
}
}
}
o_unique <- names(o_element)[! names(o_element) %in% names(n_element)]
if (length(o_unique) > 0){
for (n in o_unique){
if (is.matrix(n_element[[n]])){
n_element[[n]] <- matrix(NA,
nrow=nrow(o_element[[n]]),
ncol=ncol(o_element[[n]]))
}else{
n_element[[n]] <- rep(NA,
times=length(o_element[[n]]))
}
}
}
}
# Now merge the two lists
return(mergeLists(o_element,
n_element))
}
if(length(n_element)>1){
new_cols <- ifelse(is.matrix(n_element), ncol(n_element), length(n_element))
old_cols <- ifelse(is.matrix(o_element), ncol(o_element), length(o_element))
if (new_cols != old_cols)
stop("Your length doesn't match on the elements,",
" new element (", new_cols , ") !=",
" old element (", old_cols , ")")
}
return(rbind(o_element,
n_element,
deparse.level=0))
return(c(o_element,
n_element))
}
mergeLists <- function(old, new){
if (is.null(old))
return (new)
m <- mapply(mergeLists_internal, old, new, SIMPLIFY=FALSE)
return(m)
}
Here's my example:
v1 <- list("a"=c(1,2), b="test 1", sublist=list(one=20:21, two=21:22))
v2 <- list("a"=c(3,4), b="test 2", sublist=list(one=10:11, two=11:12, three=1:2))
mergeLists(v1, v2)
This results in:
$a
[,1] [,2]
[1,] 1 2
[2,] 3 4
$b
[1] "test 1" "test 2"
$sublist
$sublist$one
[,1] [,2]
[1,] 20 21
[2,] 10 11
$sublist$two
[,1] [,2]
[1,] 21 22
[2,] 11 12
$sublist$three
[,1] [,2]
[1,] NA NA
[2,] 1 2
Yeah, I know - perhaps not the most logical merge but I have a complex parallel loop that I had to generate a more customized .combine function for, and therefore I wrote this monster :-)
merged = map(names(first), ~c(first[[.x]], second[[.x]])
merged = set_names(merged, names(first))
Using purrr. Also solves the problem of your lists not being in order.
In general one could,
merge_list <- function(...) by(v<-unlist(c(...)),names(v),base::c)
Note that the by() solution returns an attributed list, so it will print differently, but will still be a list. But you can get rid of the attributes with attr(x,"_attribute.name_")<-NULL. You can probably also use aggregate().
We can do a lapply with c(), and use setNames to assign the original name to the output.
setNames(lapply(1:length(first), function(x) c(first[[x]], second[[x]])), names(first))
$a
[1] 1 2
$b
[1] 2 3
$c
[1] 3 4
Following #Aaron left Stack Overflow and #Theo answer, the merged list's elements are in form of vector c.
But if you want to bind rows and columns use rbind and cbind.
merged = map(names(first), ~rbind(first[[.x]], second[[.x]])
merged = set_names(merged, names(first))
Using dplyr, I found that this line works for named lists using the same names:
as.list(bind_rows(first, second))

Use paste0 to create multiple object names with a for loop

I would like to create multiple object names with a for loop. I have tried the following which fails horribly:
somevar_1 = c(1,2,3)
somevar_2 = c(4,5,6)
somevar_3 = c(7,8,9)
for (n in length(1:3)) {
x <- as.name(paste0("somevar_",[i]))
x[2]
}
The desired result is x being somevar_1, somevar_2, somevar_3 for the respective iterations, and x[2] being 2, 5 and 8 respectively.
How should I do this?
somevar_1 = c(1,2,3)
somevar_2 = c(4,5,6)
somevar_3 = c(7,8,9)
for (n in 1:3) {
x <- get(paste0("somevar_", n))
print(x[2])
}
Result
[1] 2
[1] 5
[1] 8
We can use mget to get all the required objects in a list and use sapply to subset 2nd element from each of them.
sapply(mget(paste0("somevar_", 1:3)), `[`, 2)
#somevar_1 somevar_2 somevar_3
# 2 5 8

Filter list in R base on criteria within list objects

This is a trivial question, but I'm stumped. How can I filter a list of dataframes based on their length? The list is nested -- meaning there are lists of lists of dataframes of different lengths. Here is an example. I'd like to filter or subset the list to include only those objects that are length n, say 3.
Here is an example and my current approach.
library(tidyverse)
# list of list with arbitrary lengths
star.wars_ls <- list(starwars[1:5],
list(starwars[1:8], starwars[4:6]),
starwars[1:2],
list(starwars[1:7], starwars[2:6]),
starwars[1:3])
# I want to filter the list by dataframes that are 3 variables long (i.e. length(df == 3).
# Here is my attempt, I'm stuck at how to obtain
# the number of varibles in each dataframe and then filter by it.
map(star.wars_ls, function(x){
map(x, function(x){ ## Incorrectly returns 20 for all
length(y)
})
})
We can do
map(star.wars_ls, ~ if(is.data.frame(.x)) .x[length(.x) == 3] else map(.x, ~ .x[length(.x) == 3]))
You should be able to check whether the item in the star.wars_ls is a list or a data frame. Then, check the number of columns within each item. Try using:
library(tidyverse)
# list of list with arbitrary lengths
star.wars_ls <- list(starwars[1:5],
list(starwars[1:8], starwars[4:6]),
starwars[1:2],
list(starwars[1:7], starwars[2:6]),
starwars[1:3])
# I want to filter the list by dataframes that are 3 variables long (i.e. length(df == 3).
datacols <- map(star.wars_ls, function(X) {
if (is.data.frame(X) == T) {
ncol(X) }
else {
map(X, function(Y) {
ncol(Y)
})
}
}
)
# > datacols
# [[1]]
# [1] 5
#
# [[2]]
# [[2]][[1]]
# [1] 8
#
# [[2]][[2]]
# [1] 3
#
#
# [[3]]
# [1] 2
#
# [[4]]
# [[4]][[1]]
# [1] 7
#
# [[4]][[2]]
# [1] 5
#
#
# [[5]]
# [1] 3
This will only give you the length (number of columns) of each data frame within the list. To get the indices (I'm sure there's a more efficient way to do this -- maybe someone else can help with that):
indexlist <- c()
for (i in 1:length(datacols)) {
if (length(datacols[[i]]) == 1) {
if (datacols[[i]][1] == 3) {
index <- i
indexlist <- c(indexlist, as.character(index))
}
} else {
for (j in 1:length(datacols[[i]])) {
if (datacols[[i]][[j]][1] == 3) {
index <- str_c(i, ",", j)
indexlist <- c(indexlist, index)
}
}
}
}
# > indexlist
# [1] "2,2" "5"
you could use recursion. It doesnt matter how deeply nested the list is:
ff = function(x)map(x,~if(is.data.frame(.x)){if(length(.x)==3) .x} else ff(.x))
ff(star.wars_ls)

Matching in list of vectors with differents lengths

I have two lists like that :
List1 <- list(c("toto", "titi"), c("tata"), c("toto", "tz", "tutu"))
List2 <- list(c("titi", "toto", "tutu"),
c("tata", "trtr", "to", "tututu"),
c("to", "titi", "tutu", "tyty"),
c("titi", "tu", "tyty", "tete"),
c("tktk", "ta"))
And I want to build a list (of the matchings) which has a similar structure as the List1 object, except that the character vectors are replaced by a list of the matching indices of first level elements of List2, this for each string of each character vector.
The matching list that I would to obtain with list1 and list2 examples is thus :
Matchings <- list(list(list(1), list(1,3,4)),
list(list(2)),
list(list(1), list(), list(1,3)))
I've built the following code solution (that works, but too slow) with loops :
Matching_list <- lapply(List1, function(x) sapply(x, function(y) return(list())))
for (i in 1:length(List1)) {
for (j in 1: length(List1[[i]])) {
Matchings = list()
for (k in 1: length(List2)) {
if(any(List1[[i]][j] %in% List2[[k]])) {
Matchings <- c(Matchings, k)
}
if(length(Matchings) != 0) {
Matching_list[[i]][[j]] <- Matchings
}
}
}
}
... but it's definitly too slow for large lists. Thus, I seek for a solution that would make that stuff without loops as far as possible.
Could you help me?
How about this:
inds <- rep(seq_along(List2), sapply(List2, length))
#[1] 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5
ls <- unlist(List2)
res <-
relist(sapply(unlist(List1), function(a) as.list(inds[which(ls %in% a)])), skeleton=List1)
all.equal(Matchings, res)
#[1] TRUE
Which will give your desired output. I doubt that its possible without, at least, looping over List1.

How to get counts of intersections of six or more sets?

I am running an analysis of a number of sets and I have been using the package VennDiagram, which has been working just fine, but it only handles up to 5 sets, and now it turns out that I need to look at 6 or more sets.
Ideally, I'm looking for a something that can do this (below) with 6 or more sets, but it doesn't necessarily have to have a plot function as long as the counts can be retrieved:
Any ideas of what I can do to add one or more sets to these five and still get the counts?
Thanks!
Here is a recursive solution to find all of the intersections in the venn diagram. sets can be a list containing any number of sets to find the intersections of. For some reason, the code in the package you are using is all hard-coded for each set size, so it doesn't scale to arbitrary intersections.
## Build intersections, 'out' accumulates the result
intersects <- function(sets, out=NULL) {
if (length(sets) < 2) return ( out ) # return result
len <- seq(length(sets))
if (missing(out)) out <- list() # initialize accumulator
for (idx in split((inds <- combn(length(sets), 2)), col(inds))) { # 2-way combinations
ii <- len > idx[2] & !(len %in% idx) # indices to keep for next intersect
out[[(n <- paste(names(sets[idx]), collapse="."))]] <- intersect(sets[[idx[1]]], sets[[idx[2]]])
out <- intersects(append(out[n], sets[ii]), out=out)
}
out
}
The function builds pairwise intersections. To avoid building repeated solutions it only calls itself on components of the set with indices greater than those that were joined (ii in the code). The result is a list of all the intersections. If you pass named components, then the result will be named by the convention "set1.set2" etc.
Results
## Some sample data
set.seed(0)
sets <- setNames(lapply(1:3, function(.) sample(letters, 10)), letters[1:3])
## Manually check intersections
a.b <- intersect(sets[[1]], sets[[2]])
b.c <- intersect(sets[[2]], sets[[3]])
a.c <- intersect(sets[[1]], sets[[3]])
a.b.c <- intersect(a.b, sets[[3]])
## Compare
res <- intersects(sets)
all.equal(res[c("a.b","a.c","b.c","a.b.c")], list(a.b=a.b, a.c=a.c, b.c=b.c, a.b.c=a.b.c))
# TRUE
res
# $a.b
# [1] "g" "i" "n" "e" "r"
#
# $a.b.c
# [1] "g"
#
# $a.c
# [1] "x" "g"
#
# $b.c
# [1] "f" "g"
## Get the counts of intersections
lengths(res)
# a.b a.b.c a.c b.c
# 5 1 2 2
Or, with numbers
intersects(list(a=1:10, b=c(1, 5, 10), c=9:20))
# $a.b
# [1] 1 5 10
# $a.b.c
# [1] 10
# $a.c
# [1] 9 10
# $b.c
# [1] 10
Here's an attempt:
list1 <- c("a","b","c","e")
list2 <- c("a","b","c","e")
list3 <- c("a","b")
list4 <- c("a","b","g","h")
list_names <- c("list1","list2","list3","list4")
lapply(1:length(list_names),function(y){
combinations <- combn(list_names,y)
res<-as.list(apply(combinations,2,function(x){
if(length(x)==1){
p <- setdiff(get(x),unlist(sapply(setdiff(list_names,x),get)))
}
else if(length(x) < length(list_names)){
p <- setdiff(Reduce(intersect,lapply(x,get)),Reduce(union,sapply(setdiff(list_names,x),get)))
}
else p <- Reduce(intersect,lapply(x,get))
if(!identical(p,character(0))) p
else NA
}))
if(y==length(list_names)) {
res[[1]] <- unlist(res);
res<-res[1]
}
names(res) <- apply(combinations,2,paste,collapse="-")
res
})
The first lapply is used to loop from 1 to the number of sets you have. Then I took all possible combinations of list names, taken y at a time. This essentially generates all of the different subareas in the Venn diagram.
For each combination, the output is the difference between the intersection of the lists in the current combination to the union of the other lists that are not in the combination.
The final result is a list of length the number of sets inputed. The first element of that list holds the unique elements in each list, the second element the unique elements in any combination of two lists etc.
OK, here's one way, assuming you represent sets as a list of vectors, and items to be searched in those sets also as vector:
# Example data format
sets <- list(v1 = 1:6, v2 = 1:8, v3 = 3:8)
items <- c(2:7)
# Search for items in each set
result <- data.frame(searched = items)
for (set in names(sets)) {
result <- cbind(result, items %in% sets[[set]])
names(result)[length(names(result))] <- set
}
# Count
library(plyr)
ddply(result, names(sets), function (i) {
data.frame(count = nrow(i))
})
This gives you all combinations actually existing in the itemset:
v1 v2 v3 count
1 FALSE TRUE TRUE 1
2 TRUE TRUE FALSE 1
3 TRUE TRUE TRUE 4

Resources