Matching in list of vectors with differents lengths

Matching in list of vectors with differents lengths - r

I have two lists like that :
List1 <- list(c("toto", "titi"), c("tata"), c("toto", "tz", "tutu"))
List2 <- list(c("titi", "toto", "tutu"),
c("tata", "trtr", "to", "tututu"),
c("to", "titi", "tutu", "tyty"),
c("titi", "tu", "tyty", "tete"),
c("tktk", "ta"))
And I want to build a list (of the matchings) which has a similar structure as the List1 object, except that the character vectors are replaced by a list of the matching indices of first level elements of List2, this for each string of each character vector.
The matching list that I would to obtain with list1 and list2 examples is thus :
Matchings <- list(list(list(1), list(1,3,4)),
list(list(2)),
list(list(1), list(), list(1,3)))
I've built the following code solution (that works, but too slow) with loops :
Matching_list <- lapply(List1, function(x) sapply(x, function(y) return(list())))
for (i in 1:length(List1)) {
for (j in 1: length(List1[[i]])) {
Matchings = list()
for (k in 1: length(List2)) {
if(any(List1[[i]][j] %in% List2[[k]])) {
Matchings <- c(Matchings, k)
}
if(length(Matchings) != 0) {
Matching_list[[i]][[j]] <- Matchings
}
}
}
}
... but it's definitly too slow for large lists. Thus, I seek for a solution that would make that stuff without loops as far as possible.
Could you help me?

How about this:
inds <- rep(seq_along(List2), sapply(List2, length))
#[1] 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5
ls <- unlist(List2)
res <-
relist(sapply(unlist(List1), function(a) as.list(inds[which(ls %in% a)])), skeleton=List1)
all.equal(Matchings, res)
#[1] TRUE
Which will give your desired output. I doubt that its possible without, at least, looping over List1.

Related

why exist a difference between appending lists and vectors?

My question is why the following code
list1 <- list()
list1[[1]] <- c()
list1[[1]] <- c(list1[[1]], 7)
shows the error
Error in list1[[1]] : subscript out of bounds
and why the following code
vector1 <- c()
vector1 <- c(vector1, 7)
works? I want to do something like this
for (i in c(1,2,3)){
for (j in c(1,2,3)){
list1[[i]].append(list3[[i]], j)
}
}

Are you looking for something like that ?
Using for loop
You can append numbers to each vector of a list by doing this:
list1 = vector("list",3)
for(i in 1:3)
{
for(j in 1:3)
{
list1[[i]] = c(list1[[i]],j)
}
}
And you get the following output:
> list1
[[1]]
[1] 1 2 3
[[2]]
[1] 1 2 3
[[3]]
[1] 1 2 3
Using lapply
You can do the same thing without the need to use for loop but instead lapply
list3 = vector("list",3)
list3 = lapply(list3,function(x){1:3})
and you get a similar output

add rows to the shorter matrices in a list to match longest one in list

I have a list of matrices that I want to be able to cbind into one matrix, but I run into a problem when they have different sized rows. To fix this I am trying to add empty rows to the bottom of the shorter ones, however the second to last step isn't quite working.
## LIST OF MATRACIES
lst = list(as.matrix(data.frame(1:3, 1:3)), as.matrix(data.frame(1:2, 1:2)))
## FIND LONGEST ONE
mrow = lapply(lst, function(x) nrow(x))
mrow = max(unlist(lst))
## CREATE MATRIX LIST TO RBIND
tempM = lapply(1:length(lst), function(x) matrix(nrow = mrow - nrow(lst[x][[1]]), ncol = ncol(lst[x][[1]])))
## ADD ROWS TO SHORTER MATRICES TO MAkE LENGTHS LINE UP
## THIS IS WHERE THINGS GO WRONG
lst = lapply(1:length(tempM), function(x) rbind(lst[x][[1]], tempM[x]))
## GOAL TO BE ABLE TO:
rlist::list.cbind(lst) ## ERROR: Different number of rows

I'm double stealing a great function from here which should do exactly what you're looking for:
cbind.fill <- function(...){
nm <- list(...)
nm <- lapply(nm, as.matrix)
n <- max(sapply(nm, nrow))
do.call(cbind, lapply(nm, function (x)
rbind(x, matrix(, n-nrow(x), ncol(x)))))
}
You can easily apply it to your list lst using do.call:
lst <- list(as.matrix(data.frame(1:3, 1:3)), as.matrix(data.frame(1:2, 1:2)))
do.call(cbind.fill,lst)
# X1. X1.3.1 X1.2 X1.2.1
# [1,] 1 1 1 1
# [2,] 2 2 2 2
# [3,] 3 3 NA NA

Another way to skin this cat:
library(tidyverse)
lst = list(as.matrix(data.frame(1:3, 1:3)),
as.matrix(data.frame(1:2, 1:2))
)
targheight <- reduce(lst,function(a,b){max(nrow(a),nrow(b))})
lst <- reduce(map(lst,function(x){rbind(x,matrix(nrow=targheight-dim(x)[1],ncol=dim(x)[2]))}),cbind)

R Apply functions to lists with lapply

I have three lists, List1 contains identifiers, List2 contains comma separated strings which may be items in List1, List3 contains numbers (some measured scores).
List1=c("Object1","Object2",......,"Objectn")
List2=c("Object1","Object2,Object3","Object4","Object5","Object6", .... )
List3=c("0.90","0,80",....)
All lists have same length.
What I want to do, for each item in List1, for each item in List2, check if the intersection is not null, and increment a score.
I can do this iteratively, but since my lists are too long, I wanted to do that with lapply but failed. Any help would be appreciated.
FinalScoreList="",
for(i in 1:length(List1)){
score=0
for(j in 1:length(List2)){
if(length(intersect(List1[[i]],
as.list(unlist(strsplit(as.character(List2[j]),',')))))>0) {
score=score+as.double(List3[j])
}
}
FinalScoreList=c(FinalScoreList,score)
}

Here is something that I think is along the lines of what you're after:
List1=c("Object1","Object2", "0.70")
List2=c("Object1","Object2", "Object3")
List3=c("0.90","0,80", "0.70")
# Make a list of lists
All_Lists = list(
"List1" = List1,
"List2" = List2,
"List3" = List3
)
# Create a dataframe listing all pairwise combinations of the lists
intersect_df <- data.frame(t(combn(names(All_Lists), 2)))
# Add a new column to this dataframe indicating the length of the intersection
# between each pair of lists
intersect_df$count <- apply(intersect_df, 1, function(r) length(intersect(All_Lists[[r[1]]], All_Lists[[r[2]]])))
Output:
> intersect_df
X1 X2 count
1 List1 List2 2
2 List1 List3 1
3 List2 List3 0
So each row in the output specifies a combination of two lists (X1 and X2), and the column count indicates the length of the intersection between those two lists.

First I would not recommend giving the name "List" (List1,List2,List3...) to items that are not lists.
Second since you want "List3" elements to be numeric do it from the beginning. I created the following example:
library(dplyr)
List1=c("Object1","Object2","Object3","Object4","Object5","Object6","Object7","Object8")
List2=c("Object3","Object4","Object5","Object6","Object7","Object8","Object9","Object10")
List3=c("0.90","0.80","0.70","0.60","0.50","0.40","0.30","0.20")%>%as.numeric
now with few alterations in your code we get the FinalScoreList
FinalScoreList=c()
for(i in 1:length(List1)){
score=0
for(j in 1:length(List2)){
if(length(intersect(List1[[i]], as.list(unlist(strsplit(as.character(List2[j]),',')))))>0) {
score=score+List3[j]
}
}
FinalScoreList=c(FinalScoreList,score)
}
> FinalScoreList
[1] 0.0 0.0 0.9 0.8 0.7 0.6 0.5 0.4
we can get the same result without looping with the code below:
df=data.frame(List1,List2,List3)
df$Matches<-0
matches0<-grep(List1,pattern=paste(intersect(List2,List1),collapse="|"))
matches1<-grep(List2,pattern=paste(intersect(List2,List1),collapse="|"))
df$Matches[matches0]<-List3[matches1]
> df$Matches
[1] 0.0 0.0 0.9 0.8 0.7 0.6 0.5 0.4

You can perform the split of List2 before your loops, this speed things up already. Also as you start with an empty vector FinalScoreList, R has to grow this in each step which makes it also slower.
This is a solution with nested lapply/sapply-calls:
List2 <- lapply(List2, function(x) unlist(strsplit(x, split = ",")))
FinalScoreList <- lapply(List1, function(x) {
indicator <- sapply(List2, function(y) x %in% y)
sum(List3[indicator])
})
unlist(FinalScoreList)
As #Antonis already said, you should store your List3 vector already as a numeric vector.
Data
List1 <- paste0("Object", 1:10)
List2 <- c("Object1", "Object6,Object5", "Object2,Object1", "Object7",
"Object6,Object8", "Object5,Object9", "Object4,Object2",
"Object3,Object8", "Object2,Object6", "Object10,Object3")
List3 <- runif(10)

Thank you guys.
Now suppose that List1 is in the same nature as List2, i.e., items could be concatenated strings. And also can have a different length.
I did lapply strsplit on List1 but still I obtain NA in FinalScoreList though.
List1 <- c("Object1", "Object7,Object5", "Object2,Object1")
List2 <- c("Object1", "Object6,Object5", "Object0,Object1", "Object7",
"Object6,Object8", "Object5,Object9", "Object4,Object2",
"Object3,Object8", "Object2,Object3", "Object10,Object3")
List3 <- runif(10)
List2 <- lapply(List2, function(x) unlist(strsplit(x, split = ",")))
List1 <- lapply(List1, function(x) unlist(strsplit(x, split = ",")))
FinalScoreList <- lapply(List1, function(x) {
indicator <- sapply(List2, function(y) {x %in% y})
sum(List3[indicator])
})
unlist(FinalScoreList)
[1] 1.595639 NA NA

Chaining list of vectors

If I have a list of vectors such as below
list.x <- list(1:2, 1:3, 3:4, 5, 5:6)
Is there a way to replace each list element with an element that includes all the other values that the element can be paired with?
For example the first element (list.x[[1]]) would be replace with 1:4 because element 2 (list.x[[2]]) shows that 2, is also paired with 3, and element 3 shows that 3 is also paired with 4.
The final result I would like to achieve would be this list
final.list <- list(1:4, 1:4, 1:4, 5:6, 5:6)

I needed a change of pace today, so I decided to try to answer the question using base R. Here it goes:
First, I created a function that unions two vectors if they intersect, and if not, simply returns the first vector:
expand.if.toucing <- function(vector1, vector2) {
i = intersect(vector1, vector2);
if (NROW(i) > 0)
union(vector1, vector2)
else
vector1;
}
Then I made a function that merges one element in the list of vectors with another:
list.reduce <- function (lst) {
for(v1 in 1:NROW(lst))
for (v2 in 1:NROW(lst)) {
if (v1 == v2)
next;
prevLength <- NROW(lst[[v1]]);
lst[[v1]] <- expand.if.toucing(lst[[v1]], lst[[v2]]);
newLength <- NROW(lst[[v1]]);
if (newLength == prevLength)
next;
return(lst[-v2]);
}
lst;
}
After this, I made a function that merges all vectors in the list that can be merged. This is sort of a proto cluster analysis, so I called it clusterize:
clusterize <- function (lst) {
reduced = TRUE;
while(reduced) {
prevLength <- NROW(lst);
lst <- list.reduce(lst);
newLength <- NROW(lst);
reduced <- prevLength != newLength;
}
lst;
}
Now it's just a matter of replacing each element in the original list with its associated cluster:
replace.with.clusters <- function(lst, clusters) {
for(l in 1:NROW(lst))
for(c in 1:NROW(clusters)) {
lst[[l]] <- expand.if.toucing(lst[[l]], clusters[[c]]);
next;
}
lst;
}
You're good to go. The two main functions are clusterize and replace.with.cluster. Use them like this:
list.x <- list(1:2, 1:3, 3:4, 5, 5:6)
clusters <- clusterize(list.x);
replace.with.clusters(list.x, clusters);
# Outputs the following:
#
# [[1]]
# [1] 1 2 3 4
#
# [[2]]
# [1] 1 2 3 4
#
# [[3]]
# [1] 3 4 1 2
#
# [[4]]
# [1] 5 6
#
# [[5]]
# [1] 5 6
The third element is in a different order than your list, but from the way you describe the problem, order is not truly relevant.

How do you append 2 subnames to a variable in R?

It has to be possible but I can't find an answer (or think of the proper search terms).
Basically I'm stuck in a double loop and need to append 2 different subnames (1 for each separate iteration) to a variable.
Basic example:
var <- list()
i1 <- 0
i2 <- 0
while (i1 < 3) {
i1 <- i1 - 1
while (i2 < 3) {
i2 <- i2 - 1
var[[i1]][[i2]] <- c(1, 5, 8)
}
}
However, stringing two subnames together like that doesn't seem to work. I'd like to get 9 results (based on 3x3 iterations) names as var11, var12, var13, var21, etc.
Thanks!

In R, you can do this in a simple one liner, but I'll initialize some variables to be explicit about it.
i1start<-1
i1end<-3
i2start<-1
i2end<-3
result<-lapply(i1start:i1end, function(i1) lapply(i2start:i2end, function(i2) c(1,5,8)))
Which should return a list of lists, where each entry at position (i1,i2) is the vector (1,5,8).
Let's break this down. It'd be useful if you look up lapply (via the command ?lapply) to brush up on the function.
If we were to just run the inner lapply, what would happen?
lapply(i2start:i2end, function(i2) c(1,5,8))
#[[1]]
#[1] 1 5 8
#
#[[2]]
#[1] 1 5 8
#
#[[3]]
#[1] 1 5 8
lapply "applies" the vector 1,2,3 (i2start:i2end) to the function, which is essentially the following
#lapply does this
function(1) c(1,5,8)
function(2) c(1,5,8)
function(3) c(1,5,8)
and then stores all these results in a list (since it is l-apply)
We then use the same concept in the outer lapply call, except our function has changed. It is no longer
function(i) c(1,5,8)
but actually
#substituting 1:3 for i2start:i2end
function(i) lapply(1:3, function(i2) c(1,5,8))
so now we are calling
lapply(1:3, function(i1) lapply(1:3, function(i2) c(1,5,8)))
which essentially calls
function(1) lapply(1:3, function(i2) c(1,5,8))
function(2) lapply(1:3, function(i2) c(1,5,8))
function(3) lapply(1:3, function(i2) c(1,5,8))
and stores the results of those functions in a list. Each of those function calls then run the inner function, which I explained just prior, and when you put it all together, it leads to your result! A lot going on in one line

Seems like you need to initialize properly your i1 and i2 variables and increase them rather than decrease. Then, in the inner loop you initialize a sublist and assign that sublist to your var list. Try this:
var <- list()
i1 <- 0
while (i1 < 3) {
i2<-0
i1 <- i1 + 1
var2<-list()
while (i2 < 3) {
i2 <- i2 + 1
var2[[i2]] <- sample(1:10,3)
}
var[[i1]]<-var2
}
Another approach could be initialize your principal list and all the sublists:
var<-vector("list",3)
for (i in 1:3) var[[i]]<-vector("list",3)
At this point, you can assign values with the double subscript:
var[[i]][[j]]<-c(1,5,8)
Assuming that both i and j ranging from 1 to 3,

Categories

HOME

fastapi

artifactory

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Matching in list of vectors with differents lengths - r

Related

why exist a difference between appending lists and vectors?

add rows to the shorter matrices in a list to match longest one in list

R Apply functions to lists with lapply

Chaining list of vectors

How do you append 2 subnames to a variable in R?

Categories

Resources