I have three lists, List1 contains identifiers, List2 contains comma separated strings which may be items in List1, List3 contains numbers (some measured scores).
List1=c("Object1","Object2",......,"Objectn")
List2=c("Object1","Object2,Object3","Object4","Object5","Object6", .... )
List3=c("0.90","0,80",....)
All lists have same length.
What I want to do, for each item in List1, for each item in List2, check if the intersection is not null, and increment a score.
I can do this iteratively, but since my lists are too long, I wanted to do that with lapply but failed. Any help would be appreciated.
FinalScoreList="",
for(i in 1:length(List1)){
score=0
for(j in 1:length(List2)){
if(length(intersect(List1[[i]],
as.list(unlist(strsplit(as.character(List2[j]),',')))))>0) {
score=score+as.double(List3[j])
}
}
FinalScoreList=c(FinalScoreList,score)
}
Here is something that I think is along the lines of what you're after:
List1=c("Object1","Object2", "0.70")
List2=c("Object1","Object2", "Object3")
List3=c("0.90","0,80", "0.70")
# Make a list of lists
All_Lists = list(
"List1" = List1,
"List2" = List2,
"List3" = List3
)
# Create a dataframe listing all pairwise combinations of the lists
intersect_df <- data.frame(t(combn(names(All_Lists), 2)))
# Add a new column to this dataframe indicating the length of the intersection
# between each pair of lists
intersect_df$count <- apply(intersect_df, 1, function(r) length(intersect(All_Lists[[r[1]]], All_Lists[[r[2]]])))
Output:
> intersect_df
X1 X2 count
1 List1 List2 2
2 List1 List3 1
3 List2 List3 0
So each row in the output specifies a combination of two lists (X1 and X2), and the column count indicates the length of the intersection between those two lists.
First I would not recommend giving the name "List" (List1,List2,List3...) to items that are not lists.
Second since you want "List3" elements to be numeric do it from the beginning. I created the following example:
library(dplyr)
List1=c("Object1","Object2","Object3","Object4","Object5","Object6","Object7","Object8")
List2=c("Object3","Object4","Object5","Object6","Object7","Object8","Object9","Object10")
List3=c("0.90","0.80","0.70","0.60","0.50","0.40","0.30","0.20")%>%as.numeric
now with few alterations in your code we get the FinalScoreList
FinalScoreList=c()
for(i in 1:length(List1)){
score=0
for(j in 1:length(List2)){
if(length(intersect(List1[[i]], as.list(unlist(strsplit(as.character(List2[j]),',')))))>0) {
score=score+List3[j]
}
}
FinalScoreList=c(FinalScoreList,score)
}
> FinalScoreList
[1] 0.0 0.0 0.9 0.8 0.7 0.6 0.5 0.4
we can get the same result without looping with the code below:
df=data.frame(List1,List2,List3)
df$Matches<-0
matches0<-grep(List1,pattern=paste(intersect(List2,List1),collapse="|"))
matches1<-grep(List2,pattern=paste(intersect(List2,List1),collapse="|"))
df$Matches[matches0]<-List3[matches1]
> df$Matches
[1] 0.0 0.0 0.9 0.8 0.7 0.6 0.5 0.4
You can perform the split of List2 before your loops, this speed things up already. Also as you start with an empty vector FinalScoreList, R has to grow this in each step which makes it also slower.
This is a solution with nested lapply/sapply-calls:
List2 <- lapply(List2, function(x) unlist(strsplit(x, split = ",")))
FinalScoreList <- lapply(List1, function(x) {
indicator <- sapply(List2, function(y) x %in% y)
sum(List3[indicator])
})
unlist(FinalScoreList)
As #Antonis already said, you should store your List3 vector already as a numeric vector.
Data
List1 <- paste0("Object", 1:10)
List2 <- c("Object1", "Object6,Object5", "Object2,Object1", "Object7",
"Object6,Object8", "Object5,Object9", "Object4,Object2",
"Object3,Object8", "Object2,Object6", "Object10,Object3")
List3 <- runif(10)
Thank you guys.
Now suppose that List1 is in the same nature as List2, i.e., items could be concatenated strings. And also can have a different length.
I did lapply strsplit on List1 but still I obtain NA in FinalScoreList though.
List1 <- c("Object1", "Object7,Object5", "Object2,Object1")
List2 <- c("Object1", "Object6,Object5", "Object0,Object1", "Object7",
"Object6,Object8", "Object5,Object9", "Object4,Object2",
"Object3,Object8", "Object2,Object3", "Object10,Object3")
List3 <- runif(10)
List2 <- lapply(List2, function(x) unlist(strsplit(x, split = ",")))
List1 <- lapply(List1, function(x) unlist(strsplit(x, split = ",")))
FinalScoreList <- lapply(List1, function(x) {
indicator <- sapply(List2, function(y) {x %in% y})
sum(List3[indicator])
})
unlist(FinalScoreList)
[1] 1.595639 NA NA
Related
I have a list of matrices that I want to be able to cbind into one matrix, but I run into a problem when they have different sized rows. To fix this I am trying to add empty rows to the bottom of the shorter ones, however the second to last step isn't quite working.
## LIST OF MATRACIES
lst = list(as.matrix(data.frame(1:3, 1:3)), as.matrix(data.frame(1:2, 1:2)))
## FIND LONGEST ONE
mrow = lapply(lst, function(x) nrow(x))
mrow = max(unlist(lst))
## CREATE MATRIX LIST TO RBIND
tempM = lapply(1:length(lst), function(x) matrix(nrow = mrow - nrow(lst[x][[1]]), ncol = ncol(lst[x][[1]])))
## ADD ROWS TO SHORTER MATRICES TO MAkE LENGTHS LINE UP
## THIS IS WHERE THINGS GO WRONG
lst = lapply(1:length(tempM), function(x) rbind(lst[x][[1]], tempM[x]))
## GOAL TO BE ABLE TO:
rlist::list.cbind(lst) ## ERROR: Different number of rows
I'm double stealing a great function from here which should do exactly what you're looking for:
cbind.fill <- function(...){
nm <- list(...)
nm <- lapply(nm, as.matrix)
n <- max(sapply(nm, nrow))
do.call(cbind, lapply(nm, function (x)
rbind(x, matrix(, n-nrow(x), ncol(x)))))
}
You can easily apply it to your list lst using do.call:
lst <- list(as.matrix(data.frame(1:3, 1:3)), as.matrix(data.frame(1:2, 1:2)))
do.call(cbind.fill,lst)
# X1. X1.3.1 X1.2 X1.2.1
# [1,] 1 1 1 1
# [2,] 2 2 2 2
# [3,] 3 3 NA NA
Another way to skin this cat:
library(tidyverse)
lst = list(as.matrix(data.frame(1:3, 1:3)),
as.matrix(data.frame(1:2, 1:2))
)
targheight <- reduce(lst,function(a,b){max(nrow(a),nrow(b))})
lst <- reduce(map(lst,function(x){rbind(x,matrix(nrow=targheight-dim(x)[1],ncol=dim(x)[2]))}),cbind)
I have two lists like that :
List1 <- list(c("toto", "titi"), c("tata"), c("toto", "tz", "tutu"))
List2 <- list(c("titi", "toto", "tutu"),
c("tata", "trtr", "to", "tututu"),
c("to", "titi", "tutu", "tyty"),
c("titi", "tu", "tyty", "tete"),
c("tktk", "ta"))
And I want to build a list (of the matchings) which has a similar structure as the List1 object, except that the character vectors are replaced by a list of the matching indices of first level elements of List2, this for each string of each character vector.
The matching list that I would to obtain with list1 and list2 examples is thus :
Matchings <- list(list(list(1), list(1,3,4)),
list(list(2)),
list(list(1), list(), list(1,3)))
I've built the following code solution (that works, but too slow) with loops :
Matching_list <- lapply(List1, function(x) sapply(x, function(y) return(list())))
for (i in 1:length(List1)) {
for (j in 1: length(List1[[i]])) {
Matchings = list()
for (k in 1: length(List2)) {
if(any(List1[[i]][j] %in% List2[[k]])) {
Matchings <- c(Matchings, k)
}
if(length(Matchings) != 0) {
Matching_list[[i]][[j]] <- Matchings
}
}
}
}
... but it's definitly too slow for large lists. Thus, I seek for a solution that would make that stuff without loops as far as possible.
Could you help me?
How about this:
inds <- rep(seq_along(List2), sapply(List2, length))
#[1] 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5
ls <- unlist(List2)
res <-
relist(sapply(unlist(List1), function(a) as.list(inds[which(ls %in% a)])), skeleton=List1)
all.equal(Matchings, res)
#[1] TRUE
Which will give your desired output. I doubt that its possible without, at least, looping over List1.
I did not find an answer in other posts, nor did i understand them if they handled similar topics, since i am relatively new to R and to programming in general. I have the following survey output X that i am working with (extract):
A1B1 A1B2 A1B3 A1B4 A2B1 A2B2 A2B3 ...
-0.37014356 1.08841141 -0.126574243 -0.59169360 1.682673457 -0.427706432 -0.76091938 ...
3.03017573 1.39812421 0.243516558 -4.67181650 -0.378640756 2.039940436 -0.40785893 ...
3.50183121 1.51249433 -0.775449944 -4.23887560 -0.456911873 0.431838943 0.91108052 ...
...
I want to compute the difference of the maximum range diff(range(X[i,n:m])) of the first 4 (with n:m equals 1:4), the second 4 (5:8) and the third 4 (9:12) columns of every row i of X and put the results into a second matrix with i rows and 3 columns.
E.g. for the first row and the first 4 cols, it would be 1.08841141+0.59169360=1.68010501.
For this purpose i created a new matrix and tried to fill it up with the values:
newmatrix <- matrix(0,nrow(X),3)
newmatrix[1:nrow(X),1] <- for (i in (1:nrow(X))) {diff(range(X[i,1:4]))}
newmatrix[1:nrow(X),2] <- for (i in (1:nrow(X))) {diff(range(X[i,5:8]))}
newmatrix[1:nrow(X),3] <- for (i in (1:nrow(X))) {diff(range(X[i,9:12]))}
I get the output error:
Error in newmatrix[1:nrow(RBetas), 1] <- for (i in (1:nrow(RBetas))) { :
number of items to replace is not a multiple of replacement length
Thank you for your help!
Assuming that the block of columns are based on the first two characters, i.e. A1, A2, we can split this into different blocks by using substr to extract the first two characters from the column names and use this as index to split. Then, we can either use apply with range and diff to get the result or use pmax and pmin.
indx <- substr(colnames(df), 1,2)
If the grouping is not based on the column names but on the position, this should also work
indx <- (1:ncol(df)-1)%/%4 +1
res1 <- sapply(split(seq_len(ncol(df)), indx),
function(i) do.call(pmax,df[,i, drop=FALSE])-
do.call(pmin, df[,i, drop=FALSE]))
Or
res2 <- sapply(split(seq_len(ncol(df)), indx),
function(i) apply(df[,i, drop=FALSE], 1,
function(x) diff(range(x))) )
identical(res1, res2)
#[1] TRUE
res1
# A1 A2
#[1,] 1.680105 2.443593
#[2,] 7.701992 2.447799
#[3,] 7.740707 1.367992
Or using your code
newmatrix <- matrix(0, nrow(df), 2) #here the example dataset is only 7 columns
for(i in (1:nrow(df))) newmatrix[i,1] <- diff(range(df[i,1:4]))
for(i in (1:nrow(df))) newmatrix[i,2] <- diff(range(df[i,5:7]))
newmatrix
# [,1] [,2]
#[1,] 1.680105 2.443593
#[2,] 7.701992 2.447799
#[3,] 7.740707 1.367992
If you have many blocks of columns, you can try a double for loop
lst <- split(seq_len(ncol(df)), indx) #keep the columns to group in a `list`
newmatrix <- matrix(0, nrow(df), 2) #he
for(i in 1:nrow(df)){
for(j in seq_along(lst)){
newmatrix[i,j] <- diff(range(df[i, lst[[j]]]))
}
}
newmatrix
# [,1] [,2]
#[1,] 1.680105 2.443593
#[2,] 7.701992 2.447799
#[3,] 7.740707 1.367992
data
df <- structure(list(A1B1 = c(-0.37014356, 3.03017573, 3.50183121),
A1B2 = c(1.08841141, 1.39812421, 1.51249433), A1B3 = c(-0.126574243,
0.243516558, -0.775449944), A1B4 = c(-0.5916936, -4.6718165,
-4.2388756), A2B1 = c(1.682673457, -0.378640756, -0.456911873
), A2B2 = c(-0.427706432, 2.039940436, 0.431838943), A2B3 = c(-0.76091938,
-0.40785893, 0.91108052)), .Names = c("A1B1", "A1B2", "A1B3",
"A1B4", "A2B1", "A2B2", "A2B3"), class = "data.frame", row.names = c(NA,
-3L))
My limited knowledge of lists is getting me in trouble. I have a list containing multiple data frames of differing lengths at each level, with a parallel structure, that looks like this:
list.fun <- function(y) {
x1 <- data.frame(x = rnorm(20, mean=y))
x2 <- data.frame(x = rnorm(10, mean=y))
return(list(x1=x1, x2=x2))
}
## make list
foo <- lapply(1:3, list.fun)
I would like to extract all the data frames with the same name (in this case x1), assign labels as a new factor, and combine them into one single data frame. This manual approach works, but I am looking for something more generic.
## split results into into two data frames
b1 <- foo[[1]]$x1
b2 <- foo[[2]]$x1
b3 <- foo[[3]]$x1
b1$trial <- "t1"
b2$trial <- "t2"
b3$trial <- "t3"
## combine
bar <- rbind(d1, d2, d3)
Apologies if this has been asked already!
You can select all the elements named x1 using the [[ operator:
lapply(foo, "[[", "x1")
You can then bind them together using do.call
do.call(rbind, lapply(foo, '[[', "x1"))
> head(do.call(rbind, lapply(foo, '[[', "x1")))
x
1 1.3599227
2 0.7760733
3 0.9852219
4 0.5447365
5 2.1185779
6 0.5419102
> nrow(do.call(rbind, lapply(foo, '[[', "x1")))
[1] 60
For your more general case:
res <- do.call(rbind, lapply(seq_along(foo)
, function(x){
out <- foo[[x]][['x1']]
out$trial <- paste0("t", x)
out
}
)
)
> head(res)
x trial
1 1.3599227 t1
2 0.7760733 t1
3 0.9852219 t1
4 0.5447365 t1
5 2.1185779 t1
6 0.5419102 t1
I have two lists looking like this:
mylist <- list(a=c(1:5),
b = c(5:12),
c = c(2:8))
list.id <- list(a=2, b=8, c=5)
I want to count the number of elements in mylist that are higher than the corresponding element in list.id and divide the result for the length of element in mylist. I have written this function.
perm.fun <- perm.fun2 = function(x,y){length(which(x[[i]] < y[[i]]))/length(x[[i]])}
However, when I do: lapply(mylist, perm.fun, list.id) I do not obtain the expected result.
Thanks
Using lapply, you would need to loop on the indices (1, 2, 3) so they can be used to extract the elements from both mylist and list.id:
perm.fun <- function(i, x, y) mean(x[[i]] > y[[i]])
lapply(seq_along(mylist), perm.fun, mylist, list.id)
But mapply is a much better tool for that task. From the doc:
mapply applies FUN to the first elements of each ... argument, the second elements, the third elements, and so on.
So your code can just be:
mapply(function(x, y) mean(x > y), mylist, list.id)
# a b c
# 0.6000000 0.5000000 0.4285714