Intersection of vectors within a same list [duplicate] - r

This question already has answers here:
How to find common elements from multiple vectors?
(3 answers)
Closed 5 years ago.
I have a list of vectors with character strings:
lst <- list(v1 = c("A", "B", "C", "D"), v2 = c("B", "C", "D", "E"), v3 = c("C", "D", "E", "F")
How can I get a new vector that includes only those character strings that intersect each vector. E.g. in this case
out <- c("C", "D")
A solution with lapply would be great.

The easiest function to use would be Reduce. Try
Reduce(intersect, lst)
That's basically the same as
# data
x <- list(A, B, C)
# these are equivalent
Reduce(intersect, x)
intersect(intersect(A, B), C)

Related

Return index numbers in R where object lengths are not multiples [duplicate]

This question already has answers here:
Is there an R function for finding the index of an element in a vector?
(4 answers)
Closed 2 years ago.
I have a vector like so:
foo = c("A", "B", "C", "D")
And I want a vector of selected index numbers, which I imagined I could do like so:
which(foo == c("A", "B", "D"))
But apparently this only works if the lengths of the two vectors are multiples, as otherwise you get an incomplete result followed by a warning message:
"longer object length is not a multiple of shorter object length".
So how do I get what I'm after, which is "1 2 4"?
Use match:
match(c('A', 'B', 'C'), foo)
Using %in% is one option here:
foo <- c("A", "B", "C", "D")
x <- c("A", "B", "D")
c(1:4)[foo %in% x] # [1] 1 2 4
The quantity foo %in% x returns a logical vector which can then be used to subset the indices you want to see.

R add all combinations of three values of a vector to a three-dimensional array

I have a data frame with two columns. The first one "V1" indicates the objects on which the different items of the second column "V2" are found, e.g.:
V1 <- c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C")
V2 <- c("a","b","c","d","a","c","d","a","b","d","e")
df <- data.frame(V1, V2)
"A" for example contains "a", "b", "c", and "d". What I am looking for is a three dimensional array with dimensions of length(unique(V2)) (and the names "a" to "e" as dimnames).
For each unique value of V1 I want all possible combinations of three V2 items (e.g. for "A" it would be c("a", "b", "c"), c("a", "b", "d", and c("b", "c", "d").
Each of these "three-item-co-occurrences" should be regarded as a coordinate in the three-dimensional array and therefore be added to the frequency count which the values in the array should display. The outcome should be the following array
ar <- array(data = c(0,0,0,0,0,0,0,1,2,1,0,1,0,2,0,0,2,2,0,1,0,1,0,1,0,
0,0,1,2,1,0,0,0,0,0,1,0,0,1,0,2,0,1,0,1,1,0,0,1,0,
0,1,0,2,0,1,0,0,1,0,0,0,0,0,0,2,1,0,0,0,0,0,0,0,0,
0,2,2,0,1,2,0,1,0,1,2,1,0,0,0,0,0,0,0,0,1,1,0,0,0,
0,1,0,1,0,1,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0),
dim = c(5, 5, 5),
dimnames = list(c("a", "b", "c", "d", "e"),
c("a", "b", "c", "d", "e"),
c("a", "b", "c", "d", "e")))
I was wondering about the 3D symmetry of your result. It took me a while to understand that you want to have all permutations of all combinations.
library(gtools) #for the permutations
foo <- function(x) {
#all combinations:
combs <- combn(x, 3, simplify = FALSE)
#all permutations for each of the combinations:
combs <- do.call(rbind, lapply(combs, permutations, n = 3, r = 3))
#tabulate:
do.call(table, lapply(asplit(combs, 2), factor, levels = letters[1:5]))
}
#apply grouped by V1, then sum the results
res <- Reduce("+", tapply(df$V2, df$V1, foo))
#check
all((res - ar)^2 == 0)
#[1] TRUE
I used to use the crossjoin CJ() to retain the pairwise count of all combinations of two different V2 items
res <- setDT(df)[,CJ(unique(V2), unique(V2)), V1][V1!=V2,
.N, .(V1,V2)][order(V1,V2)]
This code creates a data frame res with three columns. V1 and V2 contain the respective items of V2 from the original data frame df and N contains the count (how many times V1 and V2 appear with the same value of V1 (from the original data frame df).
Now, I found that I could perform this crossjoin with three 'dimensions' as well by just adding another unique(V2) and adapting the rest of the code accordingly.
The result is a data frame with four columns. V1, V2, and V3 indicate the original V2 items and N again shows the number of mutual appearances with the same original V1 objects.
res <- setDT(df)[,CJ(unique(V2), unique(V2), unique(V2)), V1][V1!=V2 & V1 != V3 & V2 != V3,
.N, .(V1,V2,V3)][order(V1,V2,V3)]
The advantage of this code is that all empty combinations (those which do not appear at all) are not considered. It worked with 1,000,000 unique values in V1 and over 600 unique items in V2, which would have otherwise caused an extremely large array of 600 x 600 x 600

Exclude common rows in tibbles [duplicate]

This question already has an answer here:
Using anti_join() from the dplyr on two tables from two different databases
(1 answer)
Closed 2 years ago.
I'm looking for a way to join two tibbles in a a way to leave rows only unique to the first first tibble or unique in both tibbles - simply those one that do not have any matched key.
Let's see example:
A <- tibble( A = c("a", "b", "c", "d", "e"))
B <- tibble( A = c("a", "b", "c"))
With common dplyr::join I am not able to get this:
A
1 d
2 e
Is there some way within dplyr to overcome it or in general in tidyverse to overcome it?
Use setdiff() function from dplyr library
A <- tibble( A = c("a", "b", "c", "d", "e"))
B <- tibble( A = c("a", "b", "c"))
C <- setdiff(A,B)
Just to add.
Setdiff(A,B) gives out those elements present in A but not in B.
dplyr::anti_join will keep only the rows that are unique to the tibble/data.frame of the first argument.
A <- tibble( A = c("a", "b", "c", "d", "e"))
B <- tibble( A = c("a", "b", "c"))
dplyr::anti_join(A, B, by = "A")
# A
# <chr>
# 1 d
# 2 e
A base R possibility (well except the tibble):
A[!A$A %in% B$A,]
returns
# A tibble: 2 x 1
A
<chr>
1 d
2 e

Find vector of strings in list (R)

I have a list, in which each element is a vector of strings, as:
l <- list(c("a", "b"), c("c", "d"))
I want to find the index of the element in l that contains a specific vector of strings, as c("a", "b"). How do I do that? I thought which(l %in% c("a", "b")) should work, but it returns integer(0) instead of 1.
%in% checks presence of elements of the LHS among elements of the RHS. To treat c("a", "b") as a single element of the RHS, it needs to be in a list:
which(l %in% list(c("a", "b")))
Other possibilities are to go element-by-element through l with sapply, such as
which(sapply(l,function(x) all(c("a","b") %in% x)))
# order doesn't matter, other elements allowed
which(sapply(l, identical, c("a", "b"))) # exact match, in order

Joining lists into a vector [duplicate]

This question already has answers here:
Create sequence of repeated values, in sequence?
(3 answers)
Closed 6 years ago.
I want to create the following vector using a, b, c repeating each letter thrice:
BB<-c("a","a","a","b","b","b","c","c","c")
This is my code:
Alphabet<-c("a","b","c")
AA<-list()
for(i in 1:3){
AA[[i]]<-rep(Alphabet[i],each=3)
}
BB<-do.call(rbind,AA)
But I am getting a dataframe:
dput(BB)
structure(c("a", "b", "c", "a", "b", "c", "a", "b", "c"), .Dim = c(3L,
3L))
What I am doing wrong?
As Akrun mentioned we can use the same rep function
create a vector which consists of letters a,b,c
A <- c("A","B","C")
Apply rep function for the same vector, use each as sub function
AA <- rep(A,each=3)
print(AA)
[1] "A" "A" "A" "B" "B" "B" "C" "C" "C"
You should use c function to concatenate, not the rbind. This will give you vector.
Alphabet<-c("a","b","c")
AA<-list()
for(i in 1:3){
AA[[i]]<-rep(Alphabet[i],each=3)
}
BB<-do.call(c,AA)
Akrun comment is also true, if thats what you want.
You can also concatenate the rep function like so:
BB <- c(rep("a", 3), rep("b", 3), rep("c", 3))
Here is a solution but note this form or appending is not efficient for large input arrays
Alphabet <- c("a","b","c")
bb <- c()
for (i in 1:length(Alphabet)) {
bb <- c(bb, rep(Alphabet[i], 3))
}

Resources