Find vector of strings in list (R) - r

I have a list, in which each element is a vector of strings, as:
l <- list(c("a", "b"), c("c", "d"))
I want to find the index of the element in l that contains a specific vector of strings, as c("a", "b"). How do I do that? I thought which(l %in% c("a", "b")) should work, but it returns integer(0) instead of 1.

%in% checks presence of elements of the LHS among elements of the RHS. To treat c("a", "b") as a single element of the RHS, it needs to be in a list:
which(l %in% list(c("a", "b")))
Other possibilities are to go element-by-element through l with sapply, such as
which(sapply(l,function(x) all(c("a","b") %in% x)))
# order doesn't matter, other elements allowed
which(sapply(l, identical, c("a", "b"))) # exact match, in order

Related

Return index numbers in R where object lengths are not multiples [duplicate]

This question already has answers here:
Is there an R function for finding the index of an element in a vector?
(4 answers)
Closed 2 years ago.
I have a vector like so:
foo = c("A", "B", "C", "D")
And I want a vector of selected index numbers, which I imagined I could do like so:
which(foo == c("A", "B", "D"))
But apparently this only works if the lengths of the two vectors are multiples, as otherwise you get an incomplete result followed by a warning message:
"longer object length is not a multiple of shorter object length".
So how do I get what I'm after, which is "1 2 4"?
Use match:
match(c('A', 'B', 'C'), foo)
Using %in% is one option here:
foo <- c("A", "B", "C", "D")
x <- c("A", "B", "D")
c(1:4)[foo %in% x] # [1] 1 2 4
The quantity foo %in% x returns a logical vector which can then be used to subset the indices you want to see.

remove a list of specific values from another list

I am trying to remove a list of specific values from another list but I cannot find any resources to help me do so.
list1 <- list("a","b", "c", "d", "e", "f", "g","h", "i", "j", "k")
list2 <- list("a","b","c","d")
list3 <- list1[-list2]
I would hope to get an output of the first list without a,b,c, or d. Instead I get
Error in -list2 : invalid argument to unary operator
We can use setdiff as the list elements have length 1
setdiff(list1, list2)
Or use %in% and negate (!)
list1[!list1 %in% list2]

Intersection of vectors within a same list [duplicate]

This question already has answers here:
How to find common elements from multiple vectors?
(3 answers)
Closed 5 years ago.
I have a list of vectors with character strings:
lst <- list(v1 = c("A", "B", "C", "D"), v2 = c("B", "C", "D", "E"), v3 = c("C", "D", "E", "F")
How can I get a new vector that includes only those character strings that intersect each vector. E.g. in this case
out <- c("C", "D")
A solution with lapply would be great.
The easiest function to use would be Reduce. Try
Reduce(intersect, lst)
That's basically the same as
# data
x <- list(A, B, C)
# these are equivalent
Reduce(intersect, x)
intersect(intersect(A, B), C)

How to merge only specific elements of a vector in R

This is probably pretty straightforward but I'm really stuck: Let's say that I have a vector=c("a", "b", "c","d","e"). How can I concatenate only some specific elements? For example, how do I merge "b" and "c", which will lead me to the vector=c("a","bc","d","e") ?
Thank you
We can do
i1 <- vector %in% c("b", "c")
c(vector[!i1], paste(vector[i1], collapse=""))

R how to find the intersection of a subest of vectors in a list

I have a list of vectors (characters). For example:
my_list <- list(c("a", "b", "c"),
c("a", "b", "c", "d"),
c("e", "d"))
For the intersection of all these three vectors, I could use: Reduce(intersect, my_list). But as you can see, there is no common element in all three vectors.
Then, what if I want to find the common element that appears "at least" a certain amount of times in the list? Such as: somefunction(my_list, time=2) would give me c("a", "b", "c", "d") because those elements appear two times.
Thanks.
We can convert this to a data.table and do the group by action to get the elements
library(data.table)
setDT(stack(setNames(my_list, seq_along(my_list))))[,
if(uniqueN(ind)==2) values , values]$values
#[1] "a" "b" "c" "d"
A base R option would be to unlist the 'my_list', find the frequency count with the replicated sequence of 'my_list' using table, get the column sums, check whether it is equal to 2 and use that index to subset the names.
tblCount <- colSums(table(rep(seq_along(my_list), lengths(my_list)), unlist(my_list)))
names(tblCount)[tblCount==2]
#[1] "a" "b" "c" "d"
If you assume that each element will appear no more than once in a vector, you can "unlist" your vectors and count the frequency.
Here, using dplyr functions
library(dplyr)
my_list %>% unlist %>% data_frame(v=.) %>% count(v) %>% filter(n>=2) %>% .[["v"]]
Or base functions
subset(as.data.frame(table(unlist(my_list))), Freq>=2)$Var1
This works:
my_list %>%
purrr::map(~ .) %>%
purrr::reduce(.f = dplyr::intersect, .x = .)

Resources