How to generate all permutations of lists of string? - r

I have character data like this
[[1]]
[1] "F" "S"
[[2]]
[1] "Y" "Q" "Q"
[[3]]
[1] "C" "T"
[[4]]
[1] "G" "M"
[[5]]
[1] "A" "M"
And I want to generate all permutations for each individual list (not mixed between lists) and combine them together into one big list.
For example, for the first and second lists, which are "F" "S" and "Y" "Q" "Q", I want to get the permutation lists as c("FS", "SF"), and c("YQQ", "QYQ", "QQY"), and then combine them into one.

Here's an approach with combinat::permn:
library(combinat)
lapply(data,function(x)unique(sapply(combinat::permn(x),paste,collapse = "")))
#[[1]]
#[1] "FS" "SF"
#
#[[2]]
#[1] "YQQ" "QYQ" "QQY"
#
#[[3]]
#[1] "CT" "TC"
#
#[[4]]
#[1] "GM" "MG"
#
#[[5]]
#[1] "AM" "MA"
Or together with unlist:
unlist(lapply(data,function(x)unique(sapply(combinat::permn(x),paste,collapse = ""))))
# [1] "FS" "SF" "YQQ" "QYQ" "QQY" "CT" "TC" "GM" "MG" "AM" "MA"
Data:
data <- list(c("F", "S"), c("Y", "Q", "Q"), c("C", "T"), c("G", "M"),
c("A", "M"))

It looks like your desired output is not exactly the same as this related post (Generating all distinct permutations of a list in R). But we can build on the answer there.
library(combinat)
# example data, based on your description
X <- list(c("F","S"), c("Y", "Q", "Q"))
result <- lapply(X, function(x1) {
unique(sapply(permn(x1), function(x2) paste(x2, collapse = "")))
})
print(result)
Output
[[1]]
[1] "FS" "SF"
[[2]]
[1] "YQQ" "QYQ" "QQY"
The first (outer) lapply iterates over each element of the list, which contains the individual letters (in a vector). With each iteration the permn takes the individual letters (eg "F" and "S"), and returns a list object with all possible permutations (e.g "F" "S" and "S" F"). To format the output as you described, the inner sapply takes each those permutations and collapses them into a single character value, filtered for unique values.

library(combinat)
final <- unlist(lapply(X , function(test_X) lapply(permn(test_X), function(x) paste(x,collapse='')) ))

Related

Combine list components to a vector

Suppose, I have a list:
l = list(c("a", "b", "c"), c("d", "e", "f"))
[[1]]
[1] "a" "b" "c"
[[2]]
[1] "d" "e" "f"
I want to get a vector.
"ad" "be" "cf"
I can convert the list to a matrix, e.g.,sapply(l, c), and then concatenate columns, but, perhaps, there is an easier way.
We can use Reduce with paste0
Reduce(paste0, l)
[1] "ad" "be" "cf"
Or with do.call
do.call(paste0, l)
[1] "ad" "be" "cf"
Here is another option
> apply(list2DF(l), 1, paste0, collapse = "")
[1] "ad" "be" "cf"

Remove duplicates in a nested list

I have a large list of lists where I want to remove duplicated elements in each list. Example:
x <- list(c("A", "A", "B", "C"), c("O", "C", "A", "Z", "O"))
x
[[1]]
[1] "A" "A" "B" "C"
[[2]]
[1] "O" "C" "A" "Z" "O"
I want the result to be a list that looks like this, where duplicates within a list are removed, but the structure of the list remains.
[[1]]
[1] "A" "B" "C"
[[2]]
[1] "O" "C" "A" "Z"
My main strategy has been to use rapply (also tried lapply) to identify duplicates and remove them. I tried:
x[rapply(x, duplicated) == T]
but received the following error:
"Error: (list) object cannot be coerced to type 'logical'"
Does anyone know a way to solve this issue?
Thanks!
We can use lapply with unique
lapply(x, unique)
#[[1]]
#[1] "A" "B" "C"
#[[2]]
#[1] "O" "C" "A" "Z"
The issue with rapply, is that it recursively applies the duplicated and then returns a single vector instead of a list of logical vectors
rapply(x, duplicated)
#[1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
Instead it can be
lapply(x, function(u) u[!duplicated(u)])
#[[1]]
#[1] "A" "B" "C"
#[[2]]
#[1] "O" "C" "A" "Z"

How do I apply an index vector over a list of vectors?

I want to apply a long index vector (50+ non-sequential integers) to a long list of vectors (50+ character vectors containing 100+ names) in order to retrieve specific values (as a list, vector, or data frame).
A simplified example is below:
> my.list <- list(c("a","b","c"),c("d","e","f"))
> my.index <- 2:3
Desired Output
[[1]]
[1] "b"
[[2]]
[1] "f"
##or
[1] "b"
[1] "f"
##or
[1] "b" "f"
I know I can get the same value from each element using:
> lapply(my.list, function(x) x[2])
##or
> lapply(my.list,'[', 2)
I can pull the second and third values from each element by:
> lapply(my.list,'[', my.index)
[[1]]
[1] "b" "c"
[[2]]
[1] "e" "f"
##or
> for(j in my.index) for(i in seq_along(my.list)) print(my.list[[i]][[j]])
[1] "b"
[1] "e"
[1] "c"
[1] "f"
I don't know how to pull just the one value from each element.
I've been looking for a few days and haven't found any examples of this being done, but it seems fairly straight forward. Am I missing something obvious here?
Thank you,
Scott
Whenever you have a problem that is like lapply but involves multiple parallel lists/vectors, consider Map or mapply (Map simply being a wrapper around mapply with SIMPLIFY=FALSE hardcoded).
Try this:
Map("[",my.list,my.index)
#[[1]]
#[1] "b"
#
#[[2]]
#[1] "f"
..or:
mapply("[",my.list,my.index)
#[1] "b" "f"

How to split a certain element in a vector by letters?

For example, I have an element "computer" in a vector. I need to get a vector consisting of "c", "o", "m", "p", "u", "t", "e", "r".
And the second part of my question is optional. How can I create a vector containing letter combinations of the elements of the above mentioned vector and letters in the resulting combinations will be only in such order as in the original word? For instance, I want to get something like "puter" or "mpu" in this vector instead of "tumpo".
You can use
strsplit("computer", "\\b")
and
library("RWeka")
gsub(" ", "",
NGramTokenizer(paste(strsplit("computer", "\\b")[[1]], collapse=" "),
Weka_control(min=2,
max=5)),
fixed=TRUE)
# [1] "compu" "omput" "mpute" "puter" "comp"
# [6] "ompu" "mput" "pute" "uter" "com"
# [11] "omp" "mpu" "put" "ute" "ter"
# [16] "co" "om" "mp" "pu" "ut"
# [21] "te" "er"
to create n-grams with 2 <= n <=5.
For the first part of the question is really easy to get:
splits <- unlist(strsplit("computer",split=""))
> splits
[1] "c" "o" "m" "p" "u" "t" "e" "r"
For the second part you can use the following code:
subseqs <-
unlist(
lapply(1:length(splits),FUN=function(x){
lapply(1:(length(splits)+1-x),FUN=function(y){
paste(splits[y:(y+x-1)],collapse="") })
})
)
> subseqs
[1] "c" "o" "m" "p" "u" "t" "e"
[8] "r" "co" "om" "mp" "pu" "ut" "te"
[15] "er" "com" "omp" "mpu" "put" "ute" "ter"
[22] "comp" "ompu" "mput" "pute" "uter" "compu" "omput"
[29] "mpute" "puter" "comput" "ompute" "mputer" "compute" "omputer"
[36] "computer"
For three consecutive letter combinations:
x <- strsplit("computer", "\\b")
y <- combn(seq(x),3); m <- match(1:6,y[1,])
combn (x,3)[,m]

R - generate all combinations from 2 vectors given constraints

I would like to generate all combinations of two vectors, given two constraints: there can never be more than 3 characters from the first vector, and there must always be at least one characters from the second vector. I would also like to vary the final number of characters in the combination.
For instance, here are two vectors:
vec1=c("A","B","C","D")
vec2=c("W","X","Y","Z")
Say I wanted 3 characters in the combination. Possible acceptable permutations would be: "A" "B" "X"or "A" "Y" "Z". An unacceptable permutation would be: "A" "B" "C" since there is not at least one character from vec2.
Now say I wanted 5 characters in the combination. Possible acceptable permutations would be: "A" "C" "Z" "Y" or "A" "Y" "Z" "X". An unacceptable permutation would be: "A" "C" "D" "B" "X" since there are >3 characters from vec2.
I suppose I could use expand.grid to generate all combinations and then somehow subset, but there must be an easier way. Thanks in advance!
I'm not sure wheter this is easier, but you can leave away permutations that do not satisfy your conditions whith this strategy:
generate all combinations from vec1 that are acceptable.
generate all combinations from vec2 that are acceptable.
generate all combinations taking one solution from 1. + one solution from 2. Here I'd do the filtering with condition 3 afterwards.
(if you're looking for combinations, you're done, otherwise:) produce all permutations of letters within each result.
Now, let's have
vec1 <- LETTERS [1:4]
vec2 <- LETTERS [23:26]
## lists can eat up lots of memory, so use character vectors instead.
combine <- function (x, y)
combn (y, x, paste, collapse = "")
res1 <- unlist (lapply (0:3, combine, vec1))
res2 <- unlist (lapply (1:length (vec2), combine, vec2))
now we have:
> res1
[1] "" "A" "B" "C" "D" "AB" "AC" "AD" "BC" "BD" "CD" "ABC"
[13] "ABD" "ACD" "BCD"
> res2
[1] "W" "X" "Y" "Z" "WX" "WY" "WZ" "XY" "XZ" "YZ"
[11] "WXY" "WXZ" "WYZ" "XYZ" "WXYZ"
res3 <- outer (res1, res2, paste0)
res3 <- res3 [nchar (res3) == 5]
So here you are:
> res3
[1] "ABCWX" "ABDWX" "ACDWX" "BCDWX" "ABCWY" "ABDWY" "ACDWY" "BCDWY" "ABCWZ"
[10] "ABDWZ" "ACDWZ" "BCDWZ" "ABCXY" "ABDXY" "ACDXY" "BCDXY" "ABCXZ" "ABDXZ"
[19] "ACDXZ" "BCDXZ" "ABCYZ" "ABDYZ" "ACDYZ" "BCDYZ" "ABWXY" "ACWXY" "ADWXY"
[28] "BCWXY" "BDWXY" "CDWXY" "ABWXZ" "ACWXZ" "ADWXZ" "BCWXZ" "BDWXZ" "CDWXZ"
[37] "ABWYZ" "ACWYZ" "ADWYZ" "BCWYZ" "BDWYZ" "CDWYZ" "ABXYZ" "ACXYZ" "ADXYZ"
[46] "BCXYZ" "BDXYZ" "CDXYZ" "AWXYZ" "BWXYZ" "CWXYZ" "DWXYZ"
If you prefer the results split into single letters:
res <- matrix (unlist (strsplit (res3, "")), nrow = length (res3), byrow = TRUE)
> res
[,1] [,2] [,3] [,4] [,5]
[1,] "A" "B" "C" "W" "X"
[2,] "A" "B" "D" "W" "X"
[3,] "A" "C" "D" "W" "X"
[4,] "B" "C" "D" "W" "X"
(snip)
[51,] "C" "W" "X" "Y" "Z"
[52,] "D" "W" "X" "Y" "Z"
Which are your combinations.

Resources