I am trying to get all the possible combinations of length 3 of the elements of a variable. Although it partly worked with combn() I did not quite get the output I was looking for. Here's my example
x <- c("a","b","c","d","e")
t(combn(c(x,x), 3))
The output I get looks like this
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] "a" "b" "d"
[3,] "a" "b" "e"
I am not really happy with this command for 2 reasons. I wanted to get an output that says "a+b+c" "a+b+b"...., unfortunately I wasn't able to edit the output with paste() or something.
I was also looking forward for one combination of each set of letters, that is I either get "a+b+c" or "b+a+c" but not both.
Try something like:
x <- c("a","b","c","d","e")
d1 <- combn(x,3) # All combinations
d1
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] "a" "a" "a" "a" "a" "a" "b" "b" "b" "c"
# [2,] "b" "b" "b" "c" "c" "d" "c" "c" "d" "d"
# [3,] "c" "d" "e" "d" "e" "e" "d" "e" "e" "e"
nrow(unique(t(d1))) == nrow(t(d1))
# [1] TRUE
d2 <- expand.grid(x,x,x) # All permutations
d2
# Var1 Var2 Var3
# 1 a a a
# 2 b a a
# 3 c a a
# 4 d a a
# 5 e a a
# 6 a b a
# 7 b b a
# 8 c b a
# 9 d b a
# ...
nrow(unique(d2)) == nrow(d2)
# [1] TRUE
try this
x <- c("a","b","c","d","e")
expand.grid(rep(list(x), 3))
Related
I don't even really know how to describe what I want to do, so hopefully the title makes at least some sense.
Better if I show you:
I have a simple 3x5 matrix of letters a to e:
matrix(data = rep(letters[1:5], 3), nrow = 3, ncol = 5, byrow = TRUE)
It gives this:
[,1] [,2] [,3] [,4] [,5]
[1,] "a" "b" "c" "d" "e"
[2,] "a" "b" "c" "d" "e"
[3,] "a" "b" "c" "d" "e"
I would like to change it to this without typing it manually:
[,1] [,2] [,3] [,4] [,5]
[1,] "a" "b" "c" "d" "e"
[2,] "e" "a" "b" "c" "d"
[3,] "d" "e" "a" "b" "c"
I'm thinking some kind of loop system or similar, but I have no idea where to start.
For the simple case you might try this for loop.
n <- dim(m3)[2]
for (i in seq_len(nrow(m))[-1]) {
m3[i, ] <- c(m3[i, (n - i + 2):n], m3[i, 1:(n - i + 1)])
}
m3
# [,1] [,2] [,3] [,4] [,5]
# [1,] "a" "b" "c" "d" "e"
# [2,] "e" "a" "b" "c" "d"
# [3,] "d" "e" "a" "b" "c"
To let the pattern repeat for a longer matrix, we might generalize:
n <- dim(m7)[2]
for (i in seq_len(nrow(m7))[-1]) {
j <- i %% 5
if (j == 0) j <- 5
if (j > 1) m7[i, ] <- c(m7[i, (n - j + 2):n], m7[i, 1:(n - j + 1)])
}
m7
# [,1] [,2] [,3] [,4] [,5]
# [1,] "a" "b" "c" "d" "e"
# [2,] "e" "a" "b" "c" "d"
# [3,] "d" "e" "a" "b" "c"
# [4,] "c" "d" "e" "a" "b"
# [5,] "b" "c" "d" "e" "a"
# [6,] "a" "b" "c" "d" "e"
# [7,] "e" "a" "b" "c" "d"
Data:
m3 <- matrix(data=letters[1:5], nrow=3, ncol=5, byrow=TRUE)
m7 <- matrix(data=letters[1:5], nrow=7, ncol=5, byrow=TRUE)
You can create a variable called ord ord <- seq_len(ncol(m))
Within the map function use the ord and the max(ord) to create some integers that will be used to subset the array.
Then rbinding the result with do.call(rbind)
Where m is the matrix
library(purrr)
do.call(rbind, map2(ord, nrow(m), \(x,y)
m[y, c(x:max(ord),
ord[- (x:max(ord))])]
)[c(1,rev(ord))]
)
[,1] [,2] [,3] [,4] [,5]
[1,] "a" "b" "c" "d" "e"
[2,] "e" "a" "b" "c" "d"
[3,] "d" "e" "a" "b" "c"
[4,] "c" "d" "e" "a" "b"
[5,] "b" "c" "d" "e" "a"
[6,] "a" "b" "c" "d" "e"
Assume we have the following permutations of the letters, "a", "b", and "c":
library(combinat)
do.call(rbind, permn(letters[1:3]))
# [,1] [,2] [,3]
# [1,] "a" "b" "c"
# [2,] "a" "c" "b"
# [3,] "c" "a" "b"
# [4,] "c" "b" "a"
# [5,] "b" "c" "a"
# [6,] "b" "a" "c"
Is it possible to perform some function on a given permutation "on-the-fly" (i.e., a particular row) without storing the result?
That is, if the row == "a" "c" "b" or row == "b" "c" "a", do not store the result. The desired result in this case would be:
# [,1] [,2] [,3]
# [1,] "a" "b" "c"
# [2,] "c" "a" "b"
# [3,] "c" "b" "a"
# [4,] "b" "a" "c"
I know I can apply a function to all the permutations on the fly within combinat::permn with the fun argument such as:
permn(letters[1:3], fun = function(x) {
res <- paste0(x, collapse = "")
if (res == "acb" | res == "bca") {
return(NA)
} else {
return(res)
}
})
But this stills stores an NA and the returned list has 6 elements instead of the desired 4 elements:
# [[1]]
# [1] "abc"
#
# [[2]]
# [1] NA
#
# [[3]]
# [1] "cab"
#
# [[4]]
# [1] "cba"
#
# [[5]]
# [1] NA
#
# [[6]]
# [1] "bac"
Note, I am not interested in subsequently removing the NA values; I am specifically interested in not appending to the result list "on-the-fly" for a given permutation.
We could use a magrittr pipeline where we rbind the input matrix to the Rows to be checked and omit the duplicate rows.
library(combinat)
library(magrittr)
Rows <- rbind(c("a", "c", "b"), c("b", "c", "a"))
do.call(rbind, permn(letters[1:3])) %>%
subset(tail(!duplicated(rbind(Rows, .)), -nrow(Rows)))
giving:
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] "c" "a" "b"
[3,] "c" "b" "a"
[4,] "b" "a" "c"
You can return NULL for the particular condition that you want to ignore and rbind the result which will ignore the NULL elements and bind only the combinations that you need.
do.call(rbind, combinat::permn(letters[1:3], function(x)
if(!all(x == c("a", "c", "b") | x == c("b", "c", "a")))
return(x)
))
# [,1] [,2] [,3]
#[1,] "a" "b" "c"
#[2,] "c" "a" "b"
#[3,] "c" "b" "a"
#[4,] "b" "a" "c"
Similarly,
do.call(rbind, permn(letters[1:3],function(x) {
res <- paste0(x, collapse = "")
if (!res %in% c("acb","bca"))
return(res)
}))
# [,1]
#[1,] "abc"
#[2,] "cab"
#[3,] "cba"
#[4,] "bac"
suppose I have two vector like this :
l1 = c('C','D','E','F')
l2 = c('G','C','D','F')
I generate all combinations of two elements using combn function:
l1_vector = t(combn(l1,2))
l2_vector = t(combn(l2,2))
> l1_vector
[,1] [,2]
[1,] "C" "D"
[2,] "C" "E"
[3,] "C" "F"
[4,] "D" "E"
[5,] "D" "F"
[6,] "E" "F"
> l2_vector
[,1] [,2]
[1,] "G" "C"
[2,] "G" "D"
[3,] "G" "F"
[4,] "C" "D"
[5,] "C" "F"
[6,] "D" "F"
Now I want to calculate the repeat elements of l1_vector and l2_vector , as the example i give, the repeat of elements should be 3 (["C","D"],["C","F"],["D","F"])
How can I do that without using loop ?
As mentioned in the comments, you can use the merge function for this. Since the default behavior of merge is to use all of the available columns, it will return only those rows that are perfect matches.
> merge(l1_vector, l2_vector)
V1 V2
1 C D
2 C F
3 D F
>
> nrow(merge(l1_vector, l2_vector))
[1] 3
While merge is perfectly fine for your case, there is some work around.
If you just need the number of repeated elements:
choose(length(intersect(l1, l2)), 2)
[1] 3
If you need the repeated elements:
t(combn(intersect(l1, l2), 2))
[,1] [,2]
[1,] "C" "D"
[2,] "C" "F"
[3,] "D" "F"
I need help in how to manage lists in an iterative way.
I have the following list list which is composed of several dataframes with same columns, but different number of rows. Example:
[[1]]
id InpatientDays ERVisits OfficeVisits Narcotics
1 a 0 0 18 1
2 b 1 1 6 1
3 c 0 0 5 3
4 d 0 1 19 0
5 e 8 2 19 3
6 f 2 0 9 2
[[2]]
id InpatientDays ERVisits OfficeVisits Narcotics
7 a 16 1 8 1
8 b 2 0 8 0
9 c 2 1 4 3
10 d 4 2 0 2
11 e 6 5 20 2
12 a 0 0 7 4
I would like to apply a function to get all the possible combinations for the id for each "data frame" in the list.
I intended to try something like this lapply(list1, function(x) combn(unique(list1[x]$id))) Which of course does not work.. expecting to get something like:
[[1]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] "a" "a" "a" "a" "a" "b" "b" "b" "b" "c" "c" "c" "d" "d" "e"
[2,] "b" "c" "d" "e" "f" "c" "d" "e" "f" "d" "e" "f" "e" "f" "f"
[[2]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "a" "a" "a" "a" "b" "b" "b" "c" "c" "d"
[2,] "b" "c" "d" "e" "c" "d" "e" "d" "e" "e"
Is this possible? I know for sure this works for a single dataframe df
combn(unique(df$id),2)
We need to use unique(x$id)
lapply(list1, function(x) combn(unique(x$id),2))
The OP's code is looping the 'list1' using lapply. The anonymous function call (function(x)) returns each of the 'data.frame' within the list i.e. 'x' is the 'data.frame'. So, we just need to call x$id (or x[['id']]) to extract the 'id' column. In essence, 'x' is not an index. But, if we need to subset based on the index, we have to loop through the sequence of 'list1' (or if the list elements are named, then loop through the names of it)
lapply(seq_along(list1), function(i) combn(unique(list1[[i]]$id), 2))
I have following input data:
# [,1] [,2]
#[1,] "A" "B"
#[2,] "A" "C"
#[3,] "A" "D"
#[4,] "B" "C"
#[5,] "B" "D"
#[6,] "C" "D"
Next I want to exclude rows where first or second element has been previously for N times. For example if N = 2 then need to exclude following rows:
#[3,] "A" "D" - element "A" has been 2 times
#[5,] "B" "D" - element "B" has been 2 times
#[6,] "C" "D" - element "C" has been 2 times
Note: Need to take into account excluding results immediately. For example if element has met 5 times and after removing it met only 1 times then need to leave next row with this element. Because now it meets 2 times.
Example (N=2):
Input data:
[,1] [,2]
[1,] "A" "B"
[2,] "A" "C"
[3,] "A" "D"
[4,] "A" "E"
[5,] "B" "C"
[6,] "B" "D"
[7,] "B" "E"
[8,] "C" "D"
[9,] "C" "E"
[10,] "D" "E"
Output data:
[,1] [,2]
[1,] "A" "B"
[2,] "A" "C"
[5,] "B" "C"
[10,] "D" "E"
There are possibly more elegant solutions... but this seems to work:
v <- c("A", "B", "C", "D", "E")
cmb <- t(combn(v, 2))
n <- 2
# Go through each letter
for (l in v)
{
# Find the combinations using that letter
rows <- apply(cmb, 1, function(x){l %in% x})
rows.2 <- which(rows==T)
if (length(rows.2)>n)
rows.2 <- rows.2[1:n]
# Take the first n rows containing the letter,
# then append all the ones not containing it
cmb <- rbind(cmb[rows.2,], cmb[rows==F,])
}
cmb
which outputs:
[,1] [,2]
[1,] "D" "E"
[2,] "B" "C"
[3,] "A" "C"
[4,] "A" "B"