Creating a full diallel using R - r

I am relativly new to R, excuse if this question is too basic.
I am wondering whether there is a good and fast way to create a full diallel using R?
I have a matrix that looks likes:
M1 M2 M3
Line1 A B A
Line2 A A B
Line3 B A A
From this matrix I would like to create the following data frame:
X Y M1 M2 M3
Line1 Line1 AA BB AA
Line1 Line2 AA BA AB
Line1 Line3 AB BA AA
Line2 Line1 AA AB BA
Line2 Line2 AA AA BB
Line2 Line3 AB AA BA
Line3 Line1 BA AB AA
Line3 Line2 BA AA AB
Line3 Line3 BB AA AA
I think this might be possible by creating a couple of nested loops and using paste to combine the A and B lettercodes. But probably there are better and more "R-like" options (using cbind()?).

One approach is to think of the indices of the rows of your data that make up each line of the desired output. Using your data:
mat <- matrix(c("A","B","A",
"A","A","B",
"B","A","A"), ncol = 3, byrow = TRUE)
I create those indices using expand.grid(). The first row of your output is formed by the concatenation of row 1 of mat with row 1 of mat, and so on. These indices are produced as follows
> ind <- expand.grid(r1 = 1:3, r2 = 1:3)
> ind
r1 r2
1 1 1
2 2 1
3 3 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
Note that to get what your output shows we need to take columns r2 then r1 rather than the other way round.
Now I just index mat with the second column of ind and the first column of ind and supply that to paste0() the output from which is a vector so we need to reshape it to a matrix.
> matrix(paste0(mat[ind[,2], ], mat[ind[,1], ]), ncol = 3)
[,1] [,2] [,3]
[1,] "AA" "BB" "AA"
[2,] "AA" "BA" "AB"
[3,] "AB" "BA" "AA"
[4,] "AA" "AB" "BA"
[5,] "AA" "AA" "BB"
[6,] "AB" "AA" "BA"
[7,] "BA" "AB" "AA"
[8,] "BA" "AA" "AB"
[9,] "BB" "AA" "AA"
The paste0() step returns a vector of the pasted strings:
> paste0(mat[ind[,2], ], mat[ind[,1], ])
[1] "AA" "AA" "AB" "AA" "AA" "AB" "BA" "BA" "BB" "BB" "BA" "BA" "AB" "AA" "AA"
[16] "AB" "AA" "AA" "AA" "AB" "AA" "BA" "BB" "BA" "AA" "AB" "AA"
The trick as to why the matrix restructuring shown above works is to note that the entries in the output from paste0() are in column-major order because of how the index ind was formed. Essentially the two arguments passed to paste0() are:
> mat[ind[,2], ]
[,1] [,2] [,3]
[1,] "A" "B" "A"
[2,] "A" "B" "A"
[3,] "A" "B" "A"
[4,] "A" "A" "B"
[5,] "A" "A" "B"
[6,] "A" "A" "B"
[7,] "B" "A" "A"
[8,] "B" "A" "A"
[9,] "B" "A" "A"
> mat[ind[,1], ]
[,1] [,2] [,3]
[1,] "A" "B" "A"
[2,] "A" "A" "B"
[3,] "B" "A" "A"
[4,] "A" "B" "A"
[5,] "A" "A" "B"
[6,] "B" "A" "A"
[7,] "A" "B" "A"
[8,] "A" "A" "B"
[9,] "B" "A" "A"
R treats each as a vector and hence the output is a vector, but because R stores matrices by columns, we fill our output matrix with the pasted strings by columns also.

You might not need a couple of loops to get your output, here is a suggestion:
To start with, let's generate your sample matrix:
M <- matrix(c("A","B","A","A","A","B","B","A","A"), ncol = 3, byrow = TRUE)
rownames(M) <- c("Line1","Line2","Line3")
colnames(M) <- c("M1","M2","M3")
An easy to generate all possible pairs between items in a vector is to use expand.grid():
d <- expand.grid(rownames(M), rownames(M))
Generates the columns X and Y in your desired output:
Var1 Var2
1 Line1 Line1
2 Line2 Line1
3 Line3 Line1
4 Line1 Line2
5 Line2 Line2
6 Line3 Line2
7 Line1 Line3
8 Line2 Line3
9 Line3 Line3
Then, what you could do is to apply() a function to each row that pastes together the corresponding M1,M2,M3 values:
apply(d, 1, function(x) { paste(M[x[1],], paste(M[x[2],]), sep="")} )
It will generate the right combinations, but not with the right format (yet):
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] "AA" "AA" "BA" "AA" "AA" "BA" "AB" "AB" "BB"
[2,] "BB" "AB" "AB" "BA" "AA" "AA" "BA" "AA" "AA"
[3,] "AA" "BA" "AA" "AB" "BB" "AB" "AA" "BA" "AA"
To flip the matrix in the right direction, you simply have to transpose it.
From there, you can wrap everything into a data frame, in one go:
df <- data.frame( d, t(apply(d, 1, function(x) { paste(M[x[1],], paste(M[x[2],]), sep="")} ))
colnames(df) <- c("X","Y","M1","M2", "M3")
and here it is.
To be more efficient, you can finally write a little function to which you submit any M matrix.
get.it <- function(M){
d <- expand.grid(rownames(M), rownames(M))
e <- t(apply(d, 1, function(x) { paste(M[x[1],], paste(M[x[2],]), sep="")} ))
output<- data.frame( d, e)
colnames(output) <- c("X","Y","M1","M2","M3")
return(output)
}
and get.it(M) should work!

Related

Split string with n repetitive elements into n sub-strings

I have a string that is a concatenation of m possible types of elements - for the sake of simplicity m = 4 with A, B, C and D.
Whenever there are single elements more than once, I would have to split the string so that there are no repetitions left. However, I would like to generate all possible strings without repetitions.
To make this a little bit clearer, here is an example:
For A B A C D
String: A B C D
String: B A C D
This gets more complicated when there are several different elements that show up more than once:
For A B A C B D
String: A B C D
String: A C B D
String: B A C D
String: A C B D
Is there a smart way to compute this in R?
vec <- c("A","B","A","C","B","D")
combs <- lapply(setNames(nm = unique(vec)), function(a) which(vec == a))
eg <- do.call(expand.grid, combs)
out <- t(apply(eg, 1, function(r) names(eg)[order(r)]))
# [,1] [,2] [,3] [,4]
# [1,] "A" "B" "C" "D"
# [2,] "B" "A" "C" "D"
# [3,] "A" "C" "B" "D"
# [4,] "A" "C" "B" "D"
out
First vector:
vec <- c("A","B","A","C","D")
# ...
# [,1] [,2] [,3] [,4]
# [1,] "A" "B" "C" "D"
# [2,] "B" "A" "C" "D"
If you are starting and ending with strings vice vectors, then know that you can wrap the above with:
strsplit("ABACBD", "")[[1]]
# [1] "A" "B" "A" "C" "B" "D"
apply(out, 1, paste, collapse = "")
# [1] "ABCD" "BACD" "ACBD" "ACBD"

Make r ignore the order at which values appear in a column (created from pasting multiple columns)

Given a variable x that can take values A,B,C,D
And three columns for variable x:
df1<-
rbind(c("A","B","C"),c("A","D","C"),c("B","A","C"),c("A","C","B"), c("B","C","A"), c("D","A","B"), c("A","B","D"), c("A","D","C"), c("A",NA,NA),c("D","A",NA),c("A","D",NA))
How do I make column indicating the combination of in the three preceding column such that permutations (ABC, ACB, BAC) would be considered as the same combination of ABC, (AD, DA) would be considered as the same combination of AD?
Pasting the three columns with apply(df1,1,function(x) paste(x[!is.na(x)], collapse=", ")->df1$x4 and using df1%>%group(x4)%>%summarize(c=count(x4)) would count AD,DA as different instead of the same.
Edited title
My desired result would be to get
a<-cbind(c("ABC",4),c("ACD",2),c("ABD",2),c("A",1),c("AD",2))
Someone already solved my question. Thanks
You can apply function paste after sorting each row vector.
df1 <-
cbind(df1, apply(df1, 1, function(x) paste(sort(x), collapse = "")))
df1
# [,1] [,2] [,3] [,4]
# [1,] "A" "B" "C" "ABC"
# [2,] "A" "D" "C" "ACD"
# [3,] "B" "A" "C" "ABC"
# [4,] "A" "C" "B" "ABC"
# [5,] "B" "C" "A" "ABC"
# [6,] "D" "A" "B" "ABD"
# [7,] "A" "B" "D" "ABD"
# [8,] "A" "D" "C" "ACD"
# [9,] "A" NA NA "A"
#[10,] "D" "A" NA "AD"
#[11,] "A" "D" NA "AD"
You can now simply table the column, with no need for an external package to be loaded and more complex pipes.
table(df1[, 4])
#A ABC ABD ACD AD
#1 4 2 2 2

Generate all combinations of length 2 using 3 letters

The simplest way to generate such combinations would be to use combn function as follows:
print(combn(letters[1:3],2))
The output that it generates is:
[,1] [,2] [,3]
[1,] "a" "a" "b"
[2,] "b" "c" "c"
This doesn't generate combinations like aa, bb (same character repeated).
This even doesn't generate ba if ab is generated.
I want to generate all such combinations of length 2 for a vector [a,b,c].
Is there a simple way to do so in R?
The paste0 function is vectorized and so succeeds with outer:
outer(c('a','b','c'), c('a','b','c'), paste0)
[,1] [,2] [,3]
[1,] "aa" "ab" "ac"
[2,] "ba" "bb" "bc"
[3,] "ca" "cb" "cc"

Creating an edgelist from Patent data in R

I am trying to create an edgelist out of patent data of the form:
PatentID InventorIDs CoinventorIDs
1 A ; B C,D,E ; F,G,H,C
2 J ; K ; L M,O ; N ; P, Q
What I would like is the edgelist below showing the connections between inventors and patents. (the semicolons separate the coinventors associated with each primary inventor):
1 A B
1 A C
1 A D
1 A E
1 B F
1 B G
1 B H
1 B C
2 J K
2 J L
2 J M
2 J O
2 K N
2 L P
2 L Q
Is there an easy way to do this with igraph in R?
I'm confused by the edges going between the inventorIds. But, here is a kind of brute force function that you could just apply by row. There may be a way with igraph, it being a massive library, that is better, but once you have the data in an this form it should be simple to convert to an igraph data structure.
Note that this leaves out the edges between primary inventors.
## A function to make the edges for each row
rowFunc <- function(row) {
tmp <- lapply(row[2:3], strsplit, '\\s*;\\s*')
tmp2 <- lapply(tmp[[2]], strsplit, ',')
do.call(rbind, mapply(cbind, row[[1]], unlist(tmp[[1]]), unlist(tmp2, recursive=FALSE)))
}
## Apply the function by row
do.call(rbind, apply(dat, 1, rowFunc))
# [,1] [,2] [,3]
# [1,] "1" "A" "C"
# [2,] "1" "A" "D"
# [3,] "1" "A" "E"
# [4,] "1" "B" "F"
# [5,] "1" "B" "G"
# [6,] "1" "B" "H"
# [7,] "1" "B" "C"
# [8,] "2" "J" "M"
# [9,] "2" "J" "O"
# [10,] "2" "K" "N"
# [11,] "2" "L" "P"
# [12,] "2" "L" " Q"

Non-redundant version of expand.grid

The R function expand.grid returns all possible combination between the elements of supplied parameters. e.g.
> expand.grid(c("aa", "ab", "cc"), c("aa", "ab", "cc"))
Var1 Var2
1 aa aa
2 ab aa
3 cc aa
4 aa ab
5 ab ab
6 cc ab
7 aa cc
8 ab cc
9 cc cc
Do you know an efficient way to get directly (so without any row comparison after expand.grid) only the 'unique' combinations between the supplied vectors? The output will be
Var1 Var2
1 aa aa
2 ab aa
3 cc aa
5 ab ab
6 cc ab
9 cc cc
EDIT the combination of each element with itself could be eventually discarded from the answer. I don't actually need it in my program even though (mathematically) aa aa would be one (regular) unique combination between one element of Var1 and another of var2.
The solution needs to produce pairs of elements from both vectors (i.e. one from each of the input vectors - so that it could be applied to more than 2 inputs)
How about using outer? But this particular function concatenates them into one character string.
outer( c("aa", "ab", "cc"), c("aa", "ab", "cc") , "paste" )
# [,1] [,2] [,3]
#[1,] "aa aa" "aa ab" "aa cc"
#[2,] "ab aa" "ab ab" "ab cc"
#[3,] "cc aa" "cc ab" "cc cc"
You can also use combn on the unique elements of the two vectors if you don't want the repeating elements (e.g. aa aa)
vals <- c( c("aa", "ab", "cc"), c("aa", "ab", "cc") )
vals <- unique( vals )
combn( vals , 2 )
# [,1] [,2] [,3]
#[1,] "aa" "aa" "ab"
#[2,] "ab" "cc" "cc"
In base R, you can use this:
expand.grid.unique <- function(x, y, include.equals=FALSE)
{
x <- unique(x)
y <- unique(y)
g <- function(i)
{
z <- setdiff(y, x[seq_len(i-include.equals)])
if(length(z)) cbind(x[i], z, deparse.level=0)
}
do.call(rbind, lapply(seq_along(x), g))
}
Results:
> x <- c("aa", "ab", "cc")
> y <- c("aa", "ab", "cc")
> expand.grid.unique(x, y)
[,1] [,2]
[1,] "aa" "ab"
[2,] "aa" "cc"
[3,] "ab" "cc"
> expand.grid.unique(x, y, include.equals=TRUE)
[,1] [,2]
[1,] "aa" "aa"
[2,] "aa" "ab"
[3,] "aa" "cc"
[4,] "ab" "ab"
[5,] "ab" "cc"
[6,] "cc" "cc"
If the two vectors are the same, there's the combinations function in the gtools package:
library(gtools)
combinations(n = 3, r = 2, v = c("aa", "ab", "cc"), repeats.allowed = TRUE)
# [,1] [,2]
# [1,] "aa" "aa"
# [2,] "aa" "ab"
# [3,] "aa" "cc"
# [4,] "ab" "ab"
# [5,] "ab" "cc"
# [6,] "cc" "cc"
And without "aa" "aa", etc.
combinations(n = 3, r = 2, v = c("aa", "ab", "cc"), repeats.allowed = FALSE)
The previous answers were lacking a way to get a specific result, namely to keep the self-pairs but remove the ones with different orders. The gtools package has two functions for these purposes, combinations and permutations. According to this website:
When the order doesn't matter, it is a Combination.
When the order does matter it is a Permutation.
In both cases, we have the decision to make of whether repetitions are allowed or not, and correspondingly, both functions have a repeats.allowed argument, yielding 4 combinations (deliciously meta!). It's worth going over each of these. I simplified the vector to single letters for ease of understanding.
Permutations with repetition
The most expansive option is to allow both self-relations and differently ordered options:
> permutations(n = 3, r = 2, repeats.allowed = T, v = c("a", "b", "c"))
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "a"
[5,] "b" "b"
[6,] "b" "c"
[7,] "c" "a"
[8,] "c" "b"
[9,] "c" "c"
which gives us 9 options. This value can be found from the simple formula n^r i.e. 3^2=9. This is the Cartesian product/join for users familiar with SQL.
There are two ways to limit this: 1) remove self-relations (disallow repetitions), or 2) remove differently ordered options (i.e. combinations).
Combinations with repetitions
If we want to remove differently ordered options, we use:
> combinations(n = 3, r = 2, repeats.allowed = T, v = c("a", "b", "c"))
[,1] [,2]
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "b" "b"
[5,] "b" "c"
[6,] "c" "c"
which gives us 6 options. The formula for this value is (r+n-1)!/(r!*(n-1)!) i.e. (2+3-1)!/(2!*(3-1)!)=4!/(2*2!)=24/4=6.
Permutations without repetition
If instead we want to disallow repetitions, we use:
> permutations(n = 3, r = 2, repeats.allowed = F, v = c("a", "b", "c"))
[,1] [,2]
[1,] "a" "b"
[2,] "a" "c"
[3,] "b" "a"
[4,] "b" "c"
[5,] "c" "a"
[6,] "c" "b"
which also gives us 6 options, but different ones! The number of options is the same as above but it's a coincidence. The value can be found from the formula n!/(n-r)! i.e. (3*2*1)/(3-2)!=6/1!=6.
Combinations without repetitions
The most limiting is when we want neither self-relations/repetitions or differently ordered options, in which case we use:
> combinations(n = 3, r = 2, repeats.allowed = F, v = c("a", "b", "c"))
[,1] [,2]
[1,] "a" "b"
[2,] "a" "c"
[3,] "b" "c"
which gives us only 3 options. The number of options can be calculated from the rather complex formula n!/(r!(n-r)!) i.e. 3*2*1/(2*1*(3-2)!)=6/(2*1!)=6/2=3.
Try:
factors <- c("a", "b", "c")
all.combos <- t(combn(factors,2))
[,1] [,2]
[1,] "a" "b"
[2,] "a" "c"
[3,] "b" "c"
This will not include duplicates of each factor (e.g. "a" "a"), but you can add those on easily if needed.
dup.combos <- cbind(factors,factors)
factors factors
[1,] "a" "a"
[2,] "b" "b"
[3,] "c" "c"
all.combos <- rbind(all.combos,dup.combos)
factors factors
[1,] "a" "b"
[2,] "a" "c"
[3,] "b" "c"
[4,] "a" "a"
[5,] "b" "b"
[6,] "c" "c"
You can use a "greater than" operation to filter redundant combinations. This works with both numeric and character vectors.
> grid <- expand.grid(c("aa", "ab", "cc"), c("aa", "ab", "cc"), stringsAsFactors = F)
> grid[grid$Var1 >= grid$Var2, ]
Var1 Var2
1 aa aa
2 ab aa
3 cc aa
5 ab ab
6 cc ab
9 cc cc
This shouldn't slow down your code too much. If you're expanding vectors containing larger elements (e.g. two lists of dataframes), I recommend using numeric indices that refer to the original vectors.
TL;DR
Use comboGrid from RcppAlgos:
library(RcppAlgos)
comboGrid(c("aa", "ab", "cc"), c("aa", "ab", "cc"))
Var1 Var2
[1,] "aa" "aa"
[2,] "aa" "ab"
[3,] "aa" "cc"
[4,] "ab" "ab"
[5,] "ab" "cc"
[6,] "cc" "cc"
The Details
I recently came across this question R - Expand Grid Without Duplicates and as I was searching for duplicates, I found this question. The question there isn't exactly a duplicate, as it is a bit more general and has additional restrictions which #Ferdinand.kraft shined some light on.
It should be noted that many of the solutions here make use of some sort of combination function. The expand.grid function returns the Cartesian product which is fundamentally different.
The Cartesian product operates on multiple objects which may or may not be the same. Generally speaking, combination functions are applied to a single vector. The same can be said about permutation functions.
Using combination/permutation functions will only produce comparable results to expand.grid if the vectors supplied are identical. As a very simple example, consider v1 = 1:3, v2 = 2:4.
With expand.grid, we see that rows 3 and 5 are duplicates:
expand.grid(1:3, 2:4)
Var1 Var2
1 1 2
2 2 2
3 3 2
4 1 3
5 2 3
6 3 3
7 1 4
8 2 4
9 3 4
Using combn doesn't quite get us to the solution:
t(combn(unique(c(1:3, 2:4)), 2))
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 2 3
[5,] 2 4
[6,] 3 4
And with repeats using gtools, we generate too many:
gtools::combinations(4, 2, v = unique(c(1:3, 2:4)), repeats.allowed = TRUE)
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 3
[4,] 1 4
[5,] 2 2
[6,] 2 3
[7,] 2 4
[8,] 3 3
[9,] 3 4
[10,] 4 4
In fact we generate results that are not even in the cartesian product (i.e. expand.grid solution).
We need a solution that creates the following:
Var1 Var2
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 2 2
[5,] 2 3
[6,] 2 4
[7,] 3 3
[8,] 3 4
I authored the package RcppAlgos and in the latest release v2.4.3, there is a function comboGrid which addresses this very problem. It is very general, flexible, and is fast.
First, to answer the specific question raised by the OP:
library(RcppAlgos)
comboGrid(c("aa", "ab", "cc"), c("aa", "ab", "cc"))
Var1 Var2
[1,] "aa" "aa"
[2,] "aa" "ab"
[3,] "aa" "cc"
[4,] "ab" "ab"
[5,] "ab" "cc"
[6,] "cc" "cc"
And as, #Ferdinand.kraft points out, sometimes the output may need to have duplicates excluded in a given row. For that, we use repetition = FALSE:
comboGrid(c("aa", "ab", "cc"), c("aa", "ab", "cc"), repetition = FALSE)
Var1 Var2
[1,] "aa" "ab"
[2,] "aa" "cc"
[3,] "ab" "cc"
comboGrid is also very general. It can be applied to multiple vectors:
comboGrid(rep(list(c("aa", "ab", "cc")), 3))
Var1 Var2 Var3
[1,] "aa" "aa" "aa"
[2,] "aa" "aa" "ab"
[3,] "aa" "aa" "cc"
[4,] "aa" "ab" "ab"
[5,] "aa" "ab" "cc"
[6,] "aa" "cc" "cc"
[7,] "ab" "ab" "ab"
[8,] "ab" "ab" "cc"
[9,] "ab" "cc" "cc"
[10,] "cc" "cc" "cc"
Doesn't need the vectors to be identical:
comboGrid(1:3, 2:4)
Var1 Var2
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 2 2
[5,] 2 3
[6,] 2 4
[7,] 3 3
[8,] 3 4
And can be applied to vectors of various types:
set.seed(123)
my_range <- 3:15
mixed_types <- list(
int1 = sample(15, sample(my_range, 1)),
int2 = sample(15, sample(my_range, 1)),
char1 = sample(LETTERS, sample(my_range, 1)),
char2 = sample(LETTERS, sample(my_range, 1))
)
dim(expand.grid(mixed_types))
[1] 1950 4
dim(comboGrid(mixed_types, repetition = FALSE))
[1] 1595 4
dim(comboGrid(mixed_types, repetition = TRUE))
[1] 1770 4
The algorithm employed avoids generating the entirety of the Cartesian product and subsequently removing dupes. Ultimately, we create a hash table using the Fundamental theorem of arithmetic along with deduplication as pointed out by user2357112 supports Monica in the answer to Picking unordered combinations from pools with overlap. All of this together with the fact that it is written in C++ means that it is fast and memory efficient:
pools = list(c(1, 10, 14, 6),
c(7, 2, 4, 8, 3, 11, 12),
c(11, 3, 13, 4, 15, 8, 6, 5),
c(10, 1, 3, 2, 9, 5, 7),
c(1, 5, 10, 3, 8, 14),
c(15, 3, 7, 10, 4, 5, 8, 6),
c(14, 9, 11, 15),
c(7, 6, 13, 14, 10, 11, 9, 4),
c(6, 3, 2, 14, 7, 12, 9),
c(6, 11, 2, 5, 15, 7))
system.time(combCarts <- comboGrid(pools))
user system elapsed
0.929 0.062 0.992
nrow(combCarts)
[1] 1205740
## Small object created
print(object.size(combCarts), unit = "Mb")
92 Mb
system.time(cartProd <- expand.grid(pools))
user system elapsed
8.477 2.895 11.461
prod(lengths(pools))
[1] 101154816
## Very large object created
print(object.size(cartProd), unit = "Mb")
7717.5 Mb
here's a very ugly version that worked for me on a similar problem.
AHP_code = letters[1:10]
temp. <- expand.grid(AHP_code, AHP_code, stringsAsFactors = FALSE)
temp. <- temp.[temp.$Var1 != temp.$Var2, ] # remove AA, BB, CC, etc.
temp.$combo <- NA
for(i in 1:nrow(temp.)){ # vectorizing this gave me weird results, loop worked fine.
temp.$combo[i] <- paste0(sort(as.character(temp.[i, 1:2])), collapse = "")
}
temp. <- temp.[!duplicated(temp.$combo),]
temp.
USING SORT
Just for fun, one can in principle also remove duplicates from expand.grid by combining sort and unique.
unique(t(apply(expand.grid(c("aa", "ab", "cc"), c("aa", "ab", "cc")), 1, sort)))
This gives:
[,1] [,2]
[1,] "aa" "aa"
[2,] "aa" "ab"
[3,] "aa" "cc"
[4,] "ab" "ab"
[5,] "ab" "cc"
[6,] "cc" "cc"
With repetitions (this won't work if you specify different vectors for different columns and for example values in the first column are always bigger than values in the second column):
> v=c("aa","ab","cc")
> e=expand.grid(v,v,stringsAsFactors=F)
> e[!apply(e,1,is.unsorted),]
Var1 Var2
1 aa aa
4 aa ab
5 ab ab
7 aa cc
8 ab cc
9 cc cc
Without repetitions (this requires using the same vector for each column):
> t(combn(c("aa","ab","cc"),2))
[,1] [,2]
[1,] "aa" "ab"
[2,] "aa" "cc"
[3,] "ab" "cc"
With repetitions and with different vectors for different columns:
> e=expand.grid(letters[25:26],letters[1:3],letters[2:3],stringsAsFactors=F)
> e[!duplicated(t(apply(e,1,sort))),]
Var1 Var2 Var3
1 y a b
2 z a b
3 y b b
4 z b b
5 y c b
6 z c b
7 y a c
8 z a c
11 y c c
12 z c c
Without repetitions and with different vectors for different columns:
> e=expand.grid(letters[25:26],letters[1:3],letters[2:3],stringsAsFactors=F)
> e=e[!duplicated(t(apply(e,1,sort))),]
> e[!apply(apply(e,1,duplicated),2,any),]
Var1 Var2 Var3
1 y a b
2 z a b
5 y c b
6 z c b
7 y a c
8 z a c

Resources