permutation of 2 to 2 of a sample with duplicate elements - r

How do I get all possible permutation of a list with duplicate elements?
For example, 2 by with the vector x = x (1,2,2) I want permutation with repetition:
1 1
1 2
1 2
2 1
2 2
2 2
2 1
2 2
2 2

This is easily achieved with one of the many packages for generating permutations with repetition.
library(gtools)
gtools::permutations(3, 2, c(1, 2, 2), set = FALSE, repeats.allowed = TRUE)
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 2
[4,] 2 1
[5,] 2 2
[6,] 2 2
[7,] 2 1
[8,] 2 2
[9,] 2 2
library(arrangements)
arrangements::permutations(x = c(1,2,2), k = 2, replace = TRUE)
## output same as above
library(RcppAlgos) ### I am the author
RcppAlgos::permuteGeneral(c(1,2,2), 2, TRUE)
## output same as above

You can use the built-in function rep() as follow :
data.frame(V1 = rep(x, each = length(x)), V2 = rep(x, length(x)))
V1 V2
1 1 1
2 1 2
3 1 2
4 2 1
5 2 2
6 2 2
7 2 1
8 2 2
9 2 2

Related

R: list all directionless circular permutations/arrangements (i.e. where clockwise/anti-clockwise are the same)

How do I list all the circular permutations in R where direction does not matter? I have a vector 1:4 for illustration (however, I would like a general solution).
I use
gtools::permutations(n = 4, r = 4)
which gives me a listing of all possible permutations as follows:
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 1 2 4 3
[3,] 1 3 2 4
[4,] 1 3 4 2
[5,] 1 4 2 3
[6,] 1 4 3 2
[7,] 2 1 3 4
[8,] 2 1 4 3
[9,] 2 3 1 4
[10,] 2 3 4 1
[11,] 2 4 1 3
[12,] 2 4 3 1
[13,] 3 1 2 4
[14,] 3 1 4 2
[15,] 3 2 1 4
[16,] 3 2 4 1
[17,] 3 4 1 2
[18,] 3 4 2 1
[19,] 4 1 2 3
[20,] 4 1 3 2
[21,] 4 2 1 3
[22,] 4 2 3 1
[23,] 4 3 1 2
[24,] 4 3 2 1
However, what I would like is the listing of the six circular permutations. So, I think that this is:
cbind(gtools::permutations(n = 3, r = 3),4)
that gives me:
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 1 3 2 4
[3,] 2 1 3 4
[4,] 2 3 1 4
[5,] 3 1 2 4
[6,] 3 2 1 4
However, I would like to also ignore the listings which are the same but for order. Example: I do not want to distinguish c(1,2,3,4) from c(4,3,2,1) (i.e. the 1st and the 6th entry), or c(1, 3, 2, 4) and c(2, 1, 3, 4) (i.e. the 2nd and the 4th entry) and c(2, 1, 3, 4) from c(3, 1, 2, 4) (i.e. the 3rd and the 5th entry in the output)? Is it simply a case of taking the first half of the results?
Is there a surer way of doing this? Many thanks for answering my question or providing suggestions.
We can't simply take the first half of results of permutations(n-1, n-1) and append n. This is easy to see for n = 5.
I suggest to use the following approach:
We set the first element to always be 1. This way we make sure that we take only 1 permutation for each set of equivalent permutations. That's basically the same as you did making 4 the always be the last element in your example.
Consider only such permutations for which element #2 is less than element #n. For each permutation in the set described by the first rule there will be one and only one permutation which is reverse to it. This way we make sure to only take one permutation for each such pair.
And this is the algorithm we are going to use to construct such set of permutations:
Find all pairs of elements #2 and #n, where #2 is less then #n. This is combinations(n-1, 2, v = 2:n).
For each such combination find all permutations of all the rest n-3 elements. This is permutations(n - 3, n - 3, v = rest_elements) where rest_elements is a vector listing all n-3 elements that are left when we remove 1, #2 and #n.
library(gtools)
get_perms <- function(n) {
# 1 is always first, #2 and #n have to be such that #2 < #n
# all combinations of elements #2 and #n:
combs_2n <- combinations(n-1, 2, v = 2:n)
# for each such combination we have n-3 elements left to place
# it's (n-3)! variants
n_perms_rest <- factorial(n - 3)
# resulting matrix with placeholders for these element combinations
res <-
cbind(
1, # element 1
rep(combs_2n[,1], each = n_perms_rest), #element 2
matrix(nrow = n_perms_rest*nrow(combs_2n), ncol = n-3), # elements 2-(n-1)
rep(combs_2n[,2], each = n_perms_rest)
)
# fill placeholders
for (i in 1:nrow(combs_2n)) {
rest_elements <- setdiff(2:n, combs_2n[i,])
rest_perms <- permutations(n - 3, n - 3, v = rest_elements)
res[1:n_perms_rest + (i-1)*n_perms_rest, 3:(n - 1)] <- rest_perms
}
res
}
get_perms(5)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 2 4 5 3
#> [2,] 1 2 5 4 3
#> [3,] 1 2 3 5 4
#> [4,] 1 2 5 3 4
#> [5,] 1 2 3 4 5
#> [6,] 1 2 4 3 5
#> [7,] 1 3 2 5 4
#> [8,] 1 3 5 2 4
#> [9,] 1 3 2 4 5
#> [10,] 1 3 4 2 5
#> [11,] 1 4 2 3 5
#> [12,] 1 4 3 2 5
Created on 2021-08-28 by the reprex package (v2.0.1)

sample with replacement but constrain the max frequency of each member to be drawn

Is it possible to extend the sample function in R to not return more than say 2 of the same element when replace = TRUE?
Suppose I have a list:
l = c(1,1,2,3,4,5)
To sample 3 elements with replacement, I would do:
sample(l, 3, replace = TRUE)
Is there a way to constrain its output so that only a maximum of 2 of the same elements are returned? So (1,1,2) or (1,3,3) is allowed, but (1,1,1) or (3,3,3) is excluded?
set.seed(0)
The basic idea is to convert sampling with replacement to sampling without replacement.
ll <- unique(l) ## unique values
#[1] 1 2 3 4 5
pool <- rep.int(ll, 2) ## replicate each unique so they each appear twice
#[1] 1 2 3 4 5 1 2 3 4 5
sample(pool, 3) ## draw 3 samples without replacement
#[1] 4 3 5
## replicate it a few times
## each column is a sample after out "simplification" by `replicate`
replicate(5, sample(pool, 3))
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 4 2 2 3
#[2,] 4 5 1 2 5
#[3,] 2 1 2 4 1
If you wish different value to appear up to different number of times, we can do for example
pool <- rep.int(ll, c(2, 3, 3, 4, 1))
#[1] 1 1 2 2 2 3 3 3 4 4 4 4 5
## draw 9 samples; replicate 5 times
oo <- replicate(5, sample(pool, 9))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 5 1 4 3 2
# [2,] 2 2 4 4 1
# [3,] 4 4 1 1 1
# [4,] 4 2 3 2 5
# [5,] 1 4 2 5 2
# [6,] 3 4 3 3 3
# [7,] 1 4 2 2 2
# [8,] 4 1 4 3 3
# [9,] 3 3 2 2 4
We can call tabulate on each column to count the frequency of 1, 2, 3, 4, 5:
## set `nbins` in `tabulate` so frequency table of each column has the same length
apply(oo, 2L, tabulate, nbins = 5)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 2 2 1 1 2
#[2,] 1 2 3 3 3
#[3,] 2 1 2 3 2
#[4,] 3 4 3 1 1
#[5,] 1 0 0 1 1
The count in all columns meet the frequency upper bound c(2, 3, 3, 4, 1) we have set.
Would you explain the difference between rep and rep.int?
rep.int is not the "integer" method for rep. It is just a faster primitive function with less functionality than rep. You can get more details of rep, rep.int and rep_len from the doc page ?rep.

Indices of matching of rows between matrices

Let n be a positive integer. We have a matrix B that has n columns, whose entries are integers between 1 and n. The aim is to match the rows of B with the rows of permutations(n), memorizing the indices in a vector v.
For example, let us consider the following. If
permutations(3)=
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 3 2
[3,] 2 1 3
[4,] 2 3 1
[5,] 3 1 2
[6,] 3 2 1
and
B=
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 3
[3,] 3 1 2
[4,] 2 3 1
[5,] 3 1 2
Then the vector v is
1 1 5 4 5
because the first two rows of B are equal to the row number 1 of permutations(3), the third row of B is the row number 5 of permutations(3), and so on.
I tried to apply the command
row.match
but the latter returns the error:
Error in do.call("paste", c(x[, , drop = FALSE], sep = "\r")) :
second argument must be a list
One way is to use match,
match(do.call(paste, data.frame(B)), do.call(paste, data.frame(m1)))
#[1] 1 1 5 4 5
One possible way is to turn your matrices into dataframes and join them:
A = read.table(text = "
1 2 3
1 3 2
2 1 3
2 3 1
3 1 2
3 2 1
")
B = read.table(text = "
1 2 3
1 2 3
3 1 2
2 3 1
3 1 2
")
library(dplyr)
A %>%
mutate(row_id = row_number()) %>%
right_join(B) %>%
pull(row_id)
# [1] 1 1 5 4 5

What is the best way to tidy a matrix in R

Is there a best practice means of "tidying" a matrix/array? By "tidy" in this context I mean
one row per element of the matrix
one column per dimension. the elements of these columns give you the "coordinates" of the matrix element which is stored on that row
I have an example here for a 2d matrix, but ideally this would work with an array also (This example works for mm <- array(1:18, c(3,3,3)), but I thought that would be too much to paste in here)
mm <- matrix(1:9, nrow = 3)
mm
#> [,1] [,2] [,3]
#> [1,] 1 4 7
#> [2,] 2 5 8
#> [3,] 3 6 9
inds <- which(mm > -Inf, arr.ind = TRUE)
cbind(inds, value = mm[inds])
#> row col value
#> [1,] 1 1 1
#> [2,] 2 1 2
#> [3,] 3 1 3
#> [4,] 1 2 4
#> [5,] 2 2 5
#> [6,] 3 2 6
#> [7,] 1 3 7
#> [8,] 2 3 8
#> [9,] 3 3 9
as.data.frame.table One way to convert from wide to long is the following. See ?as.data.frame.table for more information. No packages are used.
mm <- matrix(1:9, 3)
long <- as.data.frame.table(mm)
The code gives this data.frame:
> long
Var1 Var2 Freq
1 A A 1
2 B A 2
3 C A 3
4 A B 4
5 B B 5
6 C B 6
7 A C 7
8 B C 8
9 C C 9
numbers
If you prefer row and column numbers:
long[1:2] <- lapply(long[1:2], as.numeric)
giving:
> long
Var1 Var2 Freq
1 1 1 1
2 2 1 2
3 3 1 3
4 1 2 4
5 2 2 5
6 3 2 6
7 1 3 7
8 2 3 8
9 3 3 9
names Note that above it used A, B, C, ... because there were no row or column names. They would have been used if present. That is, had there been row and column names and dimension names the output would look like this:
mm2 <- array(1:9, c(3, 3), dimnames = list(A = c("a", "b", "c"), B = c("x", "y", "z")))
as.data.frame.table(mm2, responseName = "Val")
giving:
A B Val
1 a x 1
2 b x 2
3 c x 3
4 a y 4
5 b y 5
6 c y 6
7 a z 7
8 b z 8
9 c z 9
3d
Here is a 3d example:
as.data.frame.table(array(1:8, c(2,2,2)))
giving:
Var1 Var2 Var3 Freq
1 A A A 1
2 B A A 2
3 A B A 3
4 B B A 4
5 A A B 5
6 B A B 6
7 A B B 7
8 B B B 8
2d only For 2d one can alternately use row and col:
sapply(list(row(mm), col(mm), mm), c)
or
cbind(c(row(mm)), c(col(mm)), c(mm))
Either of these give this matrix:
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 1 2
[3,] 3 1 3
[4,] 1 2 4
[5,] 2 2 5
[6,] 3 2 6
[7,] 1 3 7
[8,] 2 3 8
[9,] 3 3 9
Another method is to use arrayInd together with cbind like this.
# a 3 X 3 X 2 array
mm <- array(1:18, dim=c(3,3,2))
Similar to your code, but with the more natural arrayInd function, we have
# get array in desired format
myMat <- cbind(c(mm), arrayInd(seq_along(mm), .dim=dim(mm)))
# add column names
colnames(myMat) <- c("values", letters[24:26])
which returns
myMat
values x y z
[1,] 1 1 1 1
[2,] 2 2 1 1
[3,] 3 3 1 1
[4,] 4 1 2 1
[5,] 5 2 2 1
[6,] 6 3 2 1
[7,] 7 1 3 1
[8,] 8 2 3 1
[9,] 9 3 3 1
[10,] 10 1 1 2
[11,] 11 2 1 2
[12,] 12 3 1 2
[13,] 13 1 2 2
[14,] 14 2 2 2
[15,] 15 3 2 2
[16,] 16 1 3 2
[17,] 17 2 3 2
[18,] 18 3 3 2

Create dataframe of all array indices in R

Using R, I'm trying to construct a dataframe of the row and col numbers of a given matrix. E.g., if
a <- matrix(c(1:15), nrow=5, ncol=3)
then I'm looking to construct a dataframe that gives:
row col
1 1
1 2
1 3
. .
5 1
5 2
5 3
What I've tried:
row <- matrix(row(a), ncol=1, nrow=dim(a)[1]*dim(a)[2], byrow=T)
col <- matrix(col(a), ncol=1, nrow=dim(a)[1]*dim(a)[2], byrow=T)
out <- cbind(row, col)
colnames(out) <- c("row", "col")
results in:
row col
[1,] 1 1
[2,] 2 1
[3,] 3 1
[4,] 4 1
[5,] 5 1
[6,] 1 2
[7,] 2 2
[8,] 3 2
[9,] 4 2
[10,] 5 2
[11,] 1 3
[12,] 2 3
[13,] 3 3
[14,] 4 3
[15,] 5 3
Which isn't what I'm looking for, as the sequence of rows and cols in suddenly reversed, even tough I specified "byrow=T". I don't see if and where I'm making a mistake but would hugely appreciate suggestions to overcome this problem. Thanks in advance!
I'd use expand.grid on the vectors 1:ncol and 1:nrow, then flip the columns with [,2:1] to get them in the order you want:
> expand.grid(seq(ncol(a)),seq(nrow(a)))[,2:1]
Var2 Var1
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
10 4 1
11 4 2
12 4 3
13 5 1
14 5 2
15 5 3
Use row and col, but more directly manipulate their output ordering since they return corresponding indices in place for the input array. Use t to get the non-default order you want in the end:
data.frame(row = as.vector(t(row(a))), col = as.vector(t(col(a))))
row col
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
10 4 1
11 4 2
12 4 3
13 5 1
14 5 2
15 5 3
Or, as a matrix not a data.frame:
cbind(as.vector(t(row(a))), as.vector(t(col(a))))
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 3
[4,] 2 1
[5,] 2 2
[6,] 2 3
[7,] 3 1
[8,] 3 2
[9,] 3 3
[10,] 4 1
[11,] 4 2
[12,] 4 3
[13,] 5 1
[14,] 5 2
[15,] 5 3
You may want to have a look at ?expand.grid, which does just about exactly what you want to achieve.
Since there are many ways to skin a cat, I'll chip in with yet another variant based on rep:
data.frame(row=rep(seq(nrow(a)), each=ncol(a)), col=rep(seq(ncol(a)), nrow(a)))
...but to announce a "winner", I think you need to time the solutions:
# Make up a huge matrix...
a <- matrix(runif(1e7), 1e4)
system.time( a1<-data.frame(row = as.vector(t(row(a))),
col = as.vector(t(col(a)))) ) # 0.68 secs
system.time( a2<-expand.grid(col = seq(ncol(a)),
row = seq(nrow(a)))[,2:1] ) # 0.49 secs
system.time( a3<-data.frame(row=rep(seq(nrow(a)), each=ncol(a)),
col=rep(seq(ncol(a)), nrow(a))) ) # 0.59 secs
identical(a1, a2) && identical(a1, a3) # TRUE
...so it seems #Spacedman has the speediest solution!

Resources