List of all combinations of a minimum value using combn - r

Here is my data:
[,1] [,2] [,3]
[1,] 2 3 4
[2,] 2 3 5
[3,] 2 3 6
[4,] 2 4 5
[5,] 2 4 6
[6,] 2 4 2
[7,] 2 4 4
[8,] 2 4 9
[9,] 2 4 10
[10,] 2 4 3
How would I find all combinations of column 3 that are greater than 25? I am struggling how to use the combn function as the help function doesn't seem too intuitive.

If you want a non-loop version:
x <- read.table(text="2 3 4
2 3 5
2 3 6
2 4 5
2 4 6
2 4 2
2 4 4
2 4 9
2 4 10
2 4 3",stringsAsFactors=FALSE, header=FALSE)
res <- Map(combn, list(x[,3]), seq_along(x[,3]), simplify = FALSE)
unlist(res, recursive = FALSE)[lapply(unlist(res, recursive = FALSE),sum)>=25]
[[1]]
[1] 6 9 10
[[2]]
[1] 6 9 10
[[3]]
[1] 4 5 6 10
...
[[613]]
[1] 4 6 5 6 2 4 9 10 3
[[614]]
[1] 5 6 5 6 2 4 9 10 3
[[615]]
[1] 4 5 6 5 6 2 4 9 10 3
EDIT
To return rownames instead of the number vector:
rownames(x) <- paste0("row",1:10)
res <- list(Map(combn, list(x[,3]), seq_along(x[,3]), simplify = FALSE),
Map(combn, list(rownames(x)), seq_along(rownames(x)), simplify = FALSE))
unlist(res[[2]], recursive = FALSE)[lapply(unlist(res[[1]], recursive = FALSE),sum)>=25]
[[1]]
[1] "row3" "row8" "row9"
[[2]]
[1] "row5" "row8" "row9"
[[3]]
[1] "row1" "row2" "row3" "row9"
...
[[613]]
[1] "row1" "row3" "row4" "row5" "row6" "row7" "row8" "row9" "row10"
[[614]]
[1] "row2" "row3" "row4" "row5" "row6" "row7" "row8" "row9" "row10"
[[615]]
[1] "row1" "row2" "row3" "row4" "row5" "row6" "row7" "row8" "row9" "row10"
EDIT2 To get the elements of the list that match the minimum sum, in this case 25. This gives you the 42 combinations that sum to 25.
res <- Map(combn, list(x[,3]), seq_along(x[,3]), simplify = FALSE)
res3 <- unlist(res, recursive = FALSE)[lapply(unlist(res, recursive = FALSE),sum)>=25]
res3[which(rapply(res3,sum)==min(rapply(res3,sum)))]
To get the corresponding rownames as asked before:
rownames(x) <- paste0("row",1:10)
res4 <- list(Map(combn, list(x[,3]), seq_along(x[,3]), simplify = FALSE),
Map(combn, list(rownames(x)), seq_along(rownames(x)), simplify = FALSE))
unlist(res4[[2]], recursive = FALSE)[lapply(unlist(res4[[1]], recursive = FALSE),sum)>=25][which(rapply(res3,sum)==min(rapply(res3,sum)))]

The following should work for a fixed length; for all combinations with variable length one would need something more advanced (EDIT: see #PLapointe's post (which should be the accepted answer) or just a simple loop):
x <- c(4, 5, 6, 5, 6, 2, 4, 9, 10, 3)
res <- combn(x, 3)
This will return a matrix that looks like this (I only show the first entries):
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23]
[1,] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[2,] 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 5 5 5 5 5 5 6 6
[3,] 6 5 6 2 4 9 10 3 5 6 2 4 9 10 3 6 2 4 9 10 3 2 4
From there, you can then just select the combinations where the column sum is larger than your threshold:
res[, colSums(res) >= 25]
This will then give
[,1] [,2]
[1,] 6 6
[2,] 9 9
[3,] 10 10
As you now have duplicate entries (not sure if they are desired or not), you can simply do the following (or a simple loop):
res2 <- combn(unique(x), 3)
res2[, colSums(res2) >= 25]
which would then return
[1] 6 9 10

Related

Combinations of vector with sub-vector length n

Given a vector, 1:4, and a sequence length, 2, I would like to separate the vector into 'sub-vectors', each with a length of 2, and generate a matrix of all possible combinations of these sub-vectors.
Output would look like this:
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 3 4 1 2
Another example. With vector 1:8 and sub-vector length of 4, output would look like this:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 3 4 5 6 7 8
[2,] 5 6 7 8 1 2 3 4
With a vector 1:9 and sub-vector length of 3, output would look like this:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 2 3 4 5 6 7 8 9
[2,] 1 2 3 7 8 9 4 5 6
[3,] 4 5 6 1 2 3 7 8 9
[4,] 4 5 6 7 8 9 1 2 3
[5,] 7 8 9 4 5 6 1 2 3
[6,] 7 8 9 1 2 3 4 5 6
It's a given that the vector length must be divisible by the sub-vector length.
I can answer the whole question, but it will take a bit longer. This should give you the flavour of the answer.
The package combinat has a function called permn which gives you the all the permutations of a vector. You want this, but not quite. What you need is the permutations of all the blocks. So in your first example you have two blocks of length two, and in your second example you have three blocks of length three. If we look at the first, and think about ordering the blocks:
> library(combinat)
> numBlocks = 2
> permn(1:numBlocks)
[[1]]
[1] 1 2
[[2]]
[1] 2 1
So I hope you can see that the first permutation would take the blocks b1 = c(1,2), and b2 = c(3,4) and order them c(b1,b2), and the second would order them c(b2,b1).
Equally if you had three blocks, b1 = 1:3; b2 = 4:6; b3 = 7:9 then
permn(1:3)
[[1]]
[1] 1 2 3
[[2]]
[1] 1 3 2
[[3]]
[1] 3 1 2
[[4]]
[1] 3 2 1
[[5]]
[1] 2 3 1
[[6]]
[1] 2 1 3
gives you the ordering of these blocks. The more general solution is figuring out how to move the blocks around, but that isn't too hard.
Update: Using my multicool package. Note co-lexical ordering (coolex) isn't the order you'd come up with by yourself.
library(multicool)
combs = function(v, blockLength){
if(length(v) %% blockLength != 0){
stop("vector length must be divisible by blockLength")
}
numBlocks = length(v) / blockLength
blockWise = matrix(v, nc = blockLength, byrow = TRUE)
m = initMC(1:numBlocks)
Perms = allPerm(m)
t(apply(Perms, 1, function(p)as.vector(t(blockWise[p,]))))
}
> combs(1:4, 2)
[,1] [,2] [,3] [,4]
[1,] 3 4 1 2
[2,] 1 2 3 4
> combs(1:9, 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 7 8 9 4 5 6 1 2 3
[2,] 1 2 3 7 8 9 4 5 6
[3,] 7 8 9 1 2 3 4 5 6
[4,] 4 5 6 7 8 9 1 2 3
[5,] 1 2 3 4 5 6 7 8 9
[6,] 4 5 6 1 2 3 7 8 9

How to calculate the number of triplets in the rows of matrix in r?

I have this matrix:
m
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 3 1 6 8 8 8
[2,] 2 2 5 7 9 7 4
[3,] 1 2 3 4 5 6 7
[4,] 1 2 3 4 5 6 7
and I want to calculate the number of triples in each column.
So I expect a vector such as: [1,0,0,0] as the result since only the first row contains three adjacent identical matrices.
Is there any function in R to accomplish this, and that doesn't involve writing a long function?
OK, I am risking here, but, reflecting the comments, and also because it doesn't make much sense to split the question in two (debatable), let me ask what I am really after: Detecting 4 triplets (or the absence thereof) in each row of a matrix such as:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18]
[1,] 0 1 2 3 8 4 4 5 6 7 7 7 8 8 8 9 9 9
[2,] 0 1 2 0 2 3 3 3 4 5 5 5 6 7 7 7 8 9
[3,] 0 1 1 1 2 7 2 3 4 4 4 5 6 7 7 7 8 9
[4,] 0 1 1 1 2 3 4 9 4 5 5 5 6 6 6 7 8 9
[5,] 0 0 0 1 1 1 2 3 4 5 6 6 6 7 8 8 8 9
[6,] 0 1 2 3 4 5 5 5 6 6 6 7 8 8 8 9 9 9
[7,] 0 1 2 3 3 3 4 5 5 5 6 6 6 7 8 9 9 9
[8,] 0 1 2 3 4 5 5 5 6 6 6 7 7 7 8 9 9 9
We can use data.table
library(data.table)
apply(m, 1, function(x) any(tabulate(rleid(x))==3))
#[1] TRUE FALSE FALSE FALSE
If we need to find whether there are 4 triplets in a row (based on the new dataset)
apply(m1, 1, function(x) sum(tabulate(rleid(x))==3))==4
#[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
data
library(psych)
m <- `dimnames<-`(as.matrix(read.clipboard()), NULL)
m1 <- `dimnames<-`(as.matrix(read.clipboard()), NULL)
NOTE: The datasets were read after copying each of the data output showed in the OP's post and using read.clipboard from psych.
One solution is to use the lag operator from dplyr package as follows:
apply(m, 1, function(x) any((x == lag(x)) & (x == lag(x, 2))))
A more general sequence of numbers perhaps can be calculated as follows:
apply(m, 1, function(x) all(diff(which(diff(x) == 0)) == 1) & (length(which(diff(x) == 0)) == 2))
Where that last 2 is the (n - 1) where n = 3 in this case. You can also optimize it some by not computing that which(diff(x....) part twice.
Output for your example is:
[1] TRUE FALSE FALSE FALSE
Seems like there is this function in base called rle that computes run lengths of each value in a vector. You can use it as follows:
apply(m, 1, function(x) any(rle(x)$lengths == 3))
Giving you the same output:
[1] TRUE FALSE FALSE FALSE

R Equivalent of "end" in MatLab [duplicate]

Is it possible in R to say - I want all indices from position i to the end of vector/matrix?
Say I want a submatrix from 3rd column onwards. I currently only know this way:
A = matrix(rep(1:8, each = 5), nrow = 5) # just generate some example matrix...
A[,3:ncol(A)] # get submatrix from 3rd column onwards
But do I really need to write ncol(A)? Isn't there any elegant way how to say "from the 3rd column onwards"? Something like A[,3:]? (or A[,3:...])?
Sometimes it's easier to tell R what you don't want. In other words, exclude columns from the matrix using negative indexing:
Here are two alternative ways that both produce the same results:
A[, -(1:2)]
A[, -seq_len(2)]
Results:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
But to answer your question as asked: Use ncol to find the number of columns. (Similarly there is nrow to find the number of rows.)
A[, 3:ncol(A)]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
For rows (not columns as per your example) then head() and tail() could be utilised.
A <- matrix(rep(1:8, each = 5), nrow = 5)
tail(A, 3)
is almost the same as
A[3:dim(A)[1],]
(the rownames/indices printed are different is all).
Those work for vectors and data frames too:
> tail(1:10, 4)
[1] 7 8 9 10
> tail(data.frame(A = 1:5, B = 1:5), 3)
A B
3 3 3
4 4 4
5 5 5
For the column versions, you could adapt tail(), but it is a bit trickier. I wonder if NROW() and NCOL() might be useful here, rather than dim()?:
> A[, 3:NCOL(A)]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
Or flip this on its head and instead of asking R for things, ask it to drop things instead. Here is a function that encapsulates this:
give <- function(x, i, dimen = 1L) {
ind <- seq_len(i-1)
if(isTRUE(all.equal(dimen, 1L))) { ## rows
out <- x[-ind, ]
} else if(isTRUE(all.equal(dimen, 2L))) { ## cols
out <- x[, -ind]
} else {
stop("Only for 2d objects")
}
out
}
> give(A, 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 3 4 5 6 7 8
[2,] 1 2 3 4 5 6 7 8
[3,] 1 2 3 4 5 6 7 8
> give(A, 3, dimen = 2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
You can use the following instruction:
A[, 3:length(A[, 1])]
A dplyr readable renewed approach for the same thing:
A %>% as_tibble() %>%
select(-c(V1,V2))
A %>% as_tibble() %>%
select(V3:ncol(A))

Creating matrix from random block rows of another matrix without loop (in R)?

I am trying to create a matrix by drawing random block rows from another matrix. I have managed to do so with a loop.
set.seed(1)
a_matrix <- matrix(1:10,10,5) # the matrix with original sample
b_matrix <- matrix(NA,10, 5) # a matrix to store the bootstrap sample
S2<- seq(from =1 , to = 10, by =2) #[1] 1 3 5 7 9
m <- 2 # block size of m
for (r in S2){ start_point<-sample(1:(nrow(a_matrix)-1), 1, replace=T)
#randomly choose a number 1 to length of a_matrix -1
b_block <- a_matrix[start_point:(start_point+(m-1)), 1:ncol(a_matrix)]
# randomly select blocks from matrix a
b_matrix[r,]<-as.matrix((b_block)[1,])
b_matrix[(r+1),]<-as.matrix((b_block)[2,]) # put the blocks into matrix b
}
b_matrix
#we now have a b_matrix that is made of random blocks (size m=2)
#of the original a_matrix
The loop method works but it is clearly not very efficient and it is not possible to extend it to other block size (for e.g. having a blocksize of 3) .What is a cleaner and expandable approach ? Thanks in advance
Here I tried to clean it up a bit and generalize the use of m:
random_block_sample <- function(a_matrix, m = 2L) {
N <- nrow(a_matrix)
stopifnot(m <= N)
n <- ceiling(N / m)
s <- sample(N - m + 1L, n, TRUE) # start_point
i <- unlist(lapply(s, seq, length.out = m))
b_matrix <- a_matrix[i, , drop = FALSE]
head(b_matrix, N)
}
set.seed(1L)
random_block_sample(a_matrix, m = 2L)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 3 3 3 3 3
# [2,] 4 4 4 4 4
# [3,] 4 4 4 4 4
# [4,] 5 5 5 5 5
# [5,] 6 6 6 6 6
# [6,] 7 7 7 7 7
# [7,] 9 9 9 9 9
# [8,] 10 10 10 10 10
# [9,] 2 2 2 2 2
# [10,] 3 3 3 3 3
set.seed(1L)
random_block_sample(a_matrix, m = 5L)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 2 2 2 2 2
# [2,] 3 3 3 3 3
# [3,] 4 4 4 4 4
# [4,] 5 5 5 5 5
# [5,] 6 6 6 6 6
# [6,] 3 3 3 3 3
# [7,] 4 4 4 4 4
# [8,] 5 5 5 5 5
# [9,] 6 6 6 6 6
# [10,] 7 7 7 7 7

Elegant indexing up to end of vector/matrix

Is it possible in R to say - I want all indices from position i to the end of vector/matrix?
Say I want a submatrix from 3rd column onwards. I currently only know this way:
A = matrix(rep(1:8, each = 5), nrow = 5) # just generate some example matrix...
A[,3:ncol(A)] # get submatrix from 3rd column onwards
But do I really need to write ncol(A)? Isn't there any elegant way how to say "from the 3rd column onwards"? Something like A[,3:]? (or A[,3:...])?
Sometimes it's easier to tell R what you don't want. In other words, exclude columns from the matrix using negative indexing:
Here are two alternative ways that both produce the same results:
A[, -(1:2)]
A[, -seq_len(2)]
Results:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
But to answer your question as asked: Use ncol to find the number of columns. (Similarly there is nrow to find the number of rows.)
A[, 3:ncol(A)]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
For rows (not columns as per your example) then head() and tail() could be utilised.
A <- matrix(rep(1:8, each = 5), nrow = 5)
tail(A, 3)
is almost the same as
A[3:dim(A)[1],]
(the rownames/indices printed are different is all).
Those work for vectors and data frames too:
> tail(1:10, 4)
[1] 7 8 9 10
> tail(data.frame(A = 1:5, B = 1:5), 3)
A B
3 3 3
4 4 4
5 5 5
For the column versions, you could adapt tail(), but it is a bit trickier. I wonder if NROW() and NCOL() might be useful here, rather than dim()?:
> A[, 3:NCOL(A)]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
Or flip this on its head and instead of asking R for things, ask it to drop things instead. Here is a function that encapsulates this:
give <- function(x, i, dimen = 1L) {
ind <- seq_len(i-1)
if(isTRUE(all.equal(dimen, 1L))) { ## rows
out <- x[-ind, ]
} else if(isTRUE(all.equal(dimen, 2L))) { ## cols
out <- x[, -ind]
} else {
stop("Only for 2d objects")
}
out
}
> give(A, 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 3 4 5 6 7 8
[2,] 1 2 3 4 5 6 7 8
[3,] 1 2 3 4 5 6 7 8
> give(A, 3, dimen = 2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
You can use the following instruction:
A[, 3:length(A[, 1])]
A dplyr readable renewed approach for the same thing:
A %>% as_tibble() %>%
select(-c(V1,V2))
A %>% as_tibble() %>%
select(V3:ncol(A))

Resources