Optimise row wise matrix comparison in R

Optimise row wise matrix comparison in R - r

I've googled extensively and can't seem to find an answer to my problem. Apologies if this has been asked before. I have two matrices, a & b, each with the same dimensions. What I am trying to do is iterate over the rows of a (from i = 1 to number of rows in a) and check if any elements found in row i of matrix a appear in the corresponding row in matrix b. I have a solution using sapply but this becomes quite slow with very large matrices. I wondered if it is possible to vectorise my solution somehow? Examples below:
# create example matrices
a = matrix(
1:9,
nrow = 3
)
b = matrix(
4:12,
nrow = 3
)
# iterate over rows in a....
# returns TRUE for each row of a where any element in ith row is found in the corresponding row i of matrix b
sapply(1:nrow(a), function(x){ any(a[x,] %in% b[x,])})
# however, for large matrices this performs quite poorly. is it possible to vectorise?
a = matrix(
runif(14000000),
nrow = 7000000
)
b = matrix(
runif(14000000),
nrow = 7000000
)
system.time({
sapply(1:nrow(a), function(x){ any(a[x,] %in% b[x,])})
})

Use apply to find any 0 differences:
a <- sample(1:3, 9, replace = TRUE)
b <- sample(1:3, 9, replace = TRUE)
a <- matrix(a, ncol = 3)
b <- matrix(b, ncol = 3)
diff <- (a - b)
apply(diff, 1, function(x) which(x == 0)) # actual indexes = 0
apply(diff, 1, function(x) any(x == 0)) # row check only
or

Maybe you can try intersect + asplit like below
lengths(Map(intersect, asplit(a, 1), asplit(b, 1))) > 0

Related

list to dataframe without unique columns

I have this loop to generate some values
for (j in 1:2) {
table <- rep(data.frame(
matrix(c(letters[1:2],
sample(c(rep(1,100),0), size = 1),
sample(c(rep(0,100),1), size = 1)), ncol = 2) ), j)
}
I would like to get this output like this
X1 X2
a 1
b 0
a 1
b 1
To get table of letters with one column and numbers in second column
I tried
do.call(rbind, table)
data.frame(matrix(unlist(table), nrow=length(table), byrow=TRUE))
But I am not able to get values to right column in data table.

The table is getting updated in each of the iteration. Instead, we may use replicate to create a list
lst1 <- replicate(2, data.frame(
matrix(c(letters[1:2],
sample(c(rep(1,100),0), size = 1),
sample(c(rep(0,100),1), size = 1)), ncol = 2) ), simplify = FALSE)
do.call(rbind, lst1)

How to quantify the frequency of all possible row combinations of a binary matrix in R in a more efficient way?

Lets assume I have a binary matrix with 24 columns and 5000 rows.
The columns are Parameters (P1 - P24) of 5000 subjects. The parameters are binary (0 or 1).
(Note: my real data can contain as much as 40,000 subjects)
m <- matrix(, nrow = 5000, ncol = 24)
m <- apply(m, c(1,2), function(x) sample(c(0,1),1))
colnames(m) <- paste("P", c(1:24), sep = "")
Now I would like to determine what are all possible combinations of the 24 measured parameters:
comb <- expand.grid(rep(list(0:1), 24))
colnames(comb) <- paste("P", c(1:24), sep = "")
The final question is: How often does each of the possible row combinations from comb appear in matrix m?
I managed to write a code for this and create a new column in comb to add the counts. But my code appears to be really slow and would take 328 days to complete to run. Therefore the code below only considers the 20 first combinations
comb$count <- 0
for (k in 1:20){ # considers only the first 20 combinations of comb
for (i in 1:nrow(m)){
if (all(m[i,] == comb[k,1:24])){
comb$count[k] <- comb$count[k] + 1
}
}
}
Is there computationally a more efficient way to compute this above so I can count all combinations in a short time?
Thank you very much for your help in advance.

Data.Table is fast at this type of operation:
m <- matrix(, nrow = 5000, ncol = 24)
m <- apply(m, c(1,2), function(x) sample(c(0,1),1))
colnames(m) <- paste("P", c(1:24), sep = "")
comb <- expand.grid(rep(list(0:1), 24))
colnames(comb) <- paste("P", c(1:24), sep = "")
library(data.table)
data_t = data.table(m)
ans = data_t[, .N, by = P1:P24]
dim(ans)
head(ans)
The core of the function is by = P1:P24 means group by all the columns; and .N the number of records in group
I used this as inspiration - How does one aggregate and summarize data quickly?
and the data_table manual https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html

If all you need is the combinations that occur in the data and how many times, this will do it:
m2 <- apply(m, 1, paste0, collapse="")
m2.tbl <- xtabs(~m2)
head(m2.tbl)
m2
# 000000000001000101010010 000000000010001000100100 000000000010001110001100 000000000100001000010111 000000000100010110101010 000000000100101000101100
# 1 1 1 1 1 1

You can use apply to paste the unique values in a row and use table to count the frequency.
table(apply(m, 1, paste0, collapse = '-'))

Assign the same index if two vectors have a common intersection

I need help with a question closely related to some other question of mine.
How to merge two different groupings if they are not disjoint with dplyr
As the title of the question says, I want to generate an index in a vector that links different vectors in a list if they have an intersection or, if not, if both intersect with some other vector in a list, and so on...
This is a question involving graph theory/networks - I want to find indirectly connected vectors.
The question above solved my problem considering two columns a dataframe, but I don't know how to generalize this to a list in which elements my have different lengths.
This is an example: list(1:3, 3:5, 5, 6) should give me c(1, 1, 1, 2)
EDIT:
I've tried using the fact that the powers of an adjacency matrix represent possible paths from one edge to some other one.
find_connections <- function(list_vectors){
list_vectors <- list_vectors %>%
set_names(paste0("x", 1:length(list_vectors)))
x <- crossprod(table(stack(list_vectors)))
power <- nrow(x) - 2
x <- ifelse(x >= 1, 1, 0)
if(power > 0){
z <- accumulate(replicate(power, x, simplify = FALSE),
`%*%`, .init = x) %>%
reduce(`+`)
} else{
z <- x
}
z <- ifelse(z >= 1, 1, 0)
w <- z %>%
as.data.frame() %>%
group_by(across()) %>%
group_indices()
return(w)
}
The problem is that it took too long to run my code. Each matrix is not very large, but I do need to run the function on a large number of them.
Is it possible to improve this?

This is one way to do it. It creates a loop for the elements in each vector and then uses the same trick as the previous answer to find clusters.
library(data.table)
library(igraph)
x <- list(1:3, 3:5, 5, 6)
dt <- rbindlist(lapply(x,
function(r) data.table(from = r, to = shift(r, -1, fill = r[1]))))
dg <- graph_from_data_frame(dt, directed = FALSE)
unname(sapply(x, function(v) components(dg)$membership[as.character(v[1])]))
#> [1] 1 1 1 2

construct new supermatrix from block matrices

How can I construct (in R) a matrix made of subcomponents that are matrices?
For example, starting from matrices
A <- matrix(1:9,nrow=3,ncol=3)
B <- matrix(5:10,nrow=2,ncol=3)
C <- matrix(11:20,nrow=2,ncol=5)
I want to construct a block matrix
A 0
B C
where 0 represents a zero-filled block with the appropriate dimensions.
There are other questions on SO about constructing block-diagonal matrices
(Matrix::bdiag is very good for this), but I can't find one that answers this question.
(I'm posting this question because I had just about finished answering it when it was deleted by its original poster ...)

I tried writing a general purpose function. The usage is similar to matrix() but the first argument is a list of matrices (or vectors that will be recycled). It does not have all the bells and whistles (dimnames, byrow) but it is a decent start. I wouldn't be surprised to find out a better and more complete function already exists in a package but at least it was a fun exercise:
supermatrix <- function(list.of.mat, nrow = 1L, ncol = 1L) {
stopifnot(length(list.of.mat) == nrow * ncol)
is.mat <- vapply(list.of.mat, is.matrix, logical(1L))
is.vec <- vapply(list.of.mat, is.vector, logical(1L))
if (any(!is.mat & !is.vec)) stop("the list items must be matrices or vectors")
is.mat.mat <- matrix(is.mat, nrow, ncol)
if (any(rowSums(is.mat.mat) == 0L))
stop("we need at least one matrix per super row")
if (any(colSums(is.mat.mat) == 0L))
stop("we need at least one matrix per super column")
na.mat <- matrix(NA, nrow, ncol)
nrow.mat <- replace(na.mat, is.mat, vapply(list.of.mat[is.mat], nrow, integer(1L)))
ncol.mat <- replace(na.mat, is.mat, vapply(list.of.mat[is.mat], ncol, integer(1L)))
is.not.uniq <- function(x) length(table(x)) > 1L
if (any(apply(nrow.mat, 1, is.not.uniq))) stop("row dim mismatch")
if (any(apply(ncol.mat, 2, is.not.uniq))) stop("col dim mismatch")
nrow.vec <- rowMeans(nrow.mat, na.rm = TRUE)
ncol.vec <- colMeans(ncol.mat, na.rm = TRUE)
nrow.mat <- matrix(nrow.vec, nrow, ncol, byrow = FALSE)
ncol.mat <- matrix(ncol.vec, nrow, ncol, byrow = TRUE)
all.mat <- Map(matrix, list.of.mat, nrow.mat, ncol.mat)
i1.idx <- unlist(Map(rep, row(na.mat), lapply(all.mat, length)))
j1.idx <- unlist(Map(rep, col(na.mat), lapply(all.mat, length)))
i2.idx <- unlist(lapply(all.mat, row))
j2.idx <- unlist(lapply(all.mat, col))
o.idx <- order(j1.idx, j2.idx, i1.idx, i2.idx)
matrix(unlist(all.mat)[o.idx], sum(nrow.vec), sum(ncol.vec))
}
Example usage:
A <- matrix(1:9,nrow=3,ncol=3)
B <- matrix(5:10,nrow=2,ncol=3)
C <- matrix(11:20,nrow=2,ncol=5)
supermatrix(list(A, B, 0, C), 2, 2)
supermatrix(list(A, B, A, 1, 0, C, 2, C), 4, 2)

We need a zero matrix that will have compatible dimensions with A and C:
z <- matrix(0,nrow=nrow(A),ncol=ncol(C))
Now we just use rbind() and cbind():
rbind(cbind(A,z),cbind(B,C))

Creating block matrix via loop

I'm trying to create a block matrix using a loop in R, which depend on some variable I call T. The two matrices used to construct the block matrix could look like this:
A=matrix(c(1,0.3,0.3,1.5),nrow=2)
B=matrix(c(0.5,0.3,0.3,1.5),nrow=2)
So depending on what i set T to, I need different results. For T=2:
C=rbind(cbind(A,B),cbind(B,A))
For T=3:
C=rbind(cbind(A,B,B),cbind(B,A,B),cbind(B,B,A))
For T=5:
C=rbind(cbind(A,B,B,B,B),cbind(B,A,B,B,B),cbind(B,B,A,B,B),cbind(B,B,B,A,B),cbind(B,B,B,B,A))
So basically, I'm just trying to create a loop or something similar, where I can just specify my T and it will create the block matrix for me depending on T.
Thanks

You can do that:
N <- nrow(A)
C <- matrix(NA,N*T,N*T)
for (i in 1:T){
for (j in 1:T){
if (i == j)
C[(i-1)*N+1:N, (j-1)*N+1:N] <- A
else
C[(i-1)*N+1:N, (j-1)*N+1:N] <- B
}
}

From your explanation I suppose that you want single A and T-1 Bs in your final matrix.
If that is correct then here is a quick try using the permn function from the combinat library. All I am doing is generating the expression using the permutation and then evaluating it.
A = matrix(c(1,0.3,0.3,1.5),nrow=2)
B = matrix(c(0.5,0.3,0.3,1.5),nrow=2)
T = 5
x = c("A", rep("B",T-1))
perms = unique(permn(x)) #permn generates non-unique permutations
perms = lapply(perms, function(xx) {xx=paste(xx,collapse=","); xx=paste("cbind(",xx,")")})
perms = paste(perms, collapse=",")
perms = paste("C = rbind(",perms,")",collapse=",")
eval(parse(text=perms))

With the blockmatrix package this is pretty straightforward.
library(blockmatrix)
# create toy matrices (block matrix elements)
# with values which makes it easier to track them in the block matrix in the example here
A <- matrix("a", nrow = 2, ncol = 2)
B <- matrix("b", nrow = 2, ncol = 2)
# function for creating the block matrix
# n: number of repeating blocks in each dimension
# (I use n instead of T, to avoid confusion with T as in TRUE)
# m_list: the two matrices in a list
block <- function(n, m_list){
# create a 'layout matrix' of the block matrix elements
m <- matrix("B", nrow = n, ncol = n)
diag(m) <- "A"
# build block matrix
as.matrix(blockmatrix(dim = dim(m_list[[1]]), value = m, list = m_list))
}
# try with different n
block(n = 2, m_list = list(A = A, B = B))
block(n = 3, m_list = list(A = A, B = B))
block(n = 5, m_list = list(A = A, B = B))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Optimise row wise matrix comparison in R - r

Use apply to find any 0 differences: a <- sample(1:3, 9, replace = TRUE) b <- sample(1:3, 9, replace = TRUE) a <- matrix(a, ncol = 3) b <- matrix(b, ncol = 3) diff <- (a - b) apply(diff, 1, function(x) which(x == 0)) # actual indexes = 0 apply(diff, 1, function(x) any(x == 0)) # row check only or

Maybe you can try intersect + asplit like below lengths(Map(intersect, asplit(a, 1), asplit(b, 1))) > 0

Related

list to dataframe without unique columns

How to quantify the frequency of all possible row combinations of a binary matrix in R in a more efficient way?

Assign the same index if two vectors have a common intersection

construct new supermatrix from block matrices

Creating block matrix via loop

Categories

Resources