Match list to rows of matrix in R - r

"a" is a list and "b" is a matrix.
a<-list(matrix(c(0,2,0,1,0,2,0,0,1,0,0,0,0,0,2,2),4),
matrix(c(0,1,0,0,0,1,1,0,0,0,0,0),3),
matrix(c(0,0,0,0,2,0,1,0,0,0,0,0,2,0,2,1,0,1,1,0),5))
b<-matrix(c(2,2,1,1,1,2,1,2,1,1,2,1,1,1,1,1,1,2,2,2,1,2,1,1),6)
> a
[[1]]
[,1] [,2] [,3] [,4]
[1,] 0 0 1 0
[2,] 2 2 0 0
[3,] 0 0 0 2
[4,] 1 0 0 2
[[2]]
[,1] [,2] [,3] [,4]
[1,] 0 0 1 0
[2,] 1 0 0 0
[3,] 0 1 0 0
[[3]]
[,1] [,2] [,3] [,4]
[1,] 0 0 0 1
[2,] 0 1 0 0
[3,] 0 0 2 1
[4,] 0 0 0 1
[5,] 2 0 2 0
> b
[,1] [,2] [,3] [,4]
[1,] 2 1 1 2
[2,] 2 2 1 2
[3,] 1 1 1 1
[4,] 1 1 1 2
[5,] 1 2 1 1
[6,] 2 1 2 1
There are 3 objects in list "a". I want to test whether all the non-zero elements in each object in the list "a" match with the corresponding position of the same row in matrix "b". If matched, output the matched row number of b.
For example, the second object is
[[2]]
[,1] [,2] [,3] [,4]
[1,] 0 0 1 0
[2,] 1 0 0 0
[3,] 0 1 0 0
We can see the non-zero number in the 1st row is 1, and it locates in the third place of the row, it can match the 1-5 rows of matrix "b", the non-zero number in the 2nd row is 1, and it locates in the first place of this row, it can match the 3-5 rows of matrix "b", the non-zero number in the 3rd row is 1, and it locates in the second place of this row, it can match the 3-4 rows of matrix "b". so only the 3rd or 4th row of Matrix "b" can match all the rows in this object, so the output result is "3 4".
My attempting code is as follows:
temp<-Map(function(y) t(y), Map(function(a)
apply(a,1,function(x){
apply(b,1, function(y) identical(x[x!=0],y[x!=0]))}),a))
lapply(temp, function(a) which(apply(a,2,prod)==1))
The result is as follows:
[[1]]
integer(0)
[[2]]
[1] 3 4
[[3]]
[1] 6
It is right. but I wonder whether there is more quick code to handle this question?

Having a few columns and trying to take advantage of columns with > 1 unique values or no non-zero values to reduce computations:
ff = function(a, b)
{
i = seq_len(nrow(b)) #starting candidate matches
for(j in seq_len(ncol(a))) {
aj = a[, j]
nzaj = aj[aj != 0L]
if(!length(nzaj)) next #if all(a[, j] == 0) save some operations
if(sum(tabulate(nzaj) > 0L) > 1L) return(integer()) #if no unique values in a column break looping
i = i[b[i, j] == nzaj[[1L]]] #update candidate matches
}
return(i)
}
lapply(a, function(x) ff(x, b))
#[[1]]
#integer(0)
#
#[[2]]
#[1] 3 4
#
#[[3]]
#[1] 6
With data of your actual size:
set.seed(911)
a2 = replicate(300L, matrix(sample(0:3, 20 * 5, TRUE, c(0.97, 0.01, 0.01, 0.01)), 20, 5), simplify = FALSE)
b2 = matrix(sample(1:3, 15 * 5, TRUE), 15, 5)
identical(OP(a2, b2), lapply(a2, function(x) ff(x, b2)))
#[1] TRUE
microbenchmark::microbenchmark(OP(a2, b2), lapply(a2, function(x) ff(x, b2)), times = 50)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# OP(a2, b2) 686.961815 730.840732 760.029859 753.790094 785.310056 863.04577 50 b
# lapply(a2, function(x) ff(x, b2)) 8.110542 8.450888 9.381802 8.949924 9.872826 15.51568 50 a
OP is:
OP = function (a, b)
{
temp = Map(function(y) t(y), Map(function(a) apply(a, 1,
function(x) {
apply(b, 1, function(y) identical(x[x != 0], y[x !=
0]))
}), a))
lapply(temp, function(x) which(apply(x, 2, prod) == 1))
}

Your explanations of what you want and what your possible matrices look like are really not clear. From what I can deduce, you want to match the row number in b that matches the unique non-zero number in each column of a matrix in a. If so, here's a simpler option:
lapply(a, function(x){ # loop across the matrices in a
x[x == 0] <- NA # replace 0s with NA
which(apply(b, 1, function(y){ # loop across the rows of b, trying to match
all(y == colMeans(x, na.rm = TRUE)) # the rows of b with the colmeans of x
}))
})
# [[1]]
# [1] 2
#
# [[2]]
# [1] 5
#
# [[3]]
# [1] 6

Related

Transform adjacency lists to binary matrix in R

Given a list of the locations of 1s in each row, I'm trying to find an efficient way to construct a binary matrix. Here's a small example, although I’m trying to find something that scales well -
Given a binary matrix:
> M <- matrix(rbinom(25,1,0.5),5,5)
> M
[,1] [,2] [,3] [,4] [,5]
[1,] 0 1 1 1 0
[2,] 0 1 1 1 1
[3,] 1 1 0 1 1
[4,] 1 0 0 1 0
[5,] 0 1 1 0 0
I can transform M into an adjacency list using:
> Mlist <- apply(M==1, 1, which, simplify = FALSE)
> Mlist
[[1]]
[1] 2 3 4
[[2]]
[1] 2 3 4 5
[[3]]
[1] 1 2 4 5
[[4]]
[1] 1 4
[[5]]
[1] 2 3
I'd like to transform Mlist back into M. One possibility is:
M.new <- matrix(0,5,5)
for (row in 1:5){M.new[row,Mlist[[row]]] <- 1}
But, it seems like there should be a more efficient way.
Thanks!
1) Using M and Mlist defined in the Note at the end, sapply over its components replacing a vector of zeros with ones at the needed locations. Transpose at the end.
M2 <- t(sapply(Mlist, replace, x = integer(length(Mlist)), 1L))
identical(M, M2) # check that M2 equals M
## [1] TRUE
2) A variation with slightly more keystrokes, but faster, would be
M3 <- do.call("rbind", lapply(Mlist, replace, x = integer(length(Mlist)), 1L))
identical(M, M3)
## [1] TRUE
Benchmark
Here ex1 and ex2 are (1) and (2) above and ex0 is the for loop in the question except we used integer instead of double. Note that (2) is about 100x faster then the loop in the question.
library(microbenchmark)
microbenchmark(
ex0 = { M.new <- matrix(0L,5,5); for (row in 1:5){M.new[row,Mlist[[row]]] <- 1L} },
ex1 = t(sapply(Mlist, replace, x = integer(length(Mlist)), 1L)),
ex2 = do.call("rbind", lapply(Mlist, replace, x = integer(length(Mlist)), 1L))
)
giving:
Unit: microseconds
expr min lq mean median uq max neval cld
ex0 4454.4 4504.15 4639.111 4564.1 4670.10 8450.2 100 b
ex1 73.1 84.75 98.220 94.3 111.75 130.8 100 a
ex2 32.0 36.20 43.866 42.7 51.85 82.5 100 a
Note
set.seed(123)
M <- matrix(rbinom(25,1,0.5),5,5)
Mlist <- apply(M==1, 1, which, simplify = FALSE)
Using the vectorized row/column indexing - replicate the sequence of 'Mlist' by the lengths of the 'Mlist', and cbind with the unlisted 'Mlist' to create a matrix which can be used to assign the subset of elements of 'M.new' to 1
ind <- cbind(rep(seq_along(Mlist), lengths(Mlist)), unlist(Mlist))
M.new[ind] <- 1
-checking
> all.equal(M, M.new)
[1] TRUE
Or another option is sparseMatrix
library(Matrix)
as.matrix(sparseMatrix(i = rep(seq_along(Mlist), lengths(Mlist)),
j = unlist(Mlist), x = 1))
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 1 1 1
[2,] 0 1 0 1 0
[3,] 1 0 0 1 0
[4,] 0 1 0 1 0
[5,] 1 0 1 1 1

Count number of occurrence of zero between non-zero value in R

I have a matrix
mat <- matrix(c(64,76,0,0,78,35,45,0,0,4,37,0,66,46,0,0,0,0,3,0,71,0,28,97,0,30,55,65,116,30,18,0,0,143,99,0,0,0,0,0), nrow=4, byrow=T)
mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 64 76 0 0 78 35 45 0 0 4
[2,] 37 0 66 46 0 0 0 0 3 0
[3,] 71 0 28 97 0 30 55 65 116 30
[4,] 18 0 0 143 99 0 0 0 0 0
I want to create a list which count the number of occurrence of zero between non-zero value
[[1]]
[1] 2 2
[[2]]
[1] 1 4 1
[[3]]
[1] 1 1
[[4]]
[1] 2 5
All you need is rle
> apply(mat, 1, function(x) {
rle(x)$length[rle(x)$values == 0]
})
[[1]]
[1] 2 2
[[2]]
[1] 1 4 1
[[3]]
[1] 1 1
[[4]]
[1] 2 5
You can use rle which calculates the number of consecutive numbers
mat <- matrix(c(64,76,0,0,78,35,45,0,0,4,37,0,66,46,0,0,0,0,3,0,71,0,28,97,0,30,55,65,116,30,18,0,0,143,99,0,0,0,0,0), nrow=4, byrow=T)
apply(mat,1,function(x) {
value = rle(x==0)
value$length[value$values]
})
One more
setNames(object = lapply(X = data.frame(t(mat)),
FUN = function(x)
with(rle(x == 0), lengths[values])),
nm = NULL)
#[[1]]
#[1] 2 2
#[[2]]
#[1] 1 4 1
#[[3]]
#[1] 1 1
#[[4]]
#[1] 2 5
If for some reason you have a matrix with many rows and you need to do this a few seconds faster (unlikely I know), you can use the method below
library(dplyr)
rle(c(t(mat))) %>%
do.call(what = data.frame) %>%
mutate(mrow = (cumsum(lengths) - 1) %/% ncol(mat)) %>%
{split(.$lengths[!.$values], .$mrow[!.$values])}
# $`0`
# [1] 2 2
#
# $`1`
# [1] 1 4 1
#
# $`2`
# [1] 1 1
#
# $`3`
# [1] 2 5
Benchmark
mat <- mat[sample(nrow(mat), 1e6, T),]
f1 <- function(mat){
apply(mat, 1, function(x) {
with(rle(x), lengths[values == 0])
})
}
f2 <- function(mat){
rle(c(t(mat))) %>%
do.call(what = data.frame) %>%
mutate(mrow = (cumsum(lengths) - 1) %/% ncol(mat)) %>%
{split(.$lengths[!.$values], .$mrow[!.$values])}
}
microbenchmark::microbenchmark(f1(mat), f2(mat), times = 10)
# Unit: seconds
# expr min lq mean median uq max neval
# f1(mat) 28.346335 28.978307 30.633423 30.720702 31.504075 35.049800 10
# f2(mat) 3.683452 3.916681 4.099936 4.086634 4.250613 4.482668 10

Apply function on each element of a list of matrices

I have a list of matrices.
(below is a simplified example, I actually have a list of 3 matrices, the first one being in 2D, while the second and third ones are in 3D)
> a <- matrix(-1:2, ncol = 2)
> b <- array(c(-2:5), dim=c(2, 2, 2))
> c_list <- list(a,b)
> c_list
[[1]]
[,1] [,2]
[1,] -1 1
[2,] 0 2
[[2]]
, , 1
[,1] [,2]
[1,] -2 0
[2,] -1 1
, , 2
[,1] [,2]
[1,] 2 4
[2,] 3 5
I'd like to apply the function max(0,c_list) to each and every element (without a loop), in order to have the same type of object as "c_list" but with the negative values replaced by zeros.
> output
[[1]]
[,1] [,2]
[1,] 0 1
[2,] 0 2
[[2]]
, , 1
[,1] [,2]
[1,] 0 0
[2,] 0 1
, , 2
[,1] [,2]
[1,] 2 4
[2,] 3 5
I've managed to do it for a matrice or for a list with mapply or lapply, but not for a list of matrices.
Answer : either Sotos' answer
output <- lapply(c_list, function(i)replace(i, i < 0, 0))
or Moody_Mudskipper's answer
output <- lapply(c_list,pmax,0)
You can use pmax, it will preserve the format of the source matrix and vectorized so faster than looping with max.
lapply(c_list,pmax,0)
Using apply and lapply:
a <- matrix(-1:2, ncol = 2)
b <- matrix(-3:0, ncol = 2)
c <- list(a,b)
d <- lapply(c, function(m) {
apply(m, c(1, 2), function(x) max(0, x))
})
Output:
> d
[[1]]
[,1] [,2]
[1,] 0 1
[2,] 0 2
[[2]]
[,1] [,2]
[1,] 0 0
[2,] 0 0

Write a value for maximum/minimum between two values

I have a two-column matrix and I want to produce a new matrix/data.frame where Col N has 1 if is maximum, 0 otherwise (they are never equal). This is my attempt:
testM <- matrix(c(1,2,3, 1,1,5), ncol = 2, byrow = T)
>testM
V1 V2
1 1 2
2 3 1
3 1 5
apply(data.frame(testM), 1, function(row) ifelse(max(row[1],row[2]),1,0))
I expect to have:
0 1
1 0
0 1
because of the 0,1 parameters in max() function, but I just get
[1] 1 1 1
Any ideas?
Or using pmax
testM <- matrix(c(1,2,3, 1,1,5), ncol = 2, byrow = T)
--(testM==pmax(testM[,1],testM[,2]))
V1 V2
[1,] 0 1
[2,] 1 0
[3,] 0 1
You can perform arithmetic on Booleans in R! Just check if an element in each row is equal to it's max value and multiply by 1.
t(apply(testM, 1, function(row) 1*(row == max(row))))
You can use max.col and col to produce a logical matrix:
res <- col(testM) == max.col(testM)
res
[,1] [,2]
[1,] FALSE TRUE
[2,] TRUE FALSE
[3,] FALSE TRUE
If you want it as 0/1, you can do:
res <- as.integer(col(testM) == max.col(testM)) # this removes the dimension
dim(res) <- dim(testM) # puts the dimension back
res
[,1] [,2]
[1,] 0 1
[2,] 1 0
[3,] 0 1

NA won't omit in R when 0 is between 1's

I am suppose to change a square matrix which represents a graph (the vertices-and-edges kind) and change it into a list that represents the same graph
square matrix: element (i,j) = 1 means there is an edge i -> j
list: element i is a vector (possibly empty, coded as NA) of all j s.t. there is an edge i -> j
My problem is that if there is a zero in the middle of the row it returns a NA and it is only suppose to do that when a vector is empty(no edges). It only does it when a zero is between two 1's. I don't know why and NA.omit doesn't work.
This is my first time programming in R.
squaretolist <- function(m){
ml <- list() #creates an empty list that we will return at the end
for(i in 1:ncol(m)){ #loop through columns
b1 <- c()
for(j in 1:nrow(m)){ #loop through rows
ifelse(m[i,j] %in% 1, b1[j] <- j, next)
}
ifelse(length(b1) == 0, ml[[i]]<- NA, ml[[i]] <- b1 )
}
return(ml)
}
In your function, if you have a zero in between two 1s, for example 1 in the 1st position and in the 3rd position, you're assigning b1[1] to 1, b1[3] to 3 but, as you have a 0 in the 2nd position, you're not assigning b1[2] to anything so it becomes NA.
To avoid that, you can replace ifelse(m[i,j] %in% 1, b1[j] <- j, next)
by ifelse(m[i,j] %in% 1, b1 <- c(b1,j), next).
You can also get what you want with the use of grep and apply functions :
ml <- apply(m, 1, function(i) {if(any(i==1)) grep(1, i) else NA})
This instruction tells R to apply, for each row of the matrix m, a function that returns, if there is at least one 1, the position of the 1(s), else NA.
Example:
set.seed(123)
m<-matrix(sample(c(0,1),25,replace=T),nrow=5)
m[4,]<-rep(0,5)
# > m
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 0 1 1 1
# [2,] 1 1 0 0 1
# [3,] 0 1 1 0 1
# [4,] 0 0 0 0 0
# [5,] 1 0 0 1 1
ml<-apply(m,1,function(i){if(any(i==1)) grep(1,i) else NA})
# > ml
# [[1]]
# [1] 3 4 5
# [[2]]
# [1] 1 2 5
# [[3]]
# [1] 2 3 5
# [[4]]
# [1] NA
# [[5]]
# [1] 1 4 5

Resources