Matching matrix rows with list elements in R - r

A reproducible data:
dat1 <- matrix(0, nrow = 9, ncol = 2)
dat1[,1] <- rep(1:3,3)
dat1[,2] <- c(1,1,1,2,2,2,3,3,3)
dat2 <- list()
dat2[[1]] <- matrix(c(1,2,1,3), nrow = 2, ncol = 2)
dat2[[2]] <- matrix(c(1,1,2,3,1,3), nrow = 3, ncol = 2 )
> dat1
[,1] [,2]
[1,] 1 1
[2,] 2 1
[3,] 3 1
[4,] 1 2
[5,] 2 2
[6,] 3 2
[7,] 1 3
[8,] 2 3
[9,] 3 3
> dat2
[[1]]
[,1] [,2]
[1,] 1 1
[2,] 2 3
[[2]]
[,1] [,2]
[1,] 1 3
[2,] 1 1
[3,] 2 3
I have a matrix (dat1) and a list (dat2).
Some rows of dat1 is same as some of the list elements of dat2. My objective is to find out the corresponding row numbers of dat1 that are matched with dat2 and store them in a list. AN EXAMPLE of the output:
> ex.result
[[1]]
[,1]
[1,] 1
[2,] 8
[[2]]
[,1]
[1,] 7
[2,] 1
[3,] 8
I am looking for a fast way to do this without using time consuming loops.

A slightly different approach:
lapply( dat2, function(m) {
apply( m, 1, function(r)
which( apply( sweep( dat1, 2, r, "==" ), 1, all ) ) ) %>% as.matrix })
Output:
[[1]]
[,1]
[1,] 1
[2,] 8
[[2]]
[,1]
[1,] 7
[2,] 1
[3,] 8

Here is an option:
lapply(dat2, function(mat)
apply(mat, 1, function(row)
match(toString(row), apply(dat1, 1, toString))))
#[[1]]
#[1] 1 8
#
#[[2]]
#[1] 7 1 8
This returns a list with integer vectors instead of a list with array/matrix entries though.

In the same vein as above, using Map() and vector recycling:
# Coercing to a data.frame to recycle the vector that is used to search:
setNames(
Map(function(x, y){
matrix(
match(
apply(y, 1, paste, collapse = ", "),
x
)
)
},
data.frame(apply(dat1, 1, paste, collapse = ", ")),
dat2),
seq_len(length(dat2)))

Related

Create a symmetric matrix from circular shifts of a vector

I'm struggling with the creation of a symmetric matrix.
Let's say a vector v <- c(1,2,3)
I want to create a matrix like this:
matrix(ncol = 3, nrow = 3, c(1,2,3,2,3,1,3,1,2), byrow = FALSE)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 1
[3,] 3 1 2
(This is just an reprex, I have many vectors with different lengths.)
Notice this is a symmetric matrix with diagonal c(1,3,2) (different from vector v) and the manual process to create the matrix would be like this:
Using the first row as base (vector v) the process is to fill the empty spaces with the remaining values on the left side.
Any help is appreciated. Thanks!
Let me answer my own question in order to close it properly, using the incredible simple and easy solution from Henrik's comment:
matrix(v, nrow = 3, ncol = 4, byrow = TRUE)[ , 1:3]
Maybe the byrow = TRUE matches the three steps of the illustration best conceptually, but the output is the same with:
matrix(v, nrow = 4, ncol = 3)[1:3, ]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 2 3 1
# [3,] 3 1 2
Because there may be "many vectors with different lengths", it could be convenient to make a simple function and apply it to the vectors stored in a list:
cycle = function(x){
len = length(x)
matrix(x, nrow = len + 1, ncol = len)[1:len , ]
}
l = list(v1 = 1:3, v2 = letters[1:4])
lapply(l, cycle)
# $v1
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 2 3 1
# [3,] 3 1 2
#
# $v2
# [,1] [,2] [,3] [,4]
# [1,] "a" "b" "c" "d"
# [2,] "b" "c" "d" "a"
# [3,] "c" "d" "a" "b"
# [4,] "d" "a" "b" "c"
Another option is to use Reduce and make c(v[-1], v[1]) accumulative.
do.call(rbind, Reduce(function(x, y) c(x[-1], x[1]), v[-1], v, accumulate = TRUE))
# [,1] [,2] [,3]
#[1,] 1 2 3
#[2,] 2 3 1
#[3,] 3 1 2

Find common values between matrices and return matrix with row-col position

I'd like to find between to matrices the shared values, and return the locations (row-col) in a matrix.
set.seed(123)
m <- matrix(sample(4), 2, 2, byrow = T)
# m
# [,1] [,2]
# [1,] 2 3
# [2,] 1 4
m2 <- matrix(sample(4), 2, 2, byrow = F)
# m2
# [,1] [,2]
# [1,] 4 2
# [2,] 1 3
Expected output:
# [,1] [,2]
# [1,] NA NA
# [2,] "2-1" NA
Bonus if this could be generalized to non-identical matrices (different dim).
Equal sizes
One option would be
replace(m * NA, m == m2, paste(row(m), col(m), sep = "-")[m == m2])
# [,1] [,2]
# [1,] NA NA
# [2,] "2-1" NA
Different sizes
I believe that in this case, regardless of the approach, you will first need to trim both matrices to be of equal size.
set.seed(12)
(m <- matrix(sample(6), 2, 3, byrow = TRUE))
# [,1] [,2] [,3]
# [1,] 1 5 4
# [2,] 6 3 2
(m2 <- matrix(sample(6), 3, 2, byrow = FALSE))
# [,1] [,2]
# [1,] 2 5
# [2,] 4 3
# [3,] 1 6
out <- matrix(NA, max(nrow(m), nrow(m2)), max(ncol(m), ncol(m2)))
mrow <- min(nrow(m), nrow(m2))
mcol <- min(ncol(m), ncol(m2))
mTrim <- m[1:mrow, 1:mcol]
m2Trim <- m2[1:mrow, 1:mcol]
out[1:mrow, 1:mcol][mTrim == m2Trim] <- paste(row(mTrim), col(mTrim), sep = "-")[mTrim == m2Trim]
out
# [,1] [,2] [,3]
# [1,] NA "1-2" NA
# [2,] NA "2-2" NA
# [3,] NA NA NA
This function gives the desired output, but works on the condition that dim() is equal between the two matrices.
In order to generalize this for non identical matrices, on solution would be to subset the bigger matrix first.
The key is which(mat1==mat2, arr.ind=T) to get row-col index:
which(m==m2, arr.ind=T)
row col
[1,] 2 1
Inside a function:
find_in_matr <- function(mat1, mat2) {
if (!all(dim(mat1) == dim(mat2))) {
stop("mat1 and mat2 need to have the same dim()!")
}
m <- mat1
m[] <- NA # copy mat1 dim, and empty values
loc <- which(mat1==mat2, arr.ind=T) # find positions (both indxs)
m[loc] <- mapply(paste, sep="-", loc[, 1], loc[, 2]) # paste indxs
return(m)
}
Example:
set.seed(123)
m <- matrix(sample(4), 2, 2, byrow = T)
# m
# [,1] [,2]
# [1,] 2 3
# [2,] 1 4
m2 <- matrix(sample(4), 2, 2, byrow = F)
# m2
# [,1] [,2]
# [1,] 4 2
# [2,] 1 3
find_in_matr(m, m2)
# [,1] [,2]
# [1,] NA NA
# [2,] "2-1" NA
Silly piped version
library(magrittr)
(m == m2) %>%
`[<-`(!., NA) %>%
`[<-`((w <- which(., arr = T)), apply(w, 1, paste, collapse = '-'))
# [,1] [,2]
# [1,] NA NA
# [2,] "2-1" NA
I try to do it with ifelse() :
x <- apply(which(m == m2, arr.ind = T), 1, paste, collapse = "-")
ifelse(m != m2, NA, x)
# [,1] [,2]
# [1,] NA NA
# [2,] "2-1" NA
This method can deal with any dimensions.
e.g.
set.seed(999)
m1 <- matrix(sample(1:3, 12, replace = T), 3, 4)
m2 <- matrix(sample(1:3, 12, replace = T), 3, 4)
x <- apply(which(m1 == m2, arr.ind = T), 1, paste, collapse = "-")
ifelse(m1 != m2, NA, x)
# [,1] [,2] [,3] [,4]
# [1,] NA "1-4" NA "3-4"
# [2,] NA NA "2-3" NA
# [3,] "2-3" NA NA "1-2"

Apply function on each element of a list of matrices

I have a list of matrices.
(below is a simplified example, I actually have a list of 3 matrices, the first one being in 2D, while the second and third ones are in 3D)
> a <- matrix(-1:2, ncol = 2)
> b <- array(c(-2:5), dim=c(2, 2, 2))
> c_list <- list(a,b)
> c_list
[[1]]
[,1] [,2]
[1,] -1 1
[2,] 0 2
[[2]]
, , 1
[,1] [,2]
[1,] -2 0
[2,] -1 1
, , 2
[,1] [,2]
[1,] 2 4
[2,] 3 5
I'd like to apply the function max(0,c_list) to each and every element (without a loop), in order to have the same type of object as "c_list" but with the negative values replaced by zeros.
> output
[[1]]
[,1] [,2]
[1,] 0 1
[2,] 0 2
[[2]]
, , 1
[,1] [,2]
[1,] 0 0
[2,] 0 1
, , 2
[,1] [,2]
[1,] 2 4
[2,] 3 5
I've managed to do it for a matrice or for a list with mapply or lapply, but not for a list of matrices.
Answer : either Sotos' answer
output <- lapply(c_list, function(i)replace(i, i < 0, 0))
or Moody_Mudskipper's answer
output <- lapply(c_list,pmax,0)
You can use pmax, it will preserve the format of the source matrix and vectorized so faster than looping with max.
lapply(c_list,pmax,0)
Using apply and lapply:
a <- matrix(-1:2, ncol = 2)
b <- matrix(-3:0, ncol = 2)
c <- list(a,b)
d <- lapply(c, function(m) {
apply(m, c(1, 2), function(x) max(0, x))
})
Output:
> d
[[1]]
[,1] [,2]
[1,] 0 1
[2,] 0 2
[[2]]
[,1] [,2]
[1,] 0 0
[2,] 0 0

Output converted from matrix to vector in apply

I want to apply a function over one margin (column in my example) of a matrix. The problem is that the function returns matrix and apply converts it to vector so that it returns a matrix. My goal is to get three-dimensional array. Here is the example (note that matrix() is not the function of interest, just an example):
x <- matrix(1:12, 4, 3)
apply(x, 2, matrix, nrow = 2, ncol = 2)
The output is exactly the same as the input. I have pretty dull solution to this:
library(abind)
abind2 <- function (x, ...)
abind(x, ..., along = dim(x) + 1)
apply(x, 2, list) %>%
lapply(unlist) %>%
lapply(matrix, nrow = 2, ncol = 2) %>%
do.call(what = 'abind2')
I believe there must exist something better than this. Something that does not include list()ing and unlist()ing columns.
Edit:
Also, the solution should be ready to be easily applicable to any-dimensional array with any choice of MARGIN which my solution is not.
This, for example, I want to return 4-dimensional array.
x <- array(1:24, c(4,3,2))
apply(x, 2:3, list) %>%
lapply(unlist) %>%
lapply(matrix, nrow = 2, ncol = 2) %>%
do.call(what = 'abind2')
Not that complicated at all. Simply use
array(x, dim = c(2, 2, ncol(x)))
Matrix and general arrays are stored by column into a 1D long array in physical address. You can just reallocate dimension.
OK, here is possibly what you want to do in general:
tapply(x, col(x), FUN = matrix, nrow = 2, ncol = 2)
#$`1`
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
#
#$`2`
# [,1] [,2]
#[1,] 5 7
#[2,] 6 8
#
#$`3`
# [,1] [,2]
#[1,] 9 11
#[2,] 10 12
You can try to convert your matrix into a data.frame and use lapply to apply your function on the columns (as a data.frame is a list), it will return a list, where each element represents the function result for a column:
lapply(as.data.frame(x), matrix, nrow = 2, ncol = 2)
# $V1
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
# $V2
# [,1] [,2]
# [1,] 5 7
# [2,] 6 8
# $V3
# [,1] [,2]
# [1,] 9 11
# [2,] 10 12
EDIT with the second definition of x:
x <- array(1:24, c(4,3,2))
lapply(as.data.frame(x), matrix, nrow = 2, ncol = 2)
# $V1
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
# $V2
# [,1] [,2]
# [1,] 5 7
# [2,] 6 8
# $V3
# [,1] [,2]
# [1,] 9 11
# [2,] 10 12
# $V4
# [,1] [,2]
# [1,] 13 15
# [2,] 14 16
# $V5
# [,1] [,2]
# [1,] 17 19
# [2,] 18 20
# $V6
# [,1] [,2]
# [1,] 21 23
# [2,] 22 24
EDIT2: a try to get an arry as result
Based on this similar question, you may try this code:
x <- array(1:24, c(4,3,2))
sapply(1:3,
function(y) sapply(1:ncol(x[, y, ]),
function(z) matrix(x[,y,z], ncol=2, nrow=2),
simplify="array"),
simplify="array")
Dimension of the result is 2 2 2 3.
Actually, the problem here is that it needs two different calls to apply when x is an array of more than 2 dimension. In the last example of the quesion (with x <- array(1:24, c(4,3,2))), we want to apply to each element of third dimension a function that apply to each element of second dimension the matrix function.

How to keep certain values in an array in R?

Suppose I have a data array,
dat <- array(NA, c(115,45,248))
Q1: What I do if I want to get a new data array,
datnew <- array(NA, c(115,45,248))
in which, all the positive value remain and the negative value changed to NA?
Q2: What I do if I want to get a new data array,
datnew <- array(NA,c(115,45,31))
by averaging with the third dimension, but only averaging every 8 values?
Thanks a lot.
For question 2,
you can reverse the order of the dimensions, then add a dimension representing the groups to average over, then use apply:
tmp <- array( 1:32, c(2,2,8) )
tmp2 <- array( aperm(tmp), c(4,2,2,2) )
apply( tmp2, 2:4, mean )
Answer to Q1:
dat[dat < 0] <- NA
We treat dat as if it were a vector (it is but just with dims).
Answer to Q2:
Following Greg's nice, succinct solution, the solution I had in mind when posting my comment earlier was this (using Greg's tmp)
foo <- function(x, grp) aggregate(x, by = list(grp = grp), mean)$x
apply(tmp, 2:1, foo, grp = gl(2,4))
Examples:
Q1
> dat <- array(rnorm(3*3*3), c(3,3,3))
> dat
, , 1
[,1] [,2] [,3]
[1,] 0.1427815 0.1642626 -0.6876034
[2,] 0.6791252 2.1420478 -0.7073936
[3,] -0.9695173 -1.1050933 -0.3068230
, , 2
[,1] [,2] [,3]
[1,] 0.8246182 0.5132398 2.5428203
[2,] -0.4328711 0.9080648 -0.1231653
[3,] -0.7798170 -1.1160706 -0.9237559
, , 3
[,1] [,2] [,3]
[1,] -0.79505298 0.8795420 0.4520150
[2,] 0.04154077 -1.0422061 0.4657002
[3,] -0.67168971 0.7925304 -0.5461143
> dat[dat < 0] <- NA
> dat
, , 1
[,1] [,2] [,3]
[1,] 0.1427815 0.1642626 NA
[2,] 0.6791252 2.1420478 NA
[3,] NA NA NA
, , 2
[,1] [,2] [,3]
[1,] 0.8246182 0.5132398 2.542820
[2,] NA 0.9080648 NA
[3,] NA NA NA
, , 3
[,1] [,2] [,3]
[1,] NA 0.8795420 0.4520150
[2,] 0.04154077 NA 0.4657002
[3,] NA 0.7925304 NA
Q2
> foo <- function(x, grp) aggregate(x, by = list(grp = grp), mean)$x
> apply(tmp, 2:1, foo, grp = gl(2,4))
, , 1
[,1] [,2]
[1,] 7 9
[2,] 23 25
, , 2
[,1] [,2]
[1,] 8 10
[2,] 24 26
> all.equal(apply(tmp, 2:1, foo, grp = gl(2,4)), apply( tmp2, 2:4, mean ))
[1] TRUE
For question 1:
tmp2 <- ifelse(tmp1<0,tmp1,NA)
For question 2 see Greg's solution.

Resources