Suppose I have an m x n matrix M1 and an k x l matrix M2 with l <= n. I want to find all those rows in M1 that contain some row of M2 in it.
For example consider the following situation:
> M1 <- matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, ncol = 3, byrow = TRUE)
> M2 <- matrix(c(1,3,8,9), nrow = 2, ncol = 2, byrow = TRUE)
> M1
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
> M2
[,1] [,2]
[1,] 1 3
[2,] 8 9
Then rows one and three of M1 fulfill the condition, as row one contains 1 and 3 and the last row 8 and 9.
So how to achieve this in an efficient way? I have written code using loops, but as I am working with very large matrices this solution takes to much time.
This method will check each row from M2 and will return the row index from M1 if it is contained or NA in case it is not
M1 <- matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, ncol = 3, byrow = TRUE)
> M1
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
M2 <- matrix(c(1,3,8,5,4,5,1,2), nrow = 4, ncol = 2, byrow = TRUE)
> M2
[,1] [,2]
[1,] 1 3
[2,] 8 5
[3,] 4 5
[4,] 1 2
y = apply(M2,1,function(x){
z = unique(which(M1 %in% x)%%nrow(M1))
ifelse(length(z)==1,ifelse(z==0,nrow(M1),z),NA)
})
> y
[1] 1 NA 2 1
This means that row 2 from M2 is not in M1, and that rows 1 and 4 from M2 are in row 1 in M1. Also row 3 in M2 is in row 2 in M1.
More general example:
M1 <- matrix(c(1,2,3,1,2,3,4,5,6,7,8,9,10,11,12,13,1,2), nrow = 6, ncol = 3, byrow = TRUE)
M2 <- matrix(c(1,2,6,9, 10,11,16,17, 19, 2), nrow = 5, ncol =2, byrow = TRUE)
First, use match to find indexes of the matching values in M1.
ind <- match(M1, M2)
Now, using the mod operator %% with the indexes and the number of rows, you'll find the rows. This works because the indexes will always be the M2 row plus the total number of rows, so numbers in the same row will return the same result.
rows <- ind %% nrow(M2)
Then, m is a matrix containing row number of matching values between M1 and M2. Lines will only be selected if the same index appear in the same row 2 times (or, more generally, the number of times equal to the number of columns in M2). This assures that a row of M1 is only be considered if it contains all elements of a row in M2.
m <- matrix(rows, nrow = nrow(M1))
matchRows <- apply(m, 1, duplicated, incomparables = NA)
M1rows <- which(colSums(matchRows)==ncol(M2)-1)
Related
I have two dataframes that look similar to this example:
> matrix(1:9, nrow = 3, ncol = 3)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> matrix(rexp(9), 3) < 1
[,1] [,2] [,3]
[1,] TRUE TRUE FALSE
[2,] FALSE TRUE FALSE
[3,] FALSE FALSE TRUE
I want to sum individual entries of a row, but only when the logical matrix of the same size is TRUE, else this row element should not be in the sum. All rows have at least one case where one element of matrix 2 is TRUE.
The result should look like this
[,1]
[1,] 12
[2,] 5
[3,] 9
Thanks for the help.
Multiplying your T/F matrix by your other one will zero out all the elements where FALSE. You can then sum by row.
m1 <- matrix(1:9, nrow = 3, ncol = 3)
m2 <- matrix(rexp(9), 3) < 1
as.matrix(rowSums(m1 * m2), ncol = 1)
We replace the elements to NA and use rowSums with na.rm
matrix(rowSums(replace(m1, m2, NA), na.rm = TRUE))
# [,1]
#[1,] 12
#[2,] 5
#[3,] 9
Or use ifelse
matrix(rowSums(ifelse(m2, 0, m1)))
data
m1 <- matrix(1:9, nrow = 3, ncol = 3)
m2 <- matrix(rexp(9), 3) >= 1
Let say we have two matrices, i.e. M1 and M2, of dimensions n1 x m and n2 x m, respectively.
How we can find which rows of M1 are identity to those of M2 (and exceptional vice versa) ?
The preferable output is a matrix, whose the number of rows is equal to the identity rows between the matrices M1 and M2, and two columns, that is, the first column will contain the number of row of matrix M1 and the second one the number of row of matrix M2.
There might be a slicker way, but this seems to work...
#dummy data
M1 <- matrix(1:8,ncol=2)
M2 <- matrix(c(1,3,4,5,6,8),ncol=2)
M1
[,1] [,2]
[1,] 1 5
[2,] 2 6
[3,] 3 7
[4,] 4 8
M2
[,1] [,2]
[1,] 1 5
[2,] 3 6
[3,] 4 8
which(apply(M2, 1, function(v)
apply(M1, 1, function(w) sum(abs(w-v))))==0,
arr.ind = TRUE)
row col
[1,] 1 1
[2,] 4 3
The row column is the row index of M1, the col column is the index of matching rows in M2.
Create example matrices with 4 matching rows
set.seed(0)
M1 <- matrix(runif(100), 10)
M2 <- rbind(M1[sample(10, 4),], matrix(runif(60), 6))
Create output
splits <- lapply(list(M1, M2), function(x) split(x, row(x)))
out <- cbind(M1 = seq(nrow(M1)), M2 = do.call(match, splits))
out[!is.na(out[,2]),]
# M1 M2
# [1,] 2 4
# [2,] 3 3
# [3,] 6 2
# [4,] 7 1
I'm a programming beginner and I'm not able to solve this problem:
I have a vector length 132 and two matrices A and B with the size of 132x24. I would like to take every single value of the vector and compare it rowwise with matrix A. If the value occurs in A I want to have the index of the column to go to matrix B and pick the value from the column with the same position (row and column indices) as in matrix A. The results should be given back as a vector with the same length of 132.
How to do this? Do I need a for loop or are there some smart ways to work with packages?
Unfortunately I can not give example data.
Thank you for your help!
# vector v contains values that I want to compare with matrix A
> v
[1] 5 1 10 1 7
# every single value of v occurs in every row of A only once
# I want to have the position of this value in matrix A
> A
[,1] [,2] [,3] [,4]
[1,] 5 7 4 1
[2,] 14 1 3 3
[3,] 13 3 1 10
[4,] 2 1 5 8
[5,] 13 2 5 7
# the position in matrix A equals the position in matrix B
# now the values of B have to be returned as a vector
> B
[,1] [,2] [,3] [,4]
[1,] 6 3 4 3
[2,] 5 2 5 5
[3,] 4 6 3 1
[4,] 3 6 1 5
[5,] 2 4 6 3
# vector with fitting values of B
> x
[1] 6 2 1 6 3
v <- c(5, 1, 10, 1, 7)
A <- matrix(c(
5, 7, 4, 1,
14, 1, 3, 3,
13, 3, 1, 10,
2, 1, 5, 8,
13, 2, 5, 7), 5, byrow = TRUE)
B <- matrix(c(
6, 3, 4, 3,
5, 2, 5, 5,
4, 6, 3, 1,
3, 6, 1, 5,
2, 4, 6, 3), 5, byrow = TRUE)
myfun <- function(i) which(v[i]==A[i,])
ii <- 1:length(v)
B[cbind(ii, sapply(ii, myfun))]
The function myfun() is quick'n'dirty.
To test if your data are ok you can calculate how often the value v[i] is found in the row A[i,]
countv <- function(i) sum(v[i]==A[i,])
all(sapply(ii, countv)==1) ### should be TRUE
If you get FALSE then inspect:
which(sapply(ii, countv)!=1)
Alright, I'm not sure how you pictured your output, but I've got something that comes near.
Example data:
x <- 1:132
set.seed(123)
A <- matrix(sample(1:1000, size = 132*24, replace = TRUE), nrow = 132, ncol = 24)
B <- matrix(rnorm(132*24), nrow = 132, ncol = 24)
Now we check for every value of vector x if and where it occurs in every row of matrix A:
x.vs.A <- sapply(x, function(x){
apply(A, 1, function(y) {
match(x, y)
})
})
This gives us a matrix x.vs.A with 132 rows (the rows of A) and 132 columns (the values of x). Within the cells of this matrix, we will find either NA, if the combination of one value of x and one row of A was unsuccessful, or the column position within A of the FIRST match of the value of x.
And now we extract the rowwise position and bind them together with the cell value, depiting the second (column) dimension of the matched value. Thus we create for every value of x a matrix of row/column position of matches in matrix A:
x.in.A <- apply(x.vs.A, 2, function(x) cbind(which(!is.na(x)), x[!is.na(x)]))
Example:
> x.in.A[[1]]
[,1] [,2]
[1,] 12 17
[2,] 42 17
[3,] 73 12
[4,] 123 21
This would show that the first value in vector x can be found in A[12, 17], in A[42, 17] and so on.
Now access these values in B, returning vectors for each value of x, and bind them to the matrices in the list:
x.in.B <- lapply(x.in.A, function(x){
apply(x, 1, function(y){
B[y[1], y[2]]
})
})
x.in.AB <- mapply(function(x, y) cbind(x, y),
x.in.A, x.in.B)
> x.in.AB[[1]]
y
[1,] 12 17 -0.2492526
[2,] 42 17 -0.7985330
[3,] 73 12 0.1253824
[4,] 123 21 -0.9704919
I have a very simple question. I have a matrix y( 1 2 3 ) and want to access elements which are greater than 1. I do not aim to count them but want to get outputs which are 2 and 3
Do you mean something like this:
A = matrix(
c(1, 2, 3), # the data elements
nrow=1, # number of rows
ncol=3, # number of columns
byrow = TRUE)
A
[,1] [,2] [,3]
[1,] 1 2 3
Greater than 1 :
which(A > 1)
which returns:
[1] 2 3
This will return values:
A[A>1]
I would like to fast determine top k maximum values in a matrix, and then put those not the top k maximum value as zero, currently I work out the following solution. Can somebody improve these one, since when the matrix have many many rows, this one is not so fast?
thanks.
mat <- matrix(c(5, 1, 6, 4, 9, 1, 8, 9, 10), nrow = 3, byrow = TRUE)
sortedMat <- t(apply(mat, 1, function(x) sort(x, decreasing = TRUE, method = "quick")))
topK <- 2
sortedMat <- sortedMat[, 1:topK, drop = FALSE]
lmat <- mat
for (i in 1:nrow(mat)) {
lmat[i, ] <- mat[i, ] %in% sortedMat[i, ]
}
kMat <- mat * lmat
> mat
[,1] [,2] [,3]
[1,] 5 1 6
[2,] 4 9 1
[3,] 8 9 10
> kMat
[,1] [,2] [,3]
[1,] 5 0 6
[2,] 4 9 0
[3,] 0 9 10
In Rfast the command sort_mat sorts the columns of a matrix, colOrder does order for each column, colRanks gives ranks for each column and the colnth gives the nth value for each column. I believe at least one of them suit you.
You could use rank to speed this up. In case there are ties, you would have to decide on a method to break these (e.g. ties.method = "random").
kmat <- function(mat, k){
mat[t(apply(mat, 1, rank)) <= (ncol(mat)-k)] <- 0
mat
}
kmat(mat, 2)
## [,1] [,2] [,3]
## [1,] 5 0 6
## [2,] 4 9 0
## [3,] 0 9 10