R: fast determine top k maximum value in a matrix

R: fast determine top k maximum value in a matrix - r

I would like to fast determine top k maximum values in a matrix, and then put those not the top k maximum value as zero, currently I work out the following solution. Can somebody improve these one, since when the matrix have many many rows, this one is not so fast?
thanks.
mat <- matrix(c(5, 1, 6, 4, 9, 1, 8, 9, 10), nrow = 3, byrow = TRUE)
sortedMat <- t(apply(mat, 1, function(x) sort(x, decreasing = TRUE, method = "quick")))
topK <- 2
sortedMat <- sortedMat[, 1:topK, drop = FALSE]
lmat <- mat
for (i in 1:nrow(mat)) {
lmat[i, ] <- mat[i, ] %in% sortedMat[i, ]
}
kMat <- mat * lmat
> mat
[,1] [,2] [,3]
[1,] 5 1 6
[2,] 4 9 1
[3,] 8 9 10
> kMat
[,1] [,2] [,3]
[1,] 5 0 6
[2,] 4 9 0
[3,] 0 9 10

In Rfast the command sort_mat sorts the columns of a matrix, colOrder does order for each column, colRanks gives ranks for each column and the colnth gives the nth value for each column. I believe at least one of them suit you.

You could use rank to speed this up. In case there are ties, you would have to decide on a method to break these (e.g. ties.method = "random").
kmat <- function(mat, k){
mat[t(apply(mat, 1, rank)) <= (ncol(mat)-k)] <- 0
mat
}
kmat(mat, 2)
## [,1] [,2] [,3]
## [1,] 5 0 6
## [2,] 4 9 0
## [3,] 0 9 10

Related

row sum based on matrix with logicals

I have two dataframes that look similar to this example:
> matrix(1:9, nrow = 3, ncol = 3)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> matrix(rexp(9), 3) < 1
[,1] [,2] [,3]
[1,] TRUE TRUE FALSE
[2,] FALSE TRUE FALSE
[3,] FALSE FALSE TRUE
I want to sum individual entries of a row, but only when the logical matrix of the same size is TRUE, else this row element should not be in the sum. All rows have at least one case where one element of matrix 2 is TRUE.
The result should look like this
[,1]
[1,] 12
[2,] 5
[3,] 9
Thanks for the help.

Multiplying your T/F matrix by your other one will zero out all the elements where FALSE. You can then sum by row.
m1 <- matrix(1:9, nrow = 3, ncol = 3)
m2 <- matrix(rexp(9), 3) < 1
as.matrix(rowSums(m1 * m2), ncol = 1)

We replace the elements to NA and use rowSums with na.rm
matrix(rowSums(replace(m1, m2, NA), na.rm = TRUE))
# [,1]
#[1,] 12
#[2,] 5
#[3,] 9
Or use ifelse
matrix(rowSums(ifelse(m2, 0, m1)))
data
m1 <- matrix(1:9, nrow = 3, ncol = 3)
m2 <- matrix(rexp(9), 3) >= 1

Compare columns of 2 matrices to determine the column values in one matrix that should be paired with the other matrix

I have matrix M and N given by
> M
[,1] [,2] [,3] [,4] [,5]
[1,] 5 1 1 7 7
[2,] 4 7 4 2 7
[3,] 11 19 20 50 30
> N
[,1] [,2]
[1,] 7 1
[2,] 7 7
I want to find the column values in M that should be paired with N to get
[,1] [,2]
7 1
7 7
30 19
I tried the code below. Can i get an efficient way of doing it or especially doing it without using the for commands?
E=numeric()
for (i in 1:2){
for (j in 1:5) {
if (N[1,i]==M[1,j] & N[2,i]==M[2,j]){
E[i]= M[3,j]
}
}
}
E
rbind(N,E)

Well here is your loop re-written
E <- vapply(seq(nrow(N)), function(i) M[3,M[1,] == N[1,i] & M[2,] == N[2,i]], numeric(1))
# with
> rbind(N,E)
[,1] [,2]
7 1
7 7
E 30 19
there is only one loop (vapply - a wrapper for a loop) which runs through the rows of N.

Here's a way using multiple calls to apply. We iterate over the columns of M and N to find which column in M matches the first column in N and then which matches the second column in N.
logicals <- apply(M[-3,], # exclude third row
2, # iterate over columns
FUN = function(x)
apply(N, 2, #then iterate over columns of N
FUN = function(y) all(x == y)))
# [,1] [,2] [,3] [,4] [,5]
# [1,] FALSE FALSE FALSE FALSE TRUE
# [2,] FALSE TRUE FALSE FALSE FALSE
M[,apply(logicals, 1, which)]
[,1] [,2]
[1,] 7 1
[2,] 7 7
[3,] 30 19
data
M <- structure(c(5, 4, 11, 1, 7, 19,
1, 4, 20, 7, 2, 50,
7, 7, 30),
.Dim = c(3L, 5L))
N <- structure(c(7, 7, 1, 7), .Dim = c(2L, 2L))

Selecting rows conditional on column value from multiple matrices

I have
mat1 = matrix(c(2, 4, 3, 6, 7, 8), nrow=2, ncol=3)
mat2 = matrix(c(5, 6, 7, 1, 2, 3), nrow=2, ncol=3)
mat3 = matrix(c(8, 5, 8, 6, 7, 9), nrow=2, ncol=3)
which gives me 3 matrices:
[,1] [,2] [,3]
[1,] 2 3 7
[2,] 4 6 8
[,1] [,2] [,3]
[1,] 5 7 2
[2,] 6 1 3
[,1] [,2] [,3]
[1,] 8 8 7
[2,] 5 6 9
What I would like to do is compare the three matrices per row per first column, and select the row of the matrix that has the highest value on the first column.
For example: in row 1 column 1, matrix3 has the highest value (8) compared to matrix1 (2) and matrix2 (5). In row 2 column 1, matrix2 has the highest value (6). I would like to create a new matrix that copies the row of the matrix that has that highest value, resulting in:
[,1] [,2] [,3]
[1,] 8 8 7 <- From mat3
[2,] 6 1 3 <- From mat2
I know how to get a vector with the highest values from column 1, but I cannot get the whole row of the matrix copied into a new matrix. I have:
mat <- (mat1[1,])
which just copies the first row of the first matrix
[1] 2 3 7
I can select which number is the maximum number:
max(mat1[,1],mat2[,1],mat3[,1])
[1] 8
But I cannot seem to combine the two to return a matrix with the whole row.
Getting the code to loop for each row will be no problem, but I cannot seem to get it to work for the first row and as such, I am missing the essential code. Any help would be greatly appreciated. Thank you.

Are you working interactively? Do you manipulate multiple matrices spread in your workspace? A straightforward answer to your problem could be:
#which matrices have the largest element of column 1 in each row?
max.col(cbind(mat1[, 1], mat2[, 1], mat3[, 1]))
#[1] 3 2
rbind(mat3[1, ], mat2[2, ]) #use the above information to get your matrix
# [,1] [,2] [,3]
#[1,] 8 8 7
#[2,] 6 1 3
On a more ganeral use-case, a way could be:
mat_ls = list(mat1, mat2, mat3) #put your matrices in a "list"
which_col = 1 #compare column 1
which_mats = max.col(do.call(cbind, lapply(mat_ls, function(x) x[, which_col])))
do.call(rbind, lapply(seq_along(which_mats),
function(i) mat_ls[[which_mats[i]]][i, ]))
# [,1] [,2] [,3]
#[1,] 8 8 7
#[2,] 6 1 3

Probably not the prettiest solution
temp <- rbind(mat1, mat2, mat3)
rbind(temp[c(T,F),][which.max(temp[c(T,F),][, 1]),],
temp[c(F,T),][which.max(temp[c(F,T),][, 1]),])
## [,1] [,2] [,3]
## [1,] 8 8 7
## [2,] 6 1 3

You may also try:
a2 <- aperm(simplify2array( mget(ls(pattern="mat"))),c(3,2,1)) #gets all matrices with name `mat`
t(sapply(1:(dim(a2)[3]), function(i) {x1 <- a2[,,i]; x1[which.max(x1[,1]),]}))
# [,1] [,2] [,3]
#[1,] 8 8 7
#[2,] 6 1 3

Can I vectorise/vectorize this simple cohort retention model in R?

I am creating a simple cohort-based user retention model, based on the number of new users that appear each day, and the likelihood of a user reappearing on day 0 (100%), day 1, day 2, etc. I want to know the number of users active on each day. I am trying to vectorise this and getting in a right muddle. Here is a toy mockup.
rvec <- c(1, .8, .4); #retention for day 0, 1,2 (day 0 = 100%, and so forth)
newvec <- c(10, 10, 10); #new joiners for day 0, 1, 2 (might be different)
playernumbers <- matrix(0, nrow = 3, ncol = 3);
# I want to fill matrix playernumbers such that sum of each row gives
# the total playernumbers on day rownumber-1
# here is a brute force method (could be simplified via a loop or two)
# but what I am puzzled about is whether there is a way to fully vectorise it
playernumbers[1,1] <- rvec[1] * newvec[1];
playernumbers[2,1] <- rvec[2] * newvec[1];
playernumbers[3,1] <- rvec[3] * newvec[1];
playernumbers[2,2] <- rvec[1] * newvec[2];
playernumbers[3,2] <- rvec[2] * newvec[2];
playernumbers[3,3] <- rvec[1] * newvec[3];
playernumbers
I can't figure out how to vectorise this fully. I can see how I might do it columnwise, successsively using each column number to indicate (a) which rows to update (column number: nrows), and (b) which newvec index value to multiply by. But I'm not sure this is worth doing, as to me the loop is clearer. But is there a fully vectorised form am I missing? thanks!

If you don't insist on your weird indexing logic, you could simply calculate the outer product:
outer(rvec, newvec)
# [,1] [,2] [,3]
#[1,] 10 10 10
#[2,] 8 8 8
#[3,] 4 4 4
In the outer product the product of the second element of vector 1 and the second element of vector 2 is placed at [2,2]. You place it at [3,2]. Why?
Your result:
playernumbers
# [,1] [,2] [,3]
#[1,] 10 0 0
#[2,] 8 10 0
#[3,] 4 8 10
Edit:
This should do the same as your loop:
rvec <- c(1, .8, .4)
newvec <- c(10, 20, 30)
tmp <- outer(rvec, newvec)
tmp <- tmp[, ncol(tmp):1]
tmp[lower.tri(tmp)] <- 0
tmp <- tmp[, ncol(tmp):1]
res <- tmp*0
res[lower.tri(res, diag=TRUE)] <- tmp[tmp!=0]
# [,1] [,2] [,3]
#[1,] 10 0 0
#[2,] 8 20 0
#[3,] 4 16 30
rowSums(res)
#[1] 10 28 50

R apply on a matrix a function of columns and row index

I would like to apply on a matrix a function of both the value, the row index and the column index for every value in the matrix and get the transformed matrix.
For example
mat<-matrix(c(1,2,3,4),2,2)
mat
[,1] [,2]
[1,] 1 3
[2,] 2 4
f<-function(x,i,j){x+i+j}
mat2 <- my.apply(f,mat)
mat2
[,1] [,2]
[1,] 3 6
[2,] 5 8
The example above is for illustration purposes, f can be much more complex.
apply does not do the job, because of the way the extra arguments are handled.
apply(mat,1:2,f,seq_along(mat[,1]),seq_along(mat[1,]))
, , 1
[,1] [,2]
[1,] 3 4
[2,] 5 6
, , 2
[,1] [,2]
[1,] 5 6
[2,] 7 8
I can not find either a way with the lapply family. A for loop can do the job but it won't be efficient nor elegant.
Any suggestions?
Thanks

Try mapply
mat <- matrix(c(1, 2, 3, 4), 2, 2)
mat
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
matrix(mapply(function(x, i, j) x + i + j, mat, row(mat), col(mat)), nrow = nrow(mat))
## [,1] [,2]
## [1,] 3 6
## [2,] 5 8

Here is an ugly use of apply, just for some quick and dirty job. The trick is adding an additional column (or row) for row (or column) indices.
mat <- matrix(c(1, 2, 3, 4), 2, 2)
t(apply(cbind(mat, 1:nrow(mat)), 1, function(x){x[1:ncol(mat)] + 1:ncol(mat) + x[ncol(mat)+1]}))
## [,1] [,2]
##[1,] 3 5
##[2,] 6 8
If you have a function f(x, i, j) already, you can also try:
apply(cbind(mat, 1:nrow(mat)), 1, function(x){a = numeric(); for(j in 1:ncol(mat)){a[j] = f(x[j], x[ncol(mat)+1], j)}; a})

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: fast determine top k maximum value in a matrix - r

In Rfast the command sort_mat sorts the columns of a matrix, colOrder does order for each column, colRanks gives ranks for each column and the colnth gives the nth value for each column. I believe at least one of them suit you.

You could use rank to speed this up. In case there are ties, you would have to decide on a method to break these (e.g. ties.method = "random"). kmat <- function(mat, k){ mat[t(apply(mat, 1, rank)) <= (ncol(mat)-k)] <- 0 mat } kmat(mat, 2) ## [,1] [,2] [,3] ## [1,] 5 0 6 ## [2,] 4 9 0 ## [3,] 0 9 10

Related

row sum based on matrix with logicals

Compare columns of 2 matrices to determine the column values in one matrix that should be paired with the other matrix

Selecting rows conditional on column value from multiple matrices

Can I vectorise/vectorize this simple cohort retention model in R?

R apply on a matrix a function of columns and row index

Categories

Resources