Finding which matrix row has the highest column sum - r

If I have multiple matrices like
mymat <- matrix(c(1,2,3,8,6,1,1,3,1) , nrow=3, ncol=3)
matrix_list <- replicate(2, mymat)
column_sums <- colSums(matrix_list)
This shows me that row two has the highest sum, but how would I create a code that tells me that row two has the highest sum in both matrices? So it will just show me a vector of 2,2. I tried this
for(i in 1:2) {
S <- max(matrix_list[i, ])
}
However this is not doing the job. Any help would be appreciated.

apply(column_sums, 2, which.max)
#[1] 2 2

We can use max.col
max.col(t(column_sums), "first")
#[1] 2 2

Related

Values of matrix based on their column and row number in R

I have a 30 x 30 matrix in R and I want the values to be multiplied of their column and row number. For example the first value is [1] * [1] = 1
mat2 <- matrix(nrow= 30, ncol = 30)
We can use row and col to get the index of row/columns and use that to multiply
mat2 <- row(mat2) *col(mat2)
If you know the size of the matrix, you can also use outer() and construct the matrix directly in one step.
mat2 <- outer(seq(30), seq(30))
# other simple variations:
outer(1:30, 1:30)
seq(30) %o% seq(30)
1:30 %o% 1:30

Create a vector with the sum of the positive elements of each column of a m*n numeric matrix in R

I need to create an R function which takes a numeric matrix A of arbitrary format n*m as input and returns a vector that is as long as A's number of columns that contains the sum of the positive elements of each column.
I must do this in 2 ways - the first in a nested loop and the second as a one liner using vector/matrix operations.
So far I have come up with the following code which creates the vector the size of the amounts of columns of matrix A but I can only seem to get it to give me the sum of all positive elements of the matrix instead of each column:
colSumPos(A){
columns <- ncol(A)
v1 <- vector("numeric", columns)
for(i in 1:columns)
{
v1[i] <- sum(A[which(A>0)])
}
}
Could someone please explain how I get the sum of each column separately and then how I can simplify the code to dispose of the nested loop?
Thanks in advance for the help!
We can use apply with MARGIN=2 to loop through the columns and get the sum of elements that are greater than 0
apply(A, 2, function(x) sum(x[x >0], na.rm = TRUE))
#[1] 1.8036685 0.7129192 0.9305136 2.6625824 0.0000000
Or another option is colSums after replacing the values less than or equal to 0 with NA
colSums(A*NA^(A<=0), na.rm = TRUE)
#[1] 1.8036685 0.7129192 0.9305136 2.6625824 0.0000000
Or by more direct approach
colSums(replace(A, A<=0, NA), na.rm = TRUE)
#[1] 1.8036685 0.7129192 0.9305136 2.6625824 0.0000000
Or if there are no NA elements (no need for na.rm=TRUE), we can replace the values that are less than or equal to 0 with 0 and make it compact (as #ikop commented)
colSums(A*(A>0))
#[1] 1.8036685 0.7129192 0.9305136 2.6625824 0.0000000
data
set.seed(24)
A <- matrix(rnorm(25), 5, 5)
You try code folow if you using for loop
sumColum <- function(A){
for(i in 1:nrow(A)){
for(j in 1:ncol(A)){
colSums(replace(A, A<=0, NA), na.rm = TRUE)
}
}
colSums(A)
}

Generate a random matrix in R with m columns and n rows where rows sum to 1

I want to generate an nxm matrix. Suppose its 100x3. I want each row to sum to 1 (so two "0"'s and one "1").
sample(c(0,0,1),3)
will give me 1 row but is there a very fast way to generate the whole matrix without an rbind?
Thank you!
No loops, no transposition. Just create a matrix of zeros and replace one entry per row with 1 by sampling the rows.
m <- matrix(0, 100, 3)
nr <- nrow(m)
m[cbind(1:nr, sample(ncol(m), nr, TRUE))] <- 1
all(rowSums(m) == 1)
# [1] TRUE
mat <- matrix(runif(300),ncol=3)
mat[] <- as.numeric(t(apply(mat, 1, function(r) r == max(r))))
t(apply(t(matrix(rep(c(0,0,1),300),nrow = 3)), 1, function(x) sample(x)))
Since you want single 1 for a row, the problem can be restated to select a column entry randomly that has 1 for each row.
So you can do like,
m <- 3; n<-100
rand_v <- floor(runif(n)*3)+1
mat <- matrix(0,n,m)
idx <- cbind(1:n,rand_v)
mat[idx] <- 1
Hope this helps.

Divide each each cell of large matrix by sum of its row

I have a site by species matrix. The dimensions are 375 x 360. Each value represents the frequency of a species in samples of that site.
I am trying to convert this matrix from frequencies to relative abundances at each site.
I've tried a few ways to achieve this and the only one that has worked is using a for loop. However, this takes an incredibly long time or simply never finishes.
Is there a function or a vectorised method of achieving this? I've included my for-loop as an example of what I am trying to do.
relative_abundance <- matrix(0, nrow= nrow(data_wide),
ncol=ncol(data), dimnames = dimnames(data))
i=0
j=0
for(i in 1:nrow(relative_abundance)){
for(j in 1:ncol(relative_abundance)){
species_freq <- data[i,j]
row_sum <- sum(data[i,])
relative_abundance[i,j] <- species_freq/row_sum
}
}
You could do this using apply, but scale in this case makes things even simplier. Assuming you want to divide columns by their sums:
set.seed(0)
relative_abundance <- matrix(sample(1:10, 360*375, TRUE), nrow= 375)
freqs <- scale(relative_abundance, center = FALSE,
scale = colSums(relative_abundance))
The matrix is too big to output here, but here's how it shoud look like:
> head(freqs[, 1:5])
[,1] [,2] [,3] [,4] [,5]
[1,] 0.004409603 0.0014231499 0.003439803 0.004052685 0.0024026910
[2,] 0.001469868 0.0023719165 0.002457002 0.005065856 0.0004805382
[3,] 0.001959824 0.0018975332 0.004914005 0.001519757 0.0043248438
[4,] 0.002939735 0.0042694497 0.002948403 0.002532928 0.0009610764
[5,] 0.004899559 0.0009487666 0.000982801 0.001519757 0.0028832292
[6,] 0.001469868 0.0023719165 0.002457002 0.002026342 0.0009610764
And a sanity check:
> head(colSums(freqs))
[1] 1 1 1 1 1 1
Using apply:
freqs2 <- apply(relative_abundance, 2, function(i) i/sum(i))
This has the advatange of being easly changed to run by rows, but the results will be joined as columns anyway, so you'd have to transpose it.
Firstly, you could just do
relative_abundance[i,j] <- data[i,j]/sum(data[i,])
so you dont create the variables...
But to vectorise it, I suggest: compute the row sums with rowsum function(fast) and then you can just use apply by columns and each of that divide by the rowsums:
relative_freq<-apply(data,2,function(x) data[,x]/rowsum(data))
Using some simple linear algebra we can produce faster results. Simply multiply on the left by a diagonal matrix with the scaling factors you need, like this:
library(Matrix)
set.seed(0)
relative_abundance <- matrix(sample(1:10, 360*375, TRUE), nrow= 375)
Diagonal_Matrix <- diag(1/rowSums(relative_abundance))
And then we multiply from the left:
row_normalized_matrix <- Diagonal_Matrix %*% relative_abundance
If you want to normalize columnwise simply make:
Diagonal_Matrix <- diag(1/colSums(relative_abundance))
and multiply from the right.
You can do something like this
relative_abundance <- matrix(sample(1:10, 360*375, TRUE), nrow= 375)
datnorm <- relative_abundance/rowSums(relative_abundance)
this will be faster if relative_abundance is a matrix rather than a data.frame

Efficiently Extract Minimum Value AND Index for Each Column and Row in a Dataframe, then Rank by Value

I have a JxK dataframe M and I want to calculate the following.
For each row j, the value k that minimizes M[j,k]
For each column k, the value j that minimizes M[j,k]
Then, let the values satisfying the first be vector A_j and the second be vector A_k. Then, I need two vectors. Let vector C be the vector sort(c(A_j, A_k)).
A vector of length equal to A_j where element i is the index of element A_j[i] in the combined and sorted vector C.
A vector of length equal to A_k where element i is the index of element A_k[i] in the combined and sorted vector C.
For both of the two sorted vectors mentioned above, all ties should be given the first index at which that value appeared in vector C. That is, if A_j[i] and A_j[i+1] are equal, then element i and element i + 1 in the vector that satisfies condition #3 should both equal A_j[i]'s position in the sorted vector C.
As always, this is not hard to do inefficiently. However, in practice, the dataframe is very big, so inefficient solutions fail.
As a proof of concept, one solution would be as follows.
# Create the dataframe
set.seed(1)
df <- data.frame(matrix(rnorm(50, 8, 2), 10)) # A 10x5 matrix
# Calculate 1 and 2
A.j <- apply(df, 1, min)
A.k <- apply(df, 2, min)
# Calculate 3 and 4
C <- sort(unname(c(A.j, A.k)))
A.j.indices <- apply(df, 1, function(x) which(x == min(x)))
A.k.indices <- apply(df, 2, function(x) which(x == min(x)))
vec3out <- c()
vec4out <- c()
for(j in 1:nrow(df)){
rank <- which(C == A.j[j])[1]
vec3out <- c(vec3out, rank)
}
for(k in 1:ncol(df)){
rank <- which(C == A.k[k])[1]
vec4out <- c(vec4out, rank)
}
For starters, you should use a matrix. Data.frames are less efficient (Should I use a data.frame or a matrix?). Then, we should use apply functions.
Let M be your data.frame coerced to a matrix.
M <- as.matrix(M)
minByRow <- apply(M, MARGIN=1, FUN=which.min)
minByCol <- apply(M, MARGIN=2, FUN=which.min)
combinedSorted <- sort(c(minByRow, minByCol))
byRowOutput <- match(minByRow, combinedSorted)
byColOutput <- match(minByCol, combinedSorted)
Here are the results for 1 million observations of 100 variables:
M <- matrix(data=rnorm(100000000), nrow=1000000, ncol=100)
system.time({
minByRow <- apply(M, MARGIN=1, FUN=which.min)
minByCol <- apply(M, MARGIN=2, FUN=which.min)
combinedSorted <- sort(c(minByRow, minByCol))
byRowOutput <- match(minByRow, combinedSorted)
byColOutput <- match(minByCol, combinedSorted)
})
user system elapsed
7.37 0.46 7.93

Resources