find the best combination of rows in a matrix - r

I need help to generate a function in Rstudio that helps me to select in a matrix of 0 and 1 the most number of rows with a single 1 per column and the most number of columns with 1.
in the following example
m<- matrix(c(1,1,0,0,0,0,0,1,0,1,1,0,0,1,1,0,1,0,1,0), ncol = 3 , nrow = 5)
print(m)
the result would be the rows 2,3,4

Here is a brute-force approach (assuming you always have less columns than rows)
Filter(
length,
combn(1:nrow(m),
ncol(m),
function(k) {
mat <- m[k, , drop = FALSE]
if (all(rowSums(mat) == 1) & all(colSums(mat) == 1)) {
return(k)
}
},
simplify = FALSE
)
)
which gives
[[1]]
[1] 2 3 4

Related

How to find the number of columns of an R matrix that contains values greater than or equal to a given number?

I have a matrix with 52 columns, and 5,000 rows. I want to find the number of columns that contain a value less than or equal to a value (for example, how many columns out of 52 contain a number less than or equal to 10)
I was trying rowSum but I cannot remember / find a way to make this work.
Thanks!
A possible solution:
m <- matrix(1:9, 3, 3)
sum(colSums(m <= 5) != 0)
#> [1] 2
How about writing your own function?
Here's the code.
count_rows = function(df, val)
{
checks = 0
for (i in 1:ncol(df))
{
if(any(df[,i] > 0))
checks = checks + 1
}
return (checks)
}
A = matrix(runif(100), 10, 10)
count_rows(A, 0.5)
Say the matrix mat of dimensions 5000x52
set.seed(1234)
mat <- matrix(trunc(runif(5000*52)*1e5) , 5000 , 52)
dim(mat)
#> [1] 5000 52
then we can find how many columns out of 52 contains a number less than or equal to 10 using
sum(apply(mat , 2 , \(x) any(x <= 10)))
#> 24

Randomly remove some numeric data from a matrix in R?

I have a large data matrix with many numeric values (counts) in it. I would like to remove 10% of all counts. So, for example, a matrix which looks like this:
30 10
0 20
The sum of all counts here is 60. 10% of 60 is 6. So I want to randomly remove 6. A correct output could be:
29 6
0 19
(As you can see it removed 1 from 30, 4 from 10 and 1 from 20). There cannot be negative values.
How could I program this in R?
Here is a way. It subtracts 1 to positive matrix elements until a certain total to remove is reached.
subtract_int <- function(X, n){
inx <- which(X != 0, arr.ind = TRUE)
N <- nrow(inx)
while(n > 0){
i <- sample(N, 1)
if(X[ inx[i, , drop = FALSE] ] > 0){
X[ inx[i, , drop = FALSE] ] <- X[ inx[i, , drop = FALSE] ] - 1
n <- n - 1
}
if(any(X[inx] == 0)){
inx <- which(X != 0, arr.ind = TRUE)
N <- nrow(inx)
}
}
X
}
set.seed(2021)
to_remove <- round(sum(A)*0.10)
subtract_int(A, to_remove)
# [,1] [,2]
#[1,] 30 6
#[2,] 0 18
Data
A <- structure(c(30, 0, 10, 20), .Dim = c(2L, 2L))
Maybe this helps you at least to get on the right track. It's nothing more than a draft though:
randomlyRemove <- function(matrix) {
sum_mat <- sum(matrix)
while (sum_mat > 0) {
sum_mat <- sum_mat - runif(1, min = 0, max = sum_mat)
x <- round(runif(1, 1, dim(matrix)[1]), digits = 0)
y <- round(runif(1, 1, dim(matrix)[2]), digits = 0)
matrix[x,y] <- matrix[x,y] - sum_mat
}
return(matrix)
}
You might want to play with the random number generator process to get more evenly distributed substractions.
edit: added round(digits = 0) to get only integer (dimension) values and modified the random (dimension) value generation to start from 1 (not zero).
I think we can make it work with using sample. This solution is a lot more compact.
The data
A <- structure(c(30, 0, 11, 20), .Dim = c(2L, 2L))
sum(A)
#> [1] 61
The logic
UseThese <- (1:length(A))[A > 0] # Choose indices to be modified because > 0
Sample <- sample(UseThese, sum(A)*0.1, replace = TRUE) # Draw a sample of indices
A[UseThese] <- A[UseThese] - as.vector(table(Sample)) # Subtract handling repeated duplicate indices in the sample
Check the result
A
#> [,1] [,2]
#> [1,] 28 8
#> [2,] 0 19
sum(A) # should be the value above minus 6
#> [1] 55
One disadvantage of this solution is that it could lead to negative
values. So check with:
any(A < 0)
#> [1] FALSE

For loop with condition in R

Im new at programing in R.
I have a list which contains numbers between 0 and 5. I want to count how many times 1 appears before 5, in result2 stored my list. I have done this:
counting<-function(lista,n,m){
p=2
for (p in data_list){
if(results2[p]==n && results2[p-1]==m){
length(p)
}
p<-p+1
}
}
counting(results2,5,1)
Can anyone please provide me with som helpful adivce to imporve my code since it does not work.
We loop over the list, find the index of the first 5, get the sequence (seq), use that to subset the list element and count the number of 1 by creating a logical expression with == and using sum on that
sapply(data_list, function(x) {
i1 <- which(x == 5)
i2 <- i1[i1 > 1]
if(length(i2) > 0) {
sum(x[i2-1] == 1)
} else NA_real_
})
#[1] 3 3
Or in tidyverse, we can make use of lag
library(dplyr)
library(purrr)
map_dbl(data_list, ~ sum(.x == 5 & lag(.x) == 1, na.rm = TRUE))
#[1] 3 3
data
data_list <- list(c(3,4,1,5 ,2,3,1,5,4,1,5),
c(3,4,1,5 ,2,3,1,5,4,1,5))

Generate a random matrix in R with m columns and n rows where rows sum to 1

I want to generate an nxm matrix. Suppose its 100x3. I want each row to sum to 1 (so two "0"'s and one "1").
sample(c(0,0,1),3)
will give me 1 row but is there a very fast way to generate the whole matrix without an rbind?
Thank you!
No loops, no transposition. Just create a matrix of zeros and replace one entry per row with 1 by sampling the rows.
m <- matrix(0, 100, 3)
nr <- nrow(m)
m[cbind(1:nr, sample(ncol(m), nr, TRUE))] <- 1
all(rowSums(m) == 1)
# [1] TRUE
mat <- matrix(runif(300),ncol=3)
mat[] <- as.numeric(t(apply(mat, 1, function(r) r == max(r))))
t(apply(t(matrix(rep(c(0,0,1),300),nrow = 3)), 1, function(x) sample(x)))
Since you want single 1 for a row, the problem can be restated to select a column entry randomly that has 1 for each row.
So you can do like,
m <- 3; n<-100
rand_v <- floor(runif(n)*3)+1
mat <- matrix(0,n,m)
idx <- cbind(1:n,rand_v)
mat[idx] <- 1
Hope this helps.

How to pairwise compare values referring to distinct elements in two matrices of different formats?

I've got a set of objects, let's say with the IDs 'A' to 'J'. And I've got two data frames which look the following way (as you can see, the second data frame is symmetric):
df1 <- data.frame(ID = LETTERS[1:5], Var = c(9,13,15,11,28))
df2 <- as.data.frame(matrix(data = c(NA,42,83,74,84,42,NA,26,69,9,83,26,NA,67,95,74,69,67,NA,6,84,9,95,6,NA), ncol = 5, nrow = 5, dimnames = list(df1$ID, df1$ID)))
For example, take the objects 'B' and 'E'. I want to know: Is 13+28 (from df1) less than 9 (from df2)? I'd like to know this for all pairs of objects. The output should be
(a) a logical data frame structured like df2 and
(b) the number of "TRUE" values.
Most of the time I will only need result (b), but sometimes I would also need (a). So if (b) can be calculated without (a) and if this would be significantly faster, then I'd like to have both algorithms in order to select the suitable one dependent on which output I need to answer a particular question.
I'm comparing around 2000 objects, so the algorithm should be reasonably fast. So far I've been only able to implement this with two nested for-loops which is awfully slow. I bet there is a much nicer way to do this, maybe exploiting vectorisation.
This is what it currently looks like:
df3 <- as.data.frame(matrix(data = NA, ncol = nrow(df1), nrow = nrow(df1),
dimnames = list(df1$ID, df1$ID)))
for (i in 2:nrow(df3)){
for (j in 1:(i-1)){
sum.val <- df1[df1$ID == rownames(df3)[i], "Var"] + df1[df1$ID == names(df3)[j], "Var"]
df3[i,j] <- sum.val <= df2[i,j]
}
}
#
Is this what you want?
df3 <- outer(df1$Var, df1$Var, "+")
df3
df4 <- df3 < df2
df4
sum(df4, na.rm = TRUE)
Here's one way to do it...
# Get row and column indices
ind <- t( combn( df1$ID , 2 ) )
# Get totals
tot <- with( df1 , Var[ match( ind[,1] , ID ) ] + Var[ match( ind[,2] , ID ) ] )
# Make df2 a matrix
m <- as.matrix( df2 )
# Total number of values is simply
sum( m[ ind ] > tot )
#[1] 7
# Find which values in upper triangle part of the matrix exceed those from df1 (1 = TRUE)
m[upper.tri(m)] <- m[ ind ] > tot
# A B C D E
#A NA 1 1 1 0
#B 42 NA 1 0 1
#C 83 26 NA 1 1
#D 74 69 67 NA 0
#E 84 9 95 6 NA
This will do what you want.
# Generate the data
df1 <- data.frame(ID = LETTERS[1:5], Var = c(9,13,15,11,28))
df2 <- as.data.frame(matrix(data = c(NA,42,83,74,84,42,NA,26,
69,9,83,26,NA,67,95,74,69,
67,NA,6,84,9,95,6,NA),
ncol = 5, nrow = 5,
dimnames = list(df1$ID, df1$ID)))
# Define a pairwise comparison index matrix using 'combn'
idx <- combn(nrow(df1), 2)
# Create a results matrix
res <- matrix(NA, ncol = ncol(df2), nrow = nrow(df2))
# Loop through 'idx' for each possible comparison (without repeats)
for(i in 1:ncol(idx)){
logiTest <- (df1$Var[idx[1,i]] + df1$Var[idx[2,i]]) < df2[idx[1,i], idx[2,i]]
res[idx[1,i], idx[2, i]] <- logiTest
res[idx[2,i], idx[1, i]] <- logiTest
}
# Count the number of 'true' comparisons
nTrues <- sum(res, na.rm = TRUE)/2
The code simply uses a pairwise comparison index (idx) to define which elements in both df1 and df2 are to be used in each iteration of the 'for loop'. It then uses this same index to define where in the 'res' matrix the answer to the logical test is to be written.
N.B. This code will break down if the order of elements in df1 and df2 are not the same. In such cases, it would be appropriate to use the actual letters to define which values to compare.

Resources