Conditional deleting rows in a matrix - r

I have a matrix with 3 columns. The 1. column has either the value 1 or 0 in the rows. I want to delete all the rows in the matrix, where the 1. column is equal to zero (or keep the rows containing ones).
Thanks.

So, say that you have this matrix:
A= matrix(c(1, 2, 3, 0, 3, 5, 1, 3, 8),3,3, byrow=T)
The following command will give you a vector of TRUE/FALSE for each row, depending on whether the 1st column is 1 or not:
A[,1]==1
You can then select only those rows like this:
FILTERED = A[A[,1]==1,]
And you'll then find what you ask for in FILTERED

Try this:
#dummy matrix
x <- matrix(rep(c(1,0,1),4),ncol=3)
x
# [,1] [,2] [,3]
# [1,] 1 0 1
# [2,] 0 1 1
# [3,] 1 1 0
# [4,] 1 0 1
#keep rows where 1st column equals to 1
x[x[,1] == 1,]
# [,1] [,2] [,3]
# [1,] 1 0 1
# [3,] 1 1 0
# [4,] 1 0 1

Related

How to use which() on a matrix to get unique indices

Suppose I have a symmetric matrix:
> mat <- matrix(c(1,0,1,0,0,0,1,0,1,1,0,0,0,0,0,0), ncol=4, nrow=4)
> mat
[,1] [,2] [,3] [,4]
[1,] 1 0 1 0
[2,] 0 0 1 0
[3,] 1 1 0 0
[4,] 0 0 0 0
which I would like to analyse:
> which(mat==1, arr.ind=T)
row col
[1,] 1 1
[2,] 3 1
[3,] 3 2
[4,] 1 3
[5,] 2 3
now the question is: how am I not considering duplicated cells? As the resulting index matrix shows, I have the rows 2 and 4 pointing respectively to (3,1) and (1,3), which is the same cell.
How do I avoid such a situation? I only need a reference for each cell, even though the matrix is symmetric. Is there an easy way to deal with such situations?
EDIT:
I was thinking about using upper.tri or lower.tri but in this case what I get is an vector version of the matrix and I am not able to get back to the (row, col) notation.
> which(mat[upper.tri(mat)]==1, arr.ind=T)
[1] 2 3
EDIT II
expected output would be something like an unique over the couple of (row, col) and (col, row):
row col
[1,] 1 1
[2,] 3 1
[3,] 3 2
Since you have symmetrical matrix you could do
which(mat == 1 & upper.tri(mat, diag = TRUE), arr.ind = TRUE)
# row col
#[1,] 1 1
#[2,] 1 3
#[3,] 2 3
OR
which(mat == 1 & lower.tri(mat, diag = TRUE), arr.ind = TRUE)

Write a value for maximum/minimum between two values

I have a two-column matrix and I want to produce a new matrix/data.frame where Col N has 1 if is maximum, 0 otherwise (they are never equal). This is my attempt:
testM <- matrix(c(1,2,3, 1,1,5), ncol = 2, byrow = T)
>testM
V1 V2
1 1 2
2 3 1
3 1 5
apply(data.frame(testM), 1, function(row) ifelse(max(row[1],row[2]),1,0))
I expect to have:
0 1
1 0
0 1
because of the 0,1 parameters in max() function, but I just get
[1] 1 1 1
Any ideas?
Or using pmax
testM <- matrix(c(1,2,3, 1,1,5), ncol = 2, byrow = T)
--(testM==pmax(testM[,1],testM[,2]))
V1 V2
[1,] 0 1
[2,] 1 0
[3,] 0 1
You can perform arithmetic on Booleans in R! Just check if an element in each row is equal to it's max value and multiply by 1.
t(apply(testM, 1, function(row) 1*(row == max(row))))
You can use max.col and col to produce a logical matrix:
res <- col(testM) == max.col(testM)
res
[,1] [,2]
[1,] FALSE TRUE
[2,] TRUE FALSE
[3,] FALSE TRUE
If you want it as 0/1, you can do:
res <- as.integer(col(testM) == max.col(testM)) # this removes the dimension
dim(res) <- dim(testM) # puts the dimension back
res
[,1] [,2]
[1,] 0 1
[2,] 1 0
[3,] 0 1

How to add up values of one column until a condition from another column is reached?

I have a matrix like this:
m <- matrix(c(1,2,1,1,3,1,1,0,0,0,1,1,0,1), ncol = 2,
dimnames = list(NULL, c('var', 'tp')))
var tp
[1,] 1 0
[2,] 2 0
[3,] 1 0
[4,] 1 1
[5,] 3 1
[6,] 1 0
[7,] 1 1
etc.
I'd like to sum up all lines of var until tp becomes 1, then print the result and stop. In this example, that would mean summing up the first four lines.
How would I do this in R?
You can use the cumsum function to identify at which row tp becomes 1, then find the row with which, and sum var to this point
sum(X[1:min(which(cumsum(m[,2])==1)),1])
Another option:
cumsum(m[, 1])[cumsum(m[,2 ]) ==1 ] # or
cumsum(m[, 'var'])[cumsum(m[, 'tp']) == 1]

Keep one maximum value per row in a matrix in R

I have a matrix like this:
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 1 1 0
[3,] 0 0 1
The ones in each row represent the maximum values per row for e.g. i had the matrix
[,1] [,2] [,3]
[1,] 11 32 12
[2,] 16 16 14
[3,] 19 18 27
Now in this matrix in the second row I had two same maximum values (16) which got replaced by two 1's in the second row in the previous matrix, now I want to remove duplicate maximum values in my rows of a matrix so in essence what I need is something like this:
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 1 0 0
[3,] 0 0 1
i.e keep one maximum value per row at random (ties should be broken at random and only one maximum value kept) and make all the entries other than that zero. Please can any one provide me a code snippet to solve this problem.
Or you could use. This would be faster.
ret[cbind(seq_len(nrow(mat2)),max.col(mat2, "first"))] <- 1
ret
# [,1] [,2] [,3]
#[1,] 0 1 0
#[2,] 1 0 0
#[3,] 0 0 1
data
mat1 <- matrix(c(0,1,0, 1,1,0,0,0,1), ncol=3)
mat2 <- matrix(c(11,16,19, 32, 16, 18, 12, 14, 27), ncol=3)
ret <- matrix(0, ncol(mat1), nrow(mat1))
if mat is your original matrix,
Create an empty matrix full of zeros, of the correct size and dim
ret <- matrix(rep(0, length(mat)), ncol=ncol(mat))
assign the required values to 1. Note that which.max breaks tie by choosing the first occurrence.
ret[ cbind(seq(nrow(mat)), apply(mat, 1, which.max)) ] <- 1
ret
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 1 0 0
[3,] 0 0 1
Alternatively, if you truly want to split ties at random, you would use something like this as the index to ret:
cbind(seq(nrow(mat)), apply(mat, 1, function(x)
sample(which(x == max(x)), 1)
))

How to print row index and occurences count of zeros in rows in R data.frame

I want to print row index and the number of zeros present in each row of a R data.frame ..
The input matrix is like this:
A B
rowIndex1 0 1
rowIndex2 1 1
I thought to use this:
print(which(rowSums(matrix == 0) != 0))
I want that it prints something like this:
rowIndex1
1
However it does not print the number of zeros in the rows but a different number (I checked it) - like this:
rowIndex1
2400
How to achieve it?
Thanks
As mentioned in my comment, perhaps arr.ind would be of use.
Using #bartektartanus's sample data:
m <- diag(5) + c(0:6,0,0)
table(which(m == 0, arr.ind=TRUE)[, "row"])
#
# 2 3 4 5
# 1 2 1 1
The "names" (in this case, 2, 3, 4, and 5) are your row numbers and the values (in this case, 1, 2, 1, 1) are the counts.
Here is the output of which, so you can understand what is going on:
which(m == 0, arr.ind=TRUE)
# row col
# [1,] 3 2
# [2,] 4 2
# [3,] 5 2
# [4,] 2 4
# [5,] 3 4
This is working good. You get row number that contains zero.
> m <- diag(5) + c(0:6,0,0)
Warning message:
In diag(5) + c(0:6, 0, 0) :
longer object length is not a multiple of shorter object length
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 1 6 2
[2,] 1 7 2 0 3
[3,] 2 0 4 0 4
[4,] 3 0 4 1 5
[5,] 4 0 5 1 7
> which(rowSums(m == 0) != 0)
[1] 2 3 4 5
to obtain what you want use this:
> x <- rowSums(m==0)
> cbind(which(x!=0),x[x!=0])
[,1] [,2]
[1,] 2 1
[2,] 3 2
[3,] 4 1
[4,] 5 1

Resources