How to extract repeating rows in a matrix - r

I R I have this matrix
> Y
> [,1] [,2] [,3] [,4]
[1,] "0" "2" "9" "5"
[2,] "4" "7" "7" "3"
[3,] "1" "5" "7" "9"
[4,] "7" "8" "7" "4"
[5,] "7" "8" "7" "4"
[6,] "1" "1" "7" "2"
[7,] "7" "8" "7" "4"
...
From this matrix I want to get all the repeating rows that repeat 1 time, 2 times, 3 times and so on.
So for example
"7" "8" "7" "4"
occurs 3 times in Y. How do I find all the other cases?
So the output should be:
Return all rows that occurs 2 times in Y.
Return all rows that occurs 3 times in Y.
Return all rows that occurs 4 or more times in Y.
I have tried to solve this with the
> duplicate
command but this is not enough.

Here is a simple solution based on concatenating the rows of your matrix into a string and then tabulating the frequency with which the strings appear.
First we'll generate some simple fake data. I generate random zeros and ones to ensure there will be plenty of duplicates:
Y <- matrix(rbinom(100, 1, .5), ncol = 4)
head(Y)
#> [,1] [,2] [,3] [,4]
#> [1,] 0 0 0 1
#> [2,] 0 0 0 0
#> [3,] 0 0 0 0
#> [4,] 0 0 0 1
#> [5,] 0 1 1 0
#> [6,] 0 0 1 0
# I collapse all the values in each row into a string, so c(0,1,0,1) becomes "0101"
row.ids <- apply(Y, 1, paste, collapse = "")
# Now using table() I can get the frequency with which each pattern appears
row.freqs <- table(row.ids)
# All triply replicated rows
Y[row.ids %in% names(row.freqs[row.freqs==3]),]
#> [,1] [,2] [,3] [,4]
#> [1,] 0 0 0 1
#> [2,] 0 0 0 1
#> [3,] 0 1 1 0
#> [4,] 0 0 0 1
#> [5,] 0 1 1 0
#> [6,] 0 1 1 0
# All quadruply replicated rows
Y[row.ids %in% names(row.freqs[row.freqs==4]),]
#> [,1] [,2] [,3] [,4]
#> [1,] 0 0 0 0
#> [2,] 0 0 0 0
#> [3,] 0 0 1 0
#> [4,] 0 0 1 0
#> [5,] 0 0 0 0
#> [6,] 0 0 1 0
#> [7,] 0 1 1 1
#> [8,] 0 1 1 1
#> [9,] 0 1 1 1
#> [10,] 0 0 0 0
#> [11,] 0 1 1 1
#> [12,] 0 0 1 0
Created on 2019-02-20 by the reprex package (v0.2.1)

Using the test matrix Y in the Note at the end, use aggregate to create a data frame ag whose rows are the unique rows of Y followed by the count of how many times they occur.
ag <- aggregate(cbind(count = apply(Y, 1, toString)) ~ ., as.data.frame(Y),
FUN = length)
nc <- ncol(Y)
subset(ag, count == 2, select = -count) # shows rows which occur twice
split(ag[1:nc], ag$count) # splits unique rows into those that occur once, twice, etc.
Note
Y <- matrix(c(0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1,
0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0,
0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,
0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1), 25, 4)

Related

Delete specific columns

I have
F <- structure(c(0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0), .Dim = c(3L,
5L))
How can I remove from F the columns that have less than 2 consecutive zero?
Thx!
We may use rle to determine the consecutive values i.e. 0 and create a logical condition with lengths by looping over the column (apply, MARGIN = 2)
F[,!apply(F, 2, function(x) with(rle(!x),
any(lengths >= 2 & values))), drop = FALSE]
-output
[,1] [,2]
[1,] 0 0
[2,] 1 1
[3,] 1 1
If it is the opposite, just remove the !
F[,apply(F, 2, function(x) with(rle(!x),
any(lengths >= 2 & values))), drop = FALSE]
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 0 0 0
[3,] 0 0 0
A slightly different approach with rle applied over the columns:
F[, apply(F, 2, \(x) with(rle(x), any(lengths[values == 0] >= 2)))]
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 0 0 0
[3,] 0 0 0
Using pure base, no extra functions, as one-liner:
U = F[, apply((F[-1,]==0) & (F[-nrow(F),]==0), 2, any)]
Breakdown:
U = F[ # Select...
, # ...all the rows in the matrix...
apply( # ...that have...
(F[-nrow(F),]==0) & (F[-1,]==0), # ...one value = 0 and the next value = 0
2, # ...in columns (i.e. 2nd dimension)....
any # ...anywhere in the column.
)
]

Matrix fill zero rows with one

I have an Adjacency matrix A:
[1] [2] [3]
[1] 1 0 1
[2] 0 0 0
[3] 0 0 0
i want a new matrix B filled with one in all rows with only zero so B:
[1] [2] [3]
[1] 0 0 0
[2] 1 1 1
[3] 1 1 1
how can i do that i R
Since an adjacency matrix is nonnegative, you can just test for zero rowSums:
A <- matrix(c(1, 0, 1,
0, 0, 0,
0, 0, 0), byrow = TRUE, nrow = 3)
B <- A * 0 # 0-matrix same dimensions as A
B[rowSums(A) == 0, ] <- 1
apply() can be useful here. Try
t(apply(A, 1, function(x){ifelse(x != sum(x), 0, 1)}))
The apply shown returns a vector with one element per row of A having the value TRUE if that row is all zeroes and FALSE otherwise. Assign that vector to A using recycling so that each column of A is filled with it. Such assignment will also have the effect of converting FALSE and TRUE to 0 and 1. The expression is compact, seems relatively straightforward to understand and does not use any packages.
replace(A, TRUE, apply(A == 0, 1, all))
giving:
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 1 1 1
[3,] 1 1 1
This variation is even shorter but is slightly trickier:
replace(A, TRUE, apply(!A, 1, all))
Note
The input A in reproducible form is:
A <- matrix(c(1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), 3)
Another way
m <- matrix( c(1, 0, 1, 0, 0, 0, 0, 0, 0), nrow = 3, byrow = T )
> m
[,1] [,2] [,3]
[1,] 1 0 1
[2,] 0 0 0
[3,] 0 0 0
m <- matrix(rep(ifelse(rowSums(m) == 0, 1, 0), dim(m)[1]), nrow = dim(m)[1])
> m
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 1 1 1
[3,] 1 1 1

Extract sub-matrices from binary matrix in R

Say binary matrix m:
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 0 0 0 0 0 0 0 0 0
# [2,] 0 0 0 0 0 0 0 0 0
# [3,] 0 0 0 1 1 1 1 0 0
# [4,] 0 0 0 1 1 1 1 0 0
# [5,] 0 0 0 1 1 1 1 0 0
# [6,] 0 0 0 0 0 0 0 0 0
# [7,] 0 1 1 0 0 0 0 1 1
# [8,] 0 1 1 0 1 1 0 1 1
# [9,] 0 0 0 0 1 1 0 1 1
# [10,] 0 0 0 0 1 1 0 0 0
m <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1,
1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 0), .Dim = c(10L, 9L))
How we can extract those 1-valued sub-matrices? e.g.
m[7:9,8:9]
# [,1] [,2]
#[1,] 1 1
#[2,] 1 1
#[3,] 1 1
The point is that I want to extract them algorithmtically not indexing them explicitly like m[7:9,8:9].
The input is a binary matrix
List of sub-matrices as output (so list of four matrices of dim 3*4, 2*2, 3*2 and 3*2)
Sub-matrices are 1-valued rectangular
The border of the sub-matrices are secured with zeros.
I'd treat it as a spatial problem where you have a raster and want to detect regions of connected cells.
library(raster)
r <- raster(m)
library(igraph)
rc <- clump(r)
plot(rc, col = rainbow(rc#data#max))
m1 <- as.matrix(rc)
lapply(seq_len(rc#data#max), function(x) {
inds <- which(m1 == x, arr.ind = TRUE)
nrow <- diff(range(inds[, "row"])) + 1
ncol <- diff(range(inds[, "col"])) + 1
matrix(1, ncol = ncol, nrow = nrow)
})
#[[1]]
# [,1] [,2] [,3] [,4]
#[1,] 1 1 1 1
#[2,] 1 1 1 1
#[3,] 1 1 1 1
#
#[[2]]
# [,1] [,2]
#[1,] 1 1
#[2,] 1 1
#
#[[3]]
# [,1] [,2]
#[1,] 1 1
#[2,] 1 1
#[3,] 1 1
#
#[[4]]
# [,1] [,2]
#[1,] 1 1
#[2,] 1 1
#[3,] 1 1
Use focal in the raster package with an appropriate weighting matrix w. It. convolves w with m giving a matrix the same dimensions as m with the value of big at each upper left corner and other values elsewhere so comparing it to big gives a logical matrix which is TRUE at upper left corners of rectangles. Using which we get rc which has one row per rectange and two columns representing the i and j coordinates of the upper left of that rectangle. The Map call iterates over the upper left coordinates invoking genmap on each. genmap uses rle (as defined in the rl function) to find the length of the run of ones in each coordinate direction and returns a matrix of ones having those dimensions.
library(raster)
big <- 100
r <- raster(m)
w <- matrix(0, 3, 3); w[1:2, 1:2] <- 1; w[2, 2] <- big
rc <- which(as.matrix(focal(r, w, pad = TRUE, padValue = 0)) == big, arr = TRUE)
rl <- function(x) rle(x)$lengths[1]
genmat <- function(i, j) matrix(1, rl(m[i:nrow(m), j]), rl(m[i, j:ncol(m)]))
Map(genmat, rc[, 1], rc[, 2])
giving:
[[1]]
[,1] [,2]
[1,] 1 1
[2,] 1 1
[[2]]
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 1 1 1 1
[3,] 1 1 1 1
[[3]]
[,1] [,2]
[1,] 1 1
[2,] 1 1
[3,] 1 1
[[4]]
[,1] [,2]
[1,] 1 1
[2,] 1 1
[3,] 1 1
Updates Simplified code.
A rather long-winded answer, but you can do this via image labeling as I did in this SO answer. This will extend nicely to non-rectangular blobs of 1's.
find.contiguous <- function(img, x, bg) {
## we need to deal with a single (row,col) matrix index
## versus a collection of them in a two column matrix separately.
if (length(x) > 2) {
lbl <- img[x][1]
img[x] <- bg
xc <- x[,1]
yc <- x[,2]
} else {
lbl <- img[x[1],x[2]]
img[x[1],x[2]] <- bg
xc <- x[1]
yc <- x[2]
}
## find all neighbors of x
xmin <- ifelse((xc-1) < 1, 1, (xc-1))
xmax <- ifelse((xc+1) > nrow(img), nrow(img), (xc+1))
ymin <- ifelse((yc-1) < 1, 1, (yc-1))
ymax <- ifelse((yc+1) > ncol(img), ncol(img), (yc+1))
## find all neighbors of x
x <- rbind(cbind(xmin, ymin),
cbind(xc , ymin),
cbind(xmax, ymin),
cbind(xmin, yc),
cbind(xmax, yc),
cbind(xmin, ymax),
cbind(xc , ymax),
cbind(xmax, ymax))
## that have the same label as the original x
x <- x[img[x] == lbl,]
## if there is none, we stop and return the updated image
if (length(x)==0) return(img);
## otherwise, we call this function recursively
find.contiguous(img,x,bg)
}
find.contiguous is a recursive function in which for each call it receives:
A working copy of the image img.
A collection of pixel (matrix) indices x (row,col) that belong to an object in the image img.
The background value bg
find.contiguous then proceeds to:
Set all pixels at x in img to the bg color. This marks that we have visited the pixels.
Find all neighboring pixels of x that have the same label (value) as that in x. This grows the region of the same object. Note that since x is not necessarily a single pixel, x grows geometrically so that, in fact, this function is no slouch.
If there are no more neighbors belonging to the same object, we return the updated image; otherwise, we make the recursive call.
Starting from a single pixel that correspond to an object, a call to find.contiguous will grow the region to include all the object's pixels and return an updated image where the object is replaced by the background. This process can then be repeated in a loop until there are no more objects in the image, hence the ability to extract all sub-matrices of 1's.
With your data:
m <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1,
1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 0), .Dim = c(10L, 9L))
## make a copy to img which will be converted to all-zeros in the process
## as matrices of 1's are extracted by the process
img <- m
## get all pixel coordinates that are objects
x <- which(img==1, arr.ind=TRUE)
## loop until there are no more pixels that are objects
##the output is in the list out
count <- 0
out <- list()
while (length(x) > 0) {
## choose a single (e.g., first) pixel location. This belongs to the current
## object that we will grow and remove from the image using find.contiguous
if (length(x) > 2) {
x1 <- x[1,]
}
## make the call to remove the object from img
img <- find.contiguous(img, x1, 0)
## find the remaining pixel locations belonging to objects
xnew <- which(img==1, arr.ind=TRUE)
count <- count + 1
## extract the indices for the 1's found by diffing new with x
out.ind <- x[!(x[,1] %in% xnew[,1] & x[,2] %in% xnew[,2]),]
## set it as a matrix in the output
out[[count]] <- matrix(m[out.ind],nrow=length(unique(out.ind[,1])),ncol=length(unique(out.ind[,2])))
x <- xnew
}
Your output is the list out:
print(out)
##[[1]]
## [,1] [,2]
##[1,] 1 1
##[2,] 1 1
##
##[[2]]
## [,1] [,2] [,3] [,4]
##[1,] 1 1 1 1
##[2,] 1 1 1 1
##[3,] 1 1 1 1
##
##[[3]]
## [,1] [,2]
##[1,] 1 1
##[2,] 1 1
##[3,] 1 1
##
##[[4]]
## [,1] [,2]
##[1,] 1 1
##[2,] 1 1
##[3,] 1 1
Note that you can just as easily output the locations of the extracted 1's from out.ind:

Replacing row-values in matrix by its row index

I use a r-matrix (for example [[0,0,0,1],[0,1,0,1],[1,0,0,0],[0,0,1,1]]) representing
a raster. I'd like to replace every value except 0 with its row index value. Is there something like
matrix[matrix==1] <- row_index
so that my result would look like [[0,0,0,1],[0,2,0,2],[3,0,0,0],[0,0,4,4]]?
I am using R 2.15.1 on a Mac (10.7.5) and RPY2 2.2.6 to start the R-Methods.
Or is there any other way to get reasonable results for statistical functions like histogram, chi_square etc.?
For a succinct, expressive solution, I'd be likely to use this:
m <- matrix(c(0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1),
nrow = 4, byrow = TRUE)
m[m!=0] <- row(m)[m!=0]
m
# [,1] [,2] [,3] [,4]
# [1,] 0 0 0 1
# [2,] 0 2 0 2
# [3,] 3 0 0 0
# [4,] 0 0 4 4
Hopefully all hell won't break loose for suggesting a for loop, but we'll see:
Here's your matrix
mymat <- matrix(c(0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1),
nrow = 4, byrow = TRUE)
mymat
# [,1] [,2] [,3] [,4]
# [1,] 0 0 0 1
# [2,] 0 1 0 1
# [3,] 1 0 0 0
# [4,] 0 0 1 1
Here's a for loop that uses basic subsetting to identify the cases you want to replace.
for (i in 1:nrow(mymat)) {
mymat[i, ][mymat[i, ] != 0] <- i
}
Here's the result.
mymat
# [,1] [,2] [,3] [,4]
# [1,] 0 0 0 1
# [2,] 0 2 0 2
# [3,] 3 0 0 0
# [4,] 0 0 4 4
Maybe I'm missing the point of your question, but how about
> m <- matrix(c(0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1),
+ nrow = 4, byrow = TRUE)
> m * 1:nrow(m)
[,1] [,2] [,3] [,4]
[1,] 0 0 0 1
[2,] 0 2 0 2
[3,] 3 0 0 0
[4,] 0 0 4 4
>
(ETA: R fills in matrices by column, and the itemwise product operator makes the two matrices involved conformant by replicating them column by column until they fit. The * operator here winds up multiplying each item by the row to which it belongs.)

Tuples in R following Mathematica Tuples[list, n]

I would like to replicate the behavior of Tuples[list, n] of Mathematica in R. For example,
Tuples[{0, 1}, 3] =
{{0, 0, 0}, {0, 0, 1}, {0, 1, 0}, {0, 1, 1}, {1, 0, 0},
{1, 0, 1}, {1, 1, 0}, {1, 1, 1}}.
In R, I would like the following result:
tuples(c(0,1), 3) = list( c(0, 0, 0), c(0, 0, 1), c(0, 1, 0),
c(0, 1, 1), c(1, 0, 0), c(1, 0, 1), c(1, 1, 0), c(1, 1, 1) )
or something similar to this where I can extract the sublists c(0,0,0), etc.
Is there a command for this? Or how can I program this? Thanks in advance.
Here is one way:
tuples <- function(x, n) do.call(expand.grid, rep(list(x), n))
tuples(0:1, 3)
# Var1 Var2 Var3
# 1 0 0 0
# 2 1 0 0
# 3 0 1 0
# 4 1 1 0
# 5 0 0 1
# 6 1 0 1
# 7 0 1 1
# 8 1 1 1
I think it makes more sense to keep it as a matrix or data.frame since all the elements have the same length.
The combinatorics package, ‘combinat’ version 0.0-8, contains a function called hcube, which can also be used to generate tuples like Mathematica, e.g.:
hcube(c(2,2,2), translation=c(-1,-1,-1))
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 1 0 0
[3,] 0 1 0
[4,] 1 1 0
[5,] 0 0 1
[6,] 1 0 1
[7,] 0 1 1
[8,] 1 1 1

Resources