generating numerical clusters from matrix values of a minimal size - r

I have a matrix of 1's and 0's. What I would like to do is group the cells that have 1's into clusters and count the number of clusters that exist in the matrix as well as the size of these clusters.
If n number (in this case at least 4 cells with the value 1 near each other) of 1's are near each other (either immediately up, down, left or right from each other then consider them a single cluster and output the number of clusters and their size.
For example the matrix looks like this:
> m
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 0 0 0 0 0 0 0 0 0
[2,] 1 1 1 0 0 0 0 0 0 0
[3,] 0 0 1 0 0 0 0 0 0 0
[4,] 0 0 1 0 0 0 0 0 0 0
[5,] 0 0 1 0 0 0 0 1 1 0
[6,] 0 0 0 0 0 0 0 0 1 1
the number of clusters this matrix has is 2 clusters. One cluster of 7 1's and another cluster of 4 1's. I have been having quite a bit of trouble trying to get this to work and can't seem to figure it out.
The output can be something simple like this:
> output
cluster size
1 7
2 4

You could use the function ConnCompLabel from the package SDMTools to label the connected components in the binary matrix:
R> ConnCompLabel(m)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 0 0 0 0 0 0 0 0 0
[2,] 1 1 1 0 0 0 0 0 0 0
[3,] 0 0 1 0 0 0 0 0 0 0
[4,] 0 0 1 0 0 0 0 0 0 0
[5,] 0 0 1 0 0 0 0 2 2 0
[6,] 0 0 0 0 0 0 0 0 2 2
R> tab <- table(ConnCompLabel(m))[-1]
R> tab[tab >= 4]
1 2
7 4

Related

r apply function to calculate proportions by row in a matrix

I'm using apply function to calculate migration transition probabilities from eight states at time t1 to the same eight states at time t2. My data is save in matrix format and named tmp (as follows). the states at time t1 are my rows, and the states at time t2 are my columns. e.g. 228 persons stayed at state 1 between t1 and t2; 3 persons moved from state 2 to state 1 between t1 and t2.
> class(tmp)
"matrix"
> tmp
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 228 0 0 0 0 0 0 0
[2,] 3 92 0 0 0 0 0 0
[3,] 0 0 30 0 0 0 0 0
[4,] 0 0 0 20 0 0 0 0
[5,] 0 0 0 0 19 0 0 0
[6,] 0 0 0 0 0 0 0 0
[7,] 0 0 0 3 0 0 0 0
[8,] 0 0 0 0 0 0 0 3
I used the follow code to calculate the probabilities of moving to or staying in a certain state. It is the proportions for cells in each row. The results are saved in tmp1.
> tmp1=apply(tmp,1,function(X){if (sum(X)!=0) {X/sum(X)} else {numeric(length(X))}})
The problem is: I expected tmp1[4,7] to be 0 (because tmp[4,7] is 0), but the code returns me 1 (bolded). Is there a problem in my apply function?
> tmp1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 0.03157895 0 0 0 0 0 0
[2,] 0 0.96842105 0 0 0 0 0 0
[3,] 0 0.00000000 1 0 0 0 0 0
[4,] 0 0.00000000 0 1 0 0 **1** 0
[5,] 0 0.00000000 0 0 1 0 0 0
[6,] 0 0.00000000 0 0 0 0 0 0
[7,] 0 0.00000000 0 0 0 0 0 0
[8,] 0 0.00000000 0 0 0 0 0 1

Creating a list of similar diagonal block matrices in R

I have a smaller kxk matrix m given from which I want to create multiple larger NxN diagonal block matrices Q1, Q2, ..., QN. It is ensured that N is always a multiple of k.
A simple example should illustrate better what I mean:
m <- matrix(c(1,3,2,4),2,2) # the small kxk matrix
m
[,1] [,2]
[1,] 1 2
[2,] 3 4
And I want to get for let's say a 6x6 matrix the following diagonal block matrices:
Q1
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 0 0 0 0
[2,] 3 4 0 0 0 0
[3,] 0 0 0 0 0 0
[4,] 0 0 0 0 0 0
[5,] 0 0 0 0 0 0
[6,] 0 0 0 0 0 0
Q2
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 0 0 0 0 0 0
[3,] 0 0 1 2 0 0
[4,] 0 0 3 4 0 0
[5,] 0 0 0 0 0 0
[6,] 0 0 0 0 0 0
Q3
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 0 0 0 0 0 0
[3,] 0 0 0 0 0 0
[4,] 0 0 0 0 0 0
[5,] 0 0 0 0 1 2
[6,] 0 0 0 0 3 4
Any ideas how I could achieve this e.g. with lapply such that I can do the same for large matrices?
We can do this with bdiag from Matrix
library(Matrix)
lst <- list(bdiag(m, diag(4)*0), bdiag(0*diag(2), m, 0*diag(2)), bdiag(diag(4)*0, m))
If we want to change it to matrix, then use as.matrix
lapply(lst, as.matrix)
Also, this can be created as a single sparseMatrix
bdiag(list(m, 0*diag(6))[rep(1:2, length.out=5)])

multiple adjacency matrices for one edgelist R

I have the following edge list with a number that associates the edge with a path number. This is given by the following matrix which I call Totallist:
`
Begin edge end edge path number
1 3 1
3 4 1
4 5 1
6 3 2
3 2 2`
I want to construct adjacency matrices for each of the paths. In this example, I want two matrices, but there could be more. I have written the following but it only finds the matrix for the first path. I am unsure how to write something that will work for any number of paths that I throw at it:
X<-as.data.frame(table(Totallist[,3]))
nlines<-nrow(X)
nlines
freq<-X[1,2]
diameterofmatrix<-max(Totallist)
X1<-get.adjacency(graph.edgelist(as.matrix(Totallist[1:X[1,2],1:2]), directed=FALSE))
X1<-rbind(X1, 0)
X1<-cbind(X1, 0)
X1
I also need the matrices to all be the same dimension so that is why I added an extra row and column. I could continue using my method but it seems quite ugly. Thank you very much for any help.
To extract the adjacency matrices into a list you can do the following (I generate some fake data):
set.seed(42)
df <- data.frame(beginEdge = sample(1:10, 10, replace = TRUE),
endEdge = sample(1:10, 10, replace=TRUE),
pathNum = rep(c(1,2), each=5))
df
beginEdge endEdge pathNum
1 10 5 1
2 10 8 1
3 3 10 1
4 9 3 1
5 7 5 1
6 6 10 2
7 8 10 2
8 2 2 2
9 7 5 2
10 8 6 2
paths <- unique(df$pathNum) # get the paths to iterate through
If we make the nodes factors, and set the levels of the factors to all the nodes in the population, then the adjacency matrices will be computed for the population in your network. I am assuming here the network is ten actors. If your observed data contains all the nodes you want to work with set the levels to unique(c(df$beginEdge,df$endEdge)), or whatever the set of nodes are that you prefer.
df$beginEdge <- factor(df$beginEdge, levels=1:10)
df$endEdge <- factor(df$endEdge, levels=1:10)
We now go across the list of paths and create matrices storing them as a list:
list.of.adj.mats <- lapply(paths, function(i){
matrix(as.numeric((
table(df$beginEdge[df$pathNum==i],
df$endEdge[df$pathNum==i])+
table(df$endEdge[df$pathNum==i],
df$beginEdge[df$pathNum==i]))>0),
nrow=length(levels(df$beginEdge)))})
list.of.adj.mats
[[1]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 1 1
[4,] 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 1 0 0 1
[6,] 0 0 0 0 0 0 0 0 0 0
[7,] 0 0 0 0 1 0 0 0 0 0
[8,] 0 0 0 0 0 0 0 0 0 1
[9,] 0 0 1 0 0 0 0 0 0 0
[10,] 0 0 1 0 1 0 0 1 0 0
[[2]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0 0 0 0 0 0 0
[2,] 0 1 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 1 0 0 0
[6,] 0 0 0 0 0 0 0 1 0 1
[7,] 0 0 0 0 1 0 0 0 0 0
[8,] 0 0 0 0 0 1 0 0 0 1
[9,] 0 0 0 0 0 0 0 0 0 0
[10,] 0 0 0 0 0 1 0 1 0 0

R programming: How to do replace values by row?

How to do a row-wise replacement of values using R?
I have a Matrix and I would like to replace some of its values using an index vector. The problem is that R automatically does a column-wise extraction of the values as opposed to a row-wise.
You will find my code and results below:
Matrix=matrix(rep(0,42),nrow=6,ncol=7,byrow=TRUE)
v=c(1,7,11,16,18)
Matrix[v]=1
Matrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 1 0 0 0 0 0
[2,] 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0
[4,] 0 0 1 0 0 0 0
[5,] 0 1 0 0 0 0 0
[6,] 0 0 1 0 0 0 0
What I actually want to get is the row-wise version of this meaning:
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 0 0 0 0 0 1
[2,] 0 0 0 1 0 0 0
[3,] 0 1 0 1 0 0 0
[4,] 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0
>
Apparently R does a column-wise replacement of values by default.
What is the best way to obtain a row-wise replacement of the values?
Thanks!
You could recalculate the onedimensional indizes to row- and column-indices. Supposing you have calculated the row-indices in the first column of the matrix Ind and the columnindices in the second column of Ind you can do Matrix[Ind] <- 1
Matrix <- matrix(rep(0,42),nrow=6,ncol=7,byrow=TRUE)
v <- c(1,7,11,16,18)
Row <- (v-1) %/% ncol(Matrix) +1
Col <- (v-1) %% ncol(Matrix) +1
Matrix[cbind(Row,Col)] <- 1
Matrix
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 1 0 0 0 0 0 1
# [2,] 0 0 0 1 0 0 0
# [3,] 0 1 0 1 0 0 0
# [4,] 0 0 0 0 0 0 0
# [5,] 0 0 0 0 0 0 0
# [6,] 0 0 0 0 0 0 0
We can do
+(matrix(seq_along(Matrix) %in% v, ncol=ncol(Matrix), nrow=nrow(Matrix), byrow=TRUE))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] 1 0 0 0 0 0 1
#[2,] 0 0 0 1 0 0 0
#[3,] 0 1 0 1 0 0 0
#[4,] 0 0 0 0 0 0 0
#[5,] 0 0 0 0 0 0 0
#[6,] 0 0 0 0 0 0 0
You could redo your 1's to make them row-wise or you can do the following:
Matrix=matrix(rep(0,42),nrow=6,ncol=7,byrow=TRUE)
v=c(1,7,11,16,18)
Matrix<-t(Matrix)
Matrix[v]=1
Matrix<-t(Matrix)
Matrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 0 0 0 0 0 1
[2,] 0 0 0 1 0 0 0
[3,] 0 1 0 1 0 0 0
[4,] 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0

Matrix to vector minimum comparison without using apply in R

I am trying to create a function that takes in a matrix M and a vector v. It should then take the elementwise minimum between columns of M and v. As such, the number of rows of M = length(v)
For example, the below does it for two vectors of equal length. I want it to work for a matrix compared to a vector.
vectorelementwisemin = function(x,y){ #x is a vector, y is a vector (same length)
ind = which(x > y)
z = x
z[ind] <- y[ind]
return(z)
}
For example, the vectorized function could take in:
M
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 0 0 0 0 0 0 0 0 0 0 0 0
[2,] 2 2 5 4 2 3 4 1 4 4 4 2
[3,] 0 0 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0 0 0
[5,] 0 6 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0 0 0 0 0 0
v
0 4 2 1 3 0
And return
minmat(M,v)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 0 0 0 0 0 0 0 0 0 0 0 0
[2,] 2 2 4 4 2 3 4 1 4 4 4 2
[3,] 0 0 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0 0 0
[5,] 0 3 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0 0 0 0 0 0
you can just use minmat <- function(M, v) pmin(M, v) although you may want to add something like if (nrow(M) != length(v)) stop("")

Resources