I have a matrix with values 0 or 1 and I would like to obtain a list of groups of adjacent 1's. Vertical and horisontal neighbors of each 1 are considered when defining the connected groups.
For example, the matrix
mat = rbind(c(1,0,0,0,0),
c(1,0,0,1,0),
c(0,0,1,0,0),
c(0,0,0,0,0),
c(1,1,1,1,1))
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 0
[2,] 1 0 0 1 0
[3,] 0 0 1 0 0
[4,] 0 0 0 0 0
[5,] 1 1 1 1 1
should return the following 4 connected components:
C1 = {(1,1);(2,1)}
C2 = {(2,4)}
C3 = {(3,3)}
C4 = {(5,1);(5,2);(5,3);(5,4);(5,5)}
Does anybody has an idea of how to do it fast in R? My real matrix is indeed rather large, like 2000x2000 (but I expect that the number of connected components to be reasonably small, i.e. 200).
You can turn your binary matrix into a raster object, and use the raster::clumps function to "Detect clumps (patches) of connected cells. Each clump gets a unique ID". Then it is just data management to return the exact format you want. Example below:
library(igraph)
library(raster)
mat = rbind(c(1,0,0,0,0),
c(1,0,0,1,0),
c(0,0,1,0,0),
c(0,0,0,0,0),
c(1,1,1,1,1))
Rmat <- raster(mat)
Clumps <- as.matrix(clump(Rmat, directions=4))
#turn the clumps into a list
tot <- max(Clumps, na.rm=TRUE)
res <- vector("list",tot)
for (i in 1:tot){
res[i] <- list(which(Clumps == i, arr.ind = TRUE))
}
Which then res prints out at the console:
> res
[[1]]
row col
[1,] 1 1
[2,] 2 1
[[2]]
row col
[1,] 2 4
[[3]]
row col
[1,] 3 3
[[4]]
row col
[1,] 5 1
[2,] 5 2
[3,] 5 3
[4,] 5 4
[5,] 5 5
I wouldn't be surprised if there is a better way to go from the raster object to your end goal though. Again a 2000 by 2000 matrix should not be a big deal for this.
Old (wrong answer) but should be useful for people who want connected components of a graph.
You can use the igraph package to turn your adjacency matrix into a network and return the components. Your example graph is one component, so I removed one edge for illustration.
library(igraph)
mat = rbind(c(1,0,0,0,0),
c(1,0,0,1,0),
c(0,0,1,0,0),
c(0,0,0,0,0),
c(1,1,1,1,1))
g <- graph.adjacency(mat) %>% delete_edges("5|3")
plot(g)
clu <- components(g)
groups(clu)
The final line then returns at the prompt:
> groups(clu)
$`1`
[1] 1 2 4 5
$`2`
[1] 3
My experience with this algorithm it is pretty fast - so I don't think 2,000 by 2,000 will be a problem.
Related
Suppose I have a symmetric matrix:
> mat <- matrix(c(1,0,1,0,0,0,1,0,1,1,0,0,0,0,0,0), ncol=4, nrow=4)
> mat
[,1] [,2] [,3] [,4]
[1,] 1 0 1 0
[2,] 0 0 1 0
[3,] 1 1 0 0
[4,] 0 0 0 0
which I would like to analyse:
> which(mat==1, arr.ind=T)
row col
[1,] 1 1
[2,] 3 1
[3,] 3 2
[4,] 1 3
[5,] 2 3
now the question is: how am I not considering duplicated cells? As the resulting index matrix shows, I have the rows 2 and 4 pointing respectively to (3,1) and (1,3), which is the same cell.
How do I avoid such a situation? I only need a reference for each cell, even though the matrix is symmetric. Is there an easy way to deal with such situations?
EDIT:
I was thinking about using upper.tri or lower.tri but in this case what I get is an vector version of the matrix and I am not able to get back to the (row, col) notation.
> which(mat[upper.tri(mat)]==1, arr.ind=T)
[1] 2 3
EDIT II
expected output would be something like an unique over the couple of (row, col) and (col, row):
row col
[1,] 1 1
[2,] 3 1
[3,] 3 2
Since you have symmetrical matrix you could do
which(mat == 1 & upper.tri(mat, diag = TRUE), arr.ind = TRUE)
# row col
#[1,] 1 1
#[2,] 1 3
#[3,] 2 3
OR
which(mat == 1 & lower.tri(mat, diag = TRUE), arr.ind = TRUE)
I'm looking to generate all possible binary vectors of length n in R. What is the best way (preferably both computationally efficient and readable code) to do this?
n = 3
expand.grid(replicate(n, 0:1, simplify = FALSE))
# Var1 Var2 Var3
#1 0 0 0
#2 1 0 0
#3 0 1 0
#4 1 1 0
#5 0 0 1
#6 1 0 1
#7 0 1 1
#8 1 1 1
Inspired by this question generating all possible binary vectors of length n containing less than m 1s, I've extended this code to produce all possible combinations. It's not pretty, though.
> z <- 3
> z <- rep(0, n)
> do.call(rbind, lapply(0:n, function(i) t(apply(combn(1:n,i), 2, function(k) {z[k]=1;z}))))
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 1 0 0
[3,] 0 1 0
[4,] 0 0 1
[5,] 1 1 0
[6,] 1 0 1
[7,] 0 1 1
[8,] 1 1 1
What is it doing? Once we strip it back, the core of this one-liner is the following:
apply(combn(1:n,i), 2, function(k) {z[k]=1;z})
To understand this, let's step back one level further. The function combn(x,m) generates all possible combinations of x taken m at a time.
> combn(1:n, 1)
[,1] [,2] [,3]
[1,] 1 2 3
> combn(1:n, 2)
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 3 3
> combn(1:n, 3)
[,1]
[1,] 1
[2,] 2
[3,] 3
For using apply(MARGIN=2), we pass in a column of this function at a time to our inner function function(k) {z[k]=1;z} which simply replaces all of the values at the indices k with 1. Since they were originally all 0, this gives us each possible binary vector.
The rest is just window dressing. combn gives us a wide, short matrix; we transpose it with t. lapply returns a list; we bind the matrices in each element of the list together with do.call(rbind, .).
You should define what is "the best way" (fastest? shortest code?, etc.).
One way is to use the package R.utils and the function intToBin for converting decimal numbers to binary numbers. See the example.
require(R.utils)
n <- 5
strsplit(intToBin(0:(2 ^ n - 1)), split = "")
I am working in R and am trying to take a dataframe with 7 columns and create a 2 column dataframe with all the combinations of each row's responses stacked on top of each other.
For example if I had:
1,0,1,0
I would want to transform that row to the rows
1,0
1,1
1,0
0,1
0,0
1,0
And do that to every row in the dataframe and stack them.
I know how to do this for 1 row at a time
df2<-combn(df[1,],2)
That code will get me the combinations of one row like my example above; however I can't figure out how to do apply it to all rows. My best guess would be something along the lines of
df3<-apply(1:nrow(df), 1, function(x) combn(df[x,],2))
However I am getting the "dim(x) must have a positive length" error. Does anyone know what my problem is and can explain what I am doing wrong and why I need to do it a certain way. New to coding R beyond base functions. As far as data goes it's just binary data.
Since you didn't provide sample data i'll use some made up data. You can use lapply to store each row as a list element and then use do.call with rbind to bind them all together:
set.seed(1)
df <- data.frame(col1 = sample(1:4), col2 = sample(1:4), col3 = sample(1:4), col4 = sample(1:4))
df3 <- do.call(rbind, lapply(1:nrow(df), function(i) t(combn(df[i,],2))))
head(df3)
# [,1] [,2]
#[1,] 2 1
#[2,] 2 3
#[3,] 2 3
#[4,] 1 3
#[5,] 1 3
#[6,] 3 3
Note - df3 will return as a matrix.
Something like this:
df <- data.frame(a = c(1,0), b = c(0,1), c = c(1,0), d = c(0,1))
matrix(c(apply(df[1:2,],1, function(x) combn(x,2))), ncol = 2, byrow = T)
[,1] [,2]
[1,] 1 0
[2,] 1 1
[3,] 1 0
[4,] 0 1
[5,] 0 0
[6,] 1 0
[7,] 0 1
[8,] 0 0
[9,] 0 1
[10,] 1 0
[11,] 1 1
[12,] 0 1
I am suppose to change a square matrix which represents a graph (the vertices-and-edges kind) and change it into a list that represents the same graph
square matrix: element (i,j) = 1 means there is an edge i -> j
list: element i is a vector (possibly empty, coded as NA) of all j s.t. there is an edge i -> j
My problem is that if there is a zero in the middle of the row it returns a NA and it is only suppose to do that when a vector is empty(no edges). It only does it when a zero is between two 1's. I don't know why and NA.omit doesn't work.
This is my first time programming in R.
squaretolist <- function(m){
ml <- list() #creates an empty list that we will return at the end
for(i in 1:ncol(m)){ #loop through columns
b1 <- c()
for(j in 1:nrow(m)){ #loop through rows
ifelse(m[i,j] %in% 1, b1[j] <- j, next)
}
ifelse(length(b1) == 0, ml[[i]]<- NA, ml[[i]] <- b1 )
}
return(ml)
}
In your function, if you have a zero in between two 1s, for example 1 in the 1st position and in the 3rd position, you're assigning b1[1] to 1, b1[3] to 3 but, as you have a 0 in the 2nd position, you're not assigning b1[2] to anything so it becomes NA.
To avoid that, you can replace ifelse(m[i,j] %in% 1, b1[j] <- j, next)
by ifelse(m[i,j] %in% 1, b1 <- c(b1,j), next).
You can also get what you want with the use of grep and apply functions :
ml <- apply(m, 1, function(i) {if(any(i==1)) grep(1, i) else NA})
This instruction tells R to apply, for each row of the matrix m, a function that returns, if there is at least one 1, the position of the 1(s), else NA.
Example:
set.seed(123)
m<-matrix(sample(c(0,1),25,replace=T),nrow=5)
m[4,]<-rep(0,5)
# > m
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 0 1 1 1
# [2,] 1 1 0 0 1
# [3,] 0 1 1 0 1
# [4,] 0 0 0 0 0
# [5,] 1 0 0 1 1
ml<-apply(m,1,function(i){if(any(i==1)) grep(1,i) else NA})
# > ml
# [[1]]
# [1] 3 4 5
# [[2]]
# [1] 1 2 5
# [[3]]
# [1] 2 3 5
# [[4]]
# [1] NA
# [[5]]
# [1] 1 4 5
I have a single vector (call it t1) with a series of observations. I want to create a set of new vectors by popping the first observation from t1 (and so on for subsequent near-copies). But I want to keep the vectors the same length so I can add them to a data frame later.
I was able to make it work as follows:
t1 <- c(1, 2, 3)
t2 <- t1[-1]
t3 <- t2[-1]
t2[length(t2)+1] <- 0
t3[length(t3)+1] <- 0
t3[length(t3)+1] <- 0
t.all <- cbind(as.data.frame(t1), as.data.frame(t2), as.data.frame(t3))
t.all
t1 t2 t3
1 1 2 3
2 2 3 0
3 3 0 0
But this is clumsy and it's going to be tedious if I want to create a large number of columns. How can I keep the vectors the same length (or solve this problem another way)?
Here a loop version of what you try to do , uding do.call and lapply:
cbind(t1,do.call(cbind,lapply(seq_along(t1)-1,
function(x)c(tail(t1,-x),rep(0,x)))))
t1
[1,] 1 2 3
[2,] 2 3 0
[3,] 3 0 0
> t.all <- sapply(0:2, function(x) c( t1[(x+1):3], rep(0,x) ) )
> t.all
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 0
[3,] 3 0 0
If you need it to be a data.frame it would be a lot more efficient to build as a matrix first and then wrap as.data.frame around the final result.
Here's another way using vector indexing:
t1 <- (2,5,3)
mm <- do.call(rbind, lapply(seq_along(t1), function(x) t1[x:length(t1)][1:length(t1)]))
# [,1] [,2] [,3]
# [1,] 2 5 3
# [2,] 5 3 NA
# [3,] 3 NA NA
mm[is.na(mm)] <- 0
# [,1] [,2] [,3]
# [1,] 2 5 3
# [2,] 5 3 0
# [3,] 3 0 0
Another way without using apply family:
t1 <- c(2,5,4,6)
len <- length(t1)
matrix(t1[outer(1:len, 0:(len-1), '+')], ncol=len)
# [,1] [,2] [,3] [,4]
# [1,] 2 5 4 6
# [2,] 5 4 6 NA
# [3,] 4 6 NA NA
# [4,] 6 NA NA NA
How about creating a matrix row-by-row, by recycling t1 as desired:
tmat <-cbind(t1,t1,t1,t1,....) # as many as needed
Then just use a matrix triangle function
newmat<- tmat * upper.tri(tmat,diag=TRUE)
That's offset from your sample, but contains the same info per row.
Most of the other answers focus on creating the final data.frame. If that is your ultimate goal, then they provide good approaches. This answer instead focuses narrowly on your question of how to take the first element off and preserve the length. In order to keep things tidy, it is best to do the whole thing in one function.
shift <- function(tx) {append(tx[-1],0)}
Then you can have
t1 <- c(1, 2, 3)
t2 <- shift(t1)
t3 <- shift(t2)
t.all <- data.frame(t1, t2, t3)
which gives you the same result you had.
> t.all
t1 t2 t3
1 1 2 3
2 2 3 0
3 3 0 0
If you want to combine this function with a looping construct to create the data.frame, it is easiest to go through a matrix first.
t.all <- matrix(t1, nrow=length(t1), ncol=length(t1))
lapply(seq(length=length(t1))[-1], function(i) {
t.all[,i] <<- shift(t.all[,(i-1)])
})
t.all <- as.data.frame(t.all)
which gives the same data.frame, but with slightly different column names
> t.all
V1 V2 V3
1 1 2 3
2 2 3 0
3 3 0 0