Getting the biggest connected component in R igraph - r

How do I get a subgraph of the the biggest component of a graph?
Say for example I have a graph g.
size_components_g <-clusters(g, mode="weak")$csize
size_components_g
#1 2 3 10 25 2 2 1
max_size <- max(size_components_g)
max_size
#25
So 25 is the biggest size.
I want to extract the component that has these 25 vertices. How do I do that?

Well, detailed explanation of output value of any function in the R package could be found in its documentation. In this case igraph::clusters returns a named list where in csize sizes of clusters are stored while membership contains the cluster id to which each vertex belongs to.
g <- igraph::sample_gnp(20, 1/20)
components <- igraph::clusters(g, mode="weak")
biggest_cluster_id <- which.max(components$csize)
# ids
vert_ids <- V(g)[components$membership == biggest_cluster_id]
# subgraph
igraph::induced_subgraph(g, vert_ids)

Related

cutree alternative to extract cluster with given number of objects

While stats::cutree() takes an hclust-object and cuts it into a given number of clusters, I'm looking for a function that takes a given amount of elements and attempts to set k accordingly. In other words: Return the first cluster with n elements.
For example:
Searching for the first cluster with n = 9 objects.
library(psych)
data(bfi)
x <- bfi
hclust.res <- hclust(dist(abs(cor(na.omit(x)))))
cutree.res <- cutree(hclust.res, k = 2)
cutree.table <- table(cutree.res)
cutree.table
# no cluster with n = 9 elements
> cutree.res
1 2
23 5
while k = 3 yields
cutree.res <- cutree(hclust.res, k = 3)
# three clusters, whereas cluster 2 contains the required amount of objects
> cutree.table
cutree.res
1 2 3
14 9 5
Is there a more convenient way then iterating over this?
Thanks
You can easily write code for this yourself that only does one pass over the dendrogram rather than calling cutter in a loop.
Just execute the merges one by one and note the cluster sizes. Then keep the one that you "liked" the best.
Note that there might be no such solution. For example on the 1 dimensional data set -11 -10 +10 +11, cutting the dendrogram in merge order will return clusters with 1,2, or 4 elements only. So you'll have to handle this case, too.

output the names/ids list of node for subgraph (subisomorphisms)

I would like to count the number of the full 3-node subgraphs.
Nodes of the original graph have names. The example code is below.
g <- graph.full(n=5, directed = TRUE)
# adjacency matrices
d3<-matrix(c(0,1,1,1,0,1,1,1,0),nrow=3,ncol=3)
# Turn them into a convenient list
sbgDouble.mat<-list(d3)
# And then into a list of graph objects
sbgDouble.graph<-lapply(sbgDouble.mat, graph.adjacency)
# Count the number of the full 3-node subgraph
subgraph.freq.g<-c()
subgraph.freq.g[1]<-
graph.count.subisomorphisms.vf2(g, sbgDouble.graph[[1]])
#> subgraph.freq.g
# [1] 60
Update: I have tried graph.get.isomorphisms.vf2(g, sbgDouble.graph[[1]]) but result is list().
Could some one please say me how to output the names/ids list of nodes for 3-node subgraph? Thanks.
in order to see vertex IDs we can use the function graph.get.subisomorphisms.vf2(), for example,
graph.get.subisomorphisms.vf2(g, sbgDouble.graph[[1]])[1]
[[1]]
+ 3/5 vertices:
[1] 1 2 3

R, igraph walktrap.community splits all nodes in the case of a fully connected graph

Using the walktrap.community approach for defining communities within my graph works great - of all the algorithms I tested it performs the best. The caveat is that in the case of a fully connected graph with no self linkages (every node connects to each other node, but not itself) each node is assigned its own community.
I am not experienced in network analysis but this seems like an interesting case and its certainly not desired behavior. How can I avoid this splitting in my actual data?
library(igraph)
match.mat = matrix(T, nrow=8, ncol=8)
diag(match.mat)[1:8] = T
topology = which(match.mat, arr.ind=T)
g = graph.data.frame(topology, directed=F)
cm = walktrap.community(g)
membership(cm)
# 2 3 4 5 6 7 8 1
# 1 1 1 1 1 1 1 1
plot(cm, g)
diag(match.mat)[1:8] = F
topology = which(match.mat, arr.ind=T)
g = graph.data.frame(topology, directed=F)
cm = walktrap.community(g)
membership(cm)
#2 3 4 5 6 7 8 1
#1 2 3 4 5 6 7 8
plot(cm, g)
Conceptually I'm not sure how the lack of self linkages would lead to every node being split - maybe possible communities are all tied and therefore split? But the case of all self linkages would seem equivalent in that regard.
Thanks!
http://www-rp.lip6.fr/~latapy/Publis/communities.pdf
If you have read the paper carefully, you will note that the Walktrap builds a node distance measure based on the random walk transition matrix. However, this transition matrix needs to be ergodic, therefore its underlying adjacency matrix needs to be connected and non-bipartite. Non-bipartiteness is achieved by adding self loops to the nodes. Therefore, you need to add self loops to each node in your graph. Maybe it will be a good idea for the future to include this correction in the igraph package, but as far as I know they are using the C implementation of Latapy and Pons and for this one the graph needs to have self loops. Hope this answers your question!

How to compare communities in two consecutive graphs

I have the same graph represented at two different times, g.t0 and g.t1. g.t1 differs from g.t0 for having one additional edge but maintains the same vertices.
I want to compare the communities in g.t0 and g.t1, that is, to test whether the vertices moved to a different community from t0 to t1. I tried the following
library(igraph)
m <- matrix(c(0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0),nrow=4,ncol=4)
g.t0 <- graph.adjacency(m)
memb.t0 <- membership(edge.betweenness.community(g.t0))
V(g.t0)
# Vertex sequence:
# [1] 1 2 3 4
memb.t0
# [1] 1 2 2 3
g.t1 <- add.edges(g.t0,c(1,2))
memb.t1 <- membership(edge.betweenness.community(g.t1))
V(g.t1)
# Vertex sequence:
# [1] 1 2 3 4
memb.t1
# [1] 1 1 1 2
But of course the problem is that the indexing of the communities always start from 1. Then in the example it seems that all the vertices have moved to a different community, but the most intuitive reading is that actually only the vertex 1 changed community, moving with 2 and 3.
How could I approach the problem of counting the number of vertices that changed communities from t0 to t1?
Actually this is not an easy question. In general you need to match the communities in the two graphs, using some rule or criteria that the matching optimizes. As you can have different number of communities, the matching is not necessarily bijective.
There were several methods and quantities proposed for this problem, a bunch is implemented in igraph, see
http://igraph.org/r/doc/compare.html
compare.communities(memb.t1, memb.t0, method="vi")
# [1] 0.4773856
compare.communities(memb.t1, memb.t0, method="nmi")
# [1] 0.7020169
compare.communities(memb.t1, memb.t0, method="rand")
# [1] 0.6666667
See the references in the igraph manual for the details about the methods.

count cycles in network

What is the best way, or are there any ways implemented in are to count both 3 and 4 cycles in networks.
3 cycles equal connected groups of three nodes(triangles) to be calculated from one mode networks
4 cycles equal connected groups of four nodes(squares) to be calculated from two mode networks
If i have networks like this:
onemode <- read.table(text= "start end
1 2
1 3
4 5
4 6
5 6",header=TRUE)
twomode <- read.table(text= "typa typev
aa a
bb b
bb a
aa b",header=TRUE)
I thought
library(igraph)
g <- graph.data.frame(twomode)
E(g)
graph.motifs(g, size = 4)
would count the number of squares in my two mode network but I dont understand the output. I thought the result would be 1
?graph.motifs
graph.motifs searches a graph for motifs of a given size and returns a
numeric vector containing the number of different motifs. The order of
the motifs is defined by their isomorphism class, see graph.isoclass.
So the output of this is numeric vector where each value is the count of a certain motif(with sizes is 4 or 3) in your graph.
graph.motifs(g,size=4)
To get the total number of the motifs, you can use graph.motifs.no
graph.motifs.no(g,size=4)
[1] 1
Which is the number of the motif 20
which(graph.motifs(g,size=4) >0)
[1] 20
Another function that might be easier to use for this taks is kcycle.census {sna}. Details: http://svitsrv25.epfl.ch/R-doc/library/sna/html/path.census.html

Resources