count number of disconnected sub-networks - r

In igraph is there a function that will return the number of sub-networks that are not connected to each other?
For example, it would return 3 for the network below.
Was pretty sure I had used a function like this in the past but can't find anything like it now. There are options for communities and for individual isolates but not for these disconnected sub-networks that I can find.

You are looking for components. For instance,
g <- sample_gnp(20, 1/20)
components(g)$no
# [1] 14
gives the number of them in g.

Related

Generating random configuration model graphs in Julia using iGraph

Recently I started to use iGraph in Julia to generate random configuration models, since LightGraphs has a problem with time realization of these objects (link to a previous question related to this: random_configuration_model(N,E) takes to long on LightGraphs.jl). To generate such graphs, I generate a vector E (1-based indexed) and from it I generate an iGraph object g2 as follows
using PyCall, Distributions
ig = pyimport("igraph")
α=0.625;N=1000;c=0.01*N;p=α/(α+c)
E = zeros(Int64,N)
test=false
while test == false
s=0
for i in 1:N
E[i] = rand(NegativeBinomial(α,p))
s += E[i]
end
if iseven(s) == true
test = true
else
end
end
g = ig.Graph.Realize_Degree_Sequence(E)
My first question is related to the fact that python is 0-based indexed. By comparison of the components of E to the degrees of g, it seems that ig.Graph.Realize_Degree_Sequence(E) automatically convert the index bases, generating a 0 based object g from a 1-based object E. Is this correct?
Secondly, I would like to enforce the random configuration graph g to be simple, with no self loops nor multi-edges. iGraphs documentation (https://igraph.org/c/doc/igraph-Generators.html#igraph_realize_degree_sequence) says that the flag allowed_edge_types:IGRAPH_SIMPLE_SW does the job, but I am not able to find the syntax to use it in Julia. Is it possible at all to use this flag in Julia?
Be careful with LightGraph's random_configruaton_model. Last time I looked, it was broken, and it did not sample uniformly, yet the authors outright refused to fix it. I don't know if anything changed since then.
C/igraph's degree_sequence_game() has a correctly implemented method that samples uniformly, called IGRAPH_DEGSEQ_SIMPLE_NO_MULTIPLE_UNIFORM, but for some reason it is not yet exposed in Python ... we'll look into exposing it soon.
Then you have two options:
Use python-igraph's "simple" method, and keep generating graphs until you get a simple one (test it with Graph.is_simple()). This uses the stub-matching method, and will sample exactly uniformly. For large degrees, it will take a long time due to many rejections. Note that this rejection method exactly what the IGRAPH_DEGSEQ_SIMPLE_NO_MULTIPLE_UNIFORM implements (albeit bit faster).
Use igraph's Graph.Realize_Degree_Sequence() to create one graph with the given degree sequence, then rewrite it using Graph.rewire() with a sufficiently large number of rewiring steps (at least several times the edge count). This method uses degree-preserving edge switches and can be shown to sample uniformly in the limit of a large number of switches.
The "no_multiple" method in python-igraph will again not sample uniformly.
Take a look at section 2.1 of this paper for a gentle explanation of what techniques are available for uniform sampling.
You are reading C docs of igraph. You need to read Python documentation https://igraph.org/python/api/latest/igraph._igraph.GraphBase.html#Degree_Sequence. So:
julia> ige = [collect(e .+ 1) for e in ig.Graph.Degree_Sequence(collect(Any, E), method="no_multiple").get_edgelist()];
julia> extrema(reduce(vcat, ige)) # OK check
(1, 1000)
julia> sg = SimpleGraph(1000)
{1000, 0} undirected simple Int64 graph
julia> for (a, b) in ige
add_edge!(sg, a, b)
end
julia> sg # OK check
{1000, 5192} undirected simple Int64 graph
julia> length(ige) # OK check
5192
julia> sort(degree(sg)) == sort(E) # OK check
true
I used "no_multiple" algorithm, as "vl" algorithm assumes connected graph and some of degrees of nodes in your graph can be 0.

what does "not strongly connected graph" means in centiserve centroid computation?

As in the question, I have four different networks which I load from 4 different csv files. Each one fails when I compute the centroid using centiserve library. On the other hand, if I generate a random ER network, the centroid computation works.
I looked into the centroid funcion and eventually I found it checks whether the network is connected using an igraph this function is.connected(g, mode="strong")
According to wikipedia a graph is strongly connected if all the nodes are reachable from a random node in the network. To this aim, I calculated the components of my network, using igraph's decompose() function and all the networks have a single connected component: length(decompose(net)) is always equal to 1. But, centroid(net) is always returning the error.
Eventually, the question is: What exactly is this function looking for when it verifies if the graph is suitable? Why my network has a single connected component but the is.connected() function of igraph return False?
Some code:
#load file
finalNet <- read.csv("net.csv", sep=",", header=T)
#get network
net <- graph_from_data_frame(finalNet[, c(1, 2)])
#decompose says that there is a single connected component
length(decompose(net))
#while centroid does not work!
centroid(net)
the network is available here
ok, I found the answer. The problem is that the function graph_from_data_frame create a directed network, if not specified otherwise.
Hence, the solution to make my example work is to load the network as not directed:
net <- graph_from_data_frame(finalNet[, c(1, 2)], directed=F)

get.inducedSubgraph and loop function

I have a list containing a network for each row (sna.list.1).
For each of the networks, I need to extract a subgraph where only women are included, in order to calculate the density of women-only networks.
I have created a loop function to set vertex attributes
female=vector()
for (i in 1 : length (sna.list.1))
set.vertex.attribute(sna.list.1[[i]],"Female",alter.list.1bis[[i][,"NIDemo1_c4"])
but when I tried to create the subgraph with get.inducedSubgraph I receive a warning message saying " Illegal vertex selection in get.inducedSubgraph". The same formula works if I applied it to just one row/network.
subnetwork2=vector()
for (i in 1 : length (sna.list.1))
subnetwork2[[i]]=get.inducedSubgraph(sna.list.1[[i]],v=which(sna.list.1[[i]]%v%"Female"=="1"))
does anyone have suggestions?
Assuming that get.inducedSubgraph isolated alters is a continuation of your issue, it sounds like you were trying to induce a subgraph of size zero. i.e. v=which(sna.list.1[[i]]%v%"Female"=="1") was returning integer(0) for some networks.
Ideally, since the network package supports networks of size zero (no vertices) get.inducedSubgraph() should return a network of size zero in this case, but it does not yet do that.

Rejecting isomorphisms from collection of graphs

I have a collection of 15M (Million) DAGs (directed acyclic graphs - directed hypercubes actually) that I would like to remove isomorphisms from. What is the common algorithm for this? Each graph is fairly small, a hybercube of dimension N where N is 3 to 6 (for now) resulting in graphs of 64 nodes each for N=6 case.
Using networkx and python, I implemented it like this which works for small sets like 300k (Thousand) just fine (runs in a few days time).
def isIsomorphicDuplicate(hcL, hc):
"""checks if hc is an isomorphism of any of the hc's in hcL
Returns True if hcL contains an isomorphism of hc
Returns False if it is not found"""
#for each cube in hcL, check if hc could be isomorphic
#if it could be isomorphic, then check if it is
#if it is isomorphic, then return True
#if all comparisons have been made already, then it is not an isomorphism and return False
for saved_hc in hcL:
if nx.faster_could_be_isomorphic(saved_hc, hc):
if nx.fast_could_be_isomorphic(saved_hc, hc):
if nx.is_isomorphic(saved_hc, hc):
return True
return False
One better way to do it would be to convert each graph to its canonical ordering, sort the collection, then remove the duplicates. This bypasses checking each of the 15M graphs in a binary is_isomophic() test, I believe the above implementation is something like O(N!N) (not taking isomorphic time into account) whereas a clean convert all to canonical ordering and sort should take O(N) for the conversion + O(log(N)N) for the search + O(N) for the removal of duplicates. O(N!N) >> O(log(N)N)
I found this paper on Canonical graph labeling, but it is very tersely described with mathematical equations, no pseudocode: "McKay's Canonical Graph Labeling Algorithm" - http://www.math.unl.edu/~aradcliffe1/Papers/Canonical.pdf
tldr: I have an impossibly large number of graphs to check via binary isomorphism checking. I believe the common way this is done is via canonical ordering. Do any packaged algorithms or published straightforward to implement algorithms (i.e. have pseudocode) exist?
Here is a breakdown of McKay ’ s Canonical Graph Labeling Algorithm, as presented in the paper by Hartke and Radcliffe [link to paper].
I should start by pointing out that an open source implementation is available here: nauty and Traces source code.
Ok, let's do this! Unfortunately this algorithm is heavy in graph theory, so we need some terms. First I will start by defining isomorphic and automorphic.
Isomorphism:
Two graphs are isomorphic if they are the same, except that the vertices are labelled differently. The following two graphs are isomorphic.
Automorphic:
Two graphs are automorphic if they are completely the same, including the vertex labeling. The following two graphs are automorphic. This seems trivial, but turns out to be important for technical reasons.
Graph Hashing:
The core idea of this whole thing is to have a way to hash a graph into a string, then for a given graph you compute the hash strings for all graphs which are isomorphic to it. The isomorphic hash string which is alphabetically (technically lexicographically) largest is called the "Canonical Hash", and the graph which produced it is called the "Canonical Isomorph", or "Canonical Labelling".
With this, to check if any two graphs are isomorphic you just need to check if their canonical isomporphs (or canonical labellings) are equal (ie are automorphs of each other). Wow jargon! Unfortuntately this is even more confusing without the jargon :-(
The hash function we are going to use is called i(G) for a graph G: build a binary string by looking at every pair of vertices in G (in order of vertex label) and put a "1" if there is an edge between those two vertices, a "0" if not. This way the j-th bit in i(G) represents the presense of absence of that edge in the graph.
McKay ’ s Canonical Graph Labeling Algorithm
The problem is that for a graph on n vertices, there are O( n! ) possible isomorphic hash strings based on how you label the vertices, and many many more if we have to compute the same string multiple times (ie automorphs). In general we have to compute every isomorph hash string in order to find the biggest one, there's no magic sort-cut. McKay's algorithm is a search algorithm to find this canonical isomoprh faster by pruning all the automorphs out of the search tree, forcing the vertices in the canonical isomoprh to be labelled in increasing degree order, and a few other tricks that reduce the number of isomorphs we have to hash.
(1) Sect 4: the first step of McKay's is to sort vertices according to degree, which prunes out the majority of isomoprhs to search, but is not guaranteed to be a unique ordering since there may be more than one vertex of a given degree. For example, the following graph has 6 vertices; verts {1,2,3} have degree 1, verts {4,5} have degree 2 and vert {6} has degree 3. It's partial ordering according to vertex degree is {1,2,3|4,5|6}.
(2) Sect 5: Impose artificial symmetry on the vertices which were not distinguished by vertex degree; basically we take one of the groups of vertices with the same degree, and in turn pick one at a time to come first in the total ordering (fig. 2 in the paper), so in our example above, the node {1,2,3|4,5|6} would have children { {1|2,3|4,5|6}, {2|1,3|4,5|6}}, {3|1,2|4,5|6}} } by expanding the group {1,2,3} and also children { {1,2,3|4|5|6}, {1,2,3|5|4|6} } by expanding the group {4,5}. This splitting can be done all the way down to the leaf nodes which are total orderings like {1|2|3|4|5|6} which describe a full isomorph of G. This allows us to to take the partial ordering by vertex degree from (1), {1,2,3|4,5|6}, and build a tree listing all candidates for the canonical isomorph -- which is already a WAY fewer than n! combinations since, for example, vertex 6 will never come first. Note that McKay evaluates the children in a depth-first way, starting with the smallest group first, this leads to a deeper but narrower tree which is better for online pruning in the next step. Also note that each total ordering leaf node may appear in more than one subtree, there's where the pruning comes in!
(3) Sect. 6: While searching the tree, look for automorphisms and use that to prune the tree. The math here is a bit above me, but I think the idea is that if you discover that two nodes in the tree are automorphisms of each other then you can safely prune one of their subtrees because you know that they will both yield the same leaf nodes.
I have only given a high-level description of McKay's, the paper goes into a lot more depth in the math, and building an implementation will require an understanding of this math. Hopefully I've given you enough context to either go back and re-read the paper, or read the source code of the implementation.
This is indeed an interesting problem.
I would approach it from the adjacency matrix angle. Two isomorphic graphs will have adjacency matrices where the rows / columns are in a different order. So my idea is to compute for each graph several matrix properties which are invariant to row/column swaps, off the top of my head:
numVerts, min, max, sum/mean, trace (probably not useful if there are no reflexive edges), norm, rank, min/max/mean column/row sums, min/max/mean column/row norm
and any pair of isomorphic graphs will be the same on all properties.
You could make a hash function which takes in a graph and spits out a hash string like
string hashstr = str(numVerts)+str(min)+str(max)+str(sum)+...
then sort all graphs by hash string and you only need to do full isomorphism checks for graphs which hash the same.
Given that you have 15 million graphs on 36 nodes, I'm assuming that you're dealing with weighted graphs, for unweighted undirected graphs this technique will be way less effective.
This is an interesting question which I do not have an answer for! Here is my two cents:
By 15M do you mean 15 MILLION undirected graphs? How big is each one? Any properties known about them (trees, planar, k-trees)?
Have you tried minimizing the number of checks by detecting false positives in advance? Something includes computing and comparing numbers such as vertices, edges degrees and degree sequences? In addition to other heuristics to test whether a given two graphs are NOT isomorphic. Also, check nauty. It may be your way to check them (and generate canonical ordering).
If all your graphs are hypercubes (like you said), then this is trivial: All hypercubes with the same dimension are isomorphic, hypercubes with different dimension aren't. So run through your collection in linear time and throw each graph in a bucket according to its number of nodes (for hypercubes: different dimension <=> different number of nodes) and be done with it.
since you mentioned that testing smaller groups of ~300k graphs can be checked for isomorphy I would try to split the 15M graphs into groups of ~300k nodes and run the test for isomorphy on each group
say: each graph Gi := VixEi (Vertices x Edges)
(1) create buckets of graphs such that the n-th bucket contains only graphs with |V|=n
(2) for each bucket created in (1) create subbuckets such that the (n,m)-th subbucket contains only graphs such that |V|=n and |E|=m
(3) if the groups are still too large, sort the nodes within each graph by their degrees (meaning the nr of edges connected to the node), create a vector from it and distribute the graphs by this vector
example for (3):
assume 4 nodes V = {v1, v2, v3, v4}. Let d(v) be v's degree with d(v1)=3, d(v2)=1, d(v3)=5, d(v4)=4, then find < := transitive hull ( { (v2,v1), (v1,v4), (v4,v3) } ) and create a vector depening on the degrees and the order which leaves you with
(1,3,4,5) = (d(v2), d(v1), d(v4), d(v3)) = d( {v2, v1, v4, v3} ) = d(<)
now you have divided the 15M graphs into buckets where each bucket has the following characteristics:
n nodes
m edges
each graph in the group has the same 'out-degree-vector'
I assume this to be fine grained enough if you are expecting not to find too many isomorphisms
cost so far: O(n) + O(n) + O(n*log(n))
(4) now, you can assume that members inside each bucket are likely to be isomophic. you can run your isomorphic-check on the bucket and only need to compare the currently tested graph against all representants you have already found within this bucket. by assumption there shouldn't be too many, so I assume this to be quite cheap.
at step 4 you also can happily distribute the computation to several compute nodes, which should really speed up the process
Maybe you can just use McKay's implementation? It is found here now: http://pallini.di.uniroma1.it/
You can convert your 15M graphs to the compact graph6 format (or sparse6) which nauty uses and then run the nauty tool labelg to generate the canonical labels (also in graph6 format).
For example - removing isomorphic graphs from a set of random graphs:
#gnp.py
import networkx as nx
for i in range(100000):
graph = nx.gnp_random_graph(10,0.1)
print nx.generate_graph6(graph,header=False)
[nauty25r9]$ python gnp.py > gnp.g6
[nauty25r9]$ cat gnp.g6 |./labelg |sort |uniq -c |wc -l
>A labelg
>Z 10000 graphs labelled from stdin to stdout in 0.05 sec.
710

Extracting subgraphs (cliques) within graph on Netlogo

I have a netlogo question. I have some graph structures of nodes connected with (undirected) links. I need to figure out which is the smallest subgraph within one of these structures. Basically subgraph means which nodes are all connected between each other. So if I have a structure of 5 nodes and node 1 is connected to 2 and 3; node 2 to 3, 1 and 4; and node 3 to 1, 2 and 5 I need to detect the subgraph of nodes 1, 2 and 3 since they're all interconnected.
Is there an easy way to do this or is it basically not computationally possible?
Edit: I figured out that if I use the netlogo extension nw I can use the nw:maximal-cliques method to calculate what I want. Although now I have another problem. I'm trying to fill up a list of the lists of the cliques this way
let lista-cliques [nw:maximal-cliques] of turtles with [guild = g]
lista-cliques is usually of length two but the first element which should be a list of the turtles of the clique is a list like this
[[[nobody] [nobody] [nobody] [nobody]...etc
with a length of 300 when the graphs made by turtles with guild = g are around 2-8 turtles in length. Is the call to nw:maximal-cliques well made?
Any ideas of what am I doing wrong?
Edit 2: I figured how to fix the length of the list by doing this
let lista-cliques (list ([nw:maximal-cliques] of turtles with [guild = g]))
Now the list is not of 300 nodes but equal to the amount of nodes on the graph with nodes with guild = g.
That means that
length item 1 lista-cliques
is equal to
count turtles with [guild = g]
which is also evidently wrong since I can see graphs with nodes connected to only one or two nodes. I think I'm getting closer but I don't know why nw:maximal-cliques is not creating a list of the maximal-cliques but a list of all the nodes on the graph.
Any ideas?
Thanks
Your usage of nw:maximal-cliques is not quite correct.
I think that what you are trying to express by specifying of turtles with [guild = g] is something like "taking into account only the turtles that are part of guild g", but what it actually means to NetLogo is "run the reporter that precedes of for each turtle that are part of guild g and make a list out of that". (Just like, e.g., [color] of turtles will run the [color] reporter block once for each turtle and build a list of colors with the results.)
nw:maximal-cliques is a primitive that operates on the whole network, so you don't want to run it once for each turtle. And just like most primitives in the nw extension, you need to tell it which turtles and links to operate on by using the nw:set-snapshot primitive.
I think you can achieve what you want by simply doing:
nw:set-snapshot (turtles with [guild = g]) links
let lista-cliques nw:maximal-cliques
(Note that nw:set-snapshot takes a static "picture" of your network on which further calls to nw primitives operate. If something changes in your network, you need to call nw:set-snapshot to take a new picture. This will probably change in a future version of the extension.)

Resources