Isn't it true that the weakly_connected_components in julia LightGraphs should provide connected components where if The DiGraph is turned into an undirected graph, then each component should be connected?
I have tried this and I do not receive such components? As an example I have tried this on the political blogs data as an undirected network
data=readdlm(path,',',Int64) #contains edges in each row
N_ = length(unique(vcat(data[:,1],data[:,2]))) ##to get number of vertices
network = LightGraphs.DiGraph(N_)
#construct the network
for i in 1:size(data,1)
add_edge!(network, Edge(data[i,1], data[i,2]))
end
#largest weakly connected component
net = weakly_connected_components(network)[1]
temp_net,vmap = induced_subgraph(network, net)
and after getting the largest weakly connected component, I see the following:
isempty([i for i in vertices(temp_net) if isempty(edges(temp_net).adj[i])])
julia>false
which signigies some nodes have no incoming or outgoing edges.
What can be the problem? I am using the latest release 6, but the LightGraphs package tests seem to be working.
In addition to what #dan-getz said, I must implore you not to access any internal data fields of structures - we have accessors for everything that's "public". Specifically, edges(temp_net).adj is not guaranteed to be available. It's currently the same as fadj(g), the forward adjacency list of g, for both directed and undirected graphs, but it's not intended to be used except to help keep edge iteration state.
If you use .adj, your code will break on you without warning at some point.
The TL;DR answer is that edges(temp_net).adj[i] contains only the vertices i connects to, and not those connecting to i. And some vertices have no incoming edges.
The longer version, is the following which shows temp_net in a randomly generated network and assigned as in the question is indeed weakly-connected. First building a random network with:
julia> using LightGraphs
julia> N_ = 1000 ;
julia> network = LightGraphs.DiGraph(N_)
{1000, 0} directed simple Int64 graph
julia> using Distributions
julia> for i in 1:N_
add_edge!(network, sample(1:N_,2)...)
end
julia> net = weakly_connected_components(network)[1]
julia> temp_net,vmap = induced_subgraph(network, net)
({814, 978} directed simple Int64 graph, [1, 3, 4, 5, 6, 9, 10, 11, 12, 13 … 989, 990, 991, 993, 995, 996, 997, 998, 999, 1000])
And, now, we have:
julia> length(vertices(temp_net))
814
julia> invertices = union((edges(temp_net).adj[i] for i in vertices(temp_net))...);
julia> outvertices = [i for i in vertices(temp_net) if !isempty(edges(temp_net).adj[i])] ;
julia> length(union(invertices,outvertices))
814
Thus all 814 vertices either have edges from, or to them (or both) and are part of the weakly connected component.
Related
I am working with networkx 2.5. in Jupyter notebook, python version 3.6.5 I am working with an indirect graph. I have a graph from which I would like to perform the following steps iteratively
Remove one edge,
Perform some calculations over the remaining graph (number of connected components, diameter and nodes)
Put back the removed edge
Remove another edge
Repeat the whole procedure until all the edges were deleted once
For instance, I have the following graph:
'target' : [3, 2 ,4]})
nodes2 = pd.DataFrame({'nodeS' : [1, 2, 3 , 4],
'density' : [1, 2, 4, 2],
'indiv' : [2, 4, 1, 5]})
G1 = nx.from_pandas_edgelist(edges2, 'source', 'target')
nx.set_node_attributes(G1, pd.Series(nodes2.density, index=nodes2.nodeS).to_dict(), "density")
nx.set_node_attributes(G1, pd.Series(nodes2.indiv, index=nodes2.nodeS).to_dict(), 'indiv')
I plotted the first graph (G1)
The result looks like
Then I make the calculations over the graph:
# largest connected component (it will be necessary for the first calculation after first edge removal)
G1_LCC = G1.subgraph(max(nx.connected_components(G), key=len))
# number of connected components
G1_cc = nx.number_connected_components(G1)
# diameter of largest connected component
G1_diam = nx.diameter(G1_LCC)
# number of connected nodes
G1_nodes = len(nx.nodes(G1_LCC))
Removing an edge:
I tried the next:
e = (1,2)
G2= nx.remove_edge(e)
But it gives me an error:
nx.edges_iter(G)
AttributeError: module 'networkx' has no attribute 'remove_edge'
This should look like (G2)
Then perform operations on G2
Remove an edge which would look like that:
The third G3, (G3)
and so on, until all the edges have been removed once.
Note: I tried with edges iter
nx.edges_iter(G)
but I also got an error
AttributeError: module 'networkx' has no attribute 'edges_iter'
Is there an iterative/ efficient way to remove the edges, make calculations and put them back? Rather than do it one by one? Thank you
remove_edge is a method of the Graph class, not a top-level function in the library. So G2 = G1.remove_edge(e).copy() would be the correct command (you may not need to make a copy if you are adding the edge back in at a later point). edges_iter was also a method of Graph but has been discontinued for edges. So for e in G1.edges(), not for e in nx.edges_iter(G1).
I've been looking for packages using which I could create subgraphs with overlapping vertices.
From what I understand in Networkx and metis one could partition a graph into two or multi-parts. But I couldn't find how to partition into subgraphs with overlapping nodes.
Suggestions on libraries that support partitioning with overlapping vertices will be really helpful.
EDIT: I tried the angel algorithm in CDLIB to partition the original graph into subgraphs with 4 overlapping nodes.
import networkx as nx
from cdlib import algorithms
if __name__ == '__main__':
g = nx.karate_club_graph()
coms = algorithms.angel(g, threshold=4, min_community_size=10)
print(coms.method_name)
print(coms.method_parameters) # Clustering parameters)
print(coms.communities)
print(coms.overlap)
print(coms.node_coverage)
Output:
ANGEL
{'threshold': 4, 'min_community_size': 10}
[[14, 15, 18, 20, 22, 23, 27, 29, 30, 31, 32, 8], [1, 12, 13, 17, 19, 2, 21, 3, 7, 8], [14, 15, 18, 2, 20, 22, 30, 31, 33, 8]]
True
0.6470588235294118
From the communities returned, I understand 1 and 3 have an overlap of 4 nodes but 2 and 3 or 1 and 3 don't have an overlap size of 4 nodes.
It is not clear to me how the overlap threshold (4 overlaps) has to be specified
here algorithms. angel(g, threshold=4, min_community_size=10). I tried setting threshold=4 here to define an overlap size of 4 nodes. However, from the documentation available for angel
:param threshold: merging threshold in [0,1].
I am not sure how to translate the 4 overlaps to the value that has to be set between the bounds [0, 1]. Suggestions will be really helpful.
You can check out CDLIB:
They have a great amount of community finding algorithms applicable to networkX, including some overlapping communities algorithms.
On a side note:
The return type of the functions is called Node Clustering which might be a little confusing at first so here are the methods applicable to it, usually you simply want to convert to a Python dictionary.
Specifically about the angel algorithm in CDLIB:
According to ANGEL: efficient, and effective, node-centric community discovery in static and dynamic networks, the threshold is not the overlapping threshold, but used as follows:
If the ratio is greater than (or equal to) a given threshold, the merge is applied and the node label updated.
Basically, this value determines whether to further merge the nodes into bigger communities, and is not equivalent to the number of overlapping nodes.
Also, don't mistake "labels" with "node's labels" (as in nx.relabel_nodes(G, labels)). The "labels" referred are actually correlated with the Label Propagation Algorithm which is used by ANGEL.
As for the effects of varying this threshold:
[...] Increasing the threshold, we obtain a higher number of communities since lower quality merges cannot take place.
[based on the comment by #J. M. Arnold]
From ANGEL's github repository you can see that when threshold >= 1 only the min_comsize value is used:
self.threshold = threshold
if self.threshold < 1:
self.min_community_size = max([3, min_comsize, int(1. / (1 - self.threshold))])
else:
self.min_community_size = min_comsize
I am wondering about how to compute the distance between a given pair of nodes, say nodes "i" and "j"
This is a minimal example for say nodes 2 and 12 from a Random Regular Graph with 100 nodes and connectivity 3
julia> using LightGraphs
julia> L = random_regular_graph(100, 3)
julia> paths= dijkstra_shortest_paths(L, 2)
julia> distances = paths.dists
julia> d = distances[12]
The problem with this approach is that I have to calculate the distances between all the nodes and my node 2 in order to know the distance between my two nodes of interest
If you just need the shortest path from a specific source to a specific destination, consider using A*:
julia> g = CycleGraph(10)
{10, 10} undirected simple Int64 graph
julia> a_star(g, 1, 8)
3-element Array{LightGraphs.SimpleGraphs.SimpleEdge{Int64},1}:
Edge 1 => 10
Edge 10 => 9
Edge 9 => 8
If you are JUST interested in the (unweighted, unit) distances, use gdistances:
julia> gdistances(g, 1)[8]
3
In any case, do not access the .dists field from the DijkstraResult. Use the dists() method as the accessor. The internals of LightGraphs structs are not intended to be used directly.
I am a bit confused on how to distinguish a directed graph to be aperiodic or periodic. Wikipedia says this about aperiodic graphs:
'In the mathematical area of graph theory, a directed graph is said to be aperiodic if there is no integer k > 1 that divides the length of every cycle of the graph.'
For example is the graph below aperiodic or periodic. I believe the graph is not periodic but by wikipedia's definition it is periodic since integer k = 2 divides all cycles in the graph (AC and ACDB)
It would be great if someone could provide a method to distinguish if a graph is aperiodic or periodic. Maybe provide some examples of periodic and aperiodic graphs to help explain.
Thank you.
Here's a short python implementation based on Networkx, for finding wether a graph is periodic:
import networkx as nx
from math import gcd
from functools import reduce
G = nx.DiGraph()
G.add_edges_from([('A', 'C'), ('C', 'D'), ('D', 'B'), ('B', 'A'), ('C', 'A')])
cycles = list(nx.algorithms.cycles.simple_cycles(G))
cycles_sizes = [len(c) for c in cycles]
cycles_gcd = reduce(gcd, cycles_sizes)
is_periodic = cycles_gcd > 1
print("is_periodic: {}".format(is_periodic))
The code does the following:
Build the graph from your example (by specifying the edges).
List all cycles (AC and ACDB).
List all cycles sizes [2, 4].
Find greatest common denominator (GCD).
If GCD is 1 it means the graph is aperiodic, otherwise it's periodic by definition.
The graph you have given above in not aperiodic as it has the period of 2. (i.e. every node can return to itself in multiples of 2 steps)
You can play with different examples to get better intuition, and also visualize your graph by adding:
import matplotlib.pyplot as plt
nx.draw_networkx(G, with_labels=True)
plt.show()
The equation for Network Modularity is given on its wikipedia page (and in reputable books). I want to see it working in some code. I have found this is possible using the modularity library for igraph used with R (The R Foundation for Statistical Computing).
I want to see the example below (or a similar one) used in the code to calculate the modularity. The library gives on example but it isn't really what I want.
Let us have a set of vertices V = {1, 2, 3, 4, 5} and edges E = {(1,5), (2,3), (2,4), (2,5) (3,5)} that form an undirected graph.
Divide these vertices into two communities: c1 = {2,3} and c2 = {1,4,5}. It is the modularity of these two communities that is to be computed.
library(igraph)
g <- graph(c(1,5,2,3,2,4,2,5,3,5))
membership <- c(1,2,2,1,1)
modularity(g, membership)
Some explanation here:
The vector I use when creating the graph is the edge list of the graph. (In igraph versions older than 0.6, we had to subtract 1 from the numbers because igraph uses zero-based vertex indices at that time, but not any more).
The i-th element of the membership vector membership gives the index of the community to which vertex i belongs.