Checking validity of topological sort - graph

Given the following directed graph:
I determined the topological sort to be 0, 1, 2, 3, 7, 6, 5, 4 with the values for each node being:
d[0] = 1
f[0] = 16
d[1] = 2
f[1] = 15
d[2] = 3
f[2] = 14
d[3] = 4
f[3] = 13
d[4] = 7
f[4] = 8
d[5] = 6
f[5] = 9
d[6] = 5
f[6] = 10
d[7] = 11
f[7] = 12
Where d is discovery-time and f is finishing-time.
How can I check whether the topological sort is valid or not?

With python and networkx, you can check it as follows:
import networkx as nx
G = nx.DiGraph()
G.add_edges_from([(0, 2), (1, 2), (2, 3)])
all_topological_sorts = list(nx.algorithms.dag.all_topological_sorts(G))
print([0, 1, 2, 3] in all_topological_sorts) # True
print([2, 3, 1, 0] in all_topological_sorts) # False
However, note that in order to have a topological ordering, the graph must be a Directed Acyclic Graph (DAG). If G is not directed, NetworkXNotImplemented will be raised. If G is not acyclic (as in your case) NetworkXUnfeasible will be raised.
See documentation here.

If you want a less coding approach to this question (since it looks like your original topological ordering was generated without code), you can go back to the definition of a topological sort. Paraphrased from Emory University:
Topological ordering of nodes = an ordering (label) of the nodes/vertices such that for every edge (u,v) in G, u appears earlier than v in the ordering.
There's two ways that you could approach this question: from an edge perspective of a vertex perspective. I describe a naive (meaning with some additional space complexity and cleverness, you could improve on them) implementation of both below.
Edge approach
Iterate through the edges in G. For each edge, retrieve the index of each of its vertices in the ordering. Compared the indices. If the origin vertex isn't earlier than the destination vertex, return false. If you iterate through all of the edges without returning false, return true.
Complexity: O(E*V)
Vertex approach
Iterate through the vertices in your ordering. For each vertex, retrieve its list of outgoing edges. If any of those edges end in a vertex that precedes the current vertex in the ordering, return false. If you iterate through all the vertices without returning false, return true.
Complexity: O(V^2*E)

First, do a graph traversal to get the incoming degree of each vertex. Then start from the first vertex in your list. Every time, when we look at a vertex, we want to check two things 1) is the incoming degree of this vertex is 0? 2) is this vertex a neighbor of the previous vertex? We also want to decrement all its neighbors' incoming degree, as if we cut all edges. If we got a no from the previous questions at some point, we know that this is not a valid topological order. Otherwise, it is. This takes O(V + E) time.

Related

Why do we have at least three definitions of a graph in math?

Definition 1 - 2 sets and function
Definitioin 2 - 1 set and 1 family
Definition 3 - 1 relation
Why do we need such a diversity? Are some of these definitions old-fashioned or all of them have their pros and cons?
Undirected graphs and directed graphs
The third definition differs from the first two because it is about directed graphs, while the first two define undirected graph. We care about directed graphs and undirected graphs because they are adapted to different situations and to solving different problems.
You can think of directed graphs and undirected graphs as two different objects.
In general, undirected graphs are somewhat easier to reason about, and most often if someone mentions a "graph" without precision, they mean an undirected graph.
Named edges and incidence function
The first two definitions are pretty much equivalent.
The first definition, with (V, E, ѱ) gives "names" to vertices (elements of V) and names to edges (elements of E), and uses an "incidence function" ѱ to tell you which edge in E corresponds to which pair of vertices of V.
The second definition uses only (V', E') and no ѱ. I am calling them V' and E' instead of V and E to make a distinction from the first definition. Here the vertices have "names", they are the elements of V'; but the edges don't really have individual names, and E' is defined as a subset of the set of undirected pairs of V. Therefore an edge is an undirected pair of elements of V'.
Here is an example of a graph:
By the first definition:
V = {a, b, c, d};
E = {1, 2, 3};
ѱ : E -> {unordered pairs of V}
1 -> ab
2 -> ac
3 -> cd.
By the second definition:
V' = {a, b, c, d}
E' = {ab, ac, cd}.
As you can see, V' = V, and E' is the image of E by ѱ.
If you don't care about "names" for the edges, the second definition is somewhat shorter. But which one you use really doesn't matter; the theorems you might prove with one definition will be equivalent to the theorems you can prove for the other definition. The difference between the two definition is just a set theory nitpick of what "edge" means: is it a pair of elements of V, or an element of another set which is mapped to a pair of elements of V by a function? Note that the function ѱ is a bijection between the two sets E and E', so really E and E' are two different names for the same set.
Algorithms, programming languages, and representations of graphs
If you ever have to code an algorithm using a graph in your favourite programming language, you will have to decide how to represent the graph using variables and arrays and all the data structures you are used to.
For the vertices, most often, people use V = {0, 1, 2, ..., n-1} where n is the number of vertices. This is convenient because it means you can use the vertices as indices for an array.
For the edges, sometimes we encode E using an vertex-vertex incidence matrix of size n*n with a 1 in cell i,j to indicate an edge between vertices i and j and a 0 in cell i,j to indicate no edge. Here is the incidence matrix for the graph above (I replaced a,b,c,d with 0,1,2,3 as the names for the vertices):
0 1 2 3
0 0 1 1 0
1 1 0 0 0
2 1 0 0 1
3 0 0 1 0
Sometimes we encode E using an array of lists: an array of size n, where cell i contains the list of indices of vertices which are neighbours of vertex i. Here is the array of lists for the same graph:
0: 1,2
1: 0
2: 0,3
3: 2
Those two representations are closer to the second definition, since the edges don't have names; we just care about whether each pair of vertices is an edge or not.
Recently I had to write a C++ program where it was very important for me to number the edges, because I wanted to be able to use them as indices of a matrix. Thus I had V = {0, 1, 2, ..., n-1}; E = {0, 1, 2, ..., m-1}; and then I used an std::map<int, std::pair<int, int>> to map the edge indices to pairs of vertex indices. This representation was closer to your first definition, which an std::map for ѱ. Note that I had to make a choice between mapping edge indices to pairs of vertex indices, or mapping pairs of vertex indices to edge indices. Had I felt the necessity, I could even have used both. The first definition doesn't care, because ѱ is a bijection, so mathematicians can use ѱ and its inverse function ѱ^-1 indifferently; but the data structure std::map is not a mathematical function, and inversing it might take time.
Conclusion
Both definitions are equivalent, and it really doesn't matter which one you use. But if you need to code algorithms using graphs, take some time to consider different representations of graphs and which one will make your algorithm the most efficient.

Normalizing graph edit distance to [0,1] (networkx)

I want to have a normalized graph edit distance.
I'm using this function:
https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.similarity.graph_edit_distance.html#networkx.algorithms.similarity.graph_edit_distance
I'm trying to understand to graph_edit_distance function in order to be able to normalize it between 0 and 1 but I don't understand it fully.
For example:
def compare_graphs(Ga, Gb):
draw_graph(Ga)
draw_graph(Gb)
graph_edit_distance = nx.graph_edit_distance(Ga, Gb, node_match=lambda x,y : x==y)
return graph_edit_distance
compare_graphs(G1, G3)
Why is the graph_edit_distance = 4?
Graph construction:
Hey
e1 = [(1,2), (2,3)]
e3 = [(1,3), (3,1)]
G1 = nx.DiGraph()
G1.add_edges_from(e1)
G3 = nx.DiGraph()
G3.add_edges_from(e3)
The edit distance is measured by:
nx.graph_edit_distance(Ga, Gb, node_match=lambda x,y : x==y)
The difference from graph_edit_distance is that it relates to node indexes.
This is the output of optimize_edit_paths:
list(optimize_edit_paths(G1, G2, node_match, edge_match,
node_subst_cost, node_del_cost, node_ins_cost,
edge_subst_cost, edge_del_cost, edge_ins_cost,
upper_bound, True))
Out[3]:
[([(1, 1), (2, None), (3, 3)],
[((1, 2), None), ((2, 3), None), (None, (1, 3)), (None, (3, 1))],
5.0),
([(1, 1), (2, 3), (3, None)],
[((1, 2), (1, 3)), (None, (3, 1)), ((2, 3), None)],
4.0)]
I know it should be the minimum sequence of node and edge edit operations transforming graph G1 to graph isomorphic to G2.
When I try to count, I get:
1. Add node 2 to G3,
2. Cancel e1=(1,3) from G3
3. Cancel e2=(3,1) from G3
4. Add e3 = (1,2) to G3
5. Add e4 = (2,3) to G3
graph_edit_distance = 5.
What am I missing?
Or alternatively, what can I do in order to normalize the distance I receive?
I thought about dividing by |V1| + |V2| + |E1| + |E2|, or dividing by max(|V1| + |E1|, |V2| + |E2|)) but I'm not sure.
Thanks in advance.
I know its old post but I am currently reading about GED and was willing to answer it for someone looking for it in future.
Graph edit distance is 4
Reason:
1 and 3 are connected using an undirected edge. While 1 and 2 are connected using directed edge. Now graph edit path will be :
Turn undirected edge to directed edge
Change 3 to 2 (substitution)
Add an edge to 2
Finally add a node to that edge
The Graph Edit Distance is unbounded. The more nodes you add to one graph, that the other graph doesn't have, the more edits you need to make them the same. So, the GED can be arbitrarily large.
I haven't seen this proposed anywhere else, but I have an idea:
Instead of GED(G1,G2), you can compute GED(G1,G2)/[GED(G1,G0) + GED(G2,G0)],
where G0 is the empty graph.
The situation is analogous to the difference between real numbers.
Imagine that I give you |A-B| and |C-D|. They are not on the same footing.
E.g., you could have A=1, B=2 and C=1000, D=1001.
The differences are equal, but the relative differences are very different.
To convey that, you would compute |A-B|/(|A|+|B|) instead of just |A-B|.
This is symmetric to a swapping of A and B, and it's a relative distance.
Since it's relative, it can be compared to the other relative distance: |C-D|/(|C|+|D|). These relative distances are comparable, they're expressing a notion that is universal and applies to all pairs of numbers.
In summary, compute the relative GED, using G0, the null graph, like you would use 0 if you were measuring the relative distance between real numbers.

Acyclic Directed Graphs and the edges

Okay, I know that a directed acyclic graph (DAG) has E=V-1 edges. E = number of edges. V = number of vertices.
So the question is, "In a directed graph G, the number of edges is always less than the number of vertices." True or false?
Thanks for the help.
Assume N vertices/nodes, and let's explore building up a DAG with maximum edges. Consider any given node, say N1. The maximum # of nodes it can point to, or edges, at this early stage is N-1. Let's choose a second node N2: it can point to all nodes except itself and N1 - that's N-2 additional edges. Continue for remaining nodes, each can point to one less edge than the node before. The last node can point to 0 other nodes.
Sum of all edges: (N-1) + (N-2) + .. + 1 + 0 == (N-1)(N)/2

Undirected graph conversion to tree

Given an undirected graph in which each node has a Cartesian coordinate in space that has the general shape of a tree, is there an algorithm to convert the graph into a tree, and find the appropriate root node?
Note that our definition of a "tree" requires that branches do not diverge from parent nodes at acute angles.
See the example graphs below. How do we find the red node?
here is a suggestion on how to solve your problem.
prerequisites
notation:
g graph, g.v graph vertices
v,w,z: individual vertices
e: individual edge
n: number of vertices
any combination of an undirected tree g and a given node g.v uniquely determines a directed tree with root g.v (provable by induction)
idea
complement the edges of g by orientations in the directed tree implied by g and the yet-to-be-found root node by local computations at the nodes of g.
these orientations will represent child-parent-relationsships between nodes (v -> w: v child, w parent).
the completely marked tree will contain a sole node with outdegree 0, which is the desired root node. you might end up with 0 or more than one root node.
algorithm
assumes standard representation of the graph/tree structure (eg adjacency list)
all vertices in g.v are marked initially as not visited, not finished.
visit all vertices in arbitrary sequence. skip nodes marked as 'finished'.
let v be the currently visited vertex.
2.1 sweep through all edges linking v clockwise starting with a randomly chosen e_0 in the order of the edges' angle with e_0.
2.2. orient adjacent edges e_1=(v,w_1), e_2(v,w_2), that enclose an acute angle.
adjacent: wrt being ordered according to the angle they enclose with e_0.
[ note: the existence of such a pair is not guaranteed, see 2nd comment and last remark. if no angle is acute, proceed at 2. with next node. ]
2.2.1 the orientations of edges e_1, e_2 are known:
w_1 -> v -> w_2: impossible, as a grandparent-child-segment would enclose an acute angle
w_1 <- v <- w_2: impossible, same reason
w_1 <- v -> w_2: impossible, there are no nodes with outdegree >1 in a tree
w_1 -> v <- w_2:
only possible pair of orientations. e_1, e_2 might have been oriented before. if the previous orientation violates the current assignment, the problem instance has no solution.
2.2.2 this assignment implies a tree structure on the subgraphs induced by all vertices reachable from w_1 (w_2) on a path not comprising e_1 (e_2`). mark all vertices in both induced subtrees as finished
[ note: the subtree structure might violate the angle constraints. in this case the problem has no solution. ]
2.3 mark v visited. after completing steps 2.2 at vertex v, check the number nc of edges connecting that have not yet been assigned an orientation.
nc = 0: this is the root you've been searching for - but you must check whether the solution is compatible with your constraints.
nc = 1: let this edge be (v,z).
the orientation of this edge is v->z as you are in a tree. mark v as finished.
2.3.1 check z whether it is marked finished.
if it is not, check the number nc2 of unoriented edges connecting z.
nc2 = 1: repeat step 2.3 by taking z for v.
if you have not yet found a root node, your problem instance is ambiguous:
orient the remaining unoriented edges at will.
remarks
termination:
each node is visited at max 4 times:
once per step 2
at max twice per step 2.2.2
at max once per step 2.3
correctness:
all edges enclosing an acute angle are oriented per step 2.2.1
complexity (time):
visiting every node: O(n);
the clockwise sweep through all edges connecting a given vertex requires these edges to be sorted.
thus you need O( sum_i=1..m ( k_i * lg k_i ) ) at m <= n vertices under the constraint sum_i=1..m k_i = n.
in total this requires O ( n * lg n), as sum_i=1..m ( k_i * lg k_i ) <= n * lg n given sum_i=1..m k_i = n for any m <= n (provable by applying lagrange optimization).
[ note: if your trees have a degree bounded by a constant, you theoretically sort in constant time at each node affected; grand total in this case: O(n) ]
subtree marking:
each node in the graph is visited at max 2 times by this procedure if implemented as a dfs. thus a grand total of O(n) for the invocation of this subroutine.
in total: O(n * lg n)
complexity (space):
O(n) for sorting (with vertex-degree not constant-bound).
problem is probably ill-defined:
multiple solutions: e.g. steiner tree
no solution: e.g. graph shaped like a double-tipped arrow (<->)
A simple solution would be to define a 2d rectangle around the red node or the center of your node and compute each node with a moore curve. A moore curve is a space-filling curve, more over a special version of a hilbert curve where the start and end vertex is the same and the coordinate is in the middle of the 2d rectangle. In generell your problem looks like a discrete addressing space problem.

Network Modularity Calculations in R

The equation for Network Modularity is given on its wikipedia page (and in reputable books). I want to see it working in some code. I have found this is possible using the modularity library for igraph used with R (The R Foundation for Statistical Computing).
I want to see the example below (or a similar one) used in the code to calculate the modularity. The library gives on example but it isn't really what I want.
Let us have a set of vertices V = {1, 2, 3, 4, 5} and edges E = {(1,5), (2,3), (2,4), (2,5) (3,5)} that form an undirected graph.
Divide these vertices into two communities: c1 = {2,3} and c2 = {1,4,5}. It is the modularity of these two communities that is to be computed.
library(igraph)
g <- graph(c(1,5,2,3,2,4,2,5,3,5))
membership <- c(1,2,2,1,1)
modularity(g, membership)
Some explanation here:
The vector I use when creating the graph is the edge list of the graph. (In igraph versions older than 0.6, we had to subtract 1 from the numbers because igraph uses zero-based vertex indices at that time, but not any more).
The i-th element of the membership vector membership gives the index of the community to which vertex i belongs.

Resources