I'd like to implement finite, directed graphs in Idris. Also I want to work with vertex maps f : V -> X from the vertex set to some other type / set. Which would be the most natural approach here?
One option is to always chose Fin n for the vertex set. This way, a graph can be straightforwardly implemented as its adjacency matrix, i.e. Graph n = Vect n (Vect n Bool), and any vertex map as vector of type Vect n a.
On the other hand, I like the idea of defining Graph a = SortedMap (a, a) Bool. This way, a few constructions seem more straightforward, e.g. the graph product of type Graph a -> Graph b -> Graph (a, b).
But it is possible then to construct maleficent graph instances, where e.g. (x, y) is a key of g but (y, x) is not. Also, if I want to write a function which takes a graph and a compatible vertex map, I don't know how to specify that the domains of both objects have to coincide. Possibly f : (g : Graph a) -> (m : SortedMap a b) -> {auto p : vertexList g = keys m} -> ... where the functions vertexList and keys have to be written first?
A third definition would be Graph a = (a, a) -> Bool where a is supposed to be some finite type like Fin n or (Fin m, Fin n). Then vertex maps would simply be of type a -> b.
Related
Say that I have a big network with 10^4 nodes. And then I want to analyse the neighbourhood associated with a random node, say node 10. I can see which are the nodes connected to that node by looking at the 10th row entries of the adjacency matrix, and then I can repeat this if I want to look at the neighbours of those neighbours (second shell) and so on and so forth.
Is there an efficient way to do this - or even an inefficient but better than writing the whole thing from the scratch-? The actual network that I have is a Random Regular Graph and I am interested on the tree-like local structure for large networks.
If I understand your use case, there is a good way of doing this: the egonet function. You give it a graph, a starting vertex, and number of hops, and it will return an induced subgraph of the graph starting at the vertex and going out that number of hops. Here's the docstring:
egonet(g, v, d, distmx=weights(g))
Return the subgraph of g induced by the neighbors of v up to distance d, using weights (optionally) provided by distmx. This is equivalent to
induced_subgraph(g, neighborhood(g, v, d, dir=dir))[1].
Optional Arguments
––––––––––––––––––––
• dir=:out: if g is directed, this argument specifies the edge direction
with respect to v (i.e. :in or :out).
Edited to add: if all you need are the vertex indices, then neighborhood() is what you want:
neighborhood(g, v, d, distmx=weights(g))
Return a vector of each vertex in g at a geodesic distance less than or equal to d, where distances may be specified by distmx.
Optional Arguments
––––––––––––––––––––
• dir=:out: If g is directed, this argument specifies the edge direction
with respect to v of the edges to be considered. Possible values: :in or :out.
Definition 1 - 2 sets and function
Definitioin 2 - 1 set and 1 family
Definition 3 - 1 relation
Why do we need such a diversity? Are some of these definitions old-fashioned or all of them have their pros and cons?
Undirected graphs and directed graphs
The third definition differs from the first two because it is about directed graphs, while the first two define undirected graph. We care about directed graphs and undirected graphs because they are adapted to different situations and to solving different problems.
You can think of directed graphs and undirected graphs as two different objects.
In general, undirected graphs are somewhat easier to reason about, and most often if someone mentions a "graph" without precision, they mean an undirected graph.
Named edges and incidence function
The first two definitions are pretty much equivalent.
The first definition, with (V, E, ѱ) gives "names" to vertices (elements of V) and names to edges (elements of E), and uses an "incidence function" ѱ to tell you which edge in E corresponds to which pair of vertices of V.
The second definition uses only (V', E') and no ѱ. I am calling them V' and E' instead of V and E to make a distinction from the first definition. Here the vertices have "names", they are the elements of V'; but the edges don't really have individual names, and E' is defined as a subset of the set of undirected pairs of V. Therefore an edge is an undirected pair of elements of V'.
Here is an example of a graph:
By the first definition:
V = {a, b, c, d};
E = {1, 2, 3};
ѱ : E -> {unordered pairs of V}
1 -> ab
2 -> ac
3 -> cd.
By the second definition:
V' = {a, b, c, d}
E' = {ab, ac, cd}.
As you can see, V' = V, and E' is the image of E by ѱ.
If you don't care about "names" for the edges, the second definition is somewhat shorter. But which one you use really doesn't matter; the theorems you might prove with one definition will be equivalent to the theorems you can prove for the other definition. The difference between the two definition is just a set theory nitpick of what "edge" means: is it a pair of elements of V, or an element of another set which is mapped to a pair of elements of V by a function? Note that the function ѱ is a bijection between the two sets E and E', so really E and E' are two different names for the same set.
Algorithms, programming languages, and representations of graphs
If you ever have to code an algorithm using a graph in your favourite programming language, you will have to decide how to represent the graph using variables and arrays and all the data structures you are used to.
For the vertices, most often, people use V = {0, 1, 2, ..., n-1} where n is the number of vertices. This is convenient because it means you can use the vertices as indices for an array.
For the edges, sometimes we encode E using an vertex-vertex incidence matrix of size n*n with a 1 in cell i,j to indicate an edge between vertices i and j and a 0 in cell i,j to indicate no edge. Here is the incidence matrix for the graph above (I replaced a,b,c,d with 0,1,2,3 as the names for the vertices):
0 1 2 3
0 0 1 1 0
1 1 0 0 0
2 1 0 0 1
3 0 0 1 0
Sometimes we encode E using an array of lists: an array of size n, where cell i contains the list of indices of vertices which are neighbours of vertex i. Here is the array of lists for the same graph:
0: 1,2
1: 0
2: 0,3
3: 2
Those two representations are closer to the second definition, since the edges don't have names; we just care about whether each pair of vertices is an edge or not.
Recently I had to write a C++ program where it was very important for me to number the edges, because I wanted to be able to use them as indices of a matrix. Thus I had V = {0, 1, 2, ..., n-1}; E = {0, 1, 2, ..., m-1}; and then I used an std::map<int, std::pair<int, int>> to map the edge indices to pairs of vertex indices. This representation was closer to your first definition, which an std::map for ѱ. Note that I had to make a choice between mapping edge indices to pairs of vertex indices, or mapping pairs of vertex indices to edge indices. Had I felt the necessity, I could even have used both. The first definition doesn't care, because ѱ is a bijection, so mathematicians can use ѱ and its inverse function ѱ^-1 indifferently; but the data structure std::map is not a mathematical function, and inversing it might take time.
Conclusion
Both definitions are equivalent, and it really doesn't matter which one you use. But if you need to code algorithms using graphs, take some time to consider different representations of graphs and which one will make your algorithm the most efficient.
I have generated a random tree using networkx.
A = nx.random_tree(15)
I am trying to convert it to a directed graph (i.e. tree).
G = nx.to_directed(A)
However, the result is a graph with two directions.
I would like to get the output as one direction tree.
According to official documentation:
Returns: G – A directed graph with the same name, same nodes, and with each edge (u, v, data) replaced by two directed edges (u, v, data) and (v, u, data).
If you want to delete reversed edges, you can write something like this:
G = nx.random_tree(10)
H = nx.DiGraph([(u,v) for (u,v) in G.edges() if u<v])
So H will be the tree you needed:
Suppose I have a set A ⊆ nat. I want to model in Isabelle a function f : A ⇒ Y. I could use either:
a partial function, i.e. one of type nat ⇒ Y option, or
a total function, i.e. one of type nat ⇒ Y that is unspecified for inputs not in A.
I wonder which is the 'better' option. I see a couple of factors:
The "partial function" approach is better because it is easier to compare partial functions for equality. That is, if I want to see if f is equal to another function, g : A ⇒ Y, then I just say f = g. To compare under-specified total functions f and g, I would have to say ∀x ∈ A. f x = g x.
The "under-specified total function" approach is better because I don't have to faff with the constructing/deconstructing option types all the time. For instance, if f is an under-specified total function, and x ∈ A, then I can just say f x, but if f is a partial function I would have to say (the ∘ f) x. For another instance, it's trickier to do function composition on partial functions than on total functions.
For a concrete instance relevant to this question, consider the following attempt at formalising simple graphs.
type_synonym node = nat
record 'a graph =
V :: "node set"
E :: "(node × node) set"
label :: "node ⇒ 'a"
A graph comprises a set of nodes, an edge relation between them, and a label for each node. We only care about the label of nodes that are in V. So, should label be a partial function node ⇒ 'a option with dom label = V, or should it just be a total function that is unspecified outside of V?
It is probably a matter of taste and may also depend on the use you have in mind, so I'll just give you my personal taste, which would be option 2. the total function. The reason is that I think the bounded quantification in both approaches will be unavoidable anyway. I think that with approach 1. you will find that the easiest way to handle the Option is to limit the domain (bounded quantification) that you are reasoning about. As for the graph example, graph theorems always say something like for all nodes in V. But as I said, it is probably a matter of taste.
A friend presented me with a conjecture that seems to be true but neither of us can come up with a proof. Here's the problem:
Given a connected, bipartite graph with disjoint non-empty vertex sets U and V, such that |U|<|V|, all vertices are in either U or V, and there are no edges connecting two vertices within the same set, then there exists at least one edge which connects vertices a∈U and b∈V such that degree(a)>degree(b)
It's trivial to prove that there is at least one vertex in U with degree higher than one in V, but to prove that a pair exists with an edge connecting them is stumping us.
For any edge e=(a,b) with a∈U and b∈V, let w(e)=1/deg(b)-1/deg(a). For any vertex x, the sum of 1/deg(x) over all edges incident with x equals 1, because there are deg(x) such edges. Hence, the sum of w(e) over all edges e equals |V|-|U|. Since |V|-|U|>0, w(e)>0 for som edge e=(a,b), which means that deg(a)>deg(b).
Prove it by contradiction, i.e. suppose that deg(a) ≤ deg(b) ∀(a,b)∈E, where E is the edgeset of the graph (with the convention that the first element is in U and the second in V).
For F⊆E, designate by V(F) the subset of V which is reachable through edgeset F, that is:
V(F) = { b | (a,b)∈F }
Now build an edgeset F as follows:
F = empty set
For a ∈ U:
add any edge (a,b)∈E to F
Keep adding arbitrary edges (a,b)∈E to F until |V(F)| = |U|
The set V(F) obtained is connected to all nodes in U, hence by our assumption we must have
∑a∈U deg(a) ≤ ∑b∈V(F) deg(b)
However, since |U|=|V(F)| and |U|<|V| we know that there must be at least one "unreached" node v∈V\V(F), and since the graph is connected, deg(v)>0, so we obtain
∑a∈U deg(a) < ∑b∈V deg(b)
which is impossible; this should be an equality for a bipartite graph.