algorithm to test whether G contains an arborescence - graph

An arborescence of a directed graph G is a rooted tree such that there is a directed path from the root to every other vertex in the graph. Give an efficient and correct algorithm to test whether G contains an arborescence, and its time complexity.
I could only think of running DFS/BFS from every node till in one of the DFS all the nodes are covered.
I thought of using min spanning tree algorithm, but that is also only for un-directed graphs
is there any other efficient algorithm for this ?
I found a follow up question which state there is a O(n+m) algorithm for the same, can anybody help what could be the solution ?

What you are exactly looking for is the so called Edmond's algorithm. The minimum spanning tree algorithms are not going to work on directed graphs but that is the idea. The MST problem became arborescence problem when the graph is directed and arborescence is what you have described above.
The naive complexity is O(EV) just like the Prim's algorithm for undirected MST problem but I am sure there are faster implementations of it.
For more information you can check the wiki page:
Edmonds Algorithm

First note that the definition for an arborescence of a directed graph given in the question above is a bit different from the one given in e.g. Wikipedia: your question's definition does not require that the path be unique, nor does it require that the original directed graph G be a weighted one. So a solution should be simpler than the one handled by Edmond's Algorithm.
How about the following: first part will be to find an adequate root. Once an adequate root is found, running a simple DFS on the graph G starting from that root should allow us to create the needed tree and we're done. So how can we find such a root?
Start by running DFS and "reduce" any cycle found to a single edge. Inside any cycle found, it won't matter which edge we use as any of them can reach any other. If a single edge is left after this reduction it means the entire graph is strongly connected and so any edge - including the only one left - can fit as root.
If more than one edge is left, go over all remaining edges, and find the ones having an in-degree of zero. If more than one is found - then we can't construct the needed tree - as they can't be reached from one another. If just a single edge is found here - that's our root edge.
Complexity is O(edges + vertices) in say an adjacency list representation of the graph.

I think this is much simpler than I thought. Something in the similar lines already mentioned at the beginning of the thread. So basically start the DFS traversal at any node in the graph using BFS and reach what ever you can and then the once you are done. Simply take the next unvisited vertex and do BFS traversal again and incase you encounter a node that is already processed means this is sub tree has already been processed and all the nodes reachable through this node will node be reached through another node hence make the current node as the parent of this new sub tree.
simply do a DFS traversal in which each edge is guaranteed to be visited only once. Do the following
edgeCb()
{
// Already processed and has no parent means this must a sub tree
if ( g->par[ y ] == -1 && g->prc[ y ] )
g->par[ y ] = x; // Connecting two disconnected BFS/DFS trees
return 1;
}
graphTraverseDfs( g, i )
{
// Parent of each vertex is being updated as and when it is visited.
}
main() {
.
.
for ( i = 0; i < g->nv; i++ )
if ( !g->vis[ i ] )
graphTraverseDfs( g, i );
.
.
}

Related

how to find all random paths between two node in graph to construct initial population in genetic algorithm

I want to create the initial population in a genetic algorithm. a population consists of paths between two nodes ( source and destination). how to find all possible paths between two nodes in an undirected graph?
Thanks
You could take a recursive approach to this problem. Do something along the lines of the following. (be warned I have not refined this).
Start by selecting a random node from the graph as a start node. And select a random node as the end node.
Look at all the connections to other nodes from the start one. Do not return to previous nodes. If there are no possible connections left stop.
If the node is the end node then stop and record the path. If not, then look at all the connections to that node, and repeat this step.
Repeat this process with every pair of nodes in the graph.
I'm sure you can see the recursive part to this solution. I'm afraid I cannot write up this solution currently but I hope this might point you in the right direction.

Union-Find algorithm and determining whether an edge belongs to a cycle in a graph

I'm reading a book about algorithms ("Data Structures and Algorithms in C++") and have come across the following exercise:
Ex. 20. Modify cycleDetectionDFS() so that it could determine whether a particular edge is part of a cycle in an undirected graph.
In the chapter about graphs, the book reads:
Let us recall from a preceding section that depth-first search
guaranteed generating a spanning tree in which no elements of edges
used by depthFirstSearch() led to a cycle with other element of edges.
This was due to the fact that if vertices v and u belonged to edges,
then the edge(vu) was disregarded by depthFirstSearch(). A problem
arises when depthFirstSearch() is modified so that it can detect
whether a specific edge(vu) is part of a cycle (see Exercise 20).
Should such a modified depth-first search be applied to each edge
separately, then the total run would be O(E(E+V)), which could turn
into O(V^4) for dense graphs. Hence, a better method needs to be
found.
The task is to determine if two vertices are in the same set. Two
operations are needed to implement this task: finding the set to which
a vertex v belongs and uniting two sets into one if vertex v belongs
to one of them and w to another. This is known as the union-find
problem.
Later on, author describes how to merge two sets into one in case an edge passed to the function union(edge e) connects vertices in distinct sets.
However, still I don't know how to quickly check whether an edge is part of a cycle. Could someone give me a rough explanation of such algorithm which is related to the aforementioned union-find problem?
a rough explanation could be checking if a link is a backlink, whenever you have a backlink you have a loop, and whenever you have a loop you have a backlink (that is true for directed and undirected graphs).
A backlink is an edge that points from a descendant to a parent, you should know that when traversing a graph with a DFS algorithm you build a forest, and a parent is a node that is marked finished later in the traversal.
I gave you some pointers to where to look, let me know if that helps you clarify your problems.

How do I learn Tarjan's algorithm?

I have been trying to learn Tarjan's algorithm from Wikipedia for 3 hours now, but I just can't make head or tail of it. :(
http://en.wikipedia.org/wiki/Tarjan's_strongly_connected_components_algorithm#cite_note-1
Why is it a subtree of the DFS tree? (actually DFS produces a forest? o_O)
And why does v.lowlink=v.index imply that v is a root?
Can someone please explain this to me / give the intuition or motivation behind this algorithm?
The idea is: When traversing the tree, every time you've searched through a branch and are backtracking, you check whether you've encountered an edge to an 'upper' node in the tree.
If you didn't (if (v.lowlink = v.index)), then you've just completed an SCC - it consists of the current node and all nodes on the stack. That's exactly a subtree of the DFS tree, except for the nodes in SCCs that were already completed.
If you did, you propagate this information to 'upper' nodes (v.lowlink := min(v.lowlink, w.lowlink)), because combined with the path in DFS tree the edge creates an 'upward' path.
DFS produces a forest, but you always consider one tree a time. An SCC is always included in one DFS tree, otherwise (being an SCC) there would be a path in both directions between both (all) trees in question - that's a contradiction.
just adding to pjotr's answer: v.lowlink is basically the index of the upmost node that you have found in the tree. Keep in mind that upmost in this context means minimum as you keep increasing indices as you walk down. Now after processing all your successors, there's basically three cases:
v.lowlink < v.index: This indicates that you have found a back edge. Note that we haven't
just found any back edge, but one that points to a node that is "above" the current one. That's what v.lowlink < v.index implies.
v.lowlink = v.index: What we know in this case is that there is no back edge referring to anything above the current node. There might be a back edge to this node (which means that one of your successor nodes w has a lowlink such that w.lowlink = v.lowlink = v.index). It could also be that there was a back edge referring to something below the current node, which means that there was a strongly-connected component below the current node that has been printed out already. The current node, however, is definitely the root of a strongly-connected component as well.
v.lowlink > v.index: That's actually not possible. I'm just listing it for the sake of completeness. ;)
Hope it helps!
Some Intuition about the Tarjan's Algorithm:
During DFS, when we encounter a back edge from vertex v, we update its lowest reachable ancestor i.e. we update the value of low[v]
Now when the all the outgoing edges of a vertex are processed i.e we are about to exit the DFS call for the vertex v, we check the value of low[v], whether low[v] == v (Explanation below). If not this means v is not the root of the SCC and we now give the benefit to the parent of v i.e. the lowest reachable ancestor of parent[v] is now changed to low[v].
This sounds logical as if although there is no direct back edge from the parent[v] to the ancestor of v, but there is a path (back edge of v + edge towards v) via which the parent[v] can still reach the ancestor of v.
Thus we have also updated the low[parent[v]] here.
Therefore, we will keep on updating this chain and low[v] for all v will keep on updating until, we reach to the ancestor (via backtracking). For this ancestor low[v] will be equal to v. And thus this will act as the root of the SCC.
Hope this helps
Path for Mastering Bridges and Articulation Algorithms
Watch this video to get an awesome feeling that you have understood the
algorithm very well.
practice the algorithm from here and here.
practice problems to master it:
A. Cutting Figure
EC_P - Critical Edges
SUBMERGE - Submerging Islands
POLQUERY - Police Query

Find all possible paths from one vertex in a directed cyclic graph in Erlang

I would like to implement a function which finds all possible paths to all possible vertices from a source vertex V in a directed cyclic graph G.
The performance doesn't matter now, I just would like to understand the algorithm. I have read the definition of the Depth-first search algorithm, but I don't have full comprehension of what to do.
I don't have any completed piece of code to provide here, because I am not sure how to:
store the results (along with A->B->C-> we should also store A->B and A->B->C);
represent the graph (digraph? list of tuples?);
how many recursions to use (work with each adjacent vertex?).
How can I find all possible paths form one given source vertex in a directed cyclic graph in Erlang?
UPD: Based on the answers so far I have to redefine the graph definition: it is a non-acyclic graph. I know that if my recursive function hits a cycle it is an indefinite loop. To avoid that, I can just check if a current vertex is in the list of the resulting path - if yes, I stop traversing and return the path.
UPD2: Thanks for thought provoking comments! Yes, I need to find all simple paths that do not have loops from one source vertex to all the others.
In a graph like this:
with the source vertex A the algorithm should find the following paths:
A,B
A,B,C
A,B,C,D
A,D
A,D,C
A,D,C,B
The following code does the job, but it is unusable with graphs that have more that 20 vertices (I guess it is something wrong with recursion - takes too much memory, never ends):
dfs(Graph,Source) ->
?DBG("Started to traverse graph~n", []),
Neighbours = digraph:out_neighbours(Graph,Source),
?DBG("Entering recursion for source vertex ~w~n", [Source]),
dfs(Neighbours,[Source],[],Graph,Source),
ok.
dfs([],Paths,Result,_Graph,Source) ->
?DBG("There are no more neighbours left for vertex ~w~n", [Source]),
Result;
dfs([Neighbour|Other_neighbours],Paths,Result,Graph,Source) ->
?DBG("///The neighbour to check is ~w, other neighbours are: ~w~n",[Neighbour,Other_neighbours]),
?DBG("***Current result: ~w~n",[Result]),
New_result = relax_neighbours(Neighbour,Paths,Result,Graph,Source),
dfs(Other_neighbours,Paths,New_result,Graph,Source).
relax_neighbours(Neighbour,Paths,Result,Graph,Source) ->
case lists:member(Neighbour,Paths) of
false ->
?DBG("Found an unvisited neighbour ~w, path is: ~w~n",[Neighbour,Paths]),
Neighbours = digraph:out_neighbours(Graph,Neighbour),
?DBG("The neighbours of the unvisited vertex ~w are ~w, path is:
~w~n",[Neighbour,Neighbours,[Neighbour|Paths]]),
dfs(Neighbours,[Neighbour|Paths],Result,Graph,Source);
true ->
[Paths|Result]
end.
UPD3:
The problem is that the regular depth-first search algorithm will go one of the to paths first: (A,B,C,D) or (A,D,C,B) and will never go the second path.
In either case it will be the only path - for example, when the regular DFS backtracks from (A,B,C,D) it goes back up to A and checks if D (the second neighbour of A) is visited. And since the regular DFS maintains a global state for each vertex, D would have 'visited' state.
So, we have to introduce a recursion-dependent state - if we backtrack from (A,B,C,D) up to A, we should have (A,B,C,D) in the list of the results and we should have D marked as unvisited as at the very beginning of the algorithm.
I have tried to optimize the solution to tail-recursive one, but still the running time of the algorithm is unfeasible - it takes about 4 seconds to traverse a tiny graph of 16 vertices with 3 edges per vertex:
dfs(Graph,Source) ->
?DBG("Started to traverse graph~n", []),
Neighbours = digraph:out_neighbours(Graph,Source),
?DBG("Entering recursion for source vertex ~w~n", [Source]),
Result = ets:new(resulting_paths, [bag]),
Root = Source,
dfs(Neighbours,[Source],Result,Graph,Source,[],Root).
dfs([],Paths,Result,_Graph,Source,_,_) ->
?DBG("There are no more neighbours left for vertex ~w, paths are ~w, result is ~w~n", [Source,Paths,Result]),
Result;
dfs([Neighbour|Other_neighbours],Paths,Result,Graph,Source,Recursion_list,Root) ->
?DBG("~w *Current source is ~w~n",[Recursion_list,Source]),
?DBG("~w Checking neighbour _~w_ of _~w_, other neighbours are: ~w~n",[Recursion_list,Neighbour,Source,Other_neighbours]),
? DBG("~w Ready to check for visited: ~w~n",[Recursion_list,Neighbour]),
case lists:member(Neighbour,Paths) of
false ->
?DBG("~w Found an unvisited neighbour ~w, path is: ~w~n",[Recursion_list,Neighbour,Paths]),
New_paths = [Neighbour|Paths],
?DBG("~w Added neighbour to paths: ~w~n",[Recursion_list,New_paths]),
ets:insert(Result,{Root,Paths}),
Neighbours = digraph:out_neighbours(Graph,Neighbour),
?DBG("~w The neighbours of the unvisited vertex ~w are ~w, path is: ~w, recursion:~n",[Recursion_list,Neighbour,Neighbours,[Neighbour|Paths]]),
dfs(Neighbours,New_paths,Result,Graph,Neighbour,[[[]]|Recursion_list],Root);
true ->
?DBG("~w The neighbour ~w is: already visited, paths: ~w, backtracking to other neighbours:~n",[Recursion_list,Neighbour,Paths]),
ets:insert(Result,{Root,Paths})
end,
dfs(Other_neighbours,Paths,Result,Graph,Source,Recursion_list,Root).
Any ideas to run this in acceptable time?
Edit:
Okay I understand now, you want to find all simple paths from a vertex in a directed graph. So a depth-first search with backtracking would be suitable, as you have realised. The general idea is to go to a neighbour, then go to another one (not one which you've visited), and keep going until you hit a dead end. Then backtrack to the last vertex you were at and pick a different neighbour, etc.
You need to get the fiddly bits right, but it shouldn't be too hard. E.g. at every step you need to label the vertices 'explored' or 'unexplored' depending on whether you've already visited them before. The performance shouldn't be an issue, a properly implemented algorithm should take maybe O(n^2) time. So I don't know what you are doing wrong, perhaps you are visiting too many neighbours? E.g. maybe you are revisiting neighbours that you've already visited, and going round in loops or something.
I haven't really read your program, but the Wiki page on Depth-first Search has a short, simple pseudocode program which you can try to copy in your language. Store the graphs as Adjacency Lists to make it easier.
Edit:
Yes, sorry, you are right, the standard DFS search won't work as it stands, you need to adjust it slightly so that does revisit vertices it has visited before. So you are allowed to visit any vertices except the ones you have already stored in your current path.
This of course means my running time was completely wrong, the complexity of your algorithm will be through the roof. If the average complexity of your graph is d+1, then there will be approximately d*d*d*...*d = d^n possible paths.
So even if every vertex has only 3 neighbours, there's still quite a few paths when you get above 20 vertices.
There's no way around that really, because if you want your program to output all possible paths then indeed you will have to output all d^n of them.
I'm interested to know whether you need this for a specific task, or are just trying to program this out of interest. If the latter, you will just have to be happy with small, sparsely connected graphs.
I don't understand question. If I have graph G = (V, E) = ({A,B}, {(A,B),(B,A)}), there is infinite paths from A to B {[A,B], [A,B,A,B], [A,B,A,B,A,B], ...}. How I can find all possible paths to any vertex in cyclic graph?
Edit:
Did you even tried compute or guess growing of possible paths for some graphs? If you have fully connected graph you will get
2 - 1
3 - 4
4 - 15
5 - 64
6 - 325
7 - 1956
8 - 13699
9 - 109600
10 - 986409
11 - 9864100
12 - 108505111
13 - 1302061344
14 - 16926797485
15 - 236975164804
16 - 3554627472075
17 - 56874039553216
18 - 966858672404689
19 - 17403456103284420
20 - 330665665962403999
Are you sure you would like find all paths for all nodes? It means if you compute one milion paths in one second it would take 10750 years to compute all paths to all nodes in fully connected graph with 20 nodes. It is upper bound for your task so I think you don't would like do it. I think you want something else.
Not an improved algorithmic solution by any means, but you can often improve performance by spawning multiple worker threads, potentially here one for each first level node and then aggregating the results. This can often improve naive brute force algorithms relatively easily.
You can see an example here: Some Erlang Matrix Functions, in the maximise_assignment function (comments starting on line 191 as of today). Again, the underlying algorithm there is fairly naive and brute force, but the parallelisation speeds it up quite well for many forms of matrices.
I have used a similar approach in the past to find the number of Hamiltonian Paths in a graph.

Algorithms to Identify All the Cycle Bases in a UnDirected Graph

I have an undirected graph with Vertex V and Edge E. I am looking for an algorithm to identify all the cycle bases in that graph.
I think Tarjans algorithm is a good start. But the reference I have is about finding all of the cycles, not cycle base ( which, by definition is the cycle that cannot be constructed by union of other cycles).
For example, take a look at the below graph:
So, an algorithm would be helpful. If there is an existing implementation (preferably in C#), it's even better!
From what I can tell, not only is Brian's hunch spot on, but an even stronger proposition holds: each edge that's not in the minimum spanning tree adds exactly one new "base cycle".
To see this, let's see what happens when you add an edge E that's not in the MST. Let's do the favorite math way to complicate things and add some notation ;) Call the original graph G, the graph before adding E G', and the graph after adding E G''. So we need to find out how does the "base cycle count" change from G' to G''.
Adding E must close at least one cycle (otherwise E would be in the MST of G in the first place). So obviously it must add at least one "base cycle" to the already existing ones in G'. But does it add more than one?
It can't add more than two, since no edge can be a member of more than two base cycles. But if E is a member of two base cycles, then the "union" of these two base cycles must've been a base cycle in G', so again we get that the change in the number of cycles is still one.
Ergo, for each edge not in MST you get a new base cycle. So the "count" part is simple. Finding all the edges for each base cycle is a little trickier, but following the reasoning above, I think this could do it (in pseudo-Python):
for v in vertices[G]:
cycles[v] = []
for e in (edges[G] \ mst[G]):
cycle_to_split = intersect(cycles[e.node1], cycles[e.node2])
if cycle_to_split == None:
# we're adding a completely new cycle
path = find_path(e.node1, e.node2, mst[G])
for vertex on path:
cycles[vertex].append(path + [e])
cycles
else:
# we're splitting an existing base cycle
cycle1, cycle2 = split(cycle_to_split, e)
for vertex on cycle_to_split:
cycles[vertex].remove(cycle_to_split)
if vertex on cycle1:
cycles[vertex].append(cycle1)
if vertex on cycle2:
cycles[vertex].append(cycle2)
base_cycles = set(cycles)
Edit: the code should find all the base cycles in a graph (the base_cycles set at the bottom). The assumptions are that you know how to:
find the minimum spanning tree of a graph (mst[G])
find the difference between two lists (edges \ mst[G])
find an intersection of two lists
find the path between two vertices on a MST
split a cycle into two by adding an extra edge to it (the split function)
And it mainly follows the discussion above. For each edge not in the MST, you have two cases: either it brings a completely new base cycle, or it splits an existing one in two. To track which of the two is the case, we track all the base cycles that a vertex is a part of (using the cycles dictionary).
off the top of my head, I would start by looking at any Minimum Spanning Tree algorithm (Prim, Kruskal, etc). There can't be more base cycles (If I understand it correctly) than edges that are NOT in the MST....
The following is my actual untested C# code to find all these "base cycles":
public HashSet<List<EdgeT>> FindBaseCycles(ICollection<VertexT> connectedComponent)
{
Dictionary<VertexT, HashSet<List<EdgeT>>> cycles =
new Dictionary<VertexT, HashSet<List<EdgeT>>>();
// For each vertex, initialize the dictionary with empty sets of lists of
// edges
foreach (VertexT vertex in connectedComponent)
cycles.Add(vertex, new HashSet<List<EdgeT>>());
HashSet<EdgeT> spanningTree = FindSpanningTree(connectedComponent);
foreach (EdgeT edgeNotInMST in
GetIncidentEdges(connectedComponent).Except(spanningTree)) {
// Find one cycle to split, the HashSet resulted from the intersection
// operation will contain just one cycle
HashSet<List<EdgeT>> cycleToSplitSet =
cycles[(VertexT)edgeNotInMST.StartPoint]
.Intersect(cycles[(VertexT)edgeNotInMST.EndPoint]);
if (cycleToSplitSet.Count == 0) {
// Find the path between the current edge not in ST enpoints using
// the spanning tree itself
List<EdgeT> path =
FindPath(
(VertexT)edgeNotInMST.StartPoint,
(VertexT)edgeNotInMST.EndPoint,
spanningTree);
// The found path plus the current edge becomes a cycle
path.Add(edgeNotInMST);
foreach (VertexT vertexInCycle in VerticesInPathSet(path))
cycles[vertexInCycle].Add(path);
} else {
// Get the cycle to split from the set produced before
List<EdgeT> cycleToSplit = cycleToSplitSet.GetEnumerator().Current;
List<EdgeT> cycle1 = new List<EdgeT>();
List<EdgeT> cycle2 = new List<EdgeT>();
SplitCycle(cycleToSplit, edgeNotInMST, cycle1, cycle2);
// Remove the cycle that has been splitted from the vertices in the
// same cicle and add the results from the split operation to them
foreach (VertexT vertex in VerticesInPathSet(cycleToSplit)) {
cycles[vertex].Remove(cycleToSplit);
if (VerticesInPathSet(cycle1).Contains(vertex))
cycles[vertex].Add(cycle1);
if (VerticesInPathSet(cycle2).Contains(vertex))
cycles[vertex].Add(cycle2); ;
}
}
}
HashSet<List<EdgeT>> ret = new HashSet<List<EdgeT>>();
// Create the set of cycles, in each vertex should be remained only one
// incident cycle
foreach (HashSet<List<EdgeT>> remainingCycle in cycles.Values)
ret.AddAll(remainingCycle);
return ret;
}
Oggy's code was very good and clear but i'm pretty sure it contains an error, or it's me that don't understand your pseudo python code :)
cycles[v] = []
can't be a vertex indexed dictionary of lists of edges. In my opinion, it have to be a vertex indexed dictionary of sets of lists of edges.
And, to add a precisation:
for vertex on cycle_to_split:
cycle-to-split is probably an ordered list of edges so to iterate it through vertices you have to convert it in a set of vertices. Order here is negligible, so it's a very simple alghoritm.
I repeat, this is untested and uncomplete code, but is a step forward. It still requires a proper graph structure (i use an incidency list) and many graph alghoritms you can find in text books like Cormen. I wasn't able to find FindPath() and SplitCycle() in text books, and are very hard to code them in linear time of number of edges+vertices in the graph. Will report them here when I will test them.
Thanks a lot Oggy!
The standard way to detect a cycle is to use two iterators - for each iteration, one moves forward one step and the other two. Should there be a cycle, they will at some point point to each other.
This approach could be extended to record the cycles so found and move on.

Resources