BFS, Iterative DFS, and Recursive DFS: When to Mark Node as Visited - graph

After many hours spent googling, I still have not come across an in depth, intuitive as well as solidly proven treatment of this question. The closest article I have found, linked to on some obscure discussion forum, is this: https://11011110.github.io/blog/2013/12/17/stack-based-graph-traversal.html. I have also seen this Stack Overflow question DFS vs BFS .2 differences, but the responses do not arrive at a clear consensus.
So here is the question:
I have seen it stated (in Wikipedia, as well as Algorithms Illuminated by Tim Roughgarden) that, to transform a BFS implementation into an iterative DFS one, the following two changes are made:
The non-recursive implementation is similar to breadth-first search but differs from it in two ways:
it uses a stack instead of a queue, and
it delays checking whether a vertex has been discovered until the vertex is popped from the stack rather than making this check before adding the vertex.
Can anyone help explain, via intuition or example, the reason for the second distinction here? Specifically: what is the differentiating factor between BFS, iterative DFS, and recursive DFS that necessitates postponing the check until after popping off the stack only for iterative DFS?
Here is a basic implementation of BFS:
def bfs(adjacency_list, source):
explored = [False] * len(adjacency_list)
queue = deque()
queue.append(source)
explored[source] = True
while queue:
node = queue.popleft()
print(node)
for n in adjacency_list[node]:
if explored[n] == False:
explored[n] = True
queue.append(n)
If we simply swap the queue for a stack, we get this implementation of DFS:
def dfs_stack_only(adjacency_list, source):
explored = [False] * len(adjacency_list)
stack = deque()
stack.append(source)
explored[source] = True
while stack:
node = stack.pop()
print(node)
for n in adjacency_list[node]:
if explored[n] == False:
explored[n] = True
stack.append(n)
The only difference between these two algorithms here is that we swapped the queue from BFS for a stack in DFS. This implementation of DFS actually produces incorrect traversals (in a non simplistic graph; possibly for a very simple graph it might anyway produce a correct traversal).
I believe that this is the 'error' referenced in the article linked above.
However, this can be fixed in one of two ways.
Either of these two implementations produces a correct traversal:
First, the implementation suggested in the sources above, with the check delayed until after popping the node from the stack. This implementation results in many duplicates on the stack.
def dfs_iterative_correct(adjacency_list, source):
explored = [False] * len(adjacency_list)
stack = deque()
stack.append(source)
while stack:
node = stack.pop()
if explored[node] == False:
explored[node] = True
print(node)
for n in adjacency_list[node]:
stack.append(n)
Alternatively, this is a popular online implementation (this one taken from Geeks for Geeks) which also produces the correct traversal. There are some duplicates on the stack, but hardly as many as the previous implementation.
def dfs_geeks_for_geeks(adjacency_list, source):
explored = [False] * len(adjacency_list)
stack = deque()
stack.append(source)
while len(stack):
node = stack.pop()
if not explored[node]:
explored[node] = True
print(node)
for n in adjacency_list[node]:
if not explored[n]:
stack.append(n)
So in summary, it seems that the difference is not solely about when you check the visited status of a node, but more about when you actually mark it as visited. Furthermore, why does marking it as visited immediately work just fine for BFS, but not for DFS? Any insight is greatly appreciated!
Thank you!

I don't see a difference in that respect between BFS and DFS.
I see two requirements to "marking nodes as visited":
It should not prevent pushing nodes neighbors into the stack or queue.
It should prevent pushing the node again into the stack or queue.
Those requirements apply to DFS as well as BFS, so the squance for both can be:
fetch node from stack or queue
mark node as visited
get node's neighbors
put any unvisited neighbor into the stack or queue

Related

Iterative Postorder Traversal of Binary Tree in Python Optimality

I'm brushing up on leet code tree questions and every solution of Iterative Postorder Traversal of Binary Tree in Python type problems seem to use recursion.
Since there is no tail recursion in python, I believe iterative algorithm is faster because it takes less time to move around a stack than to jump through stack call frames.
Additionally I believe the iterative approach uses less memory because tracking just one stack takes up less space than the call frame stack from recursion.
I understand the typical approach to postorder iterative traversal takes 2 stacks and while loops so it seems to complicated.
However, there is an algorithm to use just 1 stack and while loop.
Here it is:
def postorderIterative(root):
curr = root
stack = []
while current or stack:
if current:
stack.append[current]
current = current.left
else:
tmp = stack[-1].right
if not tmp:
tmp = stack.pop()
#process postorder here
while stack and tmp == stack[-1].right:
tmp = stack.pop()
#process postorder here
else:
current = tmp
credit and explanation goes to here
Is there any reason why most video solutions seem to use recursive postorder traversal more than iterative even though iterative is better? is it because recursive is easier to understand?
It seems that with a big enough test case the recursive may fail in any call so I tend to always do iterative. Is this a valid approach?
I literally cannot find an iterative solution video to
leet code: 543. Diameter of Binary Tree
these are all recursive and thus suboptimal, even though they say it is optimal.
Please let me know if I am wrong and why. If not. please let me know as well. I am here to learn!
Both the iterative and recursive implementations have the same time and space complexity. The difference in running time is insignificant in terms of big O. When speaking of optimal, people usually mean the time/space complexity is optimal.
Secondly, the call stack limitation will only be a problem for a binary tree that has a height that goes into the hundreds. If a tree of such height were balanced, it would have more nodes than there are atoms in the universe. So we would only get into stack limitations when the binary tree is awfully skewed, which would make it rather useless for a real life problem. In short, I don't think the call stack limitation is such an important argument here.

Union-Find algorithm and determining whether an edge belongs to a cycle in a graph

I'm reading a book about algorithms ("Data Structures and Algorithms in C++") and have come across the following exercise:
Ex. 20. Modify cycleDetectionDFS() so that it could determine whether a particular edge is part of a cycle in an undirected graph.
In the chapter about graphs, the book reads:
Let us recall from a preceding section that depth-first search
guaranteed generating a spanning tree in which no elements of edges
used by depthFirstSearch() led to a cycle with other element of edges.
This was due to the fact that if vertices v and u belonged to edges,
then the edge(vu) was disregarded by depthFirstSearch(). A problem
arises when depthFirstSearch() is modified so that it can detect
whether a specific edge(vu) is part of a cycle (see Exercise 20).
Should such a modified depth-first search be applied to each edge
separately, then the total run would be O(E(E+V)), which could turn
into O(V^4) for dense graphs. Hence, a better method needs to be
found.
The task is to determine if two vertices are in the same set. Two
operations are needed to implement this task: finding the set to which
a vertex v belongs and uniting two sets into one if vertex v belongs
to one of them and w to another. This is known as the union-find
problem.
Later on, author describes how to merge two sets into one in case an edge passed to the function union(edge e) connects vertices in distinct sets.
However, still I don't know how to quickly check whether an edge is part of a cycle. Could someone give me a rough explanation of such algorithm which is related to the aforementioned union-find problem?
a rough explanation could be checking if a link is a backlink, whenever you have a backlink you have a loop, and whenever you have a loop you have a backlink (that is true for directed and undirected graphs).
A backlink is an edge that points from a descendant to a parent, you should know that when traversing a graph with a DFS algorithm you build a forest, and a parent is a node that is marked finished later in the traversal.
I gave you some pointers to where to look, let me know if that helps you clarify your problems.

How do I learn Tarjan's algorithm?

I have been trying to learn Tarjan's algorithm from Wikipedia for 3 hours now, but I just can't make head or tail of it. :(
http://en.wikipedia.org/wiki/Tarjan's_strongly_connected_components_algorithm#cite_note-1
Why is it a subtree of the DFS tree? (actually DFS produces a forest? o_O)
And why does v.lowlink=v.index imply that v is a root?
Can someone please explain this to me / give the intuition or motivation behind this algorithm?
The idea is: When traversing the tree, every time you've searched through a branch and are backtracking, you check whether you've encountered an edge to an 'upper' node in the tree.
If you didn't (if (v.lowlink = v.index)), then you've just completed an SCC - it consists of the current node and all nodes on the stack. That's exactly a subtree of the DFS tree, except for the nodes in SCCs that were already completed.
If you did, you propagate this information to 'upper' nodes (v.lowlink := min(v.lowlink, w.lowlink)), because combined with the path in DFS tree the edge creates an 'upward' path.
DFS produces a forest, but you always consider one tree a time. An SCC is always included in one DFS tree, otherwise (being an SCC) there would be a path in both directions between both (all) trees in question - that's a contradiction.
just adding to pjotr's answer: v.lowlink is basically the index of the upmost node that you have found in the tree. Keep in mind that upmost in this context means minimum as you keep increasing indices as you walk down. Now after processing all your successors, there's basically three cases:
v.lowlink < v.index: This indicates that you have found a back edge. Note that we haven't
just found any back edge, but one that points to a node that is "above" the current one. That's what v.lowlink < v.index implies.
v.lowlink = v.index: What we know in this case is that there is no back edge referring to anything above the current node. There might be a back edge to this node (which means that one of your successor nodes w has a lowlink such that w.lowlink = v.lowlink = v.index). It could also be that there was a back edge referring to something below the current node, which means that there was a strongly-connected component below the current node that has been printed out already. The current node, however, is definitely the root of a strongly-connected component as well.
v.lowlink > v.index: That's actually not possible. I'm just listing it for the sake of completeness. ;)
Hope it helps!
Some Intuition about the Tarjan's Algorithm:
During DFS, when we encounter a back edge from vertex v, we update its lowest reachable ancestor i.e. we update the value of low[v]
Now when the all the outgoing edges of a vertex are processed i.e we are about to exit the DFS call for the vertex v, we check the value of low[v], whether low[v] == v (Explanation below). If not this means v is not the root of the SCC and we now give the benefit to the parent of v i.e. the lowest reachable ancestor of parent[v] is now changed to low[v].
This sounds logical as if although there is no direct back edge from the parent[v] to the ancestor of v, but there is a path (back edge of v + edge towards v) via which the parent[v] can still reach the ancestor of v.
Thus we have also updated the low[parent[v]] here.
Therefore, we will keep on updating this chain and low[v] for all v will keep on updating until, we reach to the ancestor (via backtracking). For this ancestor low[v] will be equal to v. And thus this will act as the root of the SCC.
Hope this helps
Path for Mastering Bridges and Articulation Algorithms
Watch this video to get an awesome feeling that you have understood the
algorithm very well.
practice the algorithm from here and here.
practice problems to master it:
A. Cutting Figure
EC_P - Critical Edges
SUBMERGE - Submerging Islands
POLQUERY - Police Query

Find all possible paths from one vertex in a directed cyclic graph in Erlang

I would like to implement a function which finds all possible paths to all possible vertices from a source vertex V in a directed cyclic graph G.
The performance doesn't matter now, I just would like to understand the algorithm. I have read the definition of the Depth-first search algorithm, but I don't have full comprehension of what to do.
I don't have any completed piece of code to provide here, because I am not sure how to:
store the results (along with A->B->C-> we should also store A->B and A->B->C);
represent the graph (digraph? list of tuples?);
how many recursions to use (work with each adjacent vertex?).
How can I find all possible paths form one given source vertex in a directed cyclic graph in Erlang?
UPD: Based on the answers so far I have to redefine the graph definition: it is a non-acyclic graph. I know that if my recursive function hits a cycle it is an indefinite loop. To avoid that, I can just check if a current vertex is in the list of the resulting path - if yes, I stop traversing and return the path.
UPD2: Thanks for thought provoking comments! Yes, I need to find all simple paths that do not have loops from one source vertex to all the others.
In a graph like this:
with the source vertex A the algorithm should find the following paths:
A,B
A,B,C
A,B,C,D
A,D
A,D,C
A,D,C,B
The following code does the job, but it is unusable with graphs that have more that 20 vertices (I guess it is something wrong with recursion - takes too much memory, never ends):
dfs(Graph,Source) ->
?DBG("Started to traverse graph~n", []),
Neighbours = digraph:out_neighbours(Graph,Source),
?DBG("Entering recursion for source vertex ~w~n", [Source]),
dfs(Neighbours,[Source],[],Graph,Source),
ok.
dfs([],Paths,Result,_Graph,Source) ->
?DBG("There are no more neighbours left for vertex ~w~n", [Source]),
Result;
dfs([Neighbour|Other_neighbours],Paths,Result,Graph,Source) ->
?DBG("///The neighbour to check is ~w, other neighbours are: ~w~n",[Neighbour,Other_neighbours]),
?DBG("***Current result: ~w~n",[Result]),
New_result = relax_neighbours(Neighbour,Paths,Result,Graph,Source),
dfs(Other_neighbours,Paths,New_result,Graph,Source).
relax_neighbours(Neighbour,Paths,Result,Graph,Source) ->
case lists:member(Neighbour,Paths) of
false ->
?DBG("Found an unvisited neighbour ~w, path is: ~w~n",[Neighbour,Paths]),
Neighbours = digraph:out_neighbours(Graph,Neighbour),
?DBG("The neighbours of the unvisited vertex ~w are ~w, path is:
~w~n",[Neighbour,Neighbours,[Neighbour|Paths]]),
dfs(Neighbours,[Neighbour|Paths],Result,Graph,Source);
true ->
[Paths|Result]
end.
UPD3:
The problem is that the regular depth-first search algorithm will go one of the to paths first: (A,B,C,D) or (A,D,C,B) and will never go the second path.
In either case it will be the only path - for example, when the regular DFS backtracks from (A,B,C,D) it goes back up to A and checks if D (the second neighbour of A) is visited. And since the regular DFS maintains a global state for each vertex, D would have 'visited' state.
So, we have to introduce a recursion-dependent state - if we backtrack from (A,B,C,D) up to A, we should have (A,B,C,D) in the list of the results and we should have D marked as unvisited as at the very beginning of the algorithm.
I have tried to optimize the solution to tail-recursive one, but still the running time of the algorithm is unfeasible - it takes about 4 seconds to traverse a tiny graph of 16 vertices with 3 edges per vertex:
dfs(Graph,Source) ->
?DBG("Started to traverse graph~n", []),
Neighbours = digraph:out_neighbours(Graph,Source),
?DBG("Entering recursion for source vertex ~w~n", [Source]),
Result = ets:new(resulting_paths, [bag]),
Root = Source,
dfs(Neighbours,[Source],Result,Graph,Source,[],Root).
dfs([],Paths,Result,_Graph,Source,_,_) ->
?DBG("There are no more neighbours left for vertex ~w, paths are ~w, result is ~w~n", [Source,Paths,Result]),
Result;
dfs([Neighbour|Other_neighbours],Paths,Result,Graph,Source,Recursion_list,Root) ->
?DBG("~w *Current source is ~w~n",[Recursion_list,Source]),
?DBG("~w Checking neighbour _~w_ of _~w_, other neighbours are: ~w~n",[Recursion_list,Neighbour,Source,Other_neighbours]),
? DBG("~w Ready to check for visited: ~w~n",[Recursion_list,Neighbour]),
case lists:member(Neighbour,Paths) of
false ->
?DBG("~w Found an unvisited neighbour ~w, path is: ~w~n",[Recursion_list,Neighbour,Paths]),
New_paths = [Neighbour|Paths],
?DBG("~w Added neighbour to paths: ~w~n",[Recursion_list,New_paths]),
ets:insert(Result,{Root,Paths}),
Neighbours = digraph:out_neighbours(Graph,Neighbour),
?DBG("~w The neighbours of the unvisited vertex ~w are ~w, path is: ~w, recursion:~n",[Recursion_list,Neighbour,Neighbours,[Neighbour|Paths]]),
dfs(Neighbours,New_paths,Result,Graph,Neighbour,[[[]]|Recursion_list],Root);
true ->
?DBG("~w The neighbour ~w is: already visited, paths: ~w, backtracking to other neighbours:~n",[Recursion_list,Neighbour,Paths]),
ets:insert(Result,{Root,Paths})
end,
dfs(Other_neighbours,Paths,Result,Graph,Source,Recursion_list,Root).
Any ideas to run this in acceptable time?
Edit:
Okay I understand now, you want to find all simple paths from a vertex in a directed graph. So a depth-first search with backtracking would be suitable, as you have realised. The general idea is to go to a neighbour, then go to another one (not one which you've visited), and keep going until you hit a dead end. Then backtrack to the last vertex you were at and pick a different neighbour, etc.
You need to get the fiddly bits right, but it shouldn't be too hard. E.g. at every step you need to label the vertices 'explored' or 'unexplored' depending on whether you've already visited them before. The performance shouldn't be an issue, a properly implemented algorithm should take maybe O(n^2) time. So I don't know what you are doing wrong, perhaps you are visiting too many neighbours? E.g. maybe you are revisiting neighbours that you've already visited, and going round in loops or something.
I haven't really read your program, but the Wiki page on Depth-first Search has a short, simple pseudocode program which you can try to copy in your language. Store the graphs as Adjacency Lists to make it easier.
Edit:
Yes, sorry, you are right, the standard DFS search won't work as it stands, you need to adjust it slightly so that does revisit vertices it has visited before. So you are allowed to visit any vertices except the ones you have already stored in your current path.
This of course means my running time was completely wrong, the complexity of your algorithm will be through the roof. If the average complexity of your graph is d+1, then there will be approximately d*d*d*...*d = d^n possible paths.
So even if every vertex has only 3 neighbours, there's still quite a few paths when you get above 20 vertices.
There's no way around that really, because if you want your program to output all possible paths then indeed you will have to output all d^n of them.
I'm interested to know whether you need this for a specific task, or are just trying to program this out of interest. If the latter, you will just have to be happy with small, sparsely connected graphs.
I don't understand question. If I have graph G = (V, E) = ({A,B}, {(A,B),(B,A)}), there is infinite paths from A to B {[A,B], [A,B,A,B], [A,B,A,B,A,B], ...}. How I can find all possible paths to any vertex in cyclic graph?
Edit:
Did you even tried compute or guess growing of possible paths for some graphs? If you have fully connected graph you will get
2 - 1
3 - 4
4 - 15
5 - 64
6 - 325
7 - 1956
8 - 13699
9 - 109600
10 - 986409
11 - 9864100
12 - 108505111
13 - 1302061344
14 - 16926797485
15 - 236975164804
16 - 3554627472075
17 - 56874039553216
18 - 966858672404689
19 - 17403456103284420
20 - 330665665962403999
Are you sure you would like find all paths for all nodes? It means if you compute one milion paths in one second it would take 10750 years to compute all paths to all nodes in fully connected graph with 20 nodes. It is upper bound for your task so I think you don't would like do it. I think you want something else.
Not an improved algorithmic solution by any means, but you can often improve performance by spawning multiple worker threads, potentially here one for each first level node and then aggregating the results. This can often improve naive brute force algorithms relatively easily.
You can see an example here: Some Erlang Matrix Functions, in the maximise_assignment function (comments starting on line 191 as of today). Again, the underlying algorithm there is fairly naive and brute force, but the parallelisation speeds it up quite well for many forms of matrices.
I have used a similar approach in the past to find the number of Hamiltonian Paths in a graph.

general idea and maybe example on graph

I'm looking for a general idea (and maybe some code example or at least pseudocode)
Now, this is from a problem that someone gave me, or rather showed me, I don't have to solve it, but I did most of the questions anyway, the problem that I'm having is this:
Let's say you have a directed weighted graph with the following nodes:
AB5, BC4, CD8, DC8, DE6, AD5, CE2, EB3, AE7
and the question is:
how many different routes from C to C with a distance of less than x. (say, 10, 20, 30, 40)
The answer of different trips is: CDC, CEBC, CEBCDC, CDCEBC, CDEBC, CEBCEBC, CEBCEBCEBC.
The main problem I'm having with it is that when I do DFS or BFS, my implementation first chooses the node and marks it as visited therefore I'm only able to find 2 paths which are CDC and CEBC and then my algorithm quits. If I don't mark it as visited then on the next iteration (or recursive call) it will choose the same node and not next available route, so I have to always mark them as visited however by doing that how can I get for example CEBCEBCEBC, which is pretty much bouncing between nodes.
I've looked at all the different algorithms books that I have at home and while every algorithm describes how to do DFS, BFS and find shortest paths (all the good stufF), none show how to iterate indefinitively and stop only when one reaches certain weight of the graph or hits certain vertex number of times.
So why not just keep branching and branching; at each node you will evaluate two things; has this particular path exceeded the weight limit (if so, terminate the branch) and is this node where I started (in which case log my path history to an 'acceptable solutions' list); then make new branches which each take a step in each possible direction.
You should not mark nodes as visited; as MikeB points out, CDCDC is a valid solution and yet it revisits D.
I'd do it lke this:
Start with two lists of paths:
Solutions (empty) and
ActivePaths (containing one path, "C").
While ActivePaths is not empty,
Take a path out of ActivePaths (suppose it's "CD"[8]).
If its distance is not over the limit,
see where you are by looking at the last node in the path ("D").
If you're at "C", add a copy of this path to Solutions.
Now for each possible next destination ("C", "E")
make a copy of this path, ("CD"[8])
append the destination, ("CDC"[8])
add the weight, ("CDC"[16])
and put it in ActivePaths
Discard the path.
Whether this turns out to be a DFS, a BFS or something else depends on where in ActivePaths you insert and remove paths.
No offense, but this is pretty simple and you're talking about consulting a lot of books for the answer. I'd suggest playing around with the simple examples until they become more obvious.
In fact you have two different problems:
Find all distinct cycles from C to C, we will call them C_1, C_2, ..., C_n (done with a DFS)
Each C_i has a weight w_i, then you want every combination of cycles with a total weight less than N. This is a combinatorial problem (and seems to be easily solvable with dynamic programming).

Resources