I've read that the BFS algorithm is better suited to parallel implementation than DFS. I'm having trouble grasping the intuition of why this should be true. Can anyone explain?
Thanks
BFS is a procedure of propagating frontiers. 'Propagate' means push all their unvisited adjacent vertices into a queue, and they can be processing independently.
in DFS, the vertices are visited one by one along a direction such as A->B->C. Visiting B must occur before visiting C. It's a sequential procedure which can not be parallelized easily.
But the truth is both BFS and DFS are hard to parallelize because all the processing nodes have to know the global information. Accessing global variables always requires synchronizations and communications between nodes.
There a some good informations about DFS and BFS in general:
Breadth first search and depth first search
Difference Between BFS and DFS
Breadth First Search/Depth First Search Animations
What’s the difference between DFS and BFS?
Especially the animation shows how BFS use more parallel concepts. I think BFS could be implemented parallel, but there is no parallel solution for DFS.
DFS is a linear algorithm who calls every child once.
Related
I want to know what are the cases in which BFS and DFS produce the same tree from a graph rooted at any node. I know one of the cases is when the graph is already a tree. Is this the only case?
Is it dependent on the way you choose the neighbors of a node? What are some ways that the order of picking the neighbors will make it the same?
First, before delving into a direct answer to the question, I want to take a slight moment to explain DFS and BFS. This is for anyone reading who may not know what "breadth first search" and "depth first search" are.
DFS: Depth-First Search is a tree-traversal algorithm that starts at a root node and explores as far along as possible for each branch before backtracking.
BFS: Breadth-First Search is a tree traversal algorithm that starts at a root node and explores all vertices at the current depth-level before moving on to explore the vertices at the next depth level. This is a top-down approach to create a tree.
Now, onto the question.
I know one of the cases is when the graph is already a tree. Is this the only case?
You are correct in saying this is one of the cases where BFS and DFS of a graph produce the same tree. To be more explicitly clear, when you state that the graph is already a tree, you're hopefully referring to a graph that has one of the three following properties:
A graph with a maximum breadth of 1
A graph that resembles a singly linked list
When you state the tree has to be the same, if isomorphic trees are regarded to as
the "same" by your definition, then any graph that is already a tree (which may also be what you're referring to in your question) will produce the "same" tree for BFS and DFS. Of course, there must be no cycles within the tree, or the graph will result in different outcomes for BFS and DFS.
In all three of these instances, the DFS and BFS of the tree will be the same given that you start at the same proper root node for both traversals. These are the only cases that I know of. However, in some instances, there are other means of obtaining the same tree for DFS and BFS traversal of a given graph if you have the freedom to determine the order you can select "neighbors" of a given node. This leads to the second half of your question...
Is it dependent on the way you choose the neighbors of a node? What are some ways that the order of picking the neighbors will make it the same?
Yes, it is dependent (if this question means what I really think it means), because if you have the freedom for this, then there are instances in which the order of picking neighbors will make it the same. For example, having a tree in which there is at most only one child that also has children, you can obtain the same BFS and DFS by traversing the nodes without children first in the DFS traversal of the graph.
Hopefully this information helps.
I have a problem in which I need to find the longest path. Given an unveighted undirected graph. Starting from a given vertex I need to visit as many vertices as possible and finish in the same one without visiting each of them more then once.
Most of the algorithms I found were for a special case (acyclic, directed etc.). An idea can be to find Hamiltonian cycle for every subset of the vertices (the subset can be generated with backtrack). But I guess there must be a far better algorithm.
As you've discovered, finding the largest cycle involves finding the Hamiltonian cycles of its subgraphs, and thus is NP-complete - unless you're working on some special class of graphs, any solution is going to be exponential in complexity.
A smart brute force approach (e.g. bitmask) is the best efficiency one can get for this type of problem.
Are there any advantages to writing a BFS tree-traversal algorithm recursively vs iteratively? It seems to me iterative is the way to go since it can be implemented in a simple loop:
Enqueue root node
Dequeue node and examine
Enqueue its children
Go to step 2
Is there any advantage to recursion? It seems more complex with no advantages.
Thanks in advance...
When considering algorithms, we mainly consider time complexity and space complexity.
The time complexity of iterative BFS is O(|V|+|E|), where |V| is the number of vertices and |E| is the number of edges in the graph. So does recursive BFS.
And the space complexity of iterative BFS is O(|V|). So does recursive BFS.
From the perspective of time complexity and space complexity, there is no difference between these two algorithms. Since iterative BFS is easy to understand, it is not surprising that people prefer iterative BFS.
I have implemented the Dijkstra algorithm from the pseudocode found in the reference "Introduction to Algorithms", 3rd edition by Cormen, for the single-source shortest path problem.
My implementation was made on python using linked lists to represent graphs in an adjacency list representation. This means that the list of nodes is a linked list and each node has a linked list to represent the edges of each node. Furthermore, I didn't implement or use any binary heap or fibonacci heap for the minimum priority queue that the algorithm needs, so I search for each node in O(V) time inside the linked list of nodes when the procedure needs to extract the next node with the smallest distance from the source.
On the other hand, the reference also provides an algorithm for DAG's (which I have implemented) using a topological sort before applying the relaxation procedure to all the edges.
With all these context, I have a Dijkstra algorithm with a complexity of
O(V^2)
And a DAG-shortest path algorithm with a complexity of
O(V+E)
By using the timeit.default_timer() function to calculate the running times of the algorithms, I have found that the Dijkstra algorithm is faster that the DAG algorithm when applied to DAGs with positive edge weights and different graph densities, all for 100 and 1000 nodes.
Shouldn't the DAG-shortest path algorithm be faster than Dijkstra for DAGs?
Your running time analysis for both algorithms is correct and it's true that DAG shortest path algorithm is faster than Dijkstra's algorithm for DAGs.
However, there are 3 possible reasons for your testing results:
The graph you used for testing is very dense. When the graph is very dense, E ≈ V^2, so the running time for both algorithms approach O(V^2).
The number of vertices is still not large enough. To solve this, you can use a much larger graph for further testing.
The initialization of the DAG costs a lot of running time.
Anyway, DAG shortest path algorithm should be faster than Dijkstra's algorithm theoretically.
I want to carry out Graph Clustering in a huge undirected graph with millions of edges and nodes. Graph is almost clustered with different clusters joined together only by some nodes(kind of ambiguous nodes which can relate to multiple clusters). There will be very few or almost no edges between two clusters. This problem is almost similar to finding vertex cut set of a graph, with one exception that graph needs to be partitioned into many components(their number being unknown).(Refer this picture https://docs.google.com/file/d/0B7_3zLD0XdtAd3ZwMFAwWDZuU00/edit?pli=1)
Its almost like different strongly connected components sharing a couple of nodes between them and i am supposed to remove those nodes to separate those strongly connected components. Edges are weigthed but this problem is more like finding structures in a graph, so edge weigths won't be of relevance. (Another way to think about the problem would be to visualize Solid Spheres touching each other at some points with Spheres being those strongly connected components and touching points being those ambiguous nodes)
I am prototyping something, so am quiet short of time to pick up Graph Clustering Algorithms by myself and to select the best possible. Plus i need a solution that would cut nodes and not edges since different clusters share nodes and not edges in my case.
Is there any research paper, blog that addresses this or somewhat related problem? Or can anyone come up with a solution to this problem howsoever dirty.
Since millions of nodes and edges are involved, i would need a MapReduce implementation of the solution. Any inputs, links for that too?
Is there any current open source implementation in MapReduce that can i directly use?
I think this problem is analogous to Finding Communities in online social networks by removing vertices.
Your problem is not so simple. I am afraid that it is related to the clique problem, which is NP complete, so unless you quantify somehow the statement "there are almost no edges between the clusters", your problem might be still very difficult. But what I would do in your shoes, would be to try one dirty, greedy approach, namely regarding the nodes as the following kind of quasi-neural net:
Each vertex I would consider to have inputs, outputs and a sigmoid activation function which convert the input value (sum of inputs) into the output value. The output value, and I consider this important, would not be cloned and sent to all the neighbors, but rather divided evenly between the neighbors. In addition to this, I would define a logarithmic decay of activity in a neuron (self-suppression, suppressive connection to itself), defined by a decay parameter global for the net.
Now, I would start simulation with all the neurons starting from activity 0.5 (activity range is 0 to 1) with very high decay parameter, which would lead to all the neuronst quickly stabilizing in 0 state. I would then gradually decrease the decay parameter until the steady state result would yield the first clique with non-zero stable activity.
The question is what to do next. One possibility is to subtract the found clique from the graph and run the same process again until we find all the cliques. This greedy approach might succeed if your graph is indeed as well behaved (really almost clustered) as you say, but might lead to unexpected results otherwise. Another possibility is to give the found clique a unique clique smell that would be repulsive (mutual suppresion) to other cliques an rerun the algorithm until the second clique is found, give it a different clique smell repulsive to all others etc., until each node has its own assigned smell.
I think this would be as many big ideas as i have about this.
The key is, that since it is probably not possible to solve this problem in the general case (likely NP complete), you need to take use of whatever special properties your graph has. That means you need to play with parameters for a while until the algorithm solves 99% of the cases that you encounter. I don't think that it is possible to give the numerically precise answer to your question without long experimentation with the actual datasets that you encounter.
Since millions of nodes and edges are involved, i would need a MapReduce implementation of the solution. Any inputs, links for that too?
In my experience I doubt if using Map/Reduce over here would be truly advantageous. First 10^6 order of nodes isn't really that large [that too in a non hyper-connected graph, since you are considering clustering], and the over head of using Map/Reduce [unless you already have setup your hardware/software for it] for your problem will not be worth it.
Map/Reduce will work much better, where once you have solved the clustering issue, and then want to process each cluster with similar analysis. Basically when you can break your task into relatively isolated sub-tasks, which can be performed in parallel. This of course can be cascaded to several layers.
In a relatively similar situation, I personally first modelled my graph into a graph database (I used Neo4J, and would recommend it highly) and then ran my analytic and queries on it. You will be surprised as to how white board friendly this solution is, and even massively joined and connected queries will be executed near instantaneously especially at the scale of only a few million nodes. For example, you can do a filtered analysis, based on degrees of separation, followed by listing of commons.