A Shortest Path Algorithm With Minimum Number Of Nodes Traversed - graph

I am looking for a Dijkstra's algorithm implementation, that also takes into consideration the number of nodes traversed.
What I mean is, a typical Dijkstra's algorithm, takes into consideration the weight of the edges connecting the nodes, while calculating the shortest route from node A to node B. I want to insert another parameter into this. I want the algorithm to give some weightage to the number of nodes traversed, as well.
So that the shortest route computed from A to B, under certain values, may not necessarily be the Shortest Route, but the route with the least number of nodes traversed.
Any thoughts on this?
Cheers,
RD
Edit :
My apologies. I should have explained better. So, lets say, the shortest route from
(A, B) is A -> C -> D -> E -> F -> B covering a total of 10 units
But I want the algorithm to come up with the route A -> M -> N -> B covering a total of 12 units.
So, what I want, is to be able to give some weightage to the number of nodes as well, not just the distance of the connected nodes.

Let me demonstrate that adding a constant value to all edges can change which route is "shortest" (least total weight of edges).
Here's the original graph (a triangle):
A-------B
\ 5 /
2 \ / 2
\ /
C
Shortest path from A to B is via C. Now add constant 2 to all edges. The shortest path becomes instead the single step from A directly to B (owing to "penalty" we've introduced for using additional edges).
Note that the number of edges used is (excluding the node you start from) the same as the number of nodes visited.

The way you can do that is adapt the weights of the edges to always be 1, so that you traverse 5 nodes, and you've gone a distance of "5". The algorithm would be the same at that point, optimizing for number of nodes traversed rather than distance traveled.
If you want some sort of hybrid, you need to determine how much importance to give to traversing a node and the distance. The weight used in calculations should look something like:
weight = node_importance * 1 + (1 - node_importance) * distance
Where node_importance would be a percentage which gauges how much distance is a factor and how much minimum node traversal is important. Though you may have to normalize the distances to be an average of 1.

I'm going to go out on a limb here, but have you tried the A* algorithm? I may have understood your question wrong, but it seems like A* would be a good starting point for what you want to do.
Check out: http://en.wikipedia.org/wiki/A*_search_algorithm
Some pseudo code there to help you out too :)

If i understood the question correctly then its best analogy would be that used to find the best network path.
In network communication a path may not only be selected because it is shortest but has many hop counts(node), thus may lead to distortion, interference and noise due to node connection.
So the best path calculation contains the minimizing the function of variables as in your case Distance and Hop Count(nodes).
You have to derive a functional equation that could relate the distance and node counts with quality.
so something as suppose
1 hop count change = 5 unit distance (which means the impact is same for 5unit distace or 1 node change)
so to minimize the loss you can use it in the linear equation.
minimize(distance + hopcount);
where hopcount can be expressed as distance.

Related

Spanning tree with shortest path between two points

I have weighted undirected graph. I need to find spanning tree with minimal possible cost, so that distance between point A and B will be as low as possible.
For example, I have this graph: graph.
Minimal distance between A and B is 2.
Minimal spanning tree would look like this. But that would make distance between A and B = 3.
Right now I am doing this:
Find distance between AB in graph, using BFS.
Find all paths between AB with length from step 1 using DFS.
Generate spanning tree from every path from step 2.
Compare them and get minimal one.
Everything is OK until I got graph with A-B distance = 12.
Second step then take too much time. Is there any faster way of doing this? Thanks.
The fastest/most efficient way to solve this problem is to use Dijkstra's Shortest Path algorithm. This is a greedy algorithm that has the following basic structure:
1-all nodes on the graph start "infinity" distance apart
2-start with your first node (node A in your example) and keep track of each edge weight to get from this node A to each of its neighbors.
3-Choose the shortest current edge and follow it to your next node, let's call it node C for now
4-Now, for each of C's neighbors, compare the current distance (including infinity if applicable) with the sum of A's edge to C and C's shortest edge to the current neighbor. If it is shorter than the current distance, update it to the new distance.
5-Continue this process until all nodes have been visited and you reach the node you were looking for the shortest path to (i.e. B in your example)
This is a pretty efficient way of finding the shortest path between two nodes, with a running time of O(V^2), not O(nlogn) as mentioned above. As you can see, by making this a greedy algorithm, we are constantly choosing the optimal solution based on the available local information, and therefore we will never have to go back and change our decisions.
This also eliminates the need for a BFS and a DFS like in your example. Hope this helps!
While your step two is correct, I think that the problem is that you are doing too many operations.
You are doing both a BFS and a DFS which is going to be very costly. Instead, what I would recommend doing is using one of several different traversal techniques that will minimize in the computational costs.
This is a common problem to find the shortest path, and one of the popular solutions is Dijkstra's algorithm. Here is an article that expounds on this topic. https://www.geeksforgeeks.org/dijkstras-shortest-path-algorithm-greedy-algo-7/
In short, what this algorithm does, is it takes the starting point A, and then it generates the minimal spanning tree until point B is hit, then there is a single path in which A can get to B, and that is the shortest path.
Both this and your algorithm both run in O(nlogn), but in practice this solution can be thought of as running a single BFS instead of both a BFS and a DFS.

Graph theory, all paths with given distance

So I found a problem where a traveller can travel a certain distance in a graph and all bidirectional edges have some length(distance). Suppose when travelling a certain edge(either direction) you get some money/gift (it's given in question for all edges)so you have to find the max money you can collect for the given distance you can travel. Basic problem is how do I find all possible paths with given distance (there might be loops in graph) and after finding all possible paths, path with max money collected will simply be the answer. Note: any possible paths you come up with should not have a loop (straight path).
You are given an undirected connected graph with double weight on the edges (distance and reward).
You are given a fixed number d corresponding to a possible distance.
For each couple of nodes (u,v), u not equal to v, you are looking for
All the paths {P_j} connecting u and v with no repeating nodes whose total distance is d.
The paths {P_hat(j)} subset of {P_j} whose reward is maximal.
To get the first, I would try to use a modified version of the Floyd-Warshall algorithm, where you do not look for the shortest, but for any path.
Floyd-Warshall uses a strategy based on considering a "middle node" w between u and v and recursively finds the path minimising the distance between u and v.
You can do the same, while taking all path instead of excluding the minimisation, taking care of putting to inf the nodes where you have already b visited in the distance matrix and excluding at runtime every partial path in the recursion whose distance is longer than d or that arrives to an end (they connects u and v) and whose distance is shorter than d.
Can be generalised if an interval of possible distances [d, D] is given, instead of a single value d, as in this second case you would probably get the empty set all the time.
For the second step, you simply compare the reward of each of the path found in solving the first step, and you take the best one.
Is more a suggested direction rather than a complete answer, but I hope it helps!

Multi-goal path-finding [duplicate]

I have a undirected graph with about 100 nodes and about 200 edges. One node is labelled 'start', one is 'end', and there's about a dozen labelled 'mustpass'.
I need to find the shortest path through this graph that starts at 'start', ends at 'end', and passes through all of the 'mustpass' nodes (in any order).
( http://3e.org/local/maize-graph.png / http://3e.org/local/maize-graph.dot.txt is the graph in question - it represents a corn maze in Lancaster, PA)
Everyone else comparing this to the Travelling Salesman Problem probably hasn't read your question carefully. In TSP, the objective is to find the shortest cycle that visits all the vertices (a Hamiltonian cycle) -- it corresponds to having every node labelled 'mustpass'.
In your case, given that you have only about a dozen labelled 'mustpass', and given that 12! is rather small (479001600), you can simply try all permutations of only the 'mustpass' nodes, and look at the shortest path from 'start' to 'end' that visits the 'mustpass' nodes in that order -- it will simply be the concatenation of the shortest paths between every two consecutive nodes in that list.
In other words, first find the shortest distance between each pair of vertices (you can use Dijkstra's algorithm or others, but with those small numbers (100 nodes), even the simplest-to-code Floyd-Warshall algorithm will run in time). Then, once you have this in a table, try all permutations of your 'mustpass' nodes, and the rest.
Something like this:
//Precomputation: Find all pairs shortest paths, e.g. using Floyd-Warshall
n = number of nodes
for i=1 to n: for j=1 to n: d[i][j]=INF
for k=1 to n:
for i=1 to n:
for j=1 to n:
d[i][j] = min(d[i][j], d[i][k] + d[k][j])
//That *really* gives the shortest distance between every pair of nodes! :-)
//Now try all permutations
shortest = INF
for each permutation a[1],a[2],...a[k] of the 'mustpass' nodes:
shortest = min(shortest, d['start'][a[1]]+d[a[1]][a[2]]+...+d[a[k]]['end'])
print shortest
(Of course that's not real code, and if you want the actual path you'll have to keep track of which permutation gives the shortest distance, and also what the all-pairs shortest paths are, but you get the idea.)
It will run in at most a few seconds on any reasonable language :)
[If you have n nodes and k 'mustpass' nodes, its running time is O(n3) for the Floyd-Warshall part, and O(k!n) for the all permutations part, and 100^3+(12!)(100) is practically peanuts unless you have some really restrictive constraints.]
run Djikstra's Algorithm to find the shortest paths between all of the critical nodes (start, end, and must-pass), then a depth-first traversal should tell you the shortest path through the resulting subgraph that touches all of the nodes start ... mustpasses ... end
This is two problems... Steven Lowe pointed this out, but didn't give enough respect to the second half of the problem.
You should first discover the shortest paths between all of your critical nodes (start, end, mustpass). Once these paths are discovered, you can construct a simplified graph, where each edge in the new graph is a path from one critical node to another in the original graph. There are many pathfinding algorithms that you can use to find the shortest path here.
Once you have this new graph, though, you have exactly the Traveling Salesperson problem (well, almost... No need to return to your starting point). Any of the posts concerning this, mentioned above, will apply.
Actually, the problem you posted is similar to the traveling salesman, but I think closer to a simple pathfinding problem. Rather than needing to visit each and every node, you simply need to visit a particular set of nodes in the shortest time (distance) possible.
The reason for this is that, unlike the traveling salesman problem, a corn maze will not allow you to travel directly from any one point to any other point on the map without needing to pass through other nodes to get there.
I would actually recommend A* pathfinding as a technique to consider. You set this up by deciding which nodes have access to which other nodes directly, and what the "cost" of each hop from a particular node is. In this case, it looks like each "hop" could be of equal cost, since your nodes seem relatively closely spaced. A* can use this information to find the lowest cost path between any two points. Since you need to get from point A to point B and visit about 12 inbetween, even a brute force approach using pathfinding wouldn't hurt at all.
Just an alternative to consider. It does look remarkably like the traveling salesman problem, and those are good papers to read up on, but look closer and you'll see that its only overcomplicating things. ^_^ This coming from the mind of a video game programmer who's dealt with these kinds of things before.
This is not a TSP problem and not NP-hard because the original question does not require that must-pass nodes are visited only once. This makes the answer much, much simpler to just brute-force after compiling a list of shortest paths between all must-pass nodes via Dijkstra's algorithm. There may be a better way to go but a simple one would be to simply work a binary tree backwards. Imagine a list of nodes [start,a,b,c,end]. Sum the simple distances [start->a->b->c->end] this is your new target distance to beat. Now try [start->a->c->b->end] and if that's better set that as the target (and remember that it came from that pattern of nodes). Work backwards over the permutations:
[start->a->b->c->end]
[start->a->c->b->end]
[start->b->a->c->end]
[start->b->c->a->end]
[start->c->a->b->end]
[start->c->b->a->end]
One of those will be shortest.
(where are the 'visited multiple times' nodes, if any? They're just hidden in the shortest-path initialization step. The shortest path between a and b may contain c or even the end point. You don't need to care)
Andrew Top has the right idea:
1) Djikstra's Algorithm
2) Some TSP heuristic.
I recommend the Lin-Kernighan heuristic: it's one of the best known for any NP Complete problem. The only other thing to remember is that after you expanded out the graph again after step 2, you may have loops in your expanded path, so you should go around short-circuiting those (look at the degree of vertices along your path).
I'm actually not sure how good this solution will be relative to the optimum. There are probably some pathological cases to do with short circuiting. After all, this problem looks a LOT like Steiner Tree: http://en.wikipedia.org/wiki/Steiner_tree and you definitely can't approximate Steiner Tree by just contracting your graph and running Kruskal's for example.
Considering the amount of nodes and edges is relatively finite, you can probably calculate every possible path and take the shortest one.
Generally this known as the travelling salesman problem, and has a non-deterministic polynomial runtime, no matter what the algorithm you use.
http://en.wikipedia.org/wiki/Traveling_salesman_problem
The question talks about must-pass in ANY order. I have been trying to search for a solution about the defined order of must-pass nodes. I found my answer but since no question on StackOverflow had a similar question I'm posting here to let maximum people benefit from it.
If the order or must-pass is defined then you could run dijkstra's algorithm multiple times. For instance let's assume you have to start from s pass through k1, k2 and k3 (in respective order) and stop at e. Then what you could do is run dijkstra's algorithm between each consecutive pair of nodes. The cost and path would be given by:
dijkstras(s, k1) + dijkstras(k1, k2) + dijkstras(k2, k3) + dijkstras(k3, 3)
How about using brute force on the dozen 'must visit' nodes. You can cover all the possible combinations of 12 nodes easily enough, and this leaves you with an optimal circuit you can follow to cover them.
Now your problem is simplified to one of finding optimal routes from the start node to the circuit, which you then follow around until you've covered them, and then find the route from that to the end.
Final path is composed of :
start -> path to circuit* -> circuit of must visit nodes -> path to end* -> end
You find the paths I marked with * like this
Do an A* search from the start node to every point on the circuit
for each of these do an A* search from the next and previous node on the circuit to the end (because you can follow the circuit round in either direction)
What you end up with is a lot of search paths, and you can choose the one with the lowest cost.
There's lots of room for optimization by caching the searches, but I think this will generate good solutions.
It doesn't go anywhere near looking for an optimal solution though, because that could involve leaving the must visit circuit within the search.
One thing that is not mentioned anywhere, is whether it is ok for the same vertex to be visited more than once in the path. Most of the answers here assume that it's ok to visit the same edge multiple times, but my take given the question (a path should not visit the same vertex more than once!) is that it is not ok to visit the same vertex twice.
So a brute force approach would still apply, but you'd have to remove vertices already used when you attempt to calculate each subset of the path.

Cormen's "Introduction to algorithms" 3rd Edition - Edmonds-karps-Algorithm - Lemma 26.7

since I think many of us don't have the same edition of "Introduction to algorithms" of Prof. Cormen et al., I'm gonna write the Lemma (and my question) in the following.
Edmonds-Karp-Algorithm
Lemma 26.7 (in 3rd edition; in 2nd it may be Lemma 26.8):
If the Edmonds-Karp algorithm is run on a flow network G=(V,E) with source s and sink t, then for all vertices v in V{s,t}, the shortest-path distance df(s,v) in the residual network Gf increases monotonically with each flow augmentation
Proof:
First, suppose that for some vertex v in V{s,t}, there is a flow augmentation that causes the shortest-path distance from s to v to decrease, then we will derive a contradiction.
Let f be the flow just before the first augmentation that decreases some shortest-path distance, and let f' be the flow just afterward. Let v be the vertex with the minimum df'(s,v), whose distance was decreased by the augmentation, so that df'(s,v) < df(s,v). Let p = s ~~> u -> u be a shortest path from s to v in Gf', so that (u,v) in Ef' and
df'(s,u) = df'(s,v) - 1. (26.12)
Because of how we chose v, we know that the distance of vertex u from soruce s did not decrease, i.e.
df'(s,u) >= df(s,u). (26.13)
...
My question is: I don't really understand the phrase
"Because of how we chose v, we know that the distance of vertex u from soruce s did not decrease, i.e.
df'(s,u) >= df(s,u). (26.13)"
How does the way we chose v affect the property that "the distance of vertex u from s did not decrease" ? How can I derive the equation (26.13).
We know, u is a vertex on the path (s,v) and (u,v) is also a part in (s,v). Why can (s,u) not decrease as well?
Thank you all for your help.
My answer may be drawn out, but hopefully it helps for an all around understanding.
For some history, note that the Ford-Fulkerson algorithm came first. Ford-Fulkerson simply selects any path from the source to the sink, adds the amount of flow to the current capacity, then augments the Residual graph accordingly. Since the path that is selected could hypothetically be anything, there are scenarios where this approach takes 'forever' (figuratively and literally speaking, if the edge weights are allowed to be irrational) to actually terminate.
Edmonds-Karp does the same thing as the Ford-Fulkerson, only it chooses the 'shortest' path, which can be found via a breadth-first search (BFS).
BFS guarantees a certain (partial) ordering among the traversed vertices. For example, consider the following graph:
A -> B -> C,
BFS guarantees that B will be traversed before C. (You should be able to generalize this argument with more sophisticated graphs, an exercise I leave to you.) For the remainder of this post, let "n" denote the number of levels it takes in BFS to reach the target node. So if we were searching for node C in the example above, n = 2.
Edmonds-Karp behaves similarly to Ford-Fulkerson, only it guarantees that the shortest paths are chosen first. When Edmonds-Karp updates the residual graph, we know that only nodes at a level equal to or smaller than n have actually been traversed. Similarly, only edges between nodes for the first n levels could have possibly been updated in the residual graph.
I'm pretty sure that the 'how we chose v' reflects the ordering that BFS guarantees, since the added residual edges necessarily flow in the opposite direction of any selected path. If the residual edges were to create a shorter path, then it would have been possible to find a shorter path than n in the first place, because the residual edges are only created when a path to the target node has been found and BFS guarantees that the shortest such path has already been found.
Hope this helps and at least gives some insight.
I don't quite understand either. But I think that "how we choose v" here means that the flow augmentation only causes the path from s to v becomes shorter, in another way, v is the first node whose path from s becomes shorter because of the augmentation, thus the node u's distance from s does not become shorter.

finding the total number of distinct shortest paths between 2 nodes in undirected weighted graph in linear time?

I was wondering, that if there is a weighted graph G(V,E), and I need to find a single shortest path between any two vertices S and T in it then I could have used the Dijkstras algorithm. but I am not sure how this can be done when we need to find all the distinct shortest paths from S to T. Is it solvable on O(n) time? I had one more question like if we assume that the weights of the edges in the graph can assume values only in certain range lets say 1 <=w(e)<=2 will this effect the time complexity?
You can do it using Dijkstra. In Dijkstra's algorithm, you assign labels to each node based on the distance it has from your source and settle nodes according to this distance (smallest first). Once the target node is settled, you have the length of the shortest path. To retrieve the path (=the sequence of edges), you can keep track of the parent of each node you settle. To retrieve all possible paths, you have to account for multiple parents of each node.
A small example:
Suppose you have a graph which looks like this (all edges weight 1 for simplicity):
B
/ \
A C - D
\ /
E
When you do dijkstra to find the distance A->D, you would get 3. You would first settle node A with distance 0, then nodes B and E with distance 1, node C with distance 2 and finally node D with distance 3. To get the paths, you would remember the parents of the nodes when you settle them. In this case you would first set the parents of B and C =node A. You then need to set the parents of node C to B and E as they both supply the same length. Node D would get node C as a parent.
When you need the paths, you could just follow the parent pointers, starting at D and branching each time a node has multiple parents. This would give you a branch at node C.
The post mentioned in the comment above uses the same principles, although it does not allow you to actually extract the paths, it only gives you the count.
One final remark: Dijkstra is not a linear time algorithm as you need to do a lot of operations to maintain you queue and node sets.
(a) Using BFS from s to mark nodes visited as usual, in the meantime, changing following things:
(b) for each node v, it has a record L(v) representing the layer that v is on and a record f(v)
representing the number of incoming paths:
(c) everytime a node v1 is found in another node's v2 neighbors:
(d) if v1 is not in the queue, add it into queue, L(v1) = L(v2) + 1,f(v1)+ = f(v2)
(d) if v1 is already in the queue and L(v1) equals L(v2), do nothing.
(e) if v1 is already in the queue but L(v1) doe not equal to L(v2), f(v1)+ = f(v2)
(f) until in this modied BFS, t pop from the queue for the rst time, we have f(t)
(g) and this f(t) represents the number of dierent shortest paths from s to t
This might solve your problem.

Resources