I have weighted undirected graph. I need to find spanning tree with minimal possible cost, so that distance between point A and B will be as low as possible.
For example, I have this graph: graph.
Minimal distance between A and B is 2.
Minimal spanning tree would look like this. But that would make distance between A and B = 3.
Right now I am doing this:
Find distance between AB in graph, using BFS.
Find all paths between AB with length from step 1 using DFS.
Generate spanning tree from every path from step 2.
Compare them and get minimal one.
Everything is OK until I got graph with A-B distance = 12.
Second step then take too much time. Is there any faster way of doing this? Thanks.
The fastest/most efficient way to solve this problem is to use Dijkstra's Shortest Path algorithm. This is a greedy algorithm that has the following basic structure:
1-all nodes on the graph start "infinity" distance apart
2-start with your first node (node A in your example) and keep track of each edge weight to get from this node A to each of its neighbors.
3-Choose the shortest current edge and follow it to your next node, let's call it node C for now
4-Now, for each of C's neighbors, compare the current distance (including infinity if applicable) with the sum of A's edge to C and C's shortest edge to the current neighbor. If it is shorter than the current distance, update it to the new distance.
5-Continue this process until all nodes have been visited and you reach the node you were looking for the shortest path to (i.e. B in your example)
This is a pretty efficient way of finding the shortest path between two nodes, with a running time of O(V^2), not O(nlogn) as mentioned above. As you can see, by making this a greedy algorithm, we are constantly choosing the optimal solution based on the available local information, and therefore we will never have to go back and change our decisions.
This also eliminates the need for a BFS and a DFS like in your example. Hope this helps!
While your step two is correct, I think that the problem is that you are doing too many operations.
You are doing both a BFS and a DFS which is going to be very costly. Instead, what I would recommend doing is using one of several different traversal techniques that will minimize in the computational costs.
This is a common problem to find the shortest path, and one of the popular solutions is Dijkstra's algorithm. Here is an article that expounds on this topic. https://www.geeksforgeeks.org/dijkstras-shortest-path-algorithm-greedy-algo-7/
In short, what this algorithm does, is it takes the starting point A, and then it generates the minimal spanning tree until point B is hit, then there is a single path in which A can get to B, and that is the shortest path.
Both this and your algorithm both run in O(nlogn), but in practice this solution can be thought of as running a single BFS instead of both a BFS and a DFS.
I have a undirected graph with about 100 nodes and about 200 edges. One node is labelled 'start', one is 'end', and there's about a dozen labelled 'mustpass'.
I need to find the shortest path through this graph that starts at 'start', ends at 'end', and passes through all of the 'mustpass' nodes (in any order).
( http://3e.org/local/maize-graph.png / http://3e.org/local/maize-graph.dot.txt is the graph in question - it represents a corn maze in Lancaster, PA)
Everyone else comparing this to the Travelling Salesman Problem probably hasn't read your question carefully. In TSP, the objective is to find the shortest cycle that visits all the vertices (a Hamiltonian cycle) -- it corresponds to having every node labelled 'mustpass'.
In your case, given that you have only about a dozen labelled 'mustpass', and given that 12! is rather small (479001600), you can simply try all permutations of only the 'mustpass' nodes, and look at the shortest path from 'start' to 'end' that visits the 'mustpass' nodes in that order -- it will simply be the concatenation of the shortest paths between every two consecutive nodes in that list.
In other words, first find the shortest distance between each pair of vertices (you can use Dijkstra's algorithm or others, but with those small numbers (100 nodes), even the simplest-to-code Floyd-Warshall algorithm will run in time). Then, once you have this in a table, try all permutations of your 'mustpass' nodes, and the rest.
Something like this:
//Precomputation: Find all pairs shortest paths, e.g. using Floyd-Warshall
n = number of nodes
for i=1 to n: for j=1 to n: d[i][j]=INF
for k=1 to n:
for i=1 to n:
for j=1 to n:
d[i][j] = min(d[i][j], d[i][k] + d[k][j])
//That *really* gives the shortest distance between every pair of nodes! :-)
//Now try all permutations
shortest = INF
for each permutation a[1],a[2],...a[k] of the 'mustpass' nodes:
shortest = min(shortest, d['start'][a[1]]+d[a[1]][a[2]]+...+d[a[k]]['end'])
print shortest
(Of course that's not real code, and if you want the actual path you'll have to keep track of which permutation gives the shortest distance, and also what the all-pairs shortest paths are, but you get the idea.)
It will run in at most a few seconds on any reasonable language :)
[If you have n nodes and k 'mustpass' nodes, its running time is O(n3) for the Floyd-Warshall part, and O(k!n) for the all permutations part, and 100^3+(12!)(100) is practically peanuts unless you have some really restrictive constraints.]
run Djikstra's Algorithm to find the shortest paths between all of the critical nodes (start, end, and must-pass), then a depth-first traversal should tell you the shortest path through the resulting subgraph that touches all of the nodes start ... mustpasses ... end
This is two problems... Steven Lowe pointed this out, but didn't give enough respect to the second half of the problem.
You should first discover the shortest paths between all of your critical nodes (start, end, mustpass). Once these paths are discovered, you can construct a simplified graph, where each edge in the new graph is a path from one critical node to another in the original graph. There are many pathfinding algorithms that you can use to find the shortest path here.
Once you have this new graph, though, you have exactly the Traveling Salesperson problem (well, almost... No need to return to your starting point). Any of the posts concerning this, mentioned above, will apply.
Actually, the problem you posted is similar to the traveling salesman, but I think closer to a simple pathfinding problem. Rather than needing to visit each and every node, you simply need to visit a particular set of nodes in the shortest time (distance) possible.
The reason for this is that, unlike the traveling salesman problem, a corn maze will not allow you to travel directly from any one point to any other point on the map without needing to pass through other nodes to get there.
I would actually recommend A* pathfinding as a technique to consider. You set this up by deciding which nodes have access to which other nodes directly, and what the "cost" of each hop from a particular node is. In this case, it looks like each "hop" could be of equal cost, since your nodes seem relatively closely spaced. A* can use this information to find the lowest cost path between any two points. Since you need to get from point A to point B and visit about 12 inbetween, even a brute force approach using pathfinding wouldn't hurt at all.
Just an alternative to consider. It does look remarkably like the traveling salesman problem, and those are good papers to read up on, but look closer and you'll see that its only overcomplicating things. ^_^ This coming from the mind of a video game programmer who's dealt with these kinds of things before.
This is not a TSP problem and not NP-hard because the original question does not require that must-pass nodes are visited only once. This makes the answer much, much simpler to just brute-force after compiling a list of shortest paths between all must-pass nodes via Dijkstra's algorithm. There may be a better way to go but a simple one would be to simply work a binary tree backwards. Imagine a list of nodes [start,a,b,c,end]. Sum the simple distances [start->a->b->c->end] this is your new target distance to beat. Now try [start->a->c->b->end] and if that's better set that as the target (and remember that it came from that pattern of nodes). Work backwards over the permutations:
[start->a->b->c->end]
[start->a->c->b->end]
[start->b->a->c->end]
[start->b->c->a->end]
[start->c->a->b->end]
[start->c->b->a->end]
One of those will be shortest.
(where are the 'visited multiple times' nodes, if any? They're just hidden in the shortest-path initialization step. The shortest path between a and b may contain c or even the end point. You don't need to care)
Andrew Top has the right idea:
1) Djikstra's Algorithm
2) Some TSP heuristic.
I recommend the Lin-Kernighan heuristic: it's one of the best known for any NP Complete problem. The only other thing to remember is that after you expanded out the graph again after step 2, you may have loops in your expanded path, so you should go around short-circuiting those (look at the degree of vertices along your path).
I'm actually not sure how good this solution will be relative to the optimum. There are probably some pathological cases to do with short circuiting. After all, this problem looks a LOT like Steiner Tree: http://en.wikipedia.org/wiki/Steiner_tree and you definitely can't approximate Steiner Tree by just contracting your graph and running Kruskal's for example.
Considering the amount of nodes and edges is relatively finite, you can probably calculate every possible path and take the shortest one.
Generally this known as the travelling salesman problem, and has a non-deterministic polynomial runtime, no matter what the algorithm you use.
http://en.wikipedia.org/wiki/Traveling_salesman_problem
The question talks about must-pass in ANY order. I have been trying to search for a solution about the defined order of must-pass nodes. I found my answer but since no question on StackOverflow had a similar question I'm posting here to let maximum people benefit from it.
If the order or must-pass is defined then you could run dijkstra's algorithm multiple times. For instance let's assume you have to start from s pass through k1, k2 and k3 (in respective order) and stop at e. Then what you could do is run dijkstra's algorithm between each consecutive pair of nodes. The cost and path would be given by:
dijkstras(s, k1) + dijkstras(k1, k2) + dijkstras(k2, k3) + dijkstras(k3, 3)
How about using brute force on the dozen 'must visit' nodes. You can cover all the possible combinations of 12 nodes easily enough, and this leaves you with an optimal circuit you can follow to cover them.
Now your problem is simplified to one of finding optimal routes from the start node to the circuit, which you then follow around until you've covered them, and then find the route from that to the end.
Final path is composed of :
start -> path to circuit* -> circuit of must visit nodes -> path to end* -> end
You find the paths I marked with * like this
Do an A* search from the start node to every point on the circuit
for each of these do an A* search from the next and previous node on the circuit to the end (because you can follow the circuit round in either direction)
What you end up with is a lot of search paths, and you can choose the one with the lowest cost.
There's lots of room for optimization by caching the searches, but I think this will generate good solutions.
It doesn't go anywhere near looking for an optimal solution though, because that could involve leaving the must visit circuit within the search.
One thing that is not mentioned anywhere, is whether it is ok for the same vertex to be visited more than once in the path. Most of the answers here assume that it's ok to visit the same edge multiple times, but my take given the question (a path should not visit the same vertex more than once!) is that it is not ok to visit the same vertex twice.
So a brute force approach would still apply, but you'd have to remove vertices already used when you attempt to calculate each subset of the path.
So let's say I got a matrix with two types of cells: 0 and 1. 1 is not passable.
I want to find a point, from which I can run paths (say, A*) to a bunch of destinations (don't expect it to be more than 4). And I want the length of these paths to be such that l1/l2/l3/l4 = 1 or as close to 1 as possible.
For two destinations is simple: run a path between them and take the midpoint. For more destinations, I imagine I can run paths between each pair, then they will create a sort of polygon, and I could grab the centroid (or average of all path point coordinates)? Or would it be better to take all midpoints of paths between each pair and then use them as vertices in a polygon which will contain my desired point?
It seems you want to find the point with best access to multiple endpoints. For other readers, this is like trying to found an ideal settlement to trade with nearby cities; you want them to be as accessible as possible. It appears to be a variant of the Weber Problem applied to pathfinding.
The best solution, as you can no longer rely on exploiting geometry (imagine a mountain path or two blocking the way), is going to be an iterative approach. I don't imagine it will be easy to find an optimal solution because you'll need to check every square; you can't guess by pathing between endpoints anymore. In nearly any large problem space, you will need to path from each possible centroid to all endpoints. A suboptimal solution will be fairly fast. I recommend these steps:
try to estimate the centroid using geometry, forming a search area
Use a modified A* algorithm from each point S in the search area to all your target points T to generate a perfect path from S to each T.
Add the length of each path S -> T together to get Cost (probably stored in a matrix for all sample points)
Select the lowest Cost from all your samples in the matrix (or the entire population if you didn't cull the search space).
The algorithm above can also work without estimating a centroid and limiting solutions. If you choose to search the entire space, the search will be much longer, but you can find a perfect solution even in a labyrinth. If you estimate the centroid and start the search near it, you'll find good answers faster.
I mentioned earlier that you should use a modified A* algorithm... Rather than repeating a generic A* search S->Tn for every T, code A* so that it seeks multiple target locations, storing the paths to each one and stopping when it has found them all.
If you really want a perfect solution to the problem, you'll be waiting a long time, so I recommend that you use any exploit you can to reduce wasteful calculations. Even go so far as to store found paths in a lookup table for each T, and see if a point already exists along any of those paths.
To put it simply, finding the point is easy. Finding it fast-enough might take lots of clever heuristics (cost-saving measures) and stored data.
I have a directed acyclic weighted graph which I want to traverse.
The constraints for a valid solution route are:
The sum of the weights of all edges traversed in the route must be the highest possible in the graph, taking in mind the second constraint.
Exactly N vertices must have been visited in the chosen route (including the start and end vertex).
Typically the graph will have a high amount of vertices and edges, so trying all possibilities is not an option, and requires quite an efficient algorithm.
Looking for some pointers or a suitable algorithm for this problem. I know the first condition is easily fulfilled using Dijkstra's algorithm, but I am not sure how to incorporate the second condition, or even where to begin to look.
Please let me know if any additional information is needed.
I'm not sure if you are interested in any path of length N in the graph or just path between two specific vertices; I suspect the latter, but you did not mention that constraint in your question.
If the former, the solution should be a trivial Dijkstra-like algorithm where you sort all edges by their potential path value that starts at the edge weight and gets adjusted by already built adjecent paths. In each iteration, take the node with the best potential path value and add it to an adjecent path. Stop when you get a path of length N (or longer that you cut off at the sides). There are some other technical details esp. wrt. creating long paths, but I won't go into details as I suspect this is not what you are interested in. :-)
If you have fixed source and sink, I think there is no deep magic involved - just run a basic Dijkstra where a path will be associated with each vertex added to the queue, but do not insert vertices with path length >= N into the queue and do not insert sink into the queue unless its path length is N.
I was thinking about an extension to the Shortest Hamiltonian Path (SHP) problem, and I couldn't find a way of solving it. I know it is NP-complete, but I figured I'd ask here for ideas, since I do not want to simply brute force the problem.
The extension is fairly simply: Given an undirected, complete, weighted graph with n vertices, find the shortest hamiltonian path with end vertices v and u.
So, bruteforce would still take O(n!) time, since the remaining n-2 vertices can be visited in (n-2)! ways. I was trying to find a way to maybe solve this slightly faster. My efforts for finding a way to solve this problem in a beneficial manner has so far been fruitless.
Would anyone have an idea how to exploit the knowledge of the end-vertices? Preferably explained alongside some pseudocode. It is required for the solution found to be optimal.
I guess it could be solved by integer programming, since the knowledge of end nodes are fairly limiting, and makes it easy to avoid cycles, but it wouldn't really exploit the composition of the problem.
If you want to find the shortest path to connect all nodes, then you should look at travelling salesman algorithms. I don't exactly see why you approach it as an HSP. The only reason I can think of is that you want to specify your starting cities, but it should be easy to fix that (if you need that I can post it) by changing your graph a bit.
edit: adding how to tweak your graph
Add 1 node (call it E) and only connect it to your starting and ending nodes. A TSP will compute a solution to your problem by connecting all your nodes. As E is only reachable by going from start to E and then to end, the solution (a cycle yes) will contain start - E - end. Then you remove the 2 edges to and from E and you have the optimal solution. Within the TSP, the path from start to end will be optimal.
You can use a metaheuristic algorithm. There are many kinds of them (Local search, constructive search, etc.). One of them could be based on:
For each x belonging to the set of vertices X:
- Find the set of closest vertices to x C.
- Filter the set C in order to include one of them in the path P.
Repeat until all vertices of X have been included in the path P.
End.