general idea and maybe example on graph - graph

I'm looking for a general idea (and maybe some code example or at least pseudocode)
Now, this is from a problem that someone gave me, or rather showed me, I don't have to solve it, but I did most of the questions anyway, the problem that I'm having is this:
Let's say you have a directed weighted graph with the following nodes:
AB5, BC4, CD8, DC8, DE6, AD5, CE2, EB3, AE7
and the question is:
how many different routes from C to C with a distance of less than x. (say, 10, 20, 30, 40)
The answer of different trips is: CDC, CEBC, CEBCDC, CDCEBC, CDEBC, CEBCEBC, CEBCEBCEBC.
The main problem I'm having with it is that when I do DFS or BFS, my implementation first chooses the node and marks it as visited therefore I'm only able to find 2 paths which are CDC and CEBC and then my algorithm quits. If I don't mark it as visited then on the next iteration (or recursive call) it will choose the same node and not next available route, so I have to always mark them as visited however by doing that how can I get for example CEBCEBCEBC, which is pretty much bouncing between nodes.
I've looked at all the different algorithms books that I have at home and while every algorithm describes how to do DFS, BFS and find shortest paths (all the good stufF), none show how to iterate indefinitively and stop only when one reaches certain weight of the graph or hits certain vertex number of times.

So why not just keep branching and branching; at each node you will evaluate two things; has this particular path exceeded the weight limit (if so, terminate the branch) and is this node where I started (in which case log my path history to an 'acceptable solutions' list); then make new branches which each take a step in each possible direction.

You should not mark nodes as visited; as MikeB points out, CDCDC is a valid solution and yet it revisits D.
I'd do it lke this:
Start with two lists of paths:
Solutions (empty) and
ActivePaths (containing one path, "C").
While ActivePaths is not empty,
Take a path out of ActivePaths (suppose it's "CD"[8]).
If its distance is not over the limit,
see where you are by looking at the last node in the path ("D").
If you're at "C", add a copy of this path to Solutions.
Now for each possible next destination ("C", "E")
make a copy of this path, ("CD"[8])
append the destination, ("CDC"[8])
add the weight, ("CDC"[16])
and put it in ActivePaths
Discard the path.
Whether this turns out to be a DFS, a BFS or something else depends on where in ActivePaths you insert and remove paths.
No offense, but this is pretty simple and you're talking about consulting a lot of books for the answer. I'd suggest playing around with the simple examples until they become more obvious.

In fact you have two different problems:
Find all distinct cycles from C to C, we will call them C_1, C_2, ..., C_n (done with a DFS)
Each C_i has a weight w_i, then you want every combination of cycles with a total weight less than N. This is a combinatorial problem (and seems to be easily solvable with dynamic programming).

Related

Multi-goal path-finding [duplicate]

I have a undirected graph with about 100 nodes and about 200 edges. One node is labelled 'start', one is 'end', and there's about a dozen labelled 'mustpass'.
I need to find the shortest path through this graph that starts at 'start', ends at 'end', and passes through all of the 'mustpass' nodes (in any order).
( http://3e.org/local/maize-graph.png / http://3e.org/local/maize-graph.dot.txt is the graph in question - it represents a corn maze in Lancaster, PA)
Everyone else comparing this to the Travelling Salesman Problem probably hasn't read your question carefully. In TSP, the objective is to find the shortest cycle that visits all the vertices (a Hamiltonian cycle) -- it corresponds to having every node labelled 'mustpass'.
In your case, given that you have only about a dozen labelled 'mustpass', and given that 12! is rather small (479001600), you can simply try all permutations of only the 'mustpass' nodes, and look at the shortest path from 'start' to 'end' that visits the 'mustpass' nodes in that order -- it will simply be the concatenation of the shortest paths between every two consecutive nodes in that list.
In other words, first find the shortest distance between each pair of vertices (you can use Dijkstra's algorithm or others, but with those small numbers (100 nodes), even the simplest-to-code Floyd-Warshall algorithm will run in time). Then, once you have this in a table, try all permutations of your 'mustpass' nodes, and the rest.
Something like this:
//Precomputation: Find all pairs shortest paths, e.g. using Floyd-Warshall
n = number of nodes
for i=1 to n: for j=1 to n: d[i][j]=INF
for k=1 to n:
for i=1 to n:
for j=1 to n:
d[i][j] = min(d[i][j], d[i][k] + d[k][j])
//That *really* gives the shortest distance between every pair of nodes! :-)
//Now try all permutations
shortest = INF
for each permutation a[1],a[2],...a[k] of the 'mustpass' nodes:
shortest = min(shortest, d['start'][a[1]]+d[a[1]][a[2]]+...+d[a[k]]['end'])
print shortest
(Of course that's not real code, and if you want the actual path you'll have to keep track of which permutation gives the shortest distance, and also what the all-pairs shortest paths are, but you get the idea.)
It will run in at most a few seconds on any reasonable language :)
[If you have n nodes and k 'mustpass' nodes, its running time is O(n3) for the Floyd-Warshall part, and O(k!n) for the all permutations part, and 100^3+(12!)(100) is practically peanuts unless you have some really restrictive constraints.]
run Djikstra's Algorithm to find the shortest paths between all of the critical nodes (start, end, and must-pass), then a depth-first traversal should tell you the shortest path through the resulting subgraph that touches all of the nodes start ... mustpasses ... end
This is two problems... Steven Lowe pointed this out, but didn't give enough respect to the second half of the problem.
You should first discover the shortest paths between all of your critical nodes (start, end, mustpass). Once these paths are discovered, you can construct a simplified graph, where each edge in the new graph is a path from one critical node to another in the original graph. There are many pathfinding algorithms that you can use to find the shortest path here.
Once you have this new graph, though, you have exactly the Traveling Salesperson problem (well, almost... No need to return to your starting point). Any of the posts concerning this, mentioned above, will apply.
Actually, the problem you posted is similar to the traveling salesman, but I think closer to a simple pathfinding problem. Rather than needing to visit each and every node, you simply need to visit a particular set of nodes in the shortest time (distance) possible.
The reason for this is that, unlike the traveling salesman problem, a corn maze will not allow you to travel directly from any one point to any other point on the map without needing to pass through other nodes to get there.
I would actually recommend A* pathfinding as a technique to consider. You set this up by deciding which nodes have access to which other nodes directly, and what the "cost" of each hop from a particular node is. In this case, it looks like each "hop" could be of equal cost, since your nodes seem relatively closely spaced. A* can use this information to find the lowest cost path between any two points. Since you need to get from point A to point B and visit about 12 inbetween, even a brute force approach using pathfinding wouldn't hurt at all.
Just an alternative to consider. It does look remarkably like the traveling salesman problem, and those are good papers to read up on, but look closer and you'll see that its only overcomplicating things. ^_^ This coming from the mind of a video game programmer who's dealt with these kinds of things before.
This is not a TSP problem and not NP-hard because the original question does not require that must-pass nodes are visited only once. This makes the answer much, much simpler to just brute-force after compiling a list of shortest paths between all must-pass nodes via Dijkstra's algorithm. There may be a better way to go but a simple one would be to simply work a binary tree backwards. Imagine a list of nodes [start,a,b,c,end]. Sum the simple distances [start->a->b->c->end] this is your new target distance to beat. Now try [start->a->c->b->end] and if that's better set that as the target (and remember that it came from that pattern of nodes). Work backwards over the permutations:
[start->a->b->c->end]
[start->a->c->b->end]
[start->b->a->c->end]
[start->b->c->a->end]
[start->c->a->b->end]
[start->c->b->a->end]
One of those will be shortest.
(where are the 'visited multiple times' nodes, if any? They're just hidden in the shortest-path initialization step. The shortest path between a and b may contain c or even the end point. You don't need to care)
Andrew Top has the right idea:
1) Djikstra's Algorithm
2) Some TSP heuristic.
I recommend the Lin-Kernighan heuristic: it's one of the best known for any NP Complete problem. The only other thing to remember is that after you expanded out the graph again after step 2, you may have loops in your expanded path, so you should go around short-circuiting those (look at the degree of vertices along your path).
I'm actually not sure how good this solution will be relative to the optimum. There are probably some pathological cases to do with short circuiting. After all, this problem looks a LOT like Steiner Tree: http://en.wikipedia.org/wiki/Steiner_tree and you definitely can't approximate Steiner Tree by just contracting your graph and running Kruskal's for example.
Considering the amount of nodes and edges is relatively finite, you can probably calculate every possible path and take the shortest one.
Generally this known as the travelling salesman problem, and has a non-deterministic polynomial runtime, no matter what the algorithm you use.
http://en.wikipedia.org/wiki/Traveling_salesman_problem
The question talks about must-pass in ANY order. I have been trying to search for a solution about the defined order of must-pass nodes. I found my answer but since no question on StackOverflow had a similar question I'm posting here to let maximum people benefit from it.
If the order or must-pass is defined then you could run dijkstra's algorithm multiple times. For instance let's assume you have to start from s pass through k1, k2 and k3 (in respective order) and stop at e. Then what you could do is run dijkstra's algorithm between each consecutive pair of nodes. The cost and path would be given by:
dijkstras(s, k1) + dijkstras(k1, k2) + dijkstras(k2, k3) + dijkstras(k3, 3)
How about using brute force on the dozen 'must visit' nodes. You can cover all the possible combinations of 12 nodes easily enough, and this leaves you with an optimal circuit you can follow to cover them.
Now your problem is simplified to one of finding optimal routes from the start node to the circuit, which you then follow around until you've covered them, and then find the route from that to the end.
Final path is composed of :
start -> path to circuit* -> circuit of must visit nodes -> path to end* -> end
You find the paths I marked with * like this
Do an A* search from the start node to every point on the circuit
for each of these do an A* search from the next and previous node on the circuit to the end (because you can follow the circuit round in either direction)
What you end up with is a lot of search paths, and you can choose the one with the lowest cost.
There's lots of room for optimization by caching the searches, but I think this will generate good solutions.
It doesn't go anywhere near looking for an optimal solution though, because that could involve leaving the must visit circuit within the search.
One thing that is not mentioned anywhere, is whether it is ok for the same vertex to be visited more than once in the path. Most of the answers here assume that it's ok to visit the same edge multiple times, but my take given the question (a path should not visit the same vertex more than once!) is that it is not ok to visit the same vertex twice.
So a brute force approach would still apply, but you'd have to remove vertices already used when you attempt to calculate each subset of the path.

Find solution minimum spanning tree (with conditions) when extending graph

I have a logic question, therefore chose from two explanations:
Mathematical:
I have a undirected weighted complete graph over 2-14 nodes. The nodes always come in pairs (startpoint to endpoint). For this I already have the minimum spanning tree, which considers that the pairs startpoint always comes before his endpoint. Now I want to add another pair of nodes.
Real life explanation:
I already have a optimal taxi route for 1-7 people. Each joins (startpoint) and leaves (endpoint) at different places. Now I want to find the optimal route when I add another person to the taxi. I have already the calculated subpaths from each point to each point in my database (therefore this is a weighted graph). All calculated paths are real value, not heuristics.
Now I try to find the most performant solution to solve this. My current idea:
Find the point nearest to the new startpoint. Add it a) before and b) after this point. Choose the faster one.
Find the point nearest to the new endpoint. Add it a) before and b) after this point. Choose the faster one.
Ignoring the case that the new endpoint comes before the new start point, this seams feasible.
I expect that the general direction of the taxi is one direction, this eliminates the following edge case.
Is there any case I'm missing in which this algorithm wouldn't calculate the optimal solution?
There are definitely many cases were this algorithm (which is a First Fit construction heuristic) won't find the optimal solution. Given a reasonable sized dataset, in my experience, I would guess to get improvements of 10-20% by simply taking that result and adding metaheuristics (or other optimization algo's).
Explanation:
If you have multiple taxis with a limited person capacity, it has an inherit bin packing problem, which is NP-complete (which is proven to be suboptimally solved by all known construction heuristics in P).
But even if you have just 1 taxi, it is similar to TSP: if you have the optimal solution for 10 locations and add 1 location, it can create a snowball effect in the optimal solution to make the optimal solution look completely different. (sorry, no visual image of this yet)
And if you need to any additional constraints on top of that later on, you need to be aware of these false assumptions.

Traversal of directed acyclic weighted graph with constraints

I have a directed acyclic weighted graph which I want to traverse.
The constraints for a valid solution route are:
The sum of the weights of all edges traversed in the route must be the highest possible in the graph, taking in mind the second constraint.
Exactly N vertices must have been visited in the chosen route (including the start and end vertex).
Typically the graph will have a high amount of vertices and edges, so trying all possibilities is not an option, and requires quite an efficient algorithm.
Looking for some pointers or a suitable algorithm for this problem. I know the first condition is easily fulfilled using Dijkstra's algorithm, but I am not sure how to incorporate the second condition, or even where to begin to look.
Please let me know if any additional information is needed.
I'm not sure if you are interested in any path of length N in the graph or just path between two specific vertices; I suspect the latter, but you did not mention that constraint in your question.
If the former, the solution should be a trivial Dijkstra-like algorithm where you sort all edges by their potential path value that starts at the edge weight and gets adjusted by already built adjecent paths. In each iteration, take the node with the best potential path value and add it to an adjecent path. Stop when you get a path of length N (or longer that you cut off at the sides). There are some other technical details esp. wrt. creating long paths, but I won't go into details as I suspect this is not what you are interested in. :-)
If you have fixed source and sink, I think there is no deep magic involved - just run a basic Dijkstra where a path will be associated with each vertex added to the queue, but do not insert vertices with path length >= N into the queue and do not insert sink into the queue unless its path length is N.

Find all possible paths from one vertex in a directed cyclic graph in Erlang

I would like to implement a function which finds all possible paths to all possible vertices from a source vertex V in a directed cyclic graph G.
The performance doesn't matter now, I just would like to understand the algorithm. I have read the definition of the Depth-first search algorithm, but I don't have full comprehension of what to do.
I don't have any completed piece of code to provide here, because I am not sure how to:
store the results (along with A->B->C-> we should also store A->B and A->B->C);
represent the graph (digraph? list of tuples?);
how many recursions to use (work with each adjacent vertex?).
How can I find all possible paths form one given source vertex in a directed cyclic graph in Erlang?
UPD: Based on the answers so far I have to redefine the graph definition: it is a non-acyclic graph. I know that if my recursive function hits a cycle it is an indefinite loop. To avoid that, I can just check if a current vertex is in the list of the resulting path - if yes, I stop traversing and return the path.
UPD2: Thanks for thought provoking comments! Yes, I need to find all simple paths that do not have loops from one source vertex to all the others.
In a graph like this:
with the source vertex A the algorithm should find the following paths:
A,B
A,B,C
A,B,C,D
A,D
A,D,C
A,D,C,B
The following code does the job, but it is unusable with graphs that have more that 20 vertices (I guess it is something wrong with recursion - takes too much memory, never ends):
dfs(Graph,Source) ->
?DBG("Started to traverse graph~n", []),
Neighbours = digraph:out_neighbours(Graph,Source),
?DBG("Entering recursion for source vertex ~w~n", [Source]),
dfs(Neighbours,[Source],[],Graph,Source),
ok.
dfs([],Paths,Result,_Graph,Source) ->
?DBG("There are no more neighbours left for vertex ~w~n", [Source]),
Result;
dfs([Neighbour|Other_neighbours],Paths,Result,Graph,Source) ->
?DBG("///The neighbour to check is ~w, other neighbours are: ~w~n",[Neighbour,Other_neighbours]),
?DBG("***Current result: ~w~n",[Result]),
New_result = relax_neighbours(Neighbour,Paths,Result,Graph,Source),
dfs(Other_neighbours,Paths,New_result,Graph,Source).
relax_neighbours(Neighbour,Paths,Result,Graph,Source) ->
case lists:member(Neighbour,Paths) of
false ->
?DBG("Found an unvisited neighbour ~w, path is: ~w~n",[Neighbour,Paths]),
Neighbours = digraph:out_neighbours(Graph,Neighbour),
?DBG("The neighbours of the unvisited vertex ~w are ~w, path is:
~w~n",[Neighbour,Neighbours,[Neighbour|Paths]]),
dfs(Neighbours,[Neighbour|Paths],Result,Graph,Source);
true ->
[Paths|Result]
end.
UPD3:
The problem is that the regular depth-first search algorithm will go one of the to paths first: (A,B,C,D) or (A,D,C,B) and will never go the second path.
In either case it will be the only path - for example, when the regular DFS backtracks from (A,B,C,D) it goes back up to A and checks if D (the second neighbour of A) is visited. And since the regular DFS maintains a global state for each vertex, D would have 'visited' state.
So, we have to introduce a recursion-dependent state - if we backtrack from (A,B,C,D) up to A, we should have (A,B,C,D) in the list of the results and we should have D marked as unvisited as at the very beginning of the algorithm.
I have tried to optimize the solution to tail-recursive one, but still the running time of the algorithm is unfeasible - it takes about 4 seconds to traverse a tiny graph of 16 vertices with 3 edges per vertex:
dfs(Graph,Source) ->
?DBG("Started to traverse graph~n", []),
Neighbours = digraph:out_neighbours(Graph,Source),
?DBG("Entering recursion for source vertex ~w~n", [Source]),
Result = ets:new(resulting_paths, [bag]),
Root = Source,
dfs(Neighbours,[Source],Result,Graph,Source,[],Root).
dfs([],Paths,Result,_Graph,Source,_,_) ->
?DBG("There are no more neighbours left for vertex ~w, paths are ~w, result is ~w~n", [Source,Paths,Result]),
Result;
dfs([Neighbour|Other_neighbours],Paths,Result,Graph,Source,Recursion_list,Root) ->
?DBG("~w *Current source is ~w~n",[Recursion_list,Source]),
?DBG("~w Checking neighbour _~w_ of _~w_, other neighbours are: ~w~n",[Recursion_list,Neighbour,Source,Other_neighbours]),
? DBG("~w Ready to check for visited: ~w~n",[Recursion_list,Neighbour]),
case lists:member(Neighbour,Paths) of
false ->
?DBG("~w Found an unvisited neighbour ~w, path is: ~w~n",[Recursion_list,Neighbour,Paths]),
New_paths = [Neighbour|Paths],
?DBG("~w Added neighbour to paths: ~w~n",[Recursion_list,New_paths]),
ets:insert(Result,{Root,Paths}),
Neighbours = digraph:out_neighbours(Graph,Neighbour),
?DBG("~w The neighbours of the unvisited vertex ~w are ~w, path is: ~w, recursion:~n",[Recursion_list,Neighbour,Neighbours,[Neighbour|Paths]]),
dfs(Neighbours,New_paths,Result,Graph,Neighbour,[[[]]|Recursion_list],Root);
true ->
?DBG("~w The neighbour ~w is: already visited, paths: ~w, backtracking to other neighbours:~n",[Recursion_list,Neighbour,Paths]),
ets:insert(Result,{Root,Paths})
end,
dfs(Other_neighbours,Paths,Result,Graph,Source,Recursion_list,Root).
Any ideas to run this in acceptable time?
Edit:
Okay I understand now, you want to find all simple paths from a vertex in a directed graph. So a depth-first search with backtracking would be suitable, as you have realised. The general idea is to go to a neighbour, then go to another one (not one which you've visited), and keep going until you hit a dead end. Then backtrack to the last vertex you were at and pick a different neighbour, etc.
You need to get the fiddly bits right, but it shouldn't be too hard. E.g. at every step you need to label the vertices 'explored' or 'unexplored' depending on whether you've already visited them before. The performance shouldn't be an issue, a properly implemented algorithm should take maybe O(n^2) time. So I don't know what you are doing wrong, perhaps you are visiting too many neighbours? E.g. maybe you are revisiting neighbours that you've already visited, and going round in loops or something.
I haven't really read your program, but the Wiki page on Depth-first Search has a short, simple pseudocode program which you can try to copy in your language. Store the graphs as Adjacency Lists to make it easier.
Edit:
Yes, sorry, you are right, the standard DFS search won't work as it stands, you need to adjust it slightly so that does revisit vertices it has visited before. So you are allowed to visit any vertices except the ones you have already stored in your current path.
This of course means my running time was completely wrong, the complexity of your algorithm will be through the roof. If the average complexity of your graph is d+1, then there will be approximately d*d*d*...*d = d^n possible paths.
So even if every vertex has only 3 neighbours, there's still quite a few paths when you get above 20 vertices.
There's no way around that really, because if you want your program to output all possible paths then indeed you will have to output all d^n of them.
I'm interested to know whether you need this for a specific task, or are just trying to program this out of interest. If the latter, you will just have to be happy with small, sparsely connected graphs.
I don't understand question. If I have graph G = (V, E) = ({A,B}, {(A,B),(B,A)}), there is infinite paths from A to B {[A,B], [A,B,A,B], [A,B,A,B,A,B], ...}. How I can find all possible paths to any vertex in cyclic graph?
Edit:
Did you even tried compute or guess growing of possible paths for some graphs? If you have fully connected graph you will get
2 - 1
3 - 4
4 - 15
5 - 64
6 - 325
7 - 1956
8 - 13699
9 - 109600
10 - 986409
11 - 9864100
12 - 108505111
13 - 1302061344
14 - 16926797485
15 - 236975164804
16 - 3554627472075
17 - 56874039553216
18 - 966858672404689
19 - 17403456103284420
20 - 330665665962403999
Are you sure you would like find all paths for all nodes? It means if you compute one milion paths in one second it would take 10750 years to compute all paths to all nodes in fully connected graph with 20 nodes. It is upper bound for your task so I think you don't would like do it. I think you want something else.
Not an improved algorithmic solution by any means, but you can often improve performance by spawning multiple worker threads, potentially here one for each first level node and then aggregating the results. This can often improve naive brute force algorithms relatively easily.
You can see an example here: Some Erlang Matrix Functions, in the maximise_assignment function (comments starting on line 191 as of today). Again, the underlying algorithm there is fairly naive and brute force, but the parallelisation speeds it up quite well for many forms of matrices.
I have used a similar approach in the past to find the number of Hamiltonian Paths in a graph.

Directed Acyclical Graph Traversal... help?

a little out of my depth here and need to phone a friend. I've got a directed acyclical graph I need to traverse and I'm stumbling into to graph theory for the first time. I've been reading a lot about it lately but unfortunately I don't have time to figure this out academically. Can someone give me a kick with some help as to how to process this tree?
Here are the rules:
there are n root nodes (I call them "sources")
there are n end nodes
source nodes carry a numeric value
downstream nodes (I call them "worker" nodes) preform various operations on the incoming values like Add, Mult, etc.
As you can see from the graph below, nodes a, b, and c need to be processed before d, e, or f.
What's the proper order to walk this tree?
I would look into linearization of DAGs which should be achievable through Topological sorts.
Linearization, from what I remember, basically sorts in an order which holds to the invariant that for all nodes (Node_X) that have an outdegree to any other given node NodeA, NodeX appears before NodeA.
This would mean that, from your example, nodes a,b, and d would be processed first. Node c second. Nodes e and f, last.
http://en.wikipedia.org/wiki/Topological_sorting
You need to process the nodes via a Topological sort. The sort is not necessarily unique so there might be more than one available order (not that this should matter anyway).
The linked wikipedia page should have concrete algorithms to help you.

Resources