Search a DAG with boolean constraints on reachability - graph

The queries are something like
Return all vertices such that
(reachable from (A and (B or C))) and (not reachable from (D and E)).
The query can be formed with any kind of boolean constraints on reachability.
Are there efficient methods to do this query fast? Other than actually find the set of all item reachable vertex of concern, then do unions, intersections and set minus on those sets?

I think the only way to make the search faster than the default you suggest (calculate the sets reachable from each of the vertices in the predicate, then do set arithmetic), is to so the boolean math during the search, and use it to abort branches.
For example, suppose we're trying to calculate the set reachable from ((A or B) but not C). When searching the nodes, if we're on a node marked (B), and two of the "next" node are already marked (A) and (C) respectively. On these three nodes, the predicate reduces to:
(B): P = NOT(C)
(A): P = NOT(C)
(C): P = FALSE
So there's no reason to continue the search to either of these nodes.
(Naturally, if we're going to calculate more than one set on the same DAG, it behooves us to retain these marks.)

Related

Using a recursive / fixed point / iterative structure in a Neo4j Cypher query

My Neo4j database contains relationships that may have a special property:
(a) -[{sustains:true}]-> (b)
This means that a sustains b: when the last node that sustains b is deleted, b itself should be deleted. I'm trying to write a Cypher statement that deletes a given node PLUS all nodes that now become unsustained as a result. This may set off a chain reaction, and I don't know how to encode this in Cypher. Is Cypher expressive enough?
In any other language, I could come up with a number of ways to implement this. A recursive algorithm for this would be something like:
delete(a) :=
MATCH (a) -[{sustains:true}]-> (b)
REMOVE a
WITH b
MATCH (aa) -[{sustains:true}]-> (b)
WHERE count(aa) = 0
delete(b)
Another way to describe the additional set of nodes to delete would be with a fixed point function:
setOfNodesToDelete(Set) :=
RETURN Set' ⊆ Set such that for all n ∈ Set'
there is no (m) -[{sustains:true}]-> (n) with m ∉ Set
We would start with the set of all z such that (a) -[{sustains:true}*1..]-> (z), then delete a, run setOfNodesToDelete on the set until it doesn't change anymore, then delete the nodes specified by the set. This requires an unspecified number of iterations.
Any way to accomplish my goal in Cypher?

neo4j logic gate simulation, how to?

I would like to create a bunch of "and" and "or" and "not" gates in a directed graph.
And then traverse from the inputs to see what they results are.
I assume there is a ready made traversal that would do that but I don't see it.
I don't know what the name of such a traversal would be.
Certainly breadth first will not do the job.
I need to get ALL the leaves, and go up toward the root.
In other words
A = (B & (C & Z))
I need to resolve C # Z first.
I need to put this type of thing in a graph and to traverse up.
You would probably create each of the operations as a node which has N incoming and one outgoing connection. You can of course also have more complex operations encapsuled as a node.
With Neo4j 2.0 I would use Labels for the 3 types of operations.
I assume your leaves would then be boolean values? Actually I think you have many roots and just a single leaf (the result expression)
(input1)-->(:AND {id:1})-->(:OR {id:2})-->(output)
(input2)-->(:AND {id:1})
(input3)------------------>(:OR {id:2})
Then you can use CASE when for decisions on the label type and use the collection predicates (ALL, ANY) for the computation
See: http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html
Predicates: http://docs.neo4j.org/chunked/milestone/query-function.html
Labels: http://docs.neo4j.org/chunked/milestone/query-match.html#match-get-all-nodes-with-a-label

Finding all paths in directed graph with specific cost

Suppose we have the directed, weighted graph. Our task is to find all paths beetween two vertices (source and destination) which cost is less or equal =< N. We visit every vertex only once. In later version I'd like to add a condition that the source can be the destination (we just make a loop).
I think it can be done with modified Dijkstra's algorithm, but I have no idea how implement such thing. Thanks for any help.
You could use recursive backtracking to solve this problem. Terminate your recursion when:
You get to the destination
You visit a node that was already visited
Your path length exceeds N.
Pseudocode:
list curpath := {}
int dest, maxlen
def findPaths (curNode, dist):
if curNode = dest:
print curpath
return
if curNode is marked:
return
if dist > maxlen:
return
add curNode to curpath
mark curNode
for nextNode, edgeDist adjacent to curNode:
findPaths(nextNode, dist + edgeDist)
remove last element of curpath
You want to find all the paths from point A to point B in a directed graph, such as the distance from A to B is smaller than N, and allowing the possibility that A = B.
Dijkstra's algorithm is taylored to find the smallest path from one point to another in a graph, and drops many all the others along the way, so to speak. Because of this, it cannot be used to find all the paths, if we include paths which overlaps.
You can achieve your goal by doing a breadth first search in the graph, keeping each branch of the covering tree in its on stack (you will get an enormous amount of them if the nodes are very well connected), and stop at depth N. All the branches which have reached B are kept aside. Once depth N has been covered, you drop all the paths which didn't reach B. The remaining ones, as well as the one kept aside put together becomes your solutions.
You may choose to add the restriction of not having cycles in your paths, in which case you would have check at each step of the search if the newly reached node is already in the path covered so far, and prune that path if it is the case.
Here is some pseudo code:
function find_paths(graph G, node A):
list<path> L, L';
L := empty list;
push path(A) in L;
for i = 2 to N begin
L' := empty list;
for each path P in L begin
if last node of P = B then push P in L'
else
for each successor S of last node in P begin
if S not in P then
path P' := P;
push S in P';
push P' in L';
endif
end
endif
end
L := L';
end
for each path P in L begin
if last node of P != B
then remove P from L
endif
end
return L;
I think a possible improvement (depending on the size of the problem and the maximum cost N) to the recursive backtracking algorithm suggested by jma127 would be to pre-compute the minimum distance of each node from the destination (shortest path tree), then append the following to the conditions tested to terminate your recursion:
You get to the a node whose minimum distance from the destination is greater than the maximum cost N minus the distance travelled to reach the current node.
If one needs to run the algorithm several times for different sources and destinations, one could run, e.g., Johnson's algorithm at the beginning to create a matrix of the shortest paths between all pairs of nodes.

Find all possible paths from one vertex in a directed cyclic graph in Erlang

I would like to implement a function which finds all possible paths to all possible vertices from a source vertex V in a directed cyclic graph G.
The performance doesn't matter now, I just would like to understand the algorithm. I have read the definition of the Depth-first search algorithm, but I don't have full comprehension of what to do.
I don't have any completed piece of code to provide here, because I am not sure how to:
store the results (along with A->B->C-> we should also store A->B and A->B->C);
represent the graph (digraph? list of tuples?);
how many recursions to use (work with each adjacent vertex?).
How can I find all possible paths form one given source vertex in a directed cyclic graph in Erlang?
UPD: Based on the answers so far I have to redefine the graph definition: it is a non-acyclic graph. I know that if my recursive function hits a cycle it is an indefinite loop. To avoid that, I can just check if a current vertex is in the list of the resulting path - if yes, I stop traversing and return the path.
UPD2: Thanks for thought provoking comments! Yes, I need to find all simple paths that do not have loops from one source vertex to all the others.
In a graph like this:
with the source vertex A the algorithm should find the following paths:
A,B
A,B,C
A,B,C,D
A,D
A,D,C
A,D,C,B
The following code does the job, but it is unusable with graphs that have more that 20 vertices (I guess it is something wrong with recursion - takes too much memory, never ends):
dfs(Graph,Source) ->
?DBG("Started to traverse graph~n", []),
Neighbours = digraph:out_neighbours(Graph,Source),
?DBG("Entering recursion for source vertex ~w~n", [Source]),
dfs(Neighbours,[Source],[],Graph,Source),
ok.
dfs([],Paths,Result,_Graph,Source) ->
?DBG("There are no more neighbours left for vertex ~w~n", [Source]),
Result;
dfs([Neighbour|Other_neighbours],Paths,Result,Graph,Source) ->
?DBG("///The neighbour to check is ~w, other neighbours are: ~w~n",[Neighbour,Other_neighbours]),
?DBG("***Current result: ~w~n",[Result]),
New_result = relax_neighbours(Neighbour,Paths,Result,Graph,Source),
dfs(Other_neighbours,Paths,New_result,Graph,Source).
relax_neighbours(Neighbour,Paths,Result,Graph,Source) ->
case lists:member(Neighbour,Paths) of
false ->
?DBG("Found an unvisited neighbour ~w, path is: ~w~n",[Neighbour,Paths]),
Neighbours = digraph:out_neighbours(Graph,Neighbour),
?DBG("The neighbours of the unvisited vertex ~w are ~w, path is:
~w~n",[Neighbour,Neighbours,[Neighbour|Paths]]),
dfs(Neighbours,[Neighbour|Paths],Result,Graph,Source);
true ->
[Paths|Result]
end.
UPD3:
The problem is that the regular depth-first search algorithm will go one of the to paths first: (A,B,C,D) or (A,D,C,B) and will never go the second path.
In either case it will be the only path - for example, when the regular DFS backtracks from (A,B,C,D) it goes back up to A and checks if D (the second neighbour of A) is visited. And since the regular DFS maintains a global state for each vertex, D would have 'visited' state.
So, we have to introduce a recursion-dependent state - if we backtrack from (A,B,C,D) up to A, we should have (A,B,C,D) in the list of the results and we should have D marked as unvisited as at the very beginning of the algorithm.
I have tried to optimize the solution to tail-recursive one, but still the running time of the algorithm is unfeasible - it takes about 4 seconds to traverse a tiny graph of 16 vertices with 3 edges per vertex:
dfs(Graph,Source) ->
?DBG("Started to traverse graph~n", []),
Neighbours = digraph:out_neighbours(Graph,Source),
?DBG("Entering recursion for source vertex ~w~n", [Source]),
Result = ets:new(resulting_paths, [bag]),
Root = Source,
dfs(Neighbours,[Source],Result,Graph,Source,[],Root).
dfs([],Paths,Result,_Graph,Source,_,_) ->
?DBG("There are no more neighbours left for vertex ~w, paths are ~w, result is ~w~n", [Source,Paths,Result]),
Result;
dfs([Neighbour|Other_neighbours],Paths,Result,Graph,Source,Recursion_list,Root) ->
?DBG("~w *Current source is ~w~n",[Recursion_list,Source]),
?DBG("~w Checking neighbour _~w_ of _~w_, other neighbours are: ~w~n",[Recursion_list,Neighbour,Source,Other_neighbours]),
? DBG("~w Ready to check for visited: ~w~n",[Recursion_list,Neighbour]),
case lists:member(Neighbour,Paths) of
false ->
?DBG("~w Found an unvisited neighbour ~w, path is: ~w~n",[Recursion_list,Neighbour,Paths]),
New_paths = [Neighbour|Paths],
?DBG("~w Added neighbour to paths: ~w~n",[Recursion_list,New_paths]),
ets:insert(Result,{Root,Paths}),
Neighbours = digraph:out_neighbours(Graph,Neighbour),
?DBG("~w The neighbours of the unvisited vertex ~w are ~w, path is: ~w, recursion:~n",[Recursion_list,Neighbour,Neighbours,[Neighbour|Paths]]),
dfs(Neighbours,New_paths,Result,Graph,Neighbour,[[[]]|Recursion_list],Root);
true ->
?DBG("~w The neighbour ~w is: already visited, paths: ~w, backtracking to other neighbours:~n",[Recursion_list,Neighbour,Paths]),
ets:insert(Result,{Root,Paths})
end,
dfs(Other_neighbours,Paths,Result,Graph,Source,Recursion_list,Root).
Any ideas to run this in acceptable time?
Edit:
Okay I understand now, you want to find all simple paths from a vertex in a directed graph. So a depth-first search with backtracking would be suitable, as you have realised. The general idea is to go to a neighbour, then go to another one (not one which you've visited), and keep going until you hit a dead end. Then backtrack to the last vertex you were at and pick a different neighbour, etc.
You need to get the fiddly bits right, but it shouldn't be too hard. E.g. at every step you need to label the vertices 'explored' or 'unexplored' depending on whether you've already visited them before. The performance shouldn't be an issue, a properly implemented algorithm should take maybe O(n^2) time. So I don't know what you are doing wrong, perhaps you are visiting too many neighbours? E.g. maybe you are revisiting neighbours that you've already visited, and going round in loops or something.
I haven't really read your program, but the Wiki page on Depth-first Search has a short, simple pseudocode program which you can try to copy in your language. Store the graphs as Adjacency Lists to make it easier.
Edit:
Yes, sorry, you are right, the standard DFS search won't work as it stands, you need to adjust it slightly so that does revisit vertices it has visited before. So you are allowed to visit any vertices except the ones you have already stored in your current path.
This of course means my running time was completely wrong, the complexity of your algorithm will be through the roof. If the average complexity of your graph is d+1, then there will be approximately d*d*d*...*d = d^n possible paths.
So even if every vertex has only 3 neighbours, there's still quite a few paths when you get above 20 vertices.
There's no way around that really, because if you want your program to output all possible paths then indeed you will have to output all d^n of them.
I'm interested to know whether you need this for a specific task, or are just trying to program this out of interest. If the latter, you will just have to be happy with small, sparsely connected graphs.
I don't understand question. If I have graph G = (V, E) = ({A,B}, {(A,B),(B,A)}), there is infinite paths from A to B {[A,B], [A,B,A,B], [A,B,A,B,A,B], ...}. How I can find all possible paths to any vertex in cyclic graph?
Edit:
Did you even tried compute or guess growing of possible paths for some graphs? If you have fully connected graph you will get
2 - 1
3 - 4
4 - 15
5 - 64
6 - 325
7 - 1956
8 - 13699
9 - 109600
10 - 986409
11 - 9864100
12 - 108505111
13 - 1302061344
14 - 16926797485
15 - 236975164804
16 - 3554627472075
17 - 56874039553216
18 - 966858672404689
19 - 17403456103284420
20 - 330665665962403999
Are you sure you would like find all paths for all nodes? It means if you compute one milion paths in one second it would take 10750 years to compute all paths to all nodes in fully connected graph with 20 nodes. It is upper bound for your task so I think you don't would like do it. I think you want something else.
Not an improved algorithmic solution by any means, but you can often improve performance by spawning multiple worker threads, potentially here one for each first level node and then aggregating the results. This can often improve naive brute force algorithms relatively easily.
You can see an example here: Some Erlang Matrix Functions, in the maximise_assignment function (comments starting on line 191 as of today). Again, the underlying algorithm there is fairly naive and brute force, but the parallelisation speeds it up quite well for many forms of matrices.
I have used a similar approach in the past to find the number of Hamiltonian Paths in a graph.

Directed Acyclical Graph Traversal... help?

a little out of my depth here and need to phone a friend. I've got a directed acyclical graph I need to traverse and I'm stumbling into to graph theory for the first time. I've been reading a lot about it lately but unfortunately I don't have time to figure this out academically. Can someone give me a kick with some help as to how to process this tree?
Here are the rules:
there are n root nodes (I call them "sources")
there are n end nodes
source nodes carry a numeric value
downstream nodes (I call them "worker" nodes) preform various operations on the incoming values like Add, Mult, etc.
As you can see from the graph below, nodes a, b, and c need to be processed before d, e, or f.
What's the proper order to walk this tree?
I would look into linearization of DAGs which should be achievable through Topological sorts.
Linearization, from what I remember, basically sorts in an order which holds to the invariant that for all nodes (Node_X) that have an outdegree to any other given node NodeA, NodeX appears before NodeA.
This would mean that, from your example, nodes a,b, and d would be processed first. Node c second. Nodes e and f, last.
http://en.wikipedia.org/wiki/Topological_sorting
You need to process the nodes via a Topological sort. The sort is not necessarily unique so there might be more than one available order (not that this should matter anyway).
The linked wikipedia page should have concrete algorithms to help you.

Resources