Using a recursive / fixed point / iterative structure in a Neo4j Cypher query - recursion

My Neo4j database contains relationships that may have a special property:
(a) -[{sustains:true}]-> (b)
This means that a sustains b: when the last node that sustains b is deleted, b itself should be deleted. I'm trying to write a Cypher statement that deletes a given node PLUS all nodes that now become unsustained as a result. This may set off a chain reaction, and I don't know how to encode this in Cypher. Is Cypher expressive enough?
In any other language, I could come up with a number of ways to implement this. A recursive algorithm for this would be something like:
delete(a) :=
MATCH (a) -[{sustains:true}]-> (b)
REMOVE a
WITH b
MATCH (aa) -[{sustains:true}]-> (b)
WHERE count(aa) = 0
delete(b)
Another way to describe the additional set of nodes to delete would be with a fixed point function:
setOfNodesToDelete(Set) :=
RETURN Set' ⊆ Set such that for all n ∈ Set'
there is no (m) -[{sustains:true}]-> (n) with m ∉ Set
We would start with the set of all z such that (a) -[{sustains:true}*1..]-> (z), then delete a, run setOfNodesToDelete on the set until it doesn't change anymore, then delete the nodes specified by the set. This requires an unspecified number of iterations.
Any way to accomplish my goal in Cypher?

Related

Zero or more length of path in Cypher

For example I have a path:
1-[:A]->2-[:B]->3
And we can use the * operator to define if a particular edge can be repeated. I would like to use the * operator on the entire path, or both edges combined. I would like to follow: (A AND B) zero or more times.
Example:
1-[:A]->2-[:B]->3-[:A]->4-[:B]->5...
I am not sure how to apply the * operator for the entire path in Cypher. My intent is to express a pattern that allows a specific path to be repeated 0 or more times.
This is something variable-length patterns cannot do in Cypher. However, because of this, we added repeating sequences functionality to path expander procs in APOC Procedures.
As an example:
MATCH (n)
WHERE id(n) = 123
CALL apoc.path.expandConfig(n, {relationshipFilter:'A>, B>'}) YIELD path
RETURN path
This expands from a start node (n) expanding out only a repeating sequence of outgoing :A and :B relationships. No minLevel or maxLevel properties were provided, so this has a minimum of 0 length and no bounds on max length.

prolog in math - searching the level of node in prolog

Assuming here is a binary search tree, and given the rule that above(X,Y) - X is directly above Y. Also I created the rule root(X) - X has no parent.
Then, I was trying to figure out what the depth of node in this tree.
Assume the root node of tree is "r" So I got fact level(r,0). In order to implement the rule level(N,D) :-, what I was thinking is it should be have a recursion here.
Thus, I tried
level(N,D): \+ root(N), above(X,N), D is D+1, level(X,D).
So if N is not a root, there has a node X above N and level D plus one, then recursion. But when I tested this, it just works for the root condition. When I created more facts, such as node "s" is leftchild of node "r", My query is level(s,D). It returns me "no". I traced the query, it shows me
1 1 Call: level(s,_16) ?
1 1 Fail: level(s,_16) ?
I just confusing why it fails when I call level(s,D)?
There are some problems with your query:
In Prolog you cannot write something like D is D+1, because a variable can only be assigned one value;
at the moment you call D is D+1, D is not yet instantiated, so it will probably cause an error; and
You never state (at least not in the visible code) that the level/2 of the root is 0.
A solution is to first state that the level of any root is 0:
level(N,0) :-
root(N).
Now we have to define the inductive case: first we indeed look for a parent using the above/2 predicate. Performing a check that N is no root/1 is not necessary strictly speaking, because it would conflict with the fact that there is an above/2. Next we determine the level of that parent LP and finally we calculate the level of our node by stating that L is LP+1 where L is the level of N and LP the level op P:
level(N,L) :-
above(P,N),
level(P,LP),
L is LP+1.
Or putting it all together:
level(N,0) :-
root(N).
level(N,L) :-
above(P,N),
level(P,LP),
L is LP+1.
Since you did not provide a sample tree, I have no means to test whether this predicate behaves as you expect it to.
About root/1
Note that by writing root/1, you introduce data duplication: you can simply write:
root(R) :-
\+ above(_,R).

Traverse Graph With Directed Cycles using Relationship Properties as Filters

I have a Neo4j graph with directed cycles. I have had no issue finding all descendants of A assuming I don't care about loops using this Cypher query:
match (n:TEST{name:"A"})-[r:MOVEMENT*]->(m:TEST)
return n,m,last(r).movement_time
The relationships between my nodes have a timestamp property on them, movement_time. I've simulated that in my test data below using numbers that I've imported as floats. I would like to traverse the graph using the timestamp as a constraint. Only follow relationships that have a greater movement_time than the movement_time of the relationship that brought us to this node.
Here is the CSV sample data:
from,to,movement_time
A,B,0
B,C,1
B,D,1
B,E,1
B,X,2
E,A,3
Z,B,5
C,X,6
X,A,7
D,A,7
Here is what the graph looks like:
I would like to calculate the descendants of every node in the graph and include the timestamp from the last relationship using Cypher; so I'd like my output data to look something like this:
Node:[{Descendant,Movement Time},...]
A:[{B,0},{C,1},{D,1},{E,1},{X,2}]
B:[{C,1},{D,1},{E,1},{X,2},{A,7}]
C:[{X,6},{A,7}]
D:[{A,7}]
E:[{A,3}]
X:[{A,7}]
Z:[{B,5}]
This non-Neo4J implementation looks similar to what I'm trying to do: Cycle enumeration of a directed graph with multi edges
This one is not 100% what you want, but very close:
MATCH (n:TEST)-[r:MOVEMENT*]->(m:TEST)
WITH n, m, r, [x IN range(0,length(r)-2) |
(r[x+1]).movement_time - (r[x]).movement_time] AS deltas
WHERE ALL (x IN deltas WHERE x>0)
RETURN n, collect(m), collect(last(r).movement_time)
ORDER BY n.name
We basically find all the paths between any of your nodes (beware cartesian products get very expensive on non-trivial datasets). In the WITH we're building a collection delta's that holds the difference between two subsequent movement_time properties.
The WHERE applies an ALL predicate to filter out those having any non-positive value - aka we guarantee increasing values of movement_time along the path.
The RETURN then just assembles the results - but not as a map, instead one collection for the reachable nodes and the last value of movement_time.
The current issue is that we have duplicates since e.g. there are multiple paths from B to A.
As a general notice: this problem is much more elegantly and more performant solvable by using Java traversal API (http://neo4j.com/docs/stable/tutorial-traversal.html). Here you would have a PathExpander that skips paths with decreasing movement_time early instead of collection all and filter out (as Cypher does).

Neo4J - Extracting graph as a list based on relationship strength

I have a typical friend of friend graph database i.e. a social network database. The requirement is to extract all the nodes as a list in such a way that the least connected nodes appear together in the list and the most connected nodes are placed further apart in the list.
Basically its asking a graph to be represented as a list and I'm not sure if we can really do that. For e.g. if A is related to B with strength 10, B is related to C with strength 80, A to C is 20
then how to place this in a list ?
A, B, C - no because then A is distant from C relatively more than B which is not the case
A, C, B - yes because A and B are less related that A,C and C,B.
With 3 nodes its very simple but with lot of nodes - is it possible to put them in a list based on relationship strength ?
Ok, I think this is maybe what you want. An inverse of the shortestPath traversal with weights. If not, tell me how the output should be.
http://console.neo4j.org/r/n8npue
MATCH p=(n)-[*]-(m) // search all paths
WHERE n <> m
AND ALL (x IN nodes(p) WHERE length([x2 IN nodes(p) WHERE x2=x])=1) // this filters simple paths
RETURN [n IN nodes(p)| n.name] AS names, // get the names out
reduce(acc=0, r IN relationships(p)| acc + r.Strength) AS totalStrength // calculate total strength produced by following this path
ORDER BY length(p) DESC , totalStrength ASC // get the max length (hopefully a full traversal), and the minimum strength
LIMIT 1
This is not going to be efficient for a large graph, but I think it's definitely doable--probably needs using the traversal/graphalgo API shortest path functionality if you need speed on a large graph.

Search a DAG with boolean constraints on reachability

The queries are something like
Return all vertices such that
(reachable from (A and (B or C))) and (not reachable from (D and E)).
The query can be formed with any kind of boolean constraints on reachability.
Are there efficient methods to do this query fast? Other than actually find the set of all item reachable vertex of concern, then do unions, intersections and set minus on those sets?
I think the only way to make the search faster than the default you suggest (calculate the sets reachable from each of the vertices in the predicate, then do set arithmetic), is to so the boolean math during the search, and use it to abort branches.
For example, suppose we're trying to calculate the set reachable from ((A or B) but not C). When searching the nodes, if we're on a node marked (B), and two of the "next" node are already marked (A) and (C) respectively. On these three nodes, the predicate reduces to:
(B): P = NOT(C)
(A): P = NOT(C)
(C): P = FALSE
So there's no reason to continue the search to either of these nodes.
(Naturally, if we're going to calculate more than one set on the same DAG, it behooves us to retain these marks.)

Resources