MATCH on nodes in a two-dimensional COLLECTION in Neo4j / Cypher - collections

In order to limit traversal through a Neo4j graph db, I am collecting scores from a subgraph. Imagine this (simplified)
MATCH (a)-[:r1 {prop1:123}]->()-[]->()-[]->()-[]->(b {prop2:456})
WITH b,b.prop2*r1.prop1 as score ORDER BY score DESC LIMIT 10
WITH COLLECT ([b,score]) AS bscore
so far, so good. To avoid the long traversal, I want to limit the next match to the nodes b stored in bscore and sum the scores in bscore[1], but I couldn't find the correct syntax. Even wondering whether it's possible in cypher. Conceptually I'd like to do this:
MATCH bscore[0]-[:r2]->(c)
RETURN c, SUM(bscore[1])
Any hints/ pointers highly appreciated.

Could you do something like this perhaps?
MATCH (a)-[:r1 {prop1:123}]->()-[]->()-[]->()-[]->(b {prop2:456})
WITH b,b.prop2*r1.prop1 as score ORDER BY score DESC LIMIT 10
MATCH b-[:r2]->(c)
RETURN c, sum(score)

Related

Repeat in gremlin

Two queries related to gremlin are as follows:
Want to stop the traversal when a condition is satisfied during repeated condition check.
g.V().has('label_','A')).emit().repeat(inE().outV()).until(has('stop',1)).project('depth','values').by(valueMap('label_','stop'))
I want the query to stop returning further values when the stop is equal to 1 for the node encountered during the repeat statement. But the query doesn't stop and return all the records.
Output required:
=>{label_='A',stop=0}
=>{label_='B',stop=0}
=>{label_='C',stop=1}
Query to return traversal values in the following format considering if edge exists between them. Considering the graph as A->E1->B->E2->C. The output must be as follows
=> A,E1,B
=> B,E2,C
A, B, C, E1, E2 represents properties respectively where is the starting node
For the first part, it seems you traversing on the in edges and not on the out is this on purpose? if so replace the out() in the repeat to in
g.V().has(label, 'A').emit().
repeat(out()).until(has('stop', 1)).
project('label', 'stop').
by(label).
by(values('stop'))
example: https://gremlify.com/ma2xkkszkzr/1
for the second part, I'm still not sure what you meant if you just want to get all edges with their out and in you can use elementMap:
g.E().elementMap()
example: https://gremlify.com/ma2xkkszkzr/4
and if not supported you can maybe do something like this:
g.E().local(union(
outV(),
identity(),
inV()
).label().fold())
example: https://gremlify.com/ma2xkkszkzr/2

Creating edges in neo4j based on query results

I'm modelling a search term transition graph in a e-commerce software as a graph of nodes (terms) and edges (transitions). If a user types e.g. iphone in the search bar and then refines the query to iphone 6s this will be modeled as two nodes and a edge between those nodes. The same transition of terms of the different users will result in several edges between the nodes.
I'd now like to create an edge with a cumulated weight of 4 to represent that 4 users did this specific transition. How can I combine the results of a count(*) query with a create query to produce an edge with a property weight = 4
My count(*) query is:
MATCH (n:Term)-[r]->(n1:Term)
RETURN type(r), count(*)
I'd expect the combined query to look like this, but this kind of sql like composition seems not to be possible in cypher:
MATCH (n:Term), (n1:Term)
WHERE (n)-[tr:TRANSITION]->(n1)
CREATE (n)-[actr:ACC_TRANSITION {count:
MATCH (n:Term)-[r]->(n1:Term) RETURN
count(*)}
]->(n1)
RETURN n, n1
A non generic query to produce the accumulated transition that works is:
MATCH (n:Term), (n1:Term)
WHERE n.term = 'iphone' AND n1.term ='iphone 6s'
CREATE (n)-[actr:ACC_TRANSITION {count: 4}]->(n1)
RETURN n, n1
Any other ideas on how to approach and model this problem?
Use WITH like this:
MATCH (n:Term)-[r]->(n1:Term)
WITH n as n, count(*) as rel_count, n1
CREATE (n)-[:ACC_TRANSITION {count:rel_count}]->(n1)
RETURN n, n1
If you match the nodes and relationship first and then use set, you will not produce duplicate nodes or relationships
Match (n:Term)-[r]->(n1.Term)
with n as nn,count(r) as rel_count,n1 as nn1
set r.ACC_TRANSITION=rel_count
return nn,nn1,r
The create function will create duplicates.

Traverse Graph With Directed Cycles using Relationship Properties as Filters

I have a Neo4j graph with directed cycles. I have had no issue finding all descendants of A assuming I don't care about loops using this Cypher query:
match (n:TEST{name:"A"})-[r:MOVEMENT*]->(m:TEST)
return n,m,last(r).movement_time
The relationships between my nodes have a timestamp property on them, movement_time. I've simulated that in my test data below using numbers that I've imported as floats. I would like to traverse the graph using the timestamp as a constraint. Only follow relationships that have a greater movement_time than the movement_time of the relationship that brought us to this node.
Here is the CSV sample data:
from,to,movement_time
A,B,0
B,C,1
B,D,1
B,E,1
B,X,2
E,A,3
Z,B,5
C,X,6
X,A,7
D,A,7
Here is what the graph looks like:
I would like to calculate the descendants of every node in the graph and include the timestamp from the last relationship using Cypher; so I'd like my output data to look something like this:
Node:[{Descendant,Movement Time},...]
A:[{B,0},{C,1},{D,1},{E,1},{X,2}]
B:[{C,1},{D,1},{E,1},{X,2},{A,7}]
C:[{X,6},{A,7}]
D:[{A,7}]
E:[{A,3}]
X:[{A,7}]
Z:[{B,5}]
This non-Neo4J implementation looks similar to what I'm trying to do: Cycle enumeration of a directed graph with multi edges
This one is not 100% what you want, but very close:
MATCH (n:TEST)-[r:MOVEMENT*]->(m:TEST)
WITH n, m, r, [x IN range(0,length(r)-2) |
(r[x+1]).movement_time - (r[x]).movement_time] AS deltas
WHERE ALL (x IN deltas WHERE x>0)
RETURN n, collect(m), collect(last(r).movement_time)
ORDER BY n.name
We basically find all the paths between any of your nodes (beware cartesian products get very expensive on non-trivial datasets). In the WITH we're building a collection delta's that holds the difference between two subsequent movement_time properties.
The WHERE applies an ALL predicate to filter out those having any non-positive value - aka we guarantee increasing values of movement_time along the path.
The RETURN then just assembles the results - but not as a map, instead one collection for the reachable nodes and the last value of movement_time.
The current issue is that we have duplicates since e.g. there are multiple paths from B to A.
As a general notice: this problem is much more elegantly and more performant solvable by using Java traversal API (http://neo4j.com/docs/stable/tutorial-traversal.html). Here you would have a PathExpander that skips paths with decreasing movement_time early instead of collection all and filter out (as Cypher does).

How to write this in Cypher

I have around 644 nodes in my graph database(Neo4j) . I need to compute distances between all these 644 nodes and display it visually in the GUI. I want to pre-compute and store the distances between every two pairs of nodes in the database itself rather than retrieving the nodes on to the server and then finding the distances between them on the fly and then showing on the GUI.
I want to understand how to write such a query in CYPHER. Please let me know.
I think this can work:
// half cross product
match (a),(b)
where id(a) < id(b)
match p=shortestPath((a)-[*]-(b))
with a,b,length(p) as l
create (a)-[:DISTANCE {distance:l}]->(b)
Set 4950 properties, created 4950 relationships, returned 0 rows in 4328 ms
But the browser viz will blow up with this, just that you know.
Regarding your distance measure (it won't be that fast but should work):
MATCH (a:User)-[:READ]->(book)<-[:READ]-(b:User)
WITH a,b,count(*) as common,
length(a-[:READ]->()) as a_read,
length(b-[:READ]->()) as b_read
CREATE UNIQUE (a)-[:DISTANCE {distance:common/(a_read+b_read-common)}]-(b)

Neo4J - Extracting graph as a list based on relationship strength

I have a typical friend of friend graph database i.e. a social network database. The requirement is to extract all the nodes as a list in such a way that the least connected nodes appear together in the list and the most connected nodes are placed further apart in the list.
Basically its asking a graph to be represented as a list and I'm not sure if we can really do that. For e.g. if A is related to B with strength 10, B is related to C with strength 80, A to C is 20
then how to place this in a list ?
A, B, C - no because then A is distant from C relatively more than B which is not the case
A, C, B - yes because A and B are less related that A,C and C,B.
With 3 nodes its very simple but with lot of nodes - is it possible to put them in a list based on relationship strength ?
Ok, I think this is maybe what you want. An inverse of the shortestPath traversal with weights. If not, tell me how the output should be.
http://console.neo4j.org/r/n8npue
MATCH p=(n)-[*]-(m) // search all paths
WHERE n <> m
AND ALL (x IN nodes(p) WHERE length([x2 IN nodes(p) WHERE x2=x])=1) // this filters simple paths
RETURN [n IN nodes(p)| n.name] AS names, // get the names out
reduce(acc=0, r IN relationships(p)| acc + r.Strength) AS totalStrength // calculate total strength produced by following this path
ORDER BY length(p) DESC , totalStrength ASC // get the max length (hopefully a full traversal), and the minimum strength
LIMIT 1
This is not going to be efficient for a large graph, but I think it's definitely doable--probably needs using the traversal/graphalgo API shortest path functionality if you need speed on a large graph.

Resources