Cypher query to stop graph traversal when reaching a hub - graph

I have a graph database that contains highly connected nodes (hubs). These nodes can have more than 40000 relationships.
When I want to traverse the graph starting from a node, I would like to stop traversal at these hubs not to retrieve too many nodes.
I think I should use aggregation function and conditional stop based on the count of relationship for each node, but I didn't manage to write the good cypher query.
I tried:
MATCH p=(n)-[r*..10]-(m)
WHERE n.name='MyNodeName' AND ALL (x IN nodes(p) WHERE count(x) < 10)
RETURN p;
and also:
MATCH (n)-[r*..10]-(m) WHERE n.name='MyNodeName' AND COUNT(r) < 10 RETURN p;

I think you can't stop the query at some node if you MATCH a path of length 10. You could count the number of relationships for all nodes in the path, but only after the path is matched.
You could solve this by adding an additional label to the hub nodes and filter that in your query:
MATCH (a:YourLabel)
OPTIONAL MATCH (a)-[r]-()
WITH a, count(r) as count_rels
CASE
WHEN count_rels > 20000
THEN SET a :Hub
END
Your query:
MATCH p=(n)-[r*..10]-(m)
WHERE n.name='MyNodeName' AND NONE (x IN nodes(p) WHERE x:Hub)
RETURN p
I used this approach in a similar case.

Since Neo4j 2.2 there is a cool trick to use the internal getDegree() function to determine if a node is a dense node.
You also forgot the label (and probably index) for n
For your case that would mean:
MATCH p=(n:Label)-[r*..10]-(m)
WHERE n.name='MyNodeName' AND size((m)--()) < 10
RETURN p;

Related

ArangoDB: Find last node in path

I'm pretty new to Arangodb and I'm trying to get the last/leaf node (I guess vertex) in a graph. So given I've the following graph:
Now I want start the traversal with 6010142. The query should return 6010625 because it is the last node that can be reached via 6010145. But how does the query looks like?
I already tried:
FOR v, e, p IN 1..5 OUTBOUND {_id: 'nodes/6010142'} GRAPH 'test' RETURN v
But it also returns 6010145. Furthermore it is limited to a maxDepth of 5 but my graph can exceed the limit. So I also need a solution that works for any depth. Hopefully anyone can help me :-)
I'm also just starting out with AQL but maybe this can help.
FOR v IN 1..5 OUTBOUND {_id: 'nodes/6010142'} GRAPH 'test' OPTIONS {uniqueVertices: 'global', bfs: true}
FILTER LENGTH(FOR vv IN OUTBOUND v GRAPH 'test' LIMIT 1 RETURN 1) == 0
RETURN v
This approach follows an older ArangoDB cook book (p. 39) for finding leaf nodes. The filter line takes the connected nodes found by the first line and does a second traversal to check if this is actually a leaf.
The OPTIONS {uniqueVertices: 'global', bfs: true} part is an optimization if you are only interested in unique leaf nodes and not all the different paths to those nodes.
Regarding maxDepth I would just use a sufficiently high number. The worst case would be the number of nodes in your graph.
(The graph you posted and your description seem to disagree about the direction of the edges. Maybe you need to use INBOUND.)

Repeat in gremlin

Two queries related to gremlin are as follows:
Want to stop the traversal when a condition is satisfied during repeated condition check.
g.V().has('label_','A')).emit().repeat(inE().outV()).until(has('stop',1)).project('depth','values').by(valueMap('label_','stop'))
I want the query to stop returning further values when the stop is equal to 1 for the node encountered during the repeat statement. But the query doesn't stop and return all the records.
Output required:
=>{label_='A',stop=0}
=>{label_='B',stop=0}
=>{label_='C',stop=1}
Query to return traversal values in the following format considering if edge exists between them. Considering the graph as A->E1->B->E2->C. The output must be as follows
=> A,E1,B
=> B,E2,C
A, B, C, E1, E2 represents properties respectively where is the starting node
For the first part, it seems you traversing on the in edges and not on the out is this on purpose? if so replace the out() in the repeat to in
g.V().has(label, 'A').emit().
repeat(out()).until(has('stop', 1)).
project('label', 'stop').
by(label).
by(values('stop'))
example: https://gremlify.com/ma2xkkszkzr/1
for the second part, I'm still not sure what you meant if you just want to get all edges with their out and in you can use elementMap:
g.E().elementMap()
example: https://gremlify.com/ma2xkkszkzr/4
and if not supported you can maybe do something like this:
g.E().local(union(
outV(),
identity(),
inV()
).label().fold())
example: https://gremlify.com/ma2xkkszkzr/2

Creating edges in neo4j based on query results

I'm modelling a search term transition graph in a e-commerce software as a graph of nodes (terms) and edges (transitions). If a user types e.g. iphone in the search bar and then refines the query to iphone 6s this will be modeled as two nodes and a edge between those nodes. The same transition of terms of the different users will result in several edges between the nodes.
I'd now like to create an edge with a cumulated weight of 4 to represent that 4 users did this specific transition. How can I combine the results of a count(*) query with a create query to produce an edge with a property weight = 4
My count(*) query is:
MATCH (n:Term)-[r]->(n1:Term)
RETURN type(r), count(*)
I'd expect the combined query to look like this, but this kind of sql like composition seems not to be possible in cypher:
MATCH (n:Term), (n1:Term)
WHERE (n)-[tr:TRANSITION]->(n1)
CREATE (n)-[actr:ACC_TRANSITION {count:
MATCH (n:Term)-[r]->(n1:Term) RETURN
count(*)}
]->(n1)
RETURN n, n1
A non generic query to produce the accumulated transition that works is:
MATCH (n:Term), (n1:Term)
WHERE n.term = 'iphone' AND n1.term ='iphone 6s'
CREATE (n)-[actr:ACC_TRANSITION {count: 4}]->(n1)
RETURN n, n1
Any other ideas on how to approach and model this problem?
Use WITH like this:
MATCH (n:Term)-[r]->(n1:Term)
WITH n as n, count(*) as rel_count, n1
CREATE (n)-[:ACC_TRANSITION {count:rel_count}]->(n1)
RETURN n, n1
If you match the nodes and relationship first and then use set, you will not produce duplicate nodes or relationships
Match (n:Term)-[r]->(n1.Term)
with n as nn,count(r) as rel_count,n1 as nn1
set r.ACC_TRANSITION=rel_count
return nn,nn1,r
The create function will create duplicates.

Neo4J averaging numeric property over several nodes

I have a large graph and exactly 2 days of Neo4J under my belt, i.e. zilch.
I want to compute an average over all nodes of a certain numerical properties, say n.prop_01 whenever it occurs in nodes (n). Then I want to get the nodes whose property n.prop_01 is below a certain threshold depending on the average.
I could not find an answer or something inspiring in the documentation or on online forums. I tried what follows and more...
MATCH (n) WHERE exists(n.prop_01)
WITH n, collect(n.prop_01) AS all_prop_01
UNWIND all_prop_01 as all_prop
WITH n,prop_01,avg(all_prop) AS avg_prop
WHERE n.prop_01 < (1.1*avg_prop)
RETURN n.name,n.prop_01 LIMIT 20;
Result:(no changes, no records).
Thanks for any pointers as to why this does not work and how I could make it work.
You need to first calculate the average, and then filter, otherwise when you unwind, the average is equal to the property of each node:
MATCH (n) WHERE exists(n.prop_01)
WITH avg(n.prop_01) as avg_prop
MATCH (n) WHERE exists(n.prop_01) AND n.prop_01 < 1.1 * avg_prop
RETURN count(n), avg_prop
You could compute the average upstream when you are matching your nodes with prop_01.
You should also consider adding a label to the match to reduce the number of nodes your query needs to look through.
You can also collect the n nodes rather than just prop_01 and use that in your unwind later.
MATCH (n:Add_A_Node_Label_Here)
WHERE exists(n.prop_01)
WITH collect(n) AS all_n, avg(n.prop_01) as avg_prop
UNWIND all_n as n
WITH n, avg_prop
WHERE n.prop_01 < (1.1 * avg_prop)
RETURN n.name, n.prop_01
LIMIT 20;

Path properties in Cypher

I have a following graph in Neo4j
(id:5,t:e)<--(id:4,t:w)<--(id:0;t:s)-->(id:1,t:w)-->(id:2,t:b)-->(id:3,t:e)
now I search paths from nodes with t:s to nodes with t:e such that only white-listed nodes with t:w are in-between.
So ideally i need a query to return only (0)-->(4)-->(5) but not (0)-->(1)-->(2)-->(3).
EDIT: i have forgotten to mention that paths may have variable length: from 0 to potentially infinity. It means that I may have an arbitrary number of "t:w" nodes
Best regards
Working just with the information that you have provided above you could use
MATCH p=({t:'s'})-->({t:'w'})-->({t:'e'}) RETURN p
Of course if an s could link directly to an e you will need to use variable length relationships matches.
MATCH p=({t:'s'})-[*0..1]->({t:'w'})-[]->({t:'e'})
RETURN DISTINCT p
EDIT - Paths of any length
MATCH p=({t:'s'})-[*0..1]->({t:'w'})-[*]->({t:'e'})
RETURN DISTINCT p
To match a path of any length use the * operator in the relationship path match. It is usually best to put some bounds on that match, an example of which is the *0..1 (length 0 to 1). You can leave either end open *..6 (length 1 to 6) or *2.. (length 2 to whatever).
The problem with this is that now you cannot guarantee the node types in the intervening nodes (so t:"b" will be matched). To avoid that I think you'll have to filter.
MATCH p=({t:'s'})-[*]->({t:'e'})
WHERE ALL (node IN NODES(p)
WHERE node.t = 's' OR node.t = 'w' OR node.t = 'e' )
RETURN p
End Edit
You should introduce labels to your nodes and use relationship types for traversal though as that is where Neo/Cypher is going to be able to help you out. You should also make sure that if you are matching on properties that they are indexed correctly.

Resources