Neo4J averaging numeric property over several nodes - graph

I have a large graph and exactly 2 days of Neo4J under my belt, i.e. zilch.
I want to compute an average over all nodes of a certain numerical properties, say n.prop_01 whenever it occurs in nodes (n). Then I want to get the nodes whose property n.prop_01 is below a certain threshold depending on the average.
I could not find an answer or something inspiring in the documentation or on online forums. I tried what follows and more...
MATCH (n) WHERE exists(n.prop_01)
WITH n, collect(n.prop_01) AS all_prop_01
UNWIND all_prop_01 as all_prop
WITH n,prop_01,avg(all_prop) AS avg_prop
WHERE n.prop_01 < (1.1*avg_prop)
RETURN n.name,n.prop_01 LIMIT 20;
Result:(no changes, no records).
Thanks for any pointers as to why this does not work and how I could make it work.

You need to first calculate the average, and then filter, otherwise when you unwind, the average is equal to the property of each node:
MATCH (n) WHERE exists(n.prop_01)
WITH avg(n.prop_01) as avg_prop
MATCH (n) WHERE exists(n.prop_01) AND n.prop_01 < 1.1 * avg_prop
RETURN count(n), avg_prop

You could compute the average upstream when you are matching your nodes with prop_01.
You should also consider adding a label to the match to reduce the number of nodes your query needs to look through.
You can also collect the n nodes rather than just prop_01 and use that in your unwind later.
MATCH (n:Add_A_Node_Label_Here)
WHERE exists(n.prop_01)
WITH collect(n) AS all_n, avg(n.prop_01) as avg_prop
UNWIND all_n as n
WITH n, avg_prop
WHERE n.prop_01 < (1.1 * avg_prop)
RETURN n.name, n.prop_01
LIMIT 20;

Related

Creating edges in neo4j based on query results

I'm modelling a search term transition graph in a e-commerce software as a graph of nodes (terms) and edges (transitions). If a user types e.g. iphone in the search bar and then refines the query to iphone 6s this will be modeled as two nodes and a edge between those nodes. The same transition of terms of the different users will result in several edges between the nodes.
I'd now like to create an edge with a cumulated weight of 4 to represent that 4 users did this specific transition. How can I combine the results of a count(*) query with a create query to produce an edge with a property weight = 4
My count(*) query is:
MATCH (n:Term)-[r]->(n1:Term)
RETURN type(r), count(*)
I'd expect the combined query to look like this, but this kind of sql like composition seems not to be possible in cypher:
MATCH (n:Term), (n1:Term)
WHERE (n)-[tr:TRANSITION]->(n1)
CREATE (n)-[actr:ACC_TRANSITION {count:
MATCH (n:Term)-[r]->(n1:Term) RETURN
count(*)}
]->(n1)
RETURN n, n1
A non generic query to produce the accumulated transition that works is:
MATCH (n:Term), (n1:Term)
WHERE n.term = 'iphone' AND n1.term ='iphone 6s'
CREATE (n)-[actr:ACC_TRANSITION {count: 4}]->(n1)
RETURN n, n1
Any other ideas on how to approach and model this problem?
Use WITH like this:
MATCH (n:Term)-[r]->(n1:Term)
WITH n as n, count(*) as rel_count, n1
CREATE (n)-[:ACC_TRANSITION {count:rel_count}]->(n1)
RETURN n, n1
If you match the nodes and relationship first and then use set, you will not produce duplicate nodes or relationships
Match (n:Term)-[r]->(n1.Term)
with n as nn,count(r) as rel_count,n1 as nn1
set r.ACC_TRANSITION=rel_count
return nn,nn1,r
The create function will create duplicates.

How to determine if a vertex has odd or even number of outE()?

I wanna get the vertex that has odd number of edges. Something like these:
g.V().where(out().count() % 2 != 0)
Of course, % can not be used here. Is there a alternative way?
There is no mod operator for sack but there are div, mult and minus.
g.withSack(0).V().as('a').where(outE().count().sack(assign).sack(div).by(constant(2)).sack(mult).by(constant(2)).sack(minus).sack().is(0)) // even
g.withSack(0).V().as('a').where(outE().count().sack(assign).sack(div).by(constant(2)).sack(mult).by(constant(2)).sack(minus).sack().is(neq(0))) // odd
There is no step for a division and to my knowledge also not for modulo, but you can use a lambda for that:
g.V().outE().count().filter{count = it.get(); count % 2 == 1;}
(Note that this query requires a scan of the complete graph in most systems as no index is used.)
This post in the Gremlin-users group contains more information about mathematical operations with Gremlin.

Cypher query to stop graph traversal when reaching a hub

I have a graph database that contains highly connected nodes (hubs). These nodes can have more than 40000 relationships.
When I want to traverse the graph starting from a node, I would like to stop traversal at these hubs not to retrieve too many nodes.
I think I should use aggregation function and conditional stop based on the count of relationship for each node, but I didn't manage to write the good cypher query.
I tried:
MATCH p=(n)-[r*..10]-(m)
WHERE n.name='MyNodeName' AND ALL (x IN nodes(p) WHERE count(x) < 10)
RETURN p;
and also:
MATCH (n)-[r*..10]-(m) WHERE n.name='MyNodeName' AND COUNT(r) < 10 RETURN p;
I think you can't stop the query at some node if you MATCH a path of length 10. You could count the number of relationships for all nodes in the path, but only after the path is matched.
You could solve this by adding an additional label to the hub nodes and filter that in your query:
MATCH (a:YourLabel)
OPTIONAL MATCH (a)-[r]-()
WITH a, count(r) as count_rels
CASE
WHEN count_rels > 20000
THEN SET a :Hub
END
Your query:
MATCH p=(n)-[r*..10]-(m)
WHERE n.name='MyNodeName' AND NONE (x IN nodes(p) WHERE x:Hub)
RETURN p
I used this approach in a similar case.
Since Neo4j 2.2 there is a cool trick to use the internal getDegree() function to determine if a node is a dense node.
You also forgot the label (and probably index) for n
For your case that would mean:
MATCH p=(n:Label)-[r*..10]-(m)
WHERE n.name='MyNodeName' AND size((m)--()) < 10
RETURN p;

Path properties in Cypher

I have a following graph in Neo4j
(id:5,t:e)<--(id:4,t:w)<--(id:0;t:s)-->(id:1,t:w)-->(id:2,t:b)-->(id:3,t:e)
now I search paths from nodes with t:s to nodes with t:e such that only white-listed nodes with t:w are in-between.
So ideally i need a query to return only (0)-->(4)-->(5) but not (0)-->(1)-->(2)-->(3).
EDIT: i have forgotten to mention that paths may have variable length: from 0 to potentially infinity. It means that I may have an arbitrary number of "t:w" nodes
Best regards
Working just with the information that you have provided above you could use
MATCH p=({t:'s'})-->({t:'w'})-->({t:'e'}) RETURN p
Of course if an s could link directly to an e you will need to use variable length relationships matches.
MATCH p=({t:'s'})-[*0..1]->({t:'w'})-[]->({t:'e'})
RETURN DISTINCT p
EDIT - Paths of any length
MATCH p=({t:'s'})-[*0..1]->({t:'w'})-[*]->({t:'e'})
RETURN DISTINCT p
To match a path of any length use the * operator in the relationship path match. It is usually best to put some bounds on that match, an example of which is the *0..1 (length 0 to 1). You can leave either end open *..6 (length 1 to 6) or *2.. (length 2 to whatever).
The problem with this is that now you cannot guarantee the node types in the intervening nodes (so t:"b" will be matched). To avoid that I think you'll have to filter.
MATCH p=({t:'s'})-[*]->({t:'e'})
WHERE ALL (node IN NODES(p)
WHERE node.t = 's' OR node.t = 'w' OR node.t = 'e' )
RETURN p
End Edit
You should introduce labels to your nodes and use relationship types for traversal though as that is where Neo/Cypher is going to be able to help you out. You should also make sure that if you are matching on properties that they are indexed correctly.

MATCH on nodes in a two-dimensional COLLECTION in Neo4j / Cypher

In order to limit traversal through a Neo4j graph db, I am collecting scores from a subgraph. Imagine this (simplified)
MATCH (a)-[:r1 {prop1:123}]->()-[]->()-[]->()-[]->(b {prop2:456})
WITH b,b.prop2*r1.prop1 as score ORDER BY score DESC LIMIT 10
WITH COLLECT ([b,score]) AS bscore
so far, so good. To avoid the long traversal, I want to limit the next match to the nodes b stored in bscore and sum the scores in bscore[1], but I couldn't find the correct syntax. Even wondering whether it's possible in cypher. Conceptually I'd like to do this:
MATCH bscore[0]-[:r2]->(c)
RETURN c, SUM(bscore[1])
Any hints/ pointers highly appreciated.
Could you do something like this perhaps?
MATCH (a)-[:r1 {prop1:123}]->()-[]->()-[]->()-[]->(b {prop2:456})
WITH b,b.prop2*r1.prop1 as score ORDER BY score DESC LIMIT 10
MATCH b-[:r2]->(c)
RETURN c, sum(score)

Resources