Cypher query - Combine 2 queries by pipe result of one to other - graph

I am a beginner on cypher and want to create a query that find all nodes that connect to specific nodes that other node connect to them,
see the example
I need to get all the brown nodes that connect to the red nodes that the blue node connect to it.
for this example I want to get the brown nodes Ids: 2, 3 and 1 (because no red nodes needs to get it)
For now I did it on 2 different queries and use python to find it, but now I need to do this in 1 query.
Match (r:R)-[]-(a:A) return collect(a.Id)
and the second query:
Match (b:B) Optional Match (b)-[]-(a:A) return b.Id, collect(a.Id)
and in my script check if every record from the second query is a subset of the first list of all a.Id that connect to R
can I do it in 1 query?
Thank!

Improved answer:
Start with :B nodes and check if all their :A neighbours are have a link to :R
the ALL() function also returns true if :B does not have any neighbours
MATCH (b:B)
WHERE ALL(n IN [(b)--(a:A) | a] WHERE EXISTS ((n)--(:R)) )
RETURN b

Related

How to explain the execution path of this Cypher query?

Imagine the following graph:
And this query:
MATCH(p:Person {id:1})
MATCH (p)-[:KNOWS]-(s)
CREATE (p)-[:LIVE_IN]->(:Place {name: 'Some Place'})
Now, why five LIVE_IN, Place are created even though s is not involved in the CREATE statement? is there any place in the docs that explain this behavior?
Note: this is not about MERGE vs CREATE, although MERGE can solve it.
EDIT: In response to #Tomaz answer: I have deliberately placed MATCH (p)-[:KNOWS]-(s) in the query and I know how it will behave. I am asking for found explanations. For example, is CREATE will execute for each path or row in the matched patterns regardless of the node involved in the CREATE? what if you have complex matched patterns such as, disconnected graph, Trees...etc?
Also note that the direction of relationship KNOWS (- vs ->) will effect the number of returned rows (9 vs 1), but CREATE will execute five times regardless of the direction.
Update:
I have added 3 other node Office and issued the following query:
MATCH(p:Person {id:1})
MATCH (p)-[:KNOWS]-(s)
MATCH (o:Office)
CREATE (p)-[:LOVE]->(:Place {name: 'Any Place'})
And as result: 15 LOVE Place have been created, so it seems to me that cypher performs Cartesian Product between all nodes:
p refer to 1 nodes, s refer to 5 nodes, o refer to 3 nodes => 1 * 5 * 3 = 15
But I can not confirm this form neo4j docs unfortunately.
This is because the Person with id 1 has five neighbors.
In your query you start with:
MATCH(p:Person {id:1})
This produces a single row, where it finds the node you are looking for.
The next step is:
MATCH (p)-[:KNOWS]-(s)
This statement found 5 neighbors, so your cardinality or number of rows increases to five. And then you run a create statement for each row, which in turn creates five Places. You could for example lower the cardinality back to 1 before doing the CREATE and you'll create only a single place:
MATCH(p:Person {id:1})
MATCH (p)-[:KNOWS]-(s)
// aggregation reduces cardinality to 1
WITH p, collect(s) as neighbors
CREATE (p)-[:LIVE_IN]->(:Place {name: 'Some Place'})
When doing cypher query, always have in mind the cardinality you are operating.

How to return top n biggest cluster in Neo4j?

in my database, the graph looks somehow like this:
I want to find the top 3 biggest cluster in my data. A cluster is a collection of nodes connected to each other, the direction of the connection is not important. As can be seen from the picture, the expected result should have 3 clusters with size 3 2 2 respectively.
Here is what I came up with so far:
MATCH (n)
RETURN n, size((n)-[*]-()) AS cluster_size
ORDER BY cluster_size DESC
LIMIT 100
However, it has 2 problems:
I think the query is wrong because the size() function does not return the number of nodes in a cluster as I want, but the number of sub-graph matching the pattern instead.
The LIMIT clause limits the number of nodes to return, not taking the top result. That's why I put 100 there.
What should I do now? I'm stuck :( Thank you for your help.
UPDATE
Thanks to Bruno Peres' answer, I'm able to try algo.unionFind query in Neo4j Graph Algorithm. I can find the size of my connected components using this query:
CALL algo.unionFind.stream()
YIELD nodeId,setId
RETURN setId,count(*) as size_of_component
ORDER BY size_of_component DESC LIMIT 20;
And here is the result:
But that's all I know. I cannot get any information about the nodes in each component to visualize them. The collect(nodeId) takes forever because the top 2 components are too large. And I know it doesn't make sense to visualize those large components, but how about the third one? 235 nodes are fine to render.
I think you are looking for Connected Componentes. The section about connected components of Neo4j Graph Algorithms User Guide says:
Connected Components or UnionFind basically finds sets of connected
nodes where each node is reachable from any other node in the same
set. In graph theory, a connected component of an undirected graph is
a subgraph in which any two vertices are connected to each other by
paths, and which is connected to no additional vertices in the graph.
If this is your case you can install Neo4j Graph Algorithms and use algo.unionFind. I reproduced your scenario with this sample data set:
create (x), (y),
(a), (b), (c),
(d), (e),
(f), (g),
(a)-[:type]->(b), (b)-[:type]->(c), (c)-[:type]->(a),
(d)-[:type]->(e),
(f)-[:type]->(g)
Then running algo.unionFind:
// call unionFind procedure
CALL algo.unionFind.stream('', ':type', {})
YIELD nodeId,setId
// groupBy setId, storing all node ids of the same set id into a list
WITH setId, collect(nodeId) as nodes
// order by the size of nodes list descending
ORDER BY size(nodes) DESC
LIMIT 3 // limiting to 3
RETURN setId, nodes
The result will be:
╒═══════╤══════════╕
│"setId"│"nodes" │
╞═══════╪══════════╡
│2 │[11,12,13]│
├───────┼──────────┤
│5 │[14,15] │
├───────┼──────────┤
│7 │[16,17] │
└───────┴──────────┘
EDIT
From comments:
how can I get all nodeId of a specific setId? For example, from my
screenshot above, how can I get all nodeId of the setId 17506? That
setId has 235 nodes and I want to visualize them.
Run call CALL algo.unionFind('', ':type', {write:true, partitionProperty:"partition"}) YIELD nodes RETURN *. This statement will create apartition` property for each node, containing the partition ID the node is part of.
Run this statement to get the top 3 partitions: match (node)
with node.partition as partition, count(node) as ct order by ct desc
limit 3 return partition, ct.
Now you can get all nodes of each top 3 partitions individually with match (node {partition : 17506}) return node, using the partition ID returned in the second query.

Neo4j Cypher Query: Finding all nodes, that are connected to a node, that has more than 3 other relationships

I have a problem with my Cypher query. I have some nodes called :SENTENCE and some other called :WORD. :SENTENCE nodes have relationships :CONTAINS to :WORD nodes.
I want to find :SENTENCE nodes, that are connected to :WORD nodes, that are used from more than 3 other :SENTENCE nodes. All :WORD nodes have to comply this criterion.
I tried something like this:
MATCH p=(s1:SENTENCE)-[:CONTAINS]-(w:WORD)-[:CONTAINS]-(s2:SENTENCE)
WITH s1,w, COUNT(s2) as num
WHERE num > 3
RETURN s1
LIMIT 25
But the result contains :SENTENCE nodes, where one and not all :WORD nodes have a degree of minimum 3.
Some other try:
MATCH p=(s1:SENTENCE)-[:CONTAINS]-(w:WORD)-[:CONTAINS]-(s2:SENTENCE)
WHERE SIZE((:SENTENCE)-[:CONTAINS]-(w:WORD)) > 3
RETURN s1
LIMIT 25
But this does not hold for any :WORD nodes that is contained in an Sentence. It only holds for 1.
So my question is: How can I make a query that the condition hold for all nodes and not only for one.
This kind of requirement usually requires collecting nodes and using the all() function to ensure some predicate holds true for all elements of the collection:
MATCH (s1:SENTENCE)-[:CONTAINS]-(w:WORD)
WITH s1, collect(w) as words
WHERE all(word in words WHERE size((word)-[:CONTAINS]-()) > 3)
RETURN s1
LIMIT 25

Gremlin Match Traversal that contains a repetitive structure

Hi I am trying to match a subgraph that may have a path of Extends edges.
The known parts are the Vertices with ids 1,2,3 and 6 and their edges.. What is not known is the number of vertices and their ids between 1 and 6. Match starts from vertex with id=1. The match traversal needs to match the whole subgraph with a limit of let's say 10 steps between 4 and 6. In the trivial case vertex with id 6 is directly connected with vertex having id = 1 through edge ContainsB.
Any help is appreciated!
I think this seems to work the way I wanted:
g.V().match(
__.as("s").hasId("1").outE("ContainsB").inV().until(hasId("6")).repeat(out("Extends")).limit(10),
__.as("s").hasId("1").outE("ContainsA").inV().hasId("2"),
__.as("s").hasId("1").outE("ContainsC").inV().hasId("3")
)

Cypher query to stop graph traversal when reaching a hub

I have a graph database that contains highly connected nodes (hubs). These nodes can have more than 40000 relationships.
When I want to traverse the graph starting from a node, I would like to stop traversal at these hubs not to retrieve too many nodes.
I think I should use aggregation function and conditional stop based on the count of relationship for each node, but I didn't manage to write the good cypher query.
I tried:
MATCH p=(n)-[r*..10]-(m)
WHERE n.name='MyNodeName' AND ALL (x IN nodes(p) WHERE count(x) < 10)
RETURN p;
and also:
MATCH (n)-[r*..10]-(m) WHERE n.name='MyNodeName' AND COUNT(r) < 10 RETURN p;
I think you can't stop the query at some node if you MATCH a path of length 10. You could count the number of relationships for all nodes in the path, but only after the path is matched.
You could solve this by adding an additional label to the hub nodes and filter that in your query:
MATCH (a:YourLabel)
OPTIONAL MATCH (a)-[r]-()
WITH a, count(r) as count_rels
CASE
WHEN count_rels > 20000
THEN SET a :Hub
END
Your query:
MATCH p=(n)-[r*..10]-(m)
WHERE n.name='MyNodeName' AND NONE (x IN nodes(p) WHERE x:Hub)
RETURN p
I used this approach in a similar case.
Since Neo4j 2.2 there is a cool trick to use the internal getDegree() function to determine if a node is a dense node.
You also forgot the label (and probably index) for n
For your case that would mean:
MATCH p=(n:Label)-[r*..10]-(m)
WHERE n.name='MyNodeName' AND size((m)--()) < 10
RETURN p;

Resources