Gremlin Match Traversal that contains a repetitive structure - gremlin

Hi I am trying to match a subgraph that may have a path of Extends edges.
The known parts are the Vertices with ids 1,2,3 and 6 and their edges.. What is not known is the number of vertices and their ids between 1 and 6. Match starts from vertex with id=1. The match traversal needs to match the whole subgraph with a limit of let's say 10 steps between 4 and 6. In the trivial case vertex with id 6 is directly connected with vertex having id = 1 through edge ContainsB.
Any help is appreciated!

I think this seems to work the way I wanted:
g.V().match(
__.as("s").hasId("1").outE("ContainsB").inV().until(hasId("6")).repeat(out("Extends")).limit(10),
__.as("s").hasId("1").outE("ContainsA").inV().hasId("2"),
__.as("s").hasId("1").outE("ContainsC").inV().hasId("3")
)

Related

In NebulaGraph Database, why is the number of hops of GetSubGraph in the returned result greater than step_count?

The number of hops in the returned result is not the same as step_count. That is by design.
To show the completeness of the subgraph, an additional hop is made on all vertices that meet the conditions.
The returned paths of GET SUBGRAPH 1 STEPS FROM "A"; are A->B, B->A, and A->C. To show the completeness of the subgraph, an additional hop is made on all vertices that meet the conditions, namely B->C.
The returned path of GET SUBGRAPH 1 STEPS FROM "A" IN follow; is B->A. To show the completeness of the subgraph, an additional hop is made on all vertices that meet the conditions, namely A->B.
If you only query paths or vertices that meet the conditions, we suggest you use MATCH or GO. The example is as follows.
nebula> MATCH p= (v:player) -- (v2) WHERE id(v)=="A" RETURN p;
nebula> GO 1 STEPS FROM "A" OVER follow YIELD src(edge),dst(edge);

How to return top n biggest cluster in Neo4j?

in my database, the graph looks somehow like this:
I want to find the top 3 biggest cluster in my data. A cluster is a collection of nodes connected to each other, the direction of the connection is not important. As can be seen from the picture, the expected result should have 3 clusters with size 3 2 2 respectively.
Here is what I came up with so far:
MATCH (n)
RETURN n, size((n)-[*]-()) AS cluster_size
ORDER BY cluster_size DESC
LIMIT 100
However, it has 2 problems:
I think the query is wrong because the size() function does not return the number of nodes in a cluster as I want, but the number of sub-graph matching the pattern instead.
The LIMIT clause limits the number of nodes to return, not taking the top result. That's why I put 100 there.
What should I do now? I'm stuck :( Thank you for your help.
UPDATE
Thanks to Bruno Peres' answer, I'm able to try algo.unionFind query in Neo4j Graph Algorithm. I can find the size of my connected components using this query:
CALL algo.unionFind.stream()
YIELD nodeId,setId
RETURN setId,count(*) as size_of_component
ORDER BY size_of_component DESC LIMIT 20;
And here is the result:
But that's all I know. I cannot get any information about the nodes in each component to visualize them. The collect(nodeId) takes forever because the top 2 components are too large. And I know it doesn't make sense to visualize those large components, but how about the third one? 235 nodes are fine to render.
I think you are looking for Connected Componentes. The section about connected components of Neo4j Graph Algorithms User Guide says:
Connected Components or UnionFind basically finds sets of connected
nodes where each node is reachable from any other node in the same
set. In graph theory, a connected component of an undirected graph is
a subgraph in which any two vertices are connected to each other by
paths, and which is connected to no additional vertices in the graph.
If this is your case you can install Neo4j Graph Algorithms and use algo.unionFind. I reproduced your scenario with this sample data set:
create (x), (y),
(a), (b), (c),
(d), (e),
(f), (g),
(a)-[:type]->(b), (b)-[:type]->(c), (c)-[:type]->(a),
(d)-[:type]->(e),
(f)-[:type]->(g)
Then running algo.unionFind:
// call unionFind procedure
CALL algo.unionFind.stream('', ':type', {})
YIELD nodeId,setId
// groupBy setId, storing all node ids of the same set id into a list
WITH setId, collect(nodeId) as nodes
// order by the size of nodes list descending
ORDER BY size(nodes) DESC
LIMIT 3 // limiting to 3
RETURN setId, nodes
The result will be:
╒═══════╤══════════╕
│"setId"│"nodes" │
╞═══════╪══════════╡
│2 │[11,12,13]│
├───────┼──────────┤
│5 │[14,15] │
├───────┼──────────┤
│7 │[16,17] │
└───────┴──────────┘
EDIT
From comments:
how can I get all nodeId of a specific setId? For example, from my
screenshot above, how can I get all nodeId of the setId 17506? That
setId has 235 nodes and I want to visualize them.
Run call CALL algo.unionFind('', ':type', {write:true, partitionProperty:"partition"}) YIELD nodes RETURN *. This statement will create apartition` property for each node, containing the partition ID the node is part of.
Run this statement to get the top 3 partitions: match (node)
with node.partition as partition, count(node) as ct order by ct desc
limit 3 return partition, ct.
Now you can get all nodes of each top 3 partitions individually with match (node {partition : 17506}) return node, using the partition ID returned in the second query.

Neo4j Cypher Query: Finding all nodes, that are connected to a node, that has more than 3 other relationships

I have a problem with my Cypher query. I have some nodes called :SENTENCE and some other called :WORD. :SENTENCE nodes have relationships :CONTAINS to :WORD nodes.
I want to find :SENTENCE nodes, that are connected to :WORD nodes, that are used from more than 3 other :SENTENCE nodes. All :WORD nodes have to comply this criterion.
I tried something like this:
MATCH p=(s1:SENTENCE)-[:CONTAINS]-(w:WORD)-[:CONTAINS]-(s2:SENTENCE)
WITH s1,w, COUNT(s2) as num
WHERE num > 3
RETURN s1
LIMIT 25
But the result contains :SENTENCE nodes, where one and not all :WORD nodes have a degree of minimum 3.
Some other try:
MATCH p=(s1:SENTENCE)-[:CONTAINS]-(w:WORD)-[:CONTAINS]-(s2:SENTENCE)
WHERE SIZE((:SENTENCE)-[:CONTAINS]-(w:WORD)) > 3
RETURN s1
LIMIT 25
But this does not hold for any :WORD nodes that is contained in an Sentence. It only holds for 1.
So my question is: How can I make a query that the condition hold for all nodes and not only for one.
This kind of requirement usually requires collecting nodes and using the all() function to ensure some predicate holds true for all elements of the collection:
MATCH (s1:SENTENCE)-[:CONTAINS]-(w:WORD)
WITH s1, collect(w) as words
WHERE all(word in words WHERE size((word)-[:CONTAINS]-()) > 3)
RETURN s1
LIMIT 25

Find all BFS/DFS traversals

Given an undirected cyclic graph, I want to find all possible traversals with Breadth-First search or Depth-First search. That is given a graph as an adjacency-list:
A-BC
B-A
C-ADE
D-C
E-C
So all BFS paths from root A would be:
{ABCDE,ABCED,ACBDE,ACBED}
and for DFS:
{ABCDE,ABCED,ACDEB,ACEDB}
How would I generate those traversals algorithmically in a meaningful way? I suppose one could generate all permutations of letters and check their validity, but that seems like last-resort to me.
Any help would be appreciated.
Apart from the obvious way where you actually perform all possible DFS and BFS traversals you could try this approach:
Step 1.
In a dfs traversal starting from the root A transform the adjacency list of the currently visited node like so: First remove the parent of the node from the list. Second generate all permutations of the remaining nodes in the adj list.
So if you are at node C having come from node A you will do:
C -> ADE transform into C -> DE transform into C -> [DE, ED]
Step 2.
After step 1 you have the following transformed adj list:
A -> [CB, BC]
B -> []
C -> [DE, ED]
D -> []
E -> []
Now you launch a processing starting from (A,0), where the first item in the pair is the traversal path and the second is an index. Lets assume we have two queues. A BFS queue and a DFS queue. We put this pair into both queues.
Now we repeat the following, first for one queue until it is empty and then for the other queue.
We pop the first pair off the queue. We get (A,0). The node A maps to [BC, CB]. So we generate two new paths (ACB,1) and (ABC,1). Put these new paths in the queue.
Take the first one of these off the queue to get (ACB,1). The index is 1 so we look at the second character in the path string. This is C. Node C maps to [DE, ED].
The BFS children of this path would be (ACBDE,2) and (ACBED,2) which we obtained by appending the child permutation.
The DFS children of this path would be (ACDEB,2) and (ACEDB,2) which we obtained by inserting the child permutation right after C into the path string.
We generate the new paths according to which queue we are working on, based on the above and put them in the queue. So if we are working on the BFS queue we put in (ACBDE,2) and (ACBED,2). The contents of our queue are now : (ABC,1) , (ACBDE,2), (ACBED,2).
We pop (ABC,1) off the queue. Generate (ABC,2) since B has no children. And get the queue :
(ACBDE,2), (ACBED,2), (ABC,2) and so on. At some point we will end up with a bunch of pairs where the index is not contained in the path. For example if we get (ACBED,5) we know this is a finished path.
BFS is should be quite simple: each node has a certain depth at which it will be found. In your example you find A at depth 0, B and C at depth 1 and E and D at depth 2. In each BFS path, you will have the element with depth 0 (A) as the first element, followed by any permutation of the elements at depth 1 (B and C), followed by any permutation of the elements at depth 2 (E and D), etc...
If you look at your example, your 4 BFS paths match that pattern. A is always the first element, followed by BC or CB, followed by DE or ED. You can generalize this for graphs with nodes at deeper depths.
To find that, all you need is 1 Dijkstra search which is quite cheap.
In DFS, you don't have the nice separation by depth which makes BFS straightforward. I don't immediately see an algorithm that is as efficient as the one above. You could set up a graph structure and build up your paths by traversing your graph and backtracking. There are some cases in which this would not be very efficient but it might be enough for your application.

Pre-Requisite for Graphs with Unique Topological Sort

Let's assume that a graph in question is a DAG (directed acyclic graph).
Question: can I conclude that such graph will have a unique topological sort if, and only if, only one of its vertices has no incoming edges?
In other words, is having only one vertex with no incoming edges necessary (but not sufficient) to generate a unique topological sort?
A topological sort will be unique if and only if there is a directed edge between each pair of consecutive vertices in the topological order (i.e., the digraph has a Hamiltonian path). Source
A Hamiltonian path just means that a path between two vertices will only visit each vertex once, it does not mean though that one vertex must have no incoming edges. You can have a Hamiltonian path that is in fact a cycle. This would still generate a unique topological sort (of course it would be a cycle as well if that is important to you).
Hope this helps
Haaaaa, ok. sorry for the misunderstanding.
In this case I assume that you are right, here is a sketch of proof:
We have a unique topological sort => We have only one vertex that it is legal to put in the first place => For every vertex, except one, it is not legal to put in the first place => For every vertex, except one, we have incoming edges.
Hope that now I answered your question....
No! The graph below has only one vertex with no incoming edges, and have 2 possible solutions.
1 -> 2
3 -> 4
3 -> 1
4 -> 2
Two solutions are:
2 0 3 1
2 3 0 1
Yes you can say that as a necessary condition as if there are multiple nodes with in-degree =0 then there will not be a Hamiltonian Path hence no unique topological order. Only for the starting node of graphs (in-degree=0) will have no incoming edge, rest all vertices MUST have an incoming edge from their topological ancestor, that means node just before the current node in topological ordering. If every consecutive node in topological ordering does not have an edge the DAG will NOT Have unique order.
Someone has given the answer. Here I just want to give you a counter example: G = {(1,2), (1,3)}. In this case there are 2 valid topological sort: 1,2,3 and 1,3,2

Resources