How to traverse a graph and pattern-match a subgraph in Gremlin - gremlin

I have a graph which is made of many instances of the same pattern (or subgraph).
The subgraph of interest is pictured below.
The relationship cardinality between the nodes are:
s -> c (one-many)
c -> p (many-many)
p -> aid (one-many)
p -> rA (one-one)
p -> rB (one-one)
p -> o (many-one)
The goal is to return a list of all instances of this subgraph or pattern as shown below
[
{
s-1,
c-1,
p-1,
aid-1,
o-1,
rA-1,
rB-1
},
{
s-2,
c-2,
p-2,
aid-2,
o-2,
rA-2,
rB-2
},
{
... so on and so forth
}
]
How do I query my graph to return this response?
I have tried using a combination of and() and or() as shown below, but that did not capture the entire subpattern as desired.
g.V().hasLabel('severity').as('s').out('severity').as('c').out('affecting').as('p')
.and(
out('ownedBy').as('o'),
out('rA').as('rA'),
out('rB').as('rB'),
out('package_to_aid').as('aid')
)
.project('p', 'c', 's', 'o', 'rA', 'r', 'aid').
by(valueMap()).
by(__.in('affecting').values('cve_id')).
by(__.in('affecting').in('severity').values('severity')).
by(out('ownedBy').values('name')).
by(out('rA').valueMap()).
by(out('rB').valueMap()).
by(out('package_to_aid').values('aid')).
I know I can use a series of out() and in() steps to traverse a non-branching path (for example the nodes: s->c->p), however I am struggling with capturing/traversing paths that branch out (for example, the node p and its 3 children nodes: rA, rB, and o)
I looked at Union() but I am unable to make it work either.
I am unable to find examples of similar queries online. Does Gremlin allow this sort of traversal, or do I have to remodel my graph as a Linked-list for this to work?
ps. I am doing this on Cosmos where Match() step is not supported

Related

Write an algorithm to find a path that traverses all edges of directed graph G exactly once

...You may visit nodes multiple times, if necessary. Show the run time complexity of your algorithm. This graph is not necessarily strongly connected, but starting from a node there should exist such a path.
My approach so far was to repeatedly DFS at unvisited nodes to find the node with the highest post number as this will be part of the source node in a meta graph.
Then repeatedly DFS on the reverse graph to find the sink node meta graph.
Then run the Eulerian path algorithm until we exhaust all of the edges, backtracking if a path leads to a dead end.
I cant figure out how to proceed from here.
Hi, I came up with this. Can someone please verify it?
function edge_dfs(vertex u, graph, path, num_edges) {
if num_edges == 0 { // Found a path.
return true;
}
for all neighbors n of u {
if exists((u, n)) == true {
num_edges--
exists((u, n)) = false
path.append((n))
if edge_dfs(n, g, p, num_edges) == true:
return true
else:
num_edges++ // Backtrack if this edge was unsuccessful.
exists((u, n)) = true
path.pop()
}
}
return false // No neighbors or No valid paths from this vertex.
}
repeatedly do dfs and fine a source component
node: call it s.
path = array{s}
exists((u, v)) = true for all edges (u, v) in graph
num_edges = number of edges in graph
if edge_dfs(s, graph, path, num_edges) == true:
Path is the elements in array 'path' in order.
else:
Such a path does not exist.
And this is O(|E| + |V|) as it is just a DFS of all of the edges.

Get relationships and nodes between two nodes

Given the following data set in Neo4j:
(A)-flows->(B)-flows->(C)-flows->(D)-flows->(Z)
(A)-flows->(E)-flows->(F)-flows->(Z)
(A)-flows->(G)-flows->(Z)
How can I return the subgraph (nodes B, C, D, E, F, G their relationships between each other and the relationships to A and Z) with a Cypher query when only A and Z is known.
Pseudo code:
Match(a)-[rels*](nodes*)-(z)
where a.Id = '123' and z.Id = '456'
return a,rels,nodes,z
Save the (A)->...->(Z) subgraph to a named path then use the nodes and relationships functions to extract a list of nodes and relationships:
MATCH p=(a {Id: '123'})-[:flows*]->(z {Id: '456'})
RETURN a, nodes(p), relationships(p), z
As pointed out in the comments, nodes(p) also returns a and z. If you do not want those nodes to be returned, omit the first and last elements of the list. Thanks to Bruno Peres and cybersam for their inputs.
MATCH p=(a {Id: '123'})-[:flows*]->(z {Id: '456'})
RETURN a, nodes(p)[1..-1], relationships(p), z
Remark #1. It is also possible to UNWIND these lists to their content process one-by-one.
Remark #2. Depending on the driver your using, you can simply return p and process it in the client's code. For example, the Java driver allows use to return a Path object that has nodes() and relationships() methods returning Iterables.

DSE graph Batch write with ifnotexist on edges

I am using DSE graph to load data from a excel and preparing addE gremlin queries through java code and at last executing them over DSE graph.
In current testing need to fire 4,00,000 addE gremlin queries with two edge labels.
1) What is best practice to finish this execution in few minutes ?
Right now i am giving gremlin queries in 1000 batch to dseSession.executeGraph(new SimpleGraphStatement("")) which leading to exception Method code too large! at groovyjarjarasm.asm.MethodWriter
2) For edge labels in this usecase, my schema defined as single cardinality.
Also using custom vertex ids for vertexes.
So if a edge already exist then DSE should just ignore it without any exception ?
The query parameter should be a simple array that looks like this:
[[from1, to1, label1], [from2, to2, label2], ...]
Then your script should look like this:
for (def triple in arg) {
def (id1, id2, lbl) = triple
def v1 = graph.vertices(id1).next()
def v2 = graph.vertices(id2).next()
if (!g.V(v1).outE(lbl).filter(inV().is(v2)).hasNext()) {
v1.addEdge(lbl, v2)
}
}
Alternatively:
for (def triple in arg) {
def (id1, id2, lbl) = triple
def v1 = graph.vertices(id1).next()
if (!g.V(v1).outE(lbl).filter(inV().hasId(id2)).hasNext()) {
v1.addEdge(lbl, graph.vertices(id2).next())
}
}
Try both variants; at least one of them should outperform any other solution.

Get relationship between nodes

I have this following type of graph which I get using this query:
MATCH (p:Person)-[:REPORTS_TO *]->(c:Person) WHERE p.name="F"
WITH COLLECT (c) + p AS all
UNWIND all as p MATCH (p)-[:REPORTS_TO]-(c)
RETURN p,c;
Use-Case:
1. I want to find what level is a node at with respect to node F?
Example :
Node `D`, `E` are direct child of `F`, hence they are at level 1
Node `A,B,C` are childs of `D` (which is child of `F`) hence level 2
Node `X` is child of `A' (which is at level 2), hence level 3
and so onnnnn....
I tried to solve this by introducing a variable i and increment it with each iteration (but it didn't worked).
MATCH (p:Person)-[:REPORTS_TO *]->(c:Person) WHERE p.name="F"
WITH COLLECT (c) + p AS all ,i:int=0
UNWIND all as p MATCH (p)-[:REPORTS_TO]->(c)
RETURN p,c, i=i+1;
2. Given two nodes find relation between then
e.g Find relation between F and X?
Expected answer = 3 (as it is at level 3)
How should I proceed to solve these use-cases?
Note : Graphical response from Neo4j server isn't necessarily needed , Json response will also be fine.
UC1 Use Path and length(p) function
MATCH p=(root:Person)-[:REPORTS_TO *]->(child:Person)
WHERE root.name="F"
RETURN nodes(p)[-2], nodes(p)[-1],length(p)
This will find all paths from root node, and return pairs of second to last and last nodes + level you want.
nodes(p) - list of nodes on path p
[-2] - second node from the end of the list
UC2: use shortestPath function:
MATCH (p1:Person),(p2:Person)
WHERE p1.name = '..' AND p2.name = '...'
MATCH p=shortestPath((p2)-[:REPORTS_TO*]->(p2))
RETURN length(p)

Using Parameters in Neo4j Relationship Queries

I'm struggling to work around a small limitation of Neo4j in that I am unable to use a parameter in the Relationship section of a Cypher query.
Christophe Willemsen has already graciously assisted me in working my query to the following:
MATCH (n1:Point { name: {n1name} }),
(n2:Point { name: {n2name} }),
p = shortestPath((n1)-[r]->(n2))
WHERE type(r) = {relType}
RETURN p
Unfortunately as r is a Collection of relationships and not a single relationship, this fails with an error:
scala.collection.immutable.Stream$Cons cannot be cast to org.neo4j.graphdb.Relationship
Removing the use of shortestPath() allows the query to run successfully but returns no results.
Essentially my graph is a massive collection of "paths" that link "points" together. It is currently structured as such:
http://console.neo4j.org/r/rholp
I need to be able to provide a starting point (n1Name), an ending point (n2Name), and a single path to travel along (relType). I need a list of nodes to come out of the query (all the ones along the path).
Have I structured my graph incorrectly / not optimally? I am open to advice on whether the overall structure is not optimal as well as advice on how best to structure the query!
EDIT
Regarding your edit, the nodes() function returns you the nodes along the path :
MATCH p=allShortestPaths((n:Point { name:"Point5" })-[*]->(n2:Point { name:"Point8" }))
WHERE ALL (r IN rels(p) WHERE type(r)={relType})
RETURN nodes(p)
In the console link, it is returning nodes Points 5,6,7,8
I guess in your case that using a common relationship type name for connecting your Point nodes would be more efficient.
If having a Path1, Path2, .. is for knowing the distance between two points, you can easily know the distance by asking for the length of the path, like this query related to your console link :
MATCH (n:Point { name:"Point1" })
WITH n
MATCH (n2:Point { name:"Point4" })
WITH n, n2
MATCH p=shortestPath((n)-[]->(n2))
RETURN length(p)
If you need to return only paths having a defined relationship length, you can use it without the shortestPath by specifying a strict depth :
MATCH (n:Point { name:"Point1" })
WITH n
MATCH (n2:Point { name:"Point4" })
WITH n, n2
MATCH p=(n)-[*3..3]->(n2)
RETURN length(p)
LIMIT1
As you can see here, the need to specify the relationship is not mandatory, you can just omit it or add the :NEXT type if you have other relationship types in your graph
If you need to match on the type, for e.g. the path from point 5 to point 8 in your console link, and the path can only have a PATH_TWO relationship, then you can do this :
MATCH (n:Point { name:"Point5" })
WITH n
MATCH (n2:Point { name:"Point8" })
WITH n, n2
MATCH p=(n)-[r*]->(n2)
WHERE type(r[0])= 'PATH_TWO'
WITH p, length(p) AS l
ORDER BY l
RETURN p, l
LIMIT 1
If you really NEED to have the Path1, Path2 style, maybe a short explanation on the need could help us find the more appropriate query
MATCH p=shortestpath((n1:Point{name:{n1name}})-[:relType *]->(n2:Point {name:{n2name}}))
RETURN p

Resources