Can someone explain what this graph travesal in Gremlin is doing? - gremlin

I'm having a bit of trouble understand these Gremlin queries:
from os import getenv
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
pmap = g.V().has(name, value) \
.union(__.hasLabel('UID'),
__.hasLabel('OID').outE('attached').inV()) \
.union(__.propertyMap(),
__.inE('attached').outV().hasLabel('OID') \
.propertyMap()).toList()
So I understand g.V().has(name, value) is looking for a vertex with the keyname = value. What is the union doing here? Is it unioning vertices with a label "OID" with edges that go outward with a label "attached"?
What is theinV()` and why are the two arguments for union?

The union() step just merges the child traversal streams that are provided to it as arguments. Take a more simple example:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().has('person','name','marko').union(has('age',29),bothE())
==>v[1]
==>e[9][1-created->3]
==>e[7][1-knows->2]
==>e[8][1-knows->4]
gremlin> g.V().has('person','name','marko').union(has('age',30),bothE())
==>e[9][1-created->3]
==>e[7][1-knows->2]
==>e[8][1-knows->4]
In the first example, we get union() takes in a "marko" vertex as the starting point for both has('age',29) and bothE(). As v[1] also has a "age" property with a value of "29" we see v[1] in the output. We also see all the edges of v[1] in merged into that stream of output. In the second traversal, we see v[1] being filtered out as the "age" is not equal to "30" so all we get are the edges.
With that explanation in mind, consider what the traversal you've included in your question is doing. It finds a vertex with a "name" and some value for that key. That becomes the start point for the first union(). If the vertex has a label of "UID" then it pass through. If the vertex has a label of "OID" then it traverses the outgoing "attached" edges to the adjacent vertex and returns those.
What's odd about that is the fact that a Vertex can only have one label (at least by TinkerPop's definition - some graphs support multiple element labels). So, assuming one label, you really only get one or the other stream. Personally, I don't think the use of union() is a good choice there. I think it would be more intuitive to use coalesce since only one stream can be returned, thus expanding my example from above:
gremlin> g.V().has('person','name','marko').coalesce(has('age',30),has('age',29).bothE())
==>e[9][1-created->3]
==>e[7][1-knows->2]
==>e[8][1-knows->4]
gremlin> g.V().has('person','name','marko').coalesce(has('age',29),has('age',29).bothE())
==>v[1]
The use of coalesce() makes the intent much more clear in my opinion. Following on further with the original code to the second union() - at this point, you either have the original Vertex or one or more "attached" vertices for which the traversal combines a propertyMap() and/or a propertyMap() of an additional "attached" vertices that have an "OID" label.
It's really hard to say exactly what the intent of this traversal is given the information provided. Depending on what the data structure is and what the intent is, I imagine that things could be simplified. Hopefully, I've at least explained what union() is doing and clarified that for you as that seemed to be the core of your question.

Related

Tinkerpop Gremlin Get Edges that go to vertices within a list

I'm trying to query for edges that go to vertices within an aggregated list. It sounds quite simple, and it should be, but I seem to be writing my queries wrong, and I just can't figure out why. Anyway, I'll use the Modern Toy Graph to make an example, that won't necessarily make much sense in this context, but still illustrates what I wish to do:
graph = TinkerFactory.createModern()
g = graph.traversal()
g.V().
hasLabel('person').
aggregate('x').
outE().
where(inV().is(within('x')))
What I'm doing is traversing to all 'person' vertices, aggregating them, then trying to get all the outgoing edges that lead to another vertex within that aggregated list. I expect the above query to return the edge labelled "knows" that goes between vertex 1 and 2, and the one between 1 and 4, however nothing is returned. If i simple want to get the vertices on the other end of those edges, rather than the edges themselves, the following works fine, returning vertex 2 and 4:
g.V().
hasLabel('person').
aggregate('x').
out().
where(within('x'))
So how can I get edges that lead to vertices already aggregated in a list?
(Once again, I'm aware this example doesn't make much sense within this particular graph, and I could easily query outE('knows'), but this query is relevant to a different graph.)
Thanks.
You can't use is() quite that way. An easy fix would be to just combine your "working" traversal with the one that doesn't:
gremlin> g.V().hasLabel('person').
......1> aggregate('x').
......2> outE().
......3> where(inV().where(within('x')))
==>e[7][1-knows->2]
==>e[8][1-knows->4]

Query to retrieve all paths traversable from a given vertex

I am trying to write a query that retrieves all paths that are reachable from a specified vertex. In other words I am trying to retrieve the entire cluster/sub-graph that the vertex is connected to. A couple more constraints on the query are:
inward edges should be traversed and included in the result (I am looking for all paths that are in any way connected to the root vertex.
the search must stop at a specified depth of, say, 10 hops from the
root vertex.
Bonus constraint: I would prefer the result not to include paths which are complete sub-paths of other paths returned in the result.
I currently have the following two queries which appear to work expected on small, toy graphs I have tested them on. However, there seem to be some edge cases in our large, production graph that does not return all the paths/edges/vertices I would expect it to, but I cannot explain as to why this happens. The two queries also sometimes return some different vertices than each other.
I would prefer a fresh view on how to approach this query, rather than trying to adjust what I currently have, so please try to provide a solution before looking at my current solution below.
Query 1:
g.V(uid).repeat(bothE().bothV().simplePath()).until(loops().is_(10)).emit().dedup().path().by(valueMap(True))
Query 2:
g.V(uid).repeat(bothE().bothV().simplePath()).until(bothE().simplePath().count().is_(0).or_().loops().is_(10)).dedup().path().by(valueMap(True))
Using this simple binary tree as a test graph
g.addV('root').property('data',9).as('root').
addV('node').property('data',5).as('b').
addV('node').property('data',2).as('c').
addV('node').property('data',11).as('d').
addV('node').property('data',15).as('e').
addV('node').property('data',10).as('f').
addV('node').property('data',1).as('g').
addV('node').property('data',8).as('h').
addV('node').property('data',22).as('i').
addV('node').property('data',16).as('j').
addV('node').property('data',7).as('k').
addV('node').property('data',51).as('l').
addV('node').property('data',13).as('m').
addV('node').property('data',4).as('n').
addE('left').from('root').to('b').
addE('left').from('b').to('c').
addE('right').from('root').to('d').
addE('right').from('d').to('e').
addE('right').from('e').to('i').
addE('left').from('i').to('j').
addE('left').from('d').to('f').
addE('right').from('b').to('h').
addE('left').from('h').to('k').
addE('right').from('i').to('l').
addE('left').from('e').to('m').
addE('right').from('c').to('n').
addE('left').from('c').to('g').iterate()
We could find all the paths using
gremlin> g.V().hasLabel('root').
......1> repeat(bothE().otherV().simplePath()).
......2> until(__.not(bothE().simplePath())).
......3> path().
......4> by('data').
......5> by(label)
==>[9,right,11,left,10]
==>[9,left,5,left,2,left,1]
==>[9,left,5,left,2,right,4]
==>[9,left,5,right,8,left,7]
==>[9,right,11,right,15,left,13]
==>[9,right,11,right,15,right,22,left,16]
==>[9,right,11,right,15,right,22,right,51]
Note that I used bothE().otherV() as you said in your case you may have some incoming edges as well as outgoing ones.
We could also use the subgraph step to return the whole sub graph containing both vertices and edges. This example finds the subtree that starts at the vertex for the value 5.
gremlin> g.V().has('data',5).
......1> repeat(bothE().subgraph('sg').otherV().simplePath()).
......2> until(__.not(bothE().simplePath())).
......3> cap('sg')
==>tinkergraph[vertices:14 edges:13]
Note that both of these approaches assumes that all paths end at leaf nodes. I left out the loops() test but you can add that in as needed.

What is the Gremlin query that can get me all the vertices either directly or indirectly connected to one specific vertex

I need help with a Gremlin query that can output all the vertices related to one specific vertex A and their cascading related vertices (which means all the vertices related directly or indirectly to A).
For example, in a graph
A -> B -> C
D
Running this query on A will give me B and C.
The solution I have right now is an ugly one:
g.V('A').both(); g.V('A').both().both();
etc
Any help would be really appreciated.
Your solution isn't ugly; it only lacks a bit of iteration and an exit condition.
Do you require a maximum depth? Depending on the shape of your graph, the query you want to execute could be returning all vertices of that graph.
Assuming a toy modern TinkerGraph created in the Gremlin console:
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
This query could be helpful:
gremlin> g.V(1).repeat(both().simplePath()).emit().times(3).dedup()
==>v[3]
==>v[2]
==>v[4]
==>v[6]
==>v[5]
"Starting from vertex with id=1, traverse the graph in all directions up to a maximum depth of 3 while discarding previously visited paths. The emit() step ensures that all traversed vertices found along the way, and not just leaves, are returned."
Chances are high that you want to figure out which vertices are linked to that vertex only via specific edges. In such case, you could be passing label(s) to the both() step, and/or maybe chain a few filters.
When developing your query, feel free to chain the path() step to better understand the output.
gremlin> g.V(1).repeat(both().simplePath()).emit().times(3).dedup().path()
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[1],v[3],v[6]]
==>[v[1],v[4],v[5]]
There are other ways to solve this, but this query should get you started and familiarize yourself with basic Gremlin steps and concepts.

TinkerPop: Filter by Edge count

Sample data: TinkerPop Modern
Summary: I want to find People who have created 2 softwares.
I started with the basics, and got the count properly
g.V().hasLabel("Person").as("from" ,"to1" )
.repeat(bothE().as("e1").otherV().as("to1").dedup("from", "to1")).times(1)
.emit(filter(hasLabel("Software"))).hasLabel("Software")
.group().by(select("from").by("name")).by(count()).as("c")
Result:
>> {'Marko': 1, 'Peter': 1, 'Josh': 2}
So I tried to apply a filter but its not working (ie. Result is incorrect), what I tried:
g.V().hasLabel("Person").as("from")
.repeat(bothE().as("e1").otherV().as("to1").dedup("from", "to1")).times(1)
.filter(bothE().otherV().hasLabel("Software").count(local).is(eq(1)))
.dedup()
.values("name")
Any idea what am I doing wrong?
Sample data:
If you just need "person" vertices by edge count I don't really see why you need all that repeat() infrastructure. It's just:
gremlin> g.V().hasLabel('person').
......1> filter(outE('created').limit(2).count().is(2))
==>v[4]
You only need to count outgoing edges because the schema is such that the "created" label only connects to "software", so you don't need to check the
"software" vertex label. You limit(2) to exit the edge iteration as soon as possible but not before you have the 2 edges you are trying to count.

Finding out which Vertex and edges were traversed

I have the following Graph
If I write a Query g.V('A').Out(), how can I get that the values of the Edges which were traversed and the vertex which were encounterned in the traveral ?
You would need to tell Gremlin to not skip the edges. g.V().out() is shorthand for g.V().outE().inV(). In this case, you can interact with the value of the edges as you've explicitly told Gremlin to traverse them. I'll demonstrate with a few examples using the "modern" toy graph:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
To start, you might want to filter on a particular edge property and then traverse to adjacent vertices:
gremlin> g.V().outE().has('weight',gt(0.5)).inV()
==>v[4]
==>v[5]
You mentioned in your question that you might want to see the values of edges and the vertices you encountered. One way would be to use path():
gremlin> g.V().outE().has('weight',gt(0.5)).inV().path()
==>[v[1],e[8][1-knows->4],v[4]]
==>[v[4],e[10][4-created->5],v[5]]
You might also get more explicit to get specific properties from the edge:
gremlin> g.V().outE().has('weight',gt(0.5)).inV().path().by().by('weight')
==>[v[1],1.0,v[4]]
==>[v[4],1.0,v[5]]

Resources