Gremlin recursive graph traversal with parent and child relationship

Gremlin recursive graph traversal with parent and child relationship - gremlin

I want to traverse a tree and aggregate the parent and its immediate children only. How would I do this using Gremlin and aggregate this into a structure list arrayOf({parent1,child},{child, child1}...}
In this case I want to output [{0,1}, {0,2}, {1,8} {1,6}, {2,7},{2,9}, {8,16},{8,14},{8,15},{7,17}}
The order isnt important. Also, note I want to avoid any circular edges which can exist on the same node only (no circular loop possible from a child vertex to a parent)
Each vertex has a label city and each edge has a label highway
g.V().hasLabel("city").toList().map(x->x.id()+x.edges(Direction.OUT,"highway").collect(Collectors.toList())
My query is timing out and I was wondering if there is a faster way to do this. I have abt 5000 vertices and two vertices are connected with only one edge.

You can get close to what you are looking for using the Gremlin tree step while also avoiding Groovy closures. Assuming the following setup:
gremlin> g = traversal().withGraph(TinkerGraph.open())
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
g.addV('0').as('0').
addV('1').as('1').
addV('2').as('2').
addV('6').as('6').
addV('7').as('7').
addV('8').as('8').
addV('9').as('9').
addV('14').as('14').
addV('15').as('15').
addV('16').as('16').
addV('17').as('17').
addE('route').from('0').to('1').
addE('route').from('0').to('2').
addE('route').from('1').to('6').
addE('route').from('1').to('8').
addE('route').from('2').to('2').
addE('route').from('2').to('9').
addE('route').from('2').to('7').
addE('route').from('7').to('17').
addE('route').from('8').to('14').
addE('route').from('8').to('15').
addE('route').from('8').to('16').iterate()
A query can be written to return the tree (minus cycles) as follows:
gremlin> g.V().hasLabel('0').
......1> repeat(out().simplePath()).
......2> until(__.not(out())).
......3> tree().
......4> by(label)
==>[0:[1:[6:[],8:[14:[],15:[],16:[]]],2:[7:[17:[]],9:[]]]]
An alternative approach, that also avoids using closures:
gremlin> g.V().local(union(label(),out().simplePath().label()).fold())
==>[17]
==>[0,1,2]
==>[1,6,8]
==>[2,9,7]
==>[6]
==>[7,17]
==>[8,14,15,16]
==>[9]
==>[14]
==>[15]
==>[16]
Which can be further refined to avoid leaf only nodes using:
gremlin> g.V().local(union(label(),out().simplePath().label()).fold()).where(count(local).is(gt(1)))
==>[0,1,2]
==>[1,6,8]
==>[2,9,7]
==>[7,17]
==>[8,14,15,16]
In your code you can then create the final pairs or perhaps extend the Gremlin to break up the result even more. Hopefully these approaches will prove more efficient than falling back onto closures (which are not going to be very portable to other TinkerPop implementations that do not support in-line code).

Related

Gremlin- unable to select 2 variables together when using coalesce

I am using gremlin query language with Neptune gdb, and experienced a weird behavior when using select:
Let's say my graph is a single node
g.addV("test").property(id,"v1")
and I try this query:
g.V("v1").as("a")
.V().has("test","name","non-existing-name")
.fold().coalesce(unfold(),V("v1")).as("b")
.select("b")
The response is v[v1] as expected.
If I do the same with select("a") at the end:
g.V("v1").as("a")
.V().has("test","name","non-existing-name")
.fold().coalesce(unfold(),V("v1")).as("b")
.select("a")
I get the same result, again- as expected.
the weird behavior is when I try to use select("a","b") at the end:
g.V("v1").as("a")
.V().has("test","name","non-existing-name")
.fold().coalesce(unfold(),V("v1")).as("b")
.select("a","b")
For some reason I get an empty response. Any idea why?
(I did find out that replacing the first as with store works, but I don't understand why)

I don't quite get the same results as you do for that second traversal and I would not expect to. Here is what I would expect to see:
gremlin> g.addV("test").property(id,"v1")
==>v[v1]
gremlin> g.V("v1").as("a").
......1> V().has("test","name","non-existing-name").
......2> fold().coalesce(unfold(),V("v1")).as("b").
......3> select("b")
==>v[v1]
gremlin> g.V("v1").as("a").
......1> V().has("test","name","non-existing-name").
......2> fold().coalesce(unfold(),V("v1")).as("b").
......3> select("a")
gremlin> g.V("v1").as("a").
......1> V().has("test","name","non-existing-name").
......2> fold().coalesce(unfold(),V("v1")).as("b").
......3> select("a","b")
gremlin>
Note that the last two traversal do not return results.
When you fold() you lose the path history to "a" so the traversal can't select() that step label any more. In general, you can't reference back to step labels that are on the opposite side of a reducing barrier step (like fold()). Other examples of reducing barriers would be steps like sum(), max(), min(), etc - where you have a number of traversers that reduce to a single one.

Gremlin query with variable edge depth without using Union

I'm trying to put together a Gremlin query that returns results for 1 to n depth of a certain edge type - without having to resort to using multiple queries stitched together with .union().
I have some test data that simulates the structure of sales offices and people that work in them, including who manages which offices and which offices "roll up" under the jurisdiction of which higher level offices. The following screen shot (from Neo4j, actually) shows a subset of the graph that I'm going to reference.
The graph can be created with the following:
g.
addV('Office').as('O_111').property('code','111').
addV('Office').as('O_356').property('code','356').
addV('Office').as('O_279').property('code','279').
addV('Office').as('O_KC5').property('code','KC5').
addE('MERGES_INTO').from('O_356').to('O_111').
addE('MERGES_INTO').from('O_279').to('O_356').
addE('MERGES_INTO').from('O_KC5').to('O_279').
addV('Person').as('Bob').property('name','Bob').
addE('MANAGES').from('Bob').to('O_111').addE('WORKS_WITH').from('Bob').to('O_111').
addV('Person').as('Michael').property('name','Michael').addE('WORKS_WITH').from('Michael').to('O_111').
addV('Person').as('John').property('name','John').addE('WORKS_WITH').from('John').to('O_111').
addV('Person').as('Rich').property('name','Rich').addE('WORKS_WITH').from('Rich').to('O_111').
addV('Person').as('Matt').property('name','Matt').
addE('WORKS_WITH').from('Matt').to('O_279').addE('MANAGES').from('Matt').to('O_279').
addV('Person').as('Judy').property('name','Judy').addE('WORKS_WITH').from('Judy').to('O_279').
addV('Person').as('Joe').property('name','Joe'). addE('WORKS_WITH').from('Joe').to('O_279').
addV('Person').as('Ben').property('name','Ben').addE('WORKS_WITH').from('Ben').to('O_279').
addV('Person').as('Ron').property('name','Ron').addE('WORKS_WITH').from('Ron').to('O_KC5').
If I want to see which people (orange) that work with an office (pink) that Bob directly or indirectly manages (because, for example, offices KC5, 279, and 356 roll up to Bob's 111 office), I can use .union() and something like the following to get the proper results:
gremlin> g.V().has('Person','name','Bob').
......1> out('MANAGES').
......2> union(
......3> __.in('WORKS_WITH'),
......4> __.in('MERGES_INTO').in('WORKS_WITH'),
......5> __.in('MERGES_INTO').in('MERGES_INTO').in('WORKS_WITH'),
......6> __.in('MERGES_INTO').in('MERGES_INTO').in('MERGES_INTO').in('WORKS_WITH')
......7> ).
......8> values('name').fold()
==>[Bob, Michael, John, Rich, Matt, Judy, Joe, Ben, Ron]
That seems super verbose and awkward. Is that my only choice? Is there a better way that doesn't seem so redundant like .union()?
Coming from a Neo4j world, I'd just do something with a ranged depth of "0 or more" using *0.., like this:
MATCH (manager:Person {name:'Bob'})
OPTIONAL MATCH (manager)-[:MANAGES]->(:Office)<-[:MERGES_INTO*0..]-(:Office)<-[:WORKS_WITH]-(p:Person)
RETURN p
How do I achieve the same sort of thing in Gremlin? Even if I can't do open ended, but could do 1 to some arbitrary limit (say, 1 to 10), that would work. It probably wouldn't matter, but I will be using AWS Neptune for the actual Graph database.

When asking questions about Gremlin, a picture of your graph is nice, but a script that provides some sample data is even better - like this:
g.addV('person').property('name','michael').as('mi').
addV('person').property('name','john').as('jo').
addV('person').property('name','rich').as('ri').
addV('person').property('name','bob').as('bo').
addV('person').property('name','matt').as('ma').
addV('person').property('name','ron').as('ro').
addV('person').property('name','joe').as('joe').
addV('person').property('name','ben').as('be').
addV('person').property('name','judy').as('ju').
addV('office').property('name','111').as('111').
addV('office').property('name','356').as('356').
addV('office').property('name','279').as('279').
addV('office').property('name','kc5').as('kc5').
addE('mergesInto').from('kc5').to('279').
addE('mergesInto').from('279').to('356').
addE('mergesInto').from('356').to('111').
addE('worksWith').from('mi').to('111').
addE('worksWith').from('jo').to('111').
addE('worksWith').from('ri').to('111').
addE('worksWith').from('bo').to('111').
addE('manages').from('bo').to('111').
addE('worksWith').from('ma').to('279').
addE('manages').from('ma').to('279').
addE('worksWith').from('joe').to('279').
addE('worksWith').from('be').to('279').
addE('worksWith').from('ju').to('279').
addE('worksWith').from('ro').to('kc5').iterate()
Your instincts are correct where union() isn't quite right for what you want to do. I would prefer repeat():
gremlin> g.V().has('person','name','bob').
......1> out('manages').
......2> repeat(__.in('worksWith','mergesInto')).
......3> emit(hasLabel('person')).
......4> values('name')
==>bob
==>michael
==>john
==>rich
==>matt
==>joe
==>ben
==>judy
==>ron
In this way it traverses to arbitrary depth (though we tend to recommend setting some kind of sensible limit to avoid problems if you run into some unexpected cycle somewhere) and is much more succinct. Note the use of emit() which controls which types of vertices are returned from the repeat() - if you do not include that filter you will also return "office" vertices.

TinkerPop gremlin count vertices only in a path()

When I make a query of a path e.g.:
g.V(1).inE().outV().inE().outV().inE().outV().path()
There are both vertices and edges in the path(), is there any way to count the number of vertices in the path only and ignore edges?

Gremlin is missing something important to make this really easy to do - it doesn't discern types very well for purposes of filtering, thus TINKERPOP-2234. I've altered your example a bit so that we could have something a little trickier to work with:
gremlin> g.V(1).repeat(outE().inV()).emit().path()
==>[v[1],e[9][1-created->3],v[3]]
==>[v[1],e[7][1-knows->2],v[2]]
==>[v[1],e[8][1-knows->4],v[4]]
==>[v[1],e[8][1-knows->4],v[4],e[10][4-created->5],v[5]]
==>[v[1],e[8][1-knows->4],v[4],e[11][4-created->3],v[3]]
With repeat() we get variable length Path instances so dynamic counting of the vertices is a bit trickier than the fixed example you have in your question where the pattern of the path is known and a count is easy to discern just from the Gremlin itself. So, with a dynamic number of vertices and without TINKERPOP-2234 you have to get creative. A typical strategy is to just filter away the edges by way of some label or property value that is unique to vertices:
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().hasLabel('person','software').fold())
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[1],v[4],v[5]]
==>[v[1],v[4],v[3]]
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().hasLabel('person','software').fold()).count(local)
==>2
==>2
==>2
==>3
==>3
Or perhaps use an property unique to all edges:
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().not(has('weight')).fold())
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[1],v[4],v[5]]
==>[v[1],v[4],v[3]]
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().not(has('weight')).fold()).count(local)
==>2
==>2
==>2
==>3
==>3
If you don't have these properties or labels in your schema that allows for this you could probably use your traversal pattern to come up with some math to figure it out. In my case, i know that my Path will always be (pathLength + 1) / 2 so:
gremlin> g.V(1).repeat(outE().inV()).emit().path().as('p').math('(p + 1) / 2').by(count(local))
==>2.0
==>2.0
==>2.0
==>3.0
==>3.0
Hopefully, one of those ways will inspire you to a solution.

+1 for typeOf predicate support in Gremlin (TINKERPOP-2234).
In addition to #stephan's answer, you can also mark and select only vertices:
g.V().repeat(outE().inV().as('v')).times(3).select(all,'v')
Also, if the graph provider support it, you can also use {it.class}:
g.V().repeat(outE().inV().as('v')).times(3).path()
.map(unfold().groupCount().by({it.class}))

Gremlin Distance matrix with multiple edge properties of a path

I'm new to gremlin, please help me with a query for below graph data.
Gremlin sample graph
graph = TinkerGraph.open()
g = graph.traversal()
v1 = g.addV('4630').property('loc','B_1_1').next()
v2 = g.addV('4630').property('loc','C_1_1').next()
e1 = g.addE('sp').from(v1).to(v2).property('dist',1).property('anglein',90).property('angleout',45).next()
e2 = g.addE('sp').from(v2).to(v1).property('dist',2).property('anglein',190).property('angleout',145)
Expected result:
source destination dist angein angleout
B_1_1 C_1_1 1 90 145
C_1_1 B_1_1 2 190 145
Query that I'm trying is:
g.V().has('4630','loc',within('B_1_1','C_1_1')).
outE('sp').
inV().has('4630','loc',within('B_1_1','C_1_1')).
path().
by('loc').
by(valueMap().select(values)).
by('loc')
With below result
==>[B_1_1,[90,1,45],C_1_1]
==>[C_1_1,[190,2,145],B_1_1]
Want to have all the path edge properties in the result without any inner result. Please help how can I achieve the expected result?

It sounds like you just want to flatten your result.
gremlin> g.V().has('4630','loc',within('B_1_1','C_1_1')).
......1> outE('sp').
......2> inV().has('4630','loc',within('B_1_1','C_1_1')).
......3> path().
......4> by('loc').
......5> by(valueMap().select(values)).
......6> by('loc').
......7> map(unfold().unfold().fold())
==>[B_1_1,90,1,45,C_1_1]
==>[C_1_1,190,2,145,B_1_1]
Each path will need to be flattened so you want to apply that operation with map(). To flatten you need to first unfold() the path and then unfold() each item in the path. Since the map() operation will only next() that child traversal you need to include a final fold() to convert that flattened stream of objects back to a List.

Adding to what Stephen already said, you could also get rid of the by() modulation in your path step and instead use the path elements to collect all the values you need afterward. This will save you a few traversers and thus it should be slightly faster.
g.V().has('4630','loc',within('B_1_1','C_1_1')).
outE('sp').inV().has('4630','loc',within('B_1_1','C_1_1')).
path().
map(unfold().values('loc','dist','anglein','angleout').fold())
Also, note that even if you prefer the other query, you shouldn't use valueMap. valueMap().select(values) is just a waste of resources in my opinion.

Traverse implied edge through property match?

I'm trying to create edges between vertices based on matching the value of a property in each vertex, making what is currently an implied relationship into an explicit relationship. I've been unsuccessful in writing a gremlin traversal that will match up related vertices.
Specifically, given the following graph:
g = TinkerGraph.open().traversal()
g.addV('person').property('name','alice')
g.addV('person').property('name','bob').property('spouse','carol')
g.addV('person').property('name','carol')
g.addV('person').property('name','dave').property('spouse', 'alice')
I was hoping I could create a spouse_of relation using the following
> g.V().has('spouse').as('x')
.V().has('name', select('x').by('spouse'))
.addE('spouse_of').from('x')
but instead of creating one edge from bob to carol and another edge from dave to alice, bob and dave each end up with spouse_of edges to all of the vertices (including themselves):
> g.V().out('spouse_of').path().by('name')
==>[bob,alice]
==>[bob,bob]
==>[bob,carol]
==>[bob,dave]
==>[dave,carol]
==>[dave,dave]
==>[dave,alice]
==>[dave,bob]
It almost seems as if the has filter isn't being applied, or, to use RDBMS terms, as if I'm ending up with an "outer join" instead of the "inner join" I'd intended.
Any suggestions? Am I overlooking something trivial or profound (local vs global scope, perhaps)? Is there any way of accomplishing this in a single traversal query, or do I have to iterate through g.has('spouse') and create edges individually?

You can make this happen in a single traversal, but has() is not meant to work quite that way. The pattern for this is type of traversal is described in the Traversal Induced Values section of the Gremlin Recipes tutorial, but you can see it in action here:
gremlin> g.V().hasLabel('person').has('spouse').as('s').
......1> V().hasLabel('person').as('x').
......2> where('x', eq('s')).
......3> by('name').
......4> by('spouse').
......5> addE('spouse_of').from('s').to('x')
==>e[10][2-spouse_of->5]
==>e[11][7-spouse_of->0]
gremlin> g.E().project('x','y').by(outV().values('name')).by(inV().values('name'))
==>[x:bob,y:carol]
==>[x:dave,y:alice]
While this can be done in a single traversal note that depending on the size of your data this could be an expensive traversal as I'm not sure that either call to V() will be optimized by any graph. While it's neat to use this form, you may find that it's faster to take approaches that ensure that a use of an index is in place which might mean issuing multiple queries to solve the problem.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Gremlin recursive graph traversal with parent and child relationship - gremlin

Related

Gremlin- unable to select 2 variables together when using coalesce

Gremlin query with variable edge depth without using Union

TinkerPop gremlin count vertices only in a path()

Gremlin Distance matrix with multiple edge properties of a path

Traverse implied edge through property match?

Categories

Resources