How to limit the number of times a branch is traversed - gremlin

Starting with the toy graph I can find which vertexes are creators by looking for edges that have 'created' out edges:
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
graph.traversal().V().as('a').out('created').select('a').values('name')
==>marko
==>josh
==>josh
==>peter
I can filter out the duplicates with the dedup step...
gremlin> graph.traversal().V().as('a').out('created').select('a').dedup().values('name')
==>marko
==>josh
==>peter
...but this only alters the output, not the path followed by the Gremlin. If creators can be supernodes I'd like to tell the query to output 'a' once it finds its first 'created' edge and to then stop traversing the out step for the current 'a' and proceed to the next 'a'. Can this be done?
This syntax has the desired output. Do they behave like I intend?
graph.traversal().V().where(out('created').count().is(gt(0))).values('name')
graph.traversal().V().where(out('created').limit(1).count().is(gt(0))).values('name')
Is there a better recipe?
EDIT: I just found an example in the where doc (example 2) that shows the presence of a link being evaluated as truth (may not be wording this correctly):
graph.traversal().V().where(out('created')).values('name')
There's a warning about the star-graph problem, which I think doesn't apply here because, and I'm guessing, there is only one where step that tests a branch?

Your last example is the way to go.
g.V().where(out('created')).values('name')
Strategies will optimize that for you and turn it into:
g.V().where(outE('created')).values('name')
Also, .where(outE('created')) will not iterate through all the out-edges, it's just like a .hasNext(), hence no supernode problem.

Related

Gremlin - Showing the shortest (lowest cost) path in with meaningful information

I am trying to make Gremlin show me the shortest path (regarding the cost, not the number vertices traveled) with meaningful information. There is a similar example in [Gremlin's Recipes]http://tinkerpop.apache.org/docs/3.2.1-SNAPSHOT/recipes/#shortest-path about how one can get all the paths and their respective costs from one vertex to another, but I can not find a way to get Gremlin to display meaningful information like names or age of vertices and weight of edges. For example one can not know who v[1} is from the result below.
gremlin> g.V(1).repeat(outE().inV().simplePath()).until(hasId(5)).
path().as('p').
map(unfold().coalesce(values('weight'),
constant(0.0)).sum()).as('cost').
select('cost','p') //(4)
==>[cost:3.00, p:[v[1], e[0][1-knows->2], v[2], e[1][2-knows->4], v[4], e[2][4-knows->5], v[5]]]
==>[cost:2.00, p:[v[1], e[0][1-knows->2], v[2], e[3][2-knows->3], v[3], e[4][3-knows->4], v[4], e[2][4-knows->5], v[5]]]
I know Gremlin supports a by()-step modulator for such task as in here:
gremlin> g.V().out().out().path().by('name').by('age')
==>[marko,32,ripple]
==>[marko,32,lop]
, but I could not figure out how to combine the two solutions. Ideally the result I am looking for should be like this:
==>[duration:2, path:[Chicago, supertrain, New York]]
Any suggestions? Many Thanks in advance!
You can add by modulator after the path step, and change the values into select:
g.V().hasLabel('A').repeat(outE().inV().
simplePath()).
until(hasLabel('C')).path().
by(valueMap().with(WithOptions.tokens)).as('p').
map(unfold().
coalesce(
select('distance'),
constant(0.0)
).sum()).
as('cost').
select('cost', 'p')
example: https://gremlify.com/2wk6e3d03fe

TinkerPop gremlin count vertices only in a path()

When I make a query of a path e.g.:
g.V(1).inE().outV().inE().outV().inE().outV().path()
There are both vertices and edges in the path(), is there any way to count the number of vertices in the path only and ignore edges?
Gremlin is missing something important to make this really easy to do - it doesn't discern types very well for purposes of filtering, thus TINKERPOP-2234. I've altered your example a bit so that we could have something a little trickier to work with:
gremlin> g.V(1).repeat(outE().inV()).emit().path()
==>[v[1],e[9][1-created->3],v[3]]
==>[v[1],e[7][1-knows->2],v[2]]
==>[v[1],e[8][1-knows->4],v[4]]
==>[v[1],e[8][1-knows->4],v[4],e[10][4-created->5],v[5]]
==>[v[1],e[8][1-knows->4],v[4],e[11][4-created->3],v[3]]
With repeat() we get variable length Path instances so dynamic counting of the vertices is a bit trickier than the fixed example you have in your question where the pattern of the path is known and a count is easy to discern just from the Gremlin itself. So, with a dynamic number of vertices and without TINKERPOP-2234 you have to get creative. A typical strategy is to just filter away the edges by way of some label or property value that is unique to vertices:
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().hasLabel('person','software').fold())
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[1],v[4],v[5]]
==>[v[1],v[4],v[3]]
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().hasLabel('person','software').fold()).count(local)
==>2
==>2
==>2
==>3
==>3
Or perhaps use an property unique to all edges:
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().not(has('weight')).fold())
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[1],v[4],v[5]]
==>[v[1],v[4],v[3]]
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().not(has('weight')).fold()).count(local)
==>2
==>2
==>2
==>3
==>3
If you don't have these properties or labels in your schema that allows for this you could probably use your traversal pattern to come up with some math to figure it out. In my case, i know that my Path will always be (pathLength + 1) / 2 so:
gremlin> g.V(1).repeat(outE().inV()).emit().path().as('p').math('(p + 1) / 2').by(count(local))
==>2.0
==>2.0
==>2.0
==>3.0
==>3.0
Hopefully, one of those ways will inspire you to a solution.
+1 for typeOf predicate support in Gremlin (TINKERPOP-2234).
In addition to #stephan's answer, you can also mark and select only vertices:
g.V().repeat(outE().inV().as('v')).times(3).select(all,'v')
Also, if the graph provider support it, you can also use {it.class}:
g.V().repeat(outE().inV().as('v')).times(3).path()
.map(unfold().groupCount().by({it.class}))

Traverse implied edge through property match?

I'm trying to create edges between vertices based on matching the value of a property in each vertex, making what is currently an implied relationship into an explicit relationship. I've been unsuccessful in writing a gremlin traversal that will match up related vertices.
Specifically, given the following graph:
g = TinkerGraph.open().traversal()
g.addV('person').property('name','alice')
g.addV('person').property('name','bob').property('spouse','carol')
g.addV('person').property('name','carol')
g.addV('person').property('name','dave').property('spouse', 'alice')
I was hoping I could create a spouse_of relation using the following
> g.V().has('spouse').as('x')
.V().has('name', select('x').by('spouse'))
.addE('spouse_of').from('x')
but instead of creating one edge from bob to carol and another edge from dave to alice, bob and dave each end up with spouse_of edges to all of the vertices (including themselves):
> g.V().out('spouse_of').path().by('name')
==>[bob,alice]
==>[bob,bob]
==>[bob,carol]
==>[bob,dave]
==>[dave,carol]
==>[dave,dave]
==>[dave,alice]
==>[dave,bob]
It almost seems as if the has filter isn't being applied, or, to use RDBMS terms, as if I'm ending up with an "outer join" instead of the "inner join" I'd intended.
Any suggestions? Am I overlooking something trivial or profound (local vs global scope, perhaps)? Is there any way of accomplishing this in a single traversal query, or do I have to iterate through g.has('spouse') and create edges individually?
You can make this happen in a single traversal, but has() is not meant to work quite that way. The pattern for this is type of traversal is described in the Traversal Induced Values section of the Gremlin Recipes tutorial, but you can see it in action here:
gremlin> g.V().hasLabel('person').has('spouse').as('s').
......1> V().hasLabel('person').as('x').
......2> where('x', eq('s')).
......3> by('name').
......4> by('spouse').
......5> addE('spouse_of').from('s').to('x')
==>e[10][2-spouse_of->5]
==>e[11][7-spouse_of->0]
gremlin> g.E().project('x','y').by(outV().values('name')).by(inV().values('name'))
==>[x:bob,y:carol]
==>[x:dave,y:alice]
While this can be done in a single traversal note that depending on the size of your data this could be an expensive traversal as I'm not sure that either call to V() will be optimized by any graph. While it's neat to use this form, you may find that it's faster to take approaches that ensure that a use of an index is in place which might mean issuing multiple queries to solve the problem.

Gremlin What is the difference between transform and sideEffect

total n00b here when it comes to Gremlin, but would someone mind telling me the difference between a "transform" and "sideEffect"? I read the Gremlin Docs and both seem to "at times" take inputs, massage the data, and produce outputs.
I thought I was on to something when it looked like "transforms" were mostly "getters", but then why would it be called "transform"? And then I saw that they put functions under transform that did much more than just get data.
Any help is appreciated, thanks in advance!
A transform behaves as a map function, where the value in the Gremlin Pipeline is literally tranformed to a different value:
gremlin> g.V.transform{it.name}
==>marko
==>vadas
==>lop
==>josh
==>ripple
==>peter
Note that in the example above, we've gone from a Vertex before the transform step to the string value of the name property on that Vertex. Now watch what happens with sideEffect:
gremlin> g.V.sideEffect{it.name}
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
The Vertex passes through sideEffect unchanged (i.e. it exits the sideEffect as a Vertex, just as it went in). The point of a sideEffect is that you can do something at that step of the pipeline that does "something" unrelated to the processing of the pipeline itself. In other words, the return value of the sideEffect closure is basically ignored.

Use Gremlin to find the shortest path in a graph avoiding a given list of vertices?

I need to use Gremlin find the shortest path between two nodes (vertices) while avoiding a list of given vertices.
I already have:
v.bothE.bothV.loop(2){!it.object.equals(y)}.paths>>1
To get my shortest path.
I was attempting something like:
v.bothE.bothV.filter{it.name!="ignored"}.loop(3){!it.object.equals(y)}.paths>>1
but it does not seem to work.
Please HELP!!!
The second solution you have looks correct. However, to be clear on what you are trying to accomplish. If x and y are the vertices that you want to find the shortest path between and a vertex to ignore during the traversal if it has the property name:"ignored", then the query is:
x.both.filter{it.name!="ignored"}.loop(2){!it.object.equals(y)}.paths>>1
If the "list of given vertices" you want filtered is actually a list, then the traversal is described as such:
list = [ ... ] // construct some list
x.both.except(list).loop(2){!it.object.equals(y)}.paths>>1
Moreover, I tend to use a range filter just to be safe as this will go into an infinite loop if you forget the >>1 :)
x.both.except(list).loop(2){!it.object.equals(y)}[1].paths>>1
Also, if there is a potential for no path, then to avoid an infinitely long search, you can do a loop limit (e.g. no more than 4 steps):
x.both.except(list).loop(2){!it.object.equals(y) & it.loop < 5}.filter{it.object.equals(y)}.paths>>1
Note why the last filter step before paths is needed. There are two reasons the loop is broken out of. Thus, you might not be at y when you break out of the loop (instead, you broke out of the loop because it.loops < 5).
Here is you solution implemented over the Grateful Dead graph distributed with Gremlin. First some set up code, where we load the graph and define two vertices x and y:
gremlin> g = new TinkerGraph()
==>tinkergraph[vertices:0 edges:0]
gremlin> g.loadGraphML('data/graph-example-2.xml')
==>null
gremlin> x = g.v(89)
==>v[89]
gremlin> y = g.v(100)
==>v[100]
gremlin> x.name
==>DARK STAR
gremlin> y.name
==>BROWN EYED WOMEN
Now your traversal. Note that there is not name:"ignored" property, so instead, I altered it to account for the number of performances of each song along the path. Thus, shortest path of songs played more than 10 times in concert:
gremlin> x.both.filter{it.performances > 10}.loop(2){!it.object.equals(y)}.paths>>1
==>v[89]
==>v[26]
==>v[100]
If you use Gremlin 1.2+, then you can use a path closure to provide the names of those vertices (for example) instead of just the raw vertex objects:
gremlin> x.both.filter{it.performances > 10}.loop(2){!it.object.equals(y)}.paths{it.name}>>1
==>DARK STAR
==>PROMISED LAND
==>BROWN EYED WOMEN
I hope that helps.
Good luck!
Marko.

Resources