When I make a query of a path e.g.:
g.V(1).inE().outV().inE().outV().inE().outV().path()
There are both vertices and edges in the path(), is there any way to count the number of vertices in the path only and ignore edges?
Gremlin is missing something important to make this really easy to do - it doesn't discern types very well for purposes of filtering, thus TINKERPOP-2234. I've altered your example a bit so that we could have something a little trickier to work with:
gremlin> g.V(1).repeat(outE().inV()).emit().path()
==>[v[1],e[9][1-created->3],v[3]]
==>[v[1],e[7][1-knows->2],v[2]]
==>[v[1],e[8][1-knows->4],v[4]]
==>[v[1],e[8][1-knows->4],v[4],e[10][4-created->5],v[5]]
==>[v[1],e[8][1-knows->4],v[4],e[11][4-created->3],v[3]]
With repeat() we get variable length Path instances so dynamic counting of the vertices is a bit trickier than the fixed example you have in your question where the pattern of the path is known and a count is easy to discern just from the Gremlin itself. So, with a dynamic number of vertices and without TINKERPOP-2234 you have to get creative. A typical strategy is to just filter away the edges by way of some label or property value that is unique to vertices:
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().hasLabel('person','software').fold())
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[1],v[4],v[5]]
==>[v[1],v[4],v[3]]
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().hasLabel('person','software').fold()).count(local)
==>2
==>2
==>2
==>3
==>3
Or perhaps use an property unique to all edges:
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().not(has('weight')).fold())
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[1],v[4],v[5]]
==>[v[1],v[4],v[3]]
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().not(has('weight')).fold()).count(local)
==>2
==>2
==>2
==>3
==>3
If you don't have these properties or labels in your schema that allows for this you could probably use your traversal pattern to come up with some math to figure it out. In my case, i know that my Path will always be (pathLength + 1) / 2 so:
gremlin> g.V(1).repeat(outE().inV()).emit().path().as('p').math('(p + 1) / 2').by(count(local))
==>2.0
==>2.0
==>2.0
==>3.0
==>3.0
Hopefully, one of those ways will inspire you to a solution.
+1 for typeOf predicate support in Gremlin (TINKERPOP-2234).
In addition to #stephan's answer, you can also mark and select only vertices:
g.V().repeat(outE().inV().as('v')).times(3).select(all,'v')
Also, if the graph provider support it, you can also use {it.class}:
g.V().repeat(outE().inV().as('v')).times(3).path()
.map(unfold().groupCount().by({it.class}))
Related
I want to traverse a tree and aggregate the parent and its immediate children only. How would I do this using Gremlin and aggregate this into a structure list arrayOf({parent1,child},{child, child1}...}
In this case I want to output [{0,1}, {0,2}, {1,8} {1,6}, {2,7},{2,9}, {8,16},{8,14},{8,15},{7,17}}
The order isnt important. Also, note I want to avoid any circular edges which can exist on the same node only (no circular loop possible from a child vertex to a parent)
Each vertex has a label city and each edge has a label highway
g.V().hasLabel("city").toList().map(x->x.id()+x.edges(Direction.OUT,"highway").collect(Collectors.toList())
My query is timing out and I was wondering if there is a faster way to do this. I have abt 5000 vertices and two vertices are connected with only one edge.
You can get close to what you are looking for using the Gremlin tree step while also avoiding Groovy closures. Assuming the following setup:
gremlin> g = traversal().withGraph(TinkerGraph.open())
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
g.addV('0').as('0').
addV('1').as('1').
addV('2').as('2').
addV('6').as('6').
addV('7').as('7').
addV('8').as('8').
addV('9').as('9').
addV('14').as('14').
addV('15').as('15').
addV('16').as('16').
addV('17').as('17').
addE('route').from('0').to('1').
addE('route').from('0').to('2').
addE('route').from('1').to('6').
addE('route').from('1').to('8').
addE('route').from('2').to('2').
addE('route').from('2').to('9').
addE('route').from('2').to('7').
addE('route').from('7').to('17').
addE('route').from('8').to('14').
addE('route').from('8').to('15').
addE('route').from('8').to('16').iterate()
A query can be written to return the tree (minus cycles) as follows:
gremlin> g.V().hasLabel('0').
......1> repeat(out().simplePath()).
......2> until(__.not(out())).
......3> tree().
......4> by(label)
==>[0:[1:[6:[],8:[14:[],15:[],16:[]]],2:[7:[17:[]],9:[]]]]
An alternative approach, that also avoids using closures:
gremlin> g.V().local(union(label(),out().simplePath().label()).fold())
==>[17]
==>[0,1,2]
==>[1,6,8]
==>[2,9,7]
==>[6]
==>[7,17]
==>[8,14,15,16]
==>[9]
==>[14]
==>[15]
==>[16]
Which can be further refined to avoid leaf only nodes using:
gremlin> g.V().local(union(label(),out().simplePath().label()).fold()).where(count(local).is(gt(1)))
==>[0,1,2]
==>[1,6,8]
==>[2,9,7]
==>[7,17]
==>[8,14,15,16]
In your code you can then create the final pairs or perhaps extend the Gremlin to break up the result even more. Hopefully these approaches will prove more efficient than falling back onto closures (which are not going to be very portable to other TinkerPop implementations that do not support in-line code).
I want next:
Traverse part of graph
Take property from first traverse
Put it into other traversal as filter
Get filtered value
When I run next in Gremlin console:
g = TinkerGraph.open().traversal()
g.addV('a').property(id, 1).property('b',2)
g.addV('a').property(id, 2).property('b',2).property('c',3)
g.V(2).properties().key().limit(1).as('q').select('q')
g.V(2).properties().key().limit(1).as('q').V(1).properties().key()
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().select('q')
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is('b'))
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is(select('q')))
I get:
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('a').property(id, 1).property('b',2)
==>v[1]
gremlin> g.addV('a').property(id, 2).property('b',2).property('c',3)
==>v[2]
gremlin> g.V(2).properties().key().limit(1).as('q').select('q')
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key()
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key().select('q')
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is('b'))
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is(select('q')))
gremlin>
So I can see that:
My first traverse path gets property of 'b'
Selecting by direct usage of literal 'b' works
Using projection to filter by 'b' does not works.
So question is - how to use value from one part of traversal as filter of other traversal in case described above?
My use case is that I have prototype vertex. I want to grapb all its properties(and may be values), and find all vertices which are similar to that prototype.
Other alternative is to store query inside property of prototype, read it and evaluate it to get vertices which are filtered by it.
I know that I can do application side join of strings, but I want to stay only in code less part of Gremlin to have proper provider portability.
UPDATE:
Example from official documentation:
gremlin> firstYear = g.V().hasLabel('person').
local(properties('location').values('startTime').min()).
max().next()
==>2004
gremlin> l = g.V().hasLabel('person').as('person').
properties('location').or(has('endTime',gt(firstYear)),hasNot('endTime')).as('location').
valueMap().as('times').
select('person','location','times').by('name').by(value).by().toList()
How can I use firstYear without having variables in console, but to reference that from query?
I see your question was answered on the Gremlin Users list. [1] Copying the answer here for others that may search for the same question.
What you're looking for is:
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(eq('q'))
See documentation for the Where Step to learn about the different usage patterns of where.
[1] https://groups.google.com/forum/#!topic/gremlin-users/f1NfwUw9ZVI
I'm trying to create edges between vertices based on matching the value of a property in each vertex, making what is currently an implied relationship into an explicit relationship. I've been unsuccessful in writing a gremlin traversal that will match up related vertices.
Specifically, given the following graph:
g = TinkerGraph.open().traversal()
g.addV('person').property('name','alice')
g.addV('person').property('name','bob').property('spouse','carol')
g.addV('person').property('name','carol')
g.addV('person').property('name','dave').property('spouse', 'alice')
I was hoping I could create a spouse_of relation using the following
> g.V().has('spouse').as('x')
.V().has('name', select('x').by('spouse'))
.addE('spouse_of').from('x')
but instead of creating one edge from bob to carol and another edge from dave to alice, bob and dave each end up with spouse_of edges to all of the vertices (including themselves):
> g.V().out('spouse_of').path().by('name')
==>[bob,alice]
==>[bob,bob]
==>[bob,carol]
==>[bob,dave]
==>[dave,carol]
==>[dave,dave]
==>[dave,alice]
==>[dave,bob]
It almost seems as if the has filter isn't being applied, or, to use RDBMS terms, as if I'm ending up with an "outer join" instead of the "inner join" I'd intended.
Any suggestions? Am I overlooking something trivial or profound (local vs global scope, perhaps)? Is there any way of accomplishing this in a single traversal query, or do I have to iterate through g.has('spouse') and create edges individually?
You can make this happen in a single traversal, but has() is not meant to work quite that way. The pattern for this is type of traversal is described in the Traversal Induced Values section of the Gremlin Recipes tutorial, but you can see it in action here:
gremlin> g.V().hasLabel('person').has('spouse').as('s').
......1> V().hasLabel('person').as('x').
......2> where('x', eq('s')).
......3> by('name').
......4> by('spouse').
......5> addE('spouse_of').from('s').to('x')
==>e[10][2-spouse_of->5]
==>e[11][7-spouse_of->0]
gremlin> g.E().project('x','y').by(outV().values('name')).by(inV().values('name'))
==>[x:bob,y:carol]
==>[x:dave,y:alice]
While this can be done in a single traversal note that depending on the size of your data this could be an expensive traversal as I'm not sure that either call to V() will be optimized by any graph. While it's neat to use this form, you may find that it's faster to take approaches that ensure that a use of an index is in place which might mean issuing multiple queries to solve the problem.
For example, using the Tinkerpop's toy graph data (graph = TinkerFactory.createModern()), I want to do something like the following:
g.V().hasLabel('person').has('name', 'marko').project('a', 'b').by().by(...)
I want to use a property of the vertices from the first traversal and use that in the query inside second by().
Something like this pseudocode:
by(__.V().has(hasLabel('person').has('name', [property-from-first-traversal])))
This might be easier to do in separate queries, but I want to do it in one query - something like a Subquery in SQL.
You're probably looking for something like this:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.addV('person').property('name','marko')
==>v[13]
gremlin> g.V().has('person','name', 'marko').
project('a', 'b').
by().
by(__.as('x').V().hasLabel('person').where(eq('x')).by('name').count())
==>[a:v[1],b:2]
==>[a:v[13],b:2]
However, be careful with where() filters, thus far no provider (that I am aware of) will turn this into an index lookup, hence it will be a scan over all person vertices in your graph.
total n00b here when it comes to Gremlin, but would someone mind telling me the difference between a "transform" and "sideEffect"? I read the Gremlin Docs and both seem to "at times" take inputs, massage the data, and produce outputs.
I thought I was on to something when it looked like "transforms" were mostly "getters", but then why would it be called "transform"? And then I saw that they put functions under transform that did much more than just get data.
Any help is appreciated, thanks in advance!
A transform behaves as a map function, where the value in the Gremlin Pipeline is literally tranformed to a different value:
gremlin> g.V.transform{it.name}
==>marko
==>vadas
==>lop
==>josh
==>ripple
==>peter
Note that in the example above, we've gone from a Vertex before the transform step to the string value of the name property on that Vertex. Now watch what happens with sideEffect:
gremlin> g.V.sideEffect{it.name}
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
The Vertex passes through sideEffect unchanged (i.e. it exits the sideEffect as a Vertex, just as it went in). The point of a sideEffect is that you can do something at that step of the pipeline that does "something" unrelated to the processing of the pipeline itself. In other words, the return value of the sideEffect closure is basically ignored.