how can i write the query on gremlin console to return the pair vertices these have the parallel edge? - console

I like to transform this cypher query to gremlin.
(n:Person)-[:friend]->(t:Person)-[:friend]->(n:Person)
Thanks

Using the air-routes data set, one way to do this is to use the cyclicPath step as follows.
gremlin> g.V('44').outE().inV().outE().inV().cyclicPath().path()
==>[v[44],e[5019][44-route->8],v[8],e[3975][8-route->44],v[44]]
==>[v[44],e[5020][44-route->13],v[13],e[4158][13-route->44],v[44]]
==>[v[44],e[5021][44-route->20],v[20],e[4387][20-route->44],v[44]]
gremlin> g.V('44').outE().inV().outE().inV().cyclicPath().path().by('code').by()
==>[SAF,e[5019][44-route->8],DFW,e[3975][8-route->44],SAF]
==>[SAF,e[5020][44-route->13],LAX,e[4158][13-route->44],SAF]
==>[SAF,e[5021][44-route->20],PHX,e[4387][20-route->44],SAF]
==>[SAF,e[5022][44-route->31],DEN,e[4736][31-route->44],SAF]
==>[v[44],e[5022][44-route->31],v[31],e[4736][31-route->44],v[44]]
Or if you just want the edge IDs
gremlin> g.V('44').outE().inV().outE().inV().cyclicPath().path().by('code').by(id)
==>[SAF,5019,DFW,3975,SAF]
==>[SAF,5020,LAX,4158,SAF]
==>[SAF,5021,PHX,4387,SAF]
==>[SAF,5022,DEN,4736,SAF]
Another way to write this query involves a where step
gremlin> g.V('44').as('a').outE().inV().outE().inV().where(eq('a')).path().by('code').by()
==>[SAF,e[5019][44-route->8],DFW,e[3975][8-route->44],SAF]
==>[SAF,e[5020][44-route->13],LAX,e[4158][13-route->44],SAF]
==>[SAF,e[5021][44-route->20],PHX,e[4387][20-route->44],SAF]
==>[SAF,e[5022][44-route->31],DEN,e[4736][31-route->44],SAF]

Related

Gremlin recursive graph traversal with parent and child relationship

I want to traverse a tree and aggregate the parent and its immediate children only. How would I do this using Gremlin and aggregate this into a structure list arrayOf({parent1,child},{child, child1}...}
In this case I want to output [{0,1}, {0,2}, {1,8} {1,6}, {2,7},{2,9}, {8,16},{8,14},{8,15},{7,17}}
The order isnt important. Also, note I want to avoid any circular edges which can exist on the same node only (no circular loop possible from a child vertex to a parent)
Each vertex has a label city and each edge has a label highway
g.V().hasLabel("city").toList().map(x->x.id()+x.edges(Direction.OUT,"highway").collect(Collectors.toList())
My query is timing out and I was wondering if there is a faster way to do this. I have abt 5000 vertices and two vertices are connected with only one edge.
You can get close to what you are looking for using the Gremlin tree step while also avoiding Groovy closures. Assuming the following setup:
gremlin> g = traversal().withGraph(TinkerGraph.open())
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
g.addV('0').as('0').
addV('1').as('1').
addV('2').as('2').
addV('6').as('6').
addV('7').as('7').
addV('8').as('8').
addV('9').as('9').
addV('14').as('14').
addV('15').as('15').
addV('16').as('16').
addV('17').as('17').
addE('route').from('0').to('1').
addE('route').from('0').to('2').
addE('route').from('1').to('6').
addE('route').from('1').to('8').
addE('route').from('2').to('2').
addE('route').from('2').to('9').
addE('route').from('2').to('7').
addE('route').from('7').to('17').
addE('route').from('8').to('14').
addE('route').from('8').to('15').
addE('route').from('8').to('16').iterate()
A query can be written to return the tree (minus cycles) as follows:
gremlin> g.V().hasLabel('0').
......1> repeat(out().simplePath()).
......2> until(__.not(out())).
......3> tree().
......4> by(label)
==>[0:[1:[6:[],8:[14:[],15:[],16:[]]],2:[7:[17:[]],9:[]]]]
An alternative approach, that also avoids using closures:
gremlin> g.V().local(union(label(),out().simplePath().label()).fold())
==>[17]
==>[0,1,2]
==>[1,6,8]
==>[2,9,7]
==>[6]
==>[7,17]
==>[8,14,15,16]
==>[9]
==>[14]
==>[15]
==>[16]
Which can be further refined to avoid leaf only nodes using:
gremlin> g.V().local(union(label(),out().simplePath().label()).fold()).where(count(local).is(gt(1)))
==>[0,1,2]
==>[1,6,8]
==>[2,9,7]
==>[7,17]
==>[8,14,15,16]
In your code you can then create the final pairs or perhaps extend the Gremlin to break up the result even more. Hopefully these approaches will prove more efficient than falling back onto closures (which are not going to be very portable to other TinkerPop implementations that do not support in-line code).

Why is this gremlin query with select not returning any result but without select it works?

I have a linked list B -> C -> G
It's created in the TinkerPop Console with
graph = TinkerGraph.open()
g = traversal().withEmbedded(graph)
g.addV('TreeNodeEntity').property(single, 'Name', 'B').as('l1')
g.addV('TreeNodeEntity').property(single, 'Name', 'C').as('l1').addE('PreviousSiblingEntity').to(__.V().has('Name', 'B'))
g.addV('TreeNodeEntity').property(single, 'Name', 'G').as('l1').addE('PreviousSiblingEntity').to(__.V().has('Name', 'C'))
I try to get all siblings of B.
The following gremlin script returns C and G like I expected
g.V().
has('Name', 'B').
as('l1').
repeat(__.
bothE('PreviousSiblingEntity').
otherV().
simplePath()
).
emit().
valueMap()
But the following script doesn't give me any value.
g.V().
has('Name', 'B').
as('l1').
select('l1').
repeat(__.
bothE('PreviousSiblingEntity').
otherV().
simplePath()
).
emit().
valueMap()
Background: I want to do a .inE('ParentEntity').otherV().as('l2') between as('l1') and select('l1').
Can you give me a hint why the second script doesn't give me a result?
There is a difference between a Gremlin Path (i.e. TinkerPop's Path object) and the path in the graph that one traverses. I think you're expecting simplePath() to work on the latter, when it is operating on the former.
The traverser is transformed as it passes through the steps of your traversal. The history of those transformations is the Path. You can see that history with the path() step:
gremlin> g.V().out().path()
==>[v[2],v[0]]
==>[v[5],v[2]]
The Path is more than just graph elements though and can contain other things:
gremlin> g.V().out().values('Name').path()
==>[v[2],v[0],B]
==>[v[5],v[2],C]
Given that definition, simplePath() looks for situation where an element in that Path is repeated and filter them away. By doing g.V().has('Name', 'B').as('l1').select('l1') you immediately create such repetition and therefore the Path is filtered away:
gremlin> g.V().has('Name', 'B').as('l1').select('l1').path()
==>[v[0],v[0]]
gremlin> g.V().has('Name', 'B').as('l1').select('l1').simplePath()
gremlin>

Use property of one part of graph traverse as filter for other

I want next:
Traverse part of graph
Take property from first traverse
Put it into other traversal as filter
Get filtered value
When I run next in Gremlin console:
g = TinkerGraph.open().traversal()
g.addV('a').property(id, 1).property('b',2)
g.addV('a').property(id, 2).property('b',2).property('c',3)
g.V(2).properties().key().limit(1).as('q').select('q')
g.V(2).properties().key().limit(1).as('q').V(1).properties().key()
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().select('q')
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is('b'))
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is(select('q')))
I get:
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('a').property(id, 1).property('b',2)
==>v[1]
gremlin> g.addV('a').property(id, 2).property('b',2).property('c',3)
==>v[2]
gremlin> g.V(2).properties().key().limit(1).as('q').select('q')
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key()
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key().select('q')
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is('b'))
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is(select('q')))
gremlin>
So I can see that:
My first traverse path gets property of 'b'
Selecting by direct usage of literal 'b' works
Using projection to filter by 'b' does not works.
So question is - how to use value from one part of traversal as filter of other traversal in case described above?
My use case is that I have prototype vertex. I want to grapb all its properties(and may be values), and find all vertices which are similar to that prototype.
Other alternative is to store query inside property of prototype, read it and evaluate it to get vertices which are filtered by it.
I know that I can do application side join of strings, but I want to stay only in code less part of Gremlin to have proper provider portability.
UPDATE:
Example from official documentation:
gremlin> firstYear = g.V().hasLabel('person').
local(properties('location').values('startTime').min()).
max().next()
==>2004
gremlin> l = g.V().hasLabel('person').as('person').
properties('location').or(has('endTime',gt(firstYear)),hasNot('endTime')).as('location').
valueMap().as('times').
select('person','location','times').by('name').by(value).by().toList()
How can I use firstYear without having variables in console, but to reference that from query?
I see your question was answered on the Gremlin Users list. [1] Copying the answer here for others that may search for the same question.
What you're looking for is:
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(eq('q'))
See documentation for the Where Step to learn about the different usage patterns of where.
[1] https://groups.google.com/forum/#!topic/gremlin-users/f1NfwUw9ZVI

How do I make the following subquery in Gremlin?

For example, using the Tinkerpop's toy graph data (graph = TinkerFactory.createModern()), I want to do something like the following:
g.V().hasLabel('person').has('name', 'marko').project('a', 'b').by().by(...)
I want to use a property of the vertices from the first traversal and use that in the query inside second by().
Something like this pseudocode:
by(__.V().has(hasLabel('person').has('name', [property-from-first-traversal])))
This might be easier to do in separate queries, but I want to do it in one query - something like a Subquery in SQL.
You're probably looking for something like this:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.addV('person').property('name','marko')
==>v[13]
gremlin> g.V().has('person','name', 'marko').
project('a', 'b').
by().
by(__.as('x').V().hasLabel('person').where(eq('x')).by('name').count())
==>[a:v[1],b:2]
==>[a:v[13],b:2]
However, be careful with where() filters, thus far no provider (that I am aware of) will turn this into an index lookup, hence it will be a scan over all person vertices in your graph.

Number of nodes/edges in a large graph via Gremlin?

What is the easiest & most efficient way to count the number of nodes/edges in a large graph via Gremlin? The best I have found is using the V iterator:
gremlin> g.V.gather{it.size()}
However, this is not a viable option for large graphs, per the documentation for V:
The vertex iterator for the graph. Utilize this to iterate through all
the vertices in the graph. Use with care on large graphs unless used
in combination with a key index lookup.
I think the preferred way to do a count of all vertices would be:
gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.V.count()
==>6
gremlin> g.E.count()
==>6
though, I think that on a very large graph g.V/E just breaks down no matter what you do. On a very large graph the best option for doing a count is to use a tool like Faunus(http://thinkaurelius.github.io/faunus/) so that you can leverage the power of Hadoop to do the counts in parallel.
UPDATE: The original answer above was for TinkerPop 2.x. For TinkerPop 3.x the answer is largely the same and implies use of Gremlin Spark or some provider specific tooling (like DSE GraphFrames for DataStax Graph) that is optimized to do those kinds of large scale traversals.
I tried the above, it didn't work for me. For some of you, this may work:
gremlin> g.V.count()
{"detailedMessage":"Query parsing failed at line 1, character position at 3, error message : no viable alternative at input 'g.V.'","code":"MalformedQueryException","requestId":"99f749db-c240-9834-aa12-e17bb21e598e"}
Type ':help' or ':h' for help.
Display stack trace? [yN]
gremlin> g.V().count()
==>37
gremlin> g.E().count()
==>45
gremlin>
Use g.V().count instead of g.V.count(). (For those where the other command errors out).
via python:
from gremlin_python.structure.graph import Graph
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
graph = Graph()
graph_db_uri = 'ws://localhost/gremlin'
g = graph.traversal().withRemote(DriverRemoteConnection(graph_db_uri,'g'))
count=g.V().hasLabel('node_label').count().next()
print("vertex count: ",count)
count=g.E().hasLabel('edge_label').count().next()
print("edge count: ",count)

Resources