My graph contains some "Person" nodes that "ContributedTo" some "Conversations" nodes. I want to write a Gremlin query that will create "TalksWith" edges directly between "Person" nodes. That edge should contain a property "countConversations" that shows how many conversations both these persons contributed to.
Is this possible doing using one Gremlin query for all "Person" nodes at once?
Here's my graph setup (using Gremlin console):
g = TinkerGraph.open().traversal()
g.addV("Person").as("p1").
addV("Person").as("p2").
addV("Person").as("p3").
addV("Person").as("p4").
addV("Person").as("p5").
addV("Conversation").as("c1").
addV("Conversation").as("c2").
addV("Conversation").as("c3").
addE("ContributedTo").from("p1").to("c1").
addE("ContributedTo").from("p2").to("c1").
addE("ContributedTo").from("p3").to("c1").
addE("ContributedTo").from("p1").to("c2").
addE("ContributedTo").from("p2").to("c2").
addE("ContributedTo").from("p3").to("c2").
addE("ContributedTo").from("p4").to("c2").
addE("ContributedTo").from("p5").to("c2").
addE("ContributedTo").from("p1").to("c3").
addE("ContributedTo").from("p3").to("c2")
What I want doing is creating "TalkedWith" edges like this
addE("TalkedWith").from("p1").to("p2").property("countConversations",2)
I wrote a query to count how many conversations a specific person had with other persons
g.V(0L).out("ContributedTo").in("ContributedTo")
.hasId(without(0L)).groupCount().order(local).by(values,desc).next()
Now I want to run this calculation for each person and create "TalksWith" edges.
Here's one way to do it:
gremlin> g.V(0L).hasLabel('Person').
......1> store('p').
......2> out('ContributedTo').
......3> in('ContributedTo').
......4> where(without('p')).
......5> groupCount().
......6> unfold().
......7> addE('TalkedWith').from(select('p').unfold()).to(select(keys)).
......8> property('countConversations',select(values))
==>e[18][0-TalkedWith->1]
==>e[19][0-TalkedWith->2]
==>e[20][0-TalkedWith->3]
==>e[21][0-TalkedWith->4]
gremlin> g.E().hasLabel('TalkedWith').valueMap()
==>[countConversations:2]
==>[countConversations:3]
==>[countConversations:1]
==>[countConversations:1]
Given what you provided in your question as your progress in writing this traversal I'll assume you follow everything up to the groupCount() at line 5. At that point, we have a Map of the people v[0] talked to and the number of times they spoke. The next line deconstructs that Map into its component entries and iterates them creating an edge for each with addE(). The from vertex is gathered from "p" where it was original stored in a List and the to is extracted from the current key in the count map. The "countConversations" property then gets its value from current value of the count map.
Related
i what will be the best query to get heaviest path b/w 2 nodes in a directed graph in gremlin?
*I do have multiple paths, and sometime longest path is not the heaviest.
where each edge (not node) has an integer attribute (weight). weight range is 0<= weight <=12
thanks.
In general, the sack step can be used for such calculations. Using the air-routes data set the query below finds the longest 3-hop routes between two airports using the dist property on the edges to calculate the weights. Notice that I limit my query to only find a certain number of results and use loops to specify a maximum depth I am interested in searching. Without such constraints queries like this can run for a very long time in a highly connected graph.
gremlin> g.withSack(0).
......1> V('3').
......2> repeat(outE().sack(sum).by('dist').inV().simplePath()).
......3> until(has('code','AGR').or().loops().is(4)).
......4> has('code','AGR').
......5> limit(5).
......6> order().
......7> by(sack(),desc).
......8> local(union(path().by('code').by('dist'),sack()).fold())
==>[[AUS,4901,LHR,4479,BOM,644,AGR],10024]
==>[[AUS,5294,FRA,4080,BOM,644,AGR],10018]
==>[[AUS,1520,JFK,7782,BOM,644,AGR],9946]
==>[[AUS,1500,EWR,7790,BOM,644,AGR],9934]
==>[[AUS,1357,YYZ,7758,BOM,644,AGR],9759]
I am trying to write a query that retrieves all paths that are reachable from a specified vertex. In other words I am trying to retrieve the entire cluster/sub-graph that the vertex is connected to. A couple more constraints on the query are:
inward edges should be traversed and included in the result (I am looking for all paths that are in any way connected to the root vertex.
the search must stop at a specified depth of, say, 10 hops from the
root vertex.
Bonus constraint: I would prefer the result not to include paths which are complete sub-paths of other paths returned in the result.
I currently have the following two queries which appear to work expected on small, toy graphs I have tested them on. However, there seem to be some edge cases in our large, production graph that does not return all the paths/edges/vertices I would expect it to, but I cannot explain as to why this happens. The two queries also sometimes return some different vertices than each other.
I would prefer a fresh view on how to approach this query, rather than trying to adjust what I currently have, so please try to provide a solution before looking at my current solution below.
Query 1:
g.V(uid).repeat(bothE().bothV().simplePath()).until(loops().is_(10)).emit().dedup().path().by(valueMap(True))
Query 2:
g.V(uid).repeat(bothE().bothV().simplePath()).until(bothE().simplePath().count().is_(0).or_().loops().is_(10)).dedup().path().by(valueMap(True))
Using this simple binary tree as a test graph
g.addV('root').property('data',9).as('root').
addV('node').property('data',5).as('b').
addV('node').property('data',2).as('c').
addV('node').property('data',11).as('d').
addV('node').property('data',15).as('e').
addV('node').property('data',10).as('f').
addV('node').property('data',1).as('g').
addV('node').property('data',8).as('h').
addV('node').property('data',22).as('i').
addV('node').property('data',16).as('j').
addV('node').property('data',7).as('k').
addV('node').property('data',51).as('l').
addV('node').property('data',13).as('m').
addV('node').property('data',4).as('n').
addE('left').from('root').to('b').
addE('left').from('b').to('c').
addE('right').from('root').to('d').
addE('right').from('d').to('e').
addE('right').from('e').to('i').
addE('left').from('i').to('j').
addE('left').from('d').to('f').
addE('right').from('b').to('h').
addE('left').from('h').to('k').
addE('right').from('i').to('l').
addE('left').from('e').to('m').
addE('right').from('c').to('n').
addE('left').from('c').to('g').iterate()
We could find all the paths using
gremlin> g.V().hasLabel('root').
......1> repeat(bothE().otherV().simplePath()).
......2> until(__.not(bothE().simplePath())).
......3> path().
......4> by('data').
......5> by(label)
==>[9,right,11,left,10]
==>[9,left,5,left,2,left,1]
==>[9,left,5,left,2,right,4]
==>[9,left,5,right,8,left,7]
==>[9,right,11,right,15,left,13]
==>[9,right,11,right,15,right,22,left,16]
==>[9,right,11,right,15,right,22,right,51]
Note that I used bothE().otherV() as you said in your case you may have some incoming edges as well as outgoing ones.
We could also use the subgraph step to return the whole sub graph containing both vertices and edges. This example finds the subtree that starts at the vertex for the value 5.
gremlin> g.V().has('data',5).
......1> repeat(bothE().subgraph('sg').otherV().simplePath()).
......2> until(__.not(bothE().simplePath())).
......3> cap('sg')
==>tinkergraph[vertices:14 edges:13]
Note that both of these approaches assumes that all paths end at leaf nodes. I left out the loops() test but you can add that in as needed.
I'm traversing a graph and need to obtain all leaf nodes sorted by an order property present on the edges.
I was only able to get the leaf vertices ordered by the last edge with the following query:
g.V('n1').repeat(out()).emit().order().by(inE().values('order'))
Graph Example
In this example I wish I could get something like [n3, n4, n5, n3] or a map with the accumulated order for every vertice ([n3: 1.1, n3: 3, n4: 1.2, n5: 2])
Your picture was nice but when asking questions about Gremlin it is best to add a small sample data script like this one:
g.addV().property(id,'n1').as('n1').
addV().property(id,'n2').as('n2').
addV().property(id,'n3').as('n3').
addV().property(id,'n4').as('n4').
addV().property(id,'n5').as('n5').
addV().property(id,'n6').as('n6').
addE('next').from('n1').to('n3').property('order',3).
addE('next').from('n1').to('n2').property('order',1).
addE('next').from('n1').to('n6').property('order',2).
addE('next').from('n2').to('n3').property('order',1).
addE('next').from('n2').to('n4').property('order',2).
addE('next').from('n2').to('n5').property('order',3)
I opted to provide an answer that gives a "map with the accumulated order for ever vertex" - well, not quite a "map" because the "n3" key would not be able to be used twice. It returns pairs where the first item is the leaf vertex and the second item is a list of the "order" values traversed along the way:
gremlin> g.V('n1').
......1> repeat(outE().inV()).
......2> emit(__.not(outE())).
......3> path().
......4> map(union(tail(local,1),
......5> unfold().values('order').fold()).
......6> fold())
==>[v[n3],[3]]
==>[v[n6],[2]]
==>[v[n3],[1,1]]
==>[v[n4],[1,2]]
==>[v[n5],[1,3]]
You can see I provided a condition to emit() to only produce the leaf vertices. I then transform the path() traversed using a map() which grabs the leaf vertex with tail(local,1)) for the first item in the pair and then unfolds the Path for "order" values for the second item in the pair.
Essentially, I'm trying to modify the following piece of Gremlin code such that instead of operating on a single vertex at a time - signified by g.V(1), it will work with multiple vertices at once (e.g. changing to g.V()), while still only limiting the number of returned results per vertex to one (see limit(1)).
g.V(1).repeat(out().simplePath()).until(has('color', 'red')).path().limit(1)
The above query will compute the shortest path from a given vertex to the closest vertex which has property(color)==red.
However, I want to compute the shortest path for multiple vertices passed in at the same time, while still only returning a single path per vertex.
I'm having difficulty modifying this without returning multiple paths for the same vertex however.
Deduplicating the result by the start vertex should give you the expected result.
g.V().as('a').
repeat(out().simplePath()).
until(has('color', 'red')).
dedup('a').
path()
Example using the modern toy graph:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().
......1> repeat(out().simplePath()).
......2> until(hasLabel('software')).
......3> path()
==>[v[1],v[3]]
==>[v[1],v[4],v[5]]
==>[v[1],v[4],v[3]]
==>[v[4],v[5]]
==>[v[4],v[3]]
==>[v[6],v[3]]
gremlin> g.V().as('a').
......1> repeat(out().simplePath()).
......2> until(hasLabel('software')).
......3> dedup('a').path()
==>[v[1],v[3]]
==>[v[4],v[5]]
==>[v[6],v[3]]
So let's say I have data like so.
(id:1)->(id:2)->(id:3)->(id:4)->(id:5)->(id:6)->(id:9)->(id:10)
(id:5)->(id:7)->(id:8)->(id:6)
To be clear, Id 5 is the same node with 2 edges.
Here is a code sample:
g.addV('person').property('id',1).as('1').
addV('person').property('id',2).as('2').
addV('person').property('id',3).as('3').
addV('person').property('id',4).as('4').
addV('person').property('id',5).as('5').
addV('person').property('id',6).as('6').
addV('person').property('id',7).as('7').
addV('person').property('id',8).as('8').
addV('person').property('id',9).as('9').
addV('person').property('id',10).as('10').
addE('connection').from('1').to('2').
addE('connection').from('2').to('3').
addE('connection').from('3').to('4').
addE('connection').from('4').to('5').
addE('connection').from('5').to('6').
addE('connection').from('6').to('9').
addE('connection').from('9').to('10').
addE('connection').from('5').to('7').
addE('connection').from('7').to('8').
addE('connection').from('8').to('6').iterate()
I need to traverse the graph, and exclude any node where 6 has a connection, in any direction, to 5. So I would get back:
(id:1)->(id:2)->(id:3)->(id:4)->(id:5)->(id:6)
(id:1)->(id:2)->(id:3)->(id:4)->(id:5)->(id:7)->(id:8)->(id:6)
I'm not sure that I completely understand your question, but it sounds like you want to stop traversing as soon as you see a pattern where you encounter vertex 6 and vertex 6 has an edge to vertex 5 - if so then here's one way to do this:
gremlin> g.V().has('id',1).
......1> repeat(both().simplePath()).
......2> until(and(has('id',6),
......3> both().has('id',5))).
......4> path().by('id')
==>[1,2,3,4,5,6]
==>[1,2,3,4,5,7,8,6]
Note that I don't exactly match the output you described in your answer as the traversal starts from vertex 1, so the path taken will include that portion of the path in both cases.