Filter nodes by source Node gremlin - gremlin

So let's say I have data like so.
(id:1)->(id:2)->(id:3)->(id:4)->(id:5)->(id:6)->(id:9)->(id:10)
(id:5)->(id:7)->(id:8)->(id:6)
To be clear, Id 5 is the same node with 2 edges.
Here is a code sample:
g.addV('person').property('id',1).as('1').
addV('person').property('id',2).as('2').
addV('person').property('id',3).as('3').
addV('person').property('id',4).as('4').
addV('person').property('id',5).as('5').
addV('person').property('id',6).as('6').
addV('person').property('id',7).as('7').
addV('person').property('id',8).as('8').
addV('person').property('id',9).as('9').
addV('person').property('id',10).as('10').
addE('connection').from('1').to('2').
addE('connection').from('2').to('3').
addE('connection').from('3').to('4').
addE('connection').from('4').to('5').
addE('connection').from('5').to('6').
addE('connection').from('6').to('9').
addE('connection').from('9').to('10').
addE('connection').from('5').to('7').
addE('connection').from('7').to('8').
addE('connection').from('8').to('6').iterate()
I need to traverse the graph, and exclude any node where 6 has a connection, in any direction, to 5. So I would get back:
(id:1)->(id:2)->(id:3)->(id:4)->(id:5)->(id:6)
(id:1)->(id:2)->(id:3)->(id:4)->(id:5)->(id:7)->(id:8)->(id:6)

I'm not sure that I completely understand your question, but it sounds like you want to stop traversing as soon as you see a pattern where you encounter vertex 6 and vertex 6 has an edge to vertex 5 - if so then here's one way to do this:
gremlin> g.V().has('id',1).
......1> repeat(both().simplePath()).
......2> until(and(has('id',6),
......3> both().has('id',5))).
......4> path().by('id')
==>[1,2,3,4,5,6]
==>[1,2,3,4,5,7,8,6]
Note that I don't exactly match the output you described in your answer as the traversal starts from vertex 1, so the path taken will include that portion of the path in both cases.

Related

Tinkerpop Gremlin Get Edges that go to vertices within a list

I'm trying to query for edges that go to vertices within an aggregated list. It sounds quite simple, and it should be, but I seem to be writing my queries wrong, and I just can't figure out why. Anyway, I'll use the Modern Toy Graph to make an example, that won't necessarily make much sense in this context, but still illustrates what I wish to do:
graph = TinkerFactory.createModern()
g = graph.traversal()
g.V().
hasLabel('person').
aggregate('x').
outE().
where(inV().is(within('x')))
What I'm doing is traversing to all 'person' vertices, aggregating them, then trying to get all the outgoing edges that lead to another vertex within that aggregated list. I expect the above query to return the edge labelled "knows" that goes between vertex 1 and 2, and the one between 1 and 4, however nothing is returned. If i simple want to get the vertices on the other end of those edges, rather than the edges themselves, the following works fine, returning vertex 2 and 4:
g.V().
hasLabel('person').
aggregate('x').
out().
where(within('x'))
So how can I get edges that lead to vertices already aggregated in a list?
(Once again, I'm aware this example doesn't make much sense within this particular graph, and I could easily query outE('knows'), but this query is relevant to a different graph.)
Thanks.
You can't use is() quite that way. An easy fix would be to just combine your "working" traversal with the one that doesn't:
gremlin> g.V().hasLabel('person').
......1> aggregate('x').
......2> outE().
......3> where(inV().where(within('x')))
==>e[7][1-knows->2]
==>e[8][1-knows->4]

Query to retrieve all paths traversable from a given vertex

I am trying to write a query that retrieves all paths that are reachable from a specified vertex. In other words I am trying to retrieve the entire cluster/sub-graph that the vertex is connected to. A couple more constraints on the query are:
inward edges should be traversed and included in the result (I am looking for all paths that are in any way connected to the root vertex.
the search must stop at a specified depth of, say, 10 hops from the
root vertex.
Bonus constraint: I would prefer the result not to include paths which are complete sub-paths of other paths returned in the result.
I currently have the following two queries which appear to work expected on small, toy graphs I have tested them on. However, there seem to be some edge cases in our large, production graph that does not return all the paths/edges/vertices I would expect it to, but I cannot explain as to why this happens. The two queries also sometimes return some different vertices than each other.
I would prefer a fresh view on how to approach this query, rather than trying to adjust what I currently have, so please try to provide a solution before looking at my current solution below.
Query 1:
g.V(uid).repeat(bothE().bothV().simplePath()).until(loops().is_(10)).emit().dedup().path().by(valueMap(True))
Query 2:
g.V(uid).repeat(bothE().bothV().simplePath()).until(bothE().simplePath().count().is_(0).or_().loops().is_(10)).dedup().path().by(valueMap(True))
Using this simple binary tree as a test graph
g.addV('root').property('data',9).as('root').
addV('node').property('data',5).as('b').
addV('node').property('data',2).as('c').
addV('node').property('data',11).as('d').
addV('node').property('data',15).as('e').
addV('node').property('data',10).as('f').
addV('node').property('data',1).as('g').
addV('node').property('data',8).as('h').
addV('node').property('data',22).as('i').
addV('node').property('data',16).as('j').
addV('node').property('data',7).as('k').
addV('node').property('data',51).as('l').
addV('node').property('data',13).as('m').
addV('node').property('data',4).as('n').
addE('left').from('root').to('b').
addE('left').from('b').to('c').
addE('right').from('root').to('d').
addE('right').from('d').to('e').
addE('right').from('e').to('i').
addE('left').from('i').to('j').
addE('left').from('d').to('f').
addE('right').from('b').to('h').
addE('left').from('h').to('k').
addE('right').from('i').to('l').
addE('left').from('e').to('m').
addE('right').from('c').to('n').
addE('left').from('c').to('g').iterate()
We could find all the paths using
gremlin> g.V().hasLabel('root').
......1> repeat(bothE().otherV().simplePath()).
......2> until(__.not(bothE().simplePath())).
......3> path().
......4> by('data').
......5> by(label)
==>[9,right,11,left,10]
==>[9,left,5,left,2,left,1]
==>[9,left,5,left,2,right,4]
==>[9,left,5,right,8,left,7]
==>[9,right,11,right,15,left,13]
==>[9,right,11,right,15,right,22,left,16]
==>[9,right,11,right,15,right,22,right,51]
Note that I used bothE().otherV() as you said in your case you may have some incoming edges as well as outgoing ones.
We could also use the subgraph step to return the whole sub graph containing both vertices and edges. This example finds the subtree that starts at the vertex for the value 5.
gremlin> g.V().has('data',5).
......1> repeat(bothE().subgraph('sg').otherV().simplePath()).
......2> until(__.not(bothE().simplePath())).
......3> cap('sg')
==>tinkergraph[vertices:14 edges:13]
Note that both of these approaches assumes that all paths end at leaf nodes. I left out the loops() test but you can add that in as needed.

Gremlin: Get leaf vertexes sorted by cumulative edges property obtained on traversal

I'm traversing a graph and need to obtain all leaf nodes sorted by an order property present on the edges.
I was only able to get the leaf vertices ordered by the last edge with the following query:
g.V('n1').repeat(out()).emit().order().by(inE().values('order'))
Graph Example
In this example I wish I could get something like [n3, n4, n5, n3] or a map with the accumulated order for every vertice ([n3: 1.1, n3: 3, n4: 1.2, n5: 2])
Your picture was nice but when asking questions about Gremlin it is best to add a small sample data script like this one:
g.addV().property(id,'n1').as('n1').
addV().property(id,'n2').as('n2').
addV().property(id,'n3').as('n3').
addV().property(id,'n4').as('n4').
addV().property(id,'n5').as('n5').
addV().property(id,'n6').as('n6').
addE('next').from('n1').to('n3').property('order',3).
addE('next').from('n1').to('n2').property('order',1).
addE('next').from('n1').to('n6').property('order',2).
addE('next').from('n2').to('n3').property('order',1).
addE('next').from('n2').to('n4').property('order',2).
addE('next').from('n2').to('n5').property('order',3)
I opted to provide an answer that gives a "map with the accumulated order for ever vertex" - well, not quite a "map" because the "n3" key would not be able to be used twice. It returns pairs where the first item is the leaf vertex and the second item is a list of the "order" values traversed along the way:
gremlin> g.V('n1').
......1> repeat(outE().inV()).
......2> emit(__.not(outE())).
......3> path().
......4> map(union(tail(local,1),
......5> unfold().values('order').fold()).
......6> fold())
==>[v[n3],[3]]
==>[v[n6],[2]]
==>[v[n3],[1,1]]
==>[v[n4],[1,2]]
==>[v[n5],[1,3]]
You can see I provided a condition to emit() to only produce the leaf vertices. I then transform the path() traversed using a map() which grabs the leaf vertex with tail(local,1)) for the first item in the pair and then unfolds the Path for "order" values for the second item in the pair.

Tinkerpop, "Multiple" Query

I am using this Query
g.E().as('ID').select('ID').properties().as('PROP').select('PROP','ID')
It searches for all Edges with Properties but ignores every Edge who has no property.
I want to know How can I improve this query to search for every Edge with and without properties and give all the Data of the Edge (for example ID,SourceVertic, TargetVertic, label and the propeties).
In general looking at all edges in a graph can be expensive if you have a lot of edges. It is in general not a good idea to do this on large graphs. I used a limit step in my example below which is one way to look at just some edges. However, that said, you can see all of the properties on an edge using valueMap. For example (from a graph I have that tracks soccer matches):
gremlin> g.E().valueMap().with(WithOptions.tokens).limit(5)
==>[id:400,label:played,date:12 Apr 2014,result:1-0]
==>[id:401,label:played,date:12 Apr 2014,result:1-0]
==>[id:402,label:played,date:12 Apr 2014,result:0-1]
==>[id:403,label:played,date:12 Apr 2014,result:1-0]
==>[id:404,label:played,date:12 Apr 2014,result:0-1]
EDITED to add:
If you want to include the adjacent vertices in the result and the graph database you are using supports Apache TinkerPop at the 3.4.4 level or higher you may be able to use the elementMap step. An example is shown below.
gremlin> g.E().limit(5).elementMap()
==>[id:34,label:member,IN:[id:1,label:EPL],OUT:[id:2,label:Team],years:22]
==>[id:35,label:member,IN:[id:1,label:EPL],OUT:[id:3,label:Team],years:22]
==>[id:36,label:member,IN:[id:1,label:EPL],OUT:[id:4,label:Team],years:22]
==>[id:37,label:member,IN:[id:1,label:EPL],OUT:[id:5,label:Team],years:22]
==>[id:38,label:member,IN:[id:1,label:EPL],OUT:[id:6,label:Team],years:22]
If the database you are using does not support elementMap you would need to do something like:
gremlin> g.E().limit(5).
......1> project('EDGE','IN','OUT').
......2> by(valueMap().with(WithOptions.tokens)).
......3> by(inV().union(id(),label()).fold()).
......4> by(outV().union(id(),label()).fold())
==>[EDGE:[id:34,label:member,years:22],IN:[1,EPL],OUT:[2,Team]]
==>[EDGE:[id:35,label:member,years:22],IN:[1,EPL],OUT:[3,Team]]
==>[EDGE:[id:36,label:member,years:22],IN:[1,EPL],OUT:[4,Team]]
==>[EDGE:[id:37,label:member,years:22],IN:[1,EPL],OUT:[5,Team]]
==>[EDGE:[id:38,label:member,years:22],IN:[1,EPL],OUT:[6,Team]]

Gremlin query: how to add multiple edges that contain aggregated information?

My graph contains some "Person" nodes that "ContributedTo" some "Conversations" nodes. I want to write a Gremlin query that will create "TalksWith" edges directly between "Person" nodes. That edge should contain a property "countConversations" that shows how many conversations both these persons contributed to.
Is this possible doing using one Gremlin query for all "Person" nodes at once?
Here's my graph setup (using Gremlin console):
g = TinkerGraph.open().traversal()
g.addV("Person").as("p1").
addV("Person").as("p2").
addV("Person").as("p3").
addV("Person").as("p4").
addV("Person").as("p5").
addV("Conversation").as("c1").
addV("Conversation").as("c2").
addV("Conversation").as("c3").
addE("ContributedTo").from("p1").to("c1").
addE("ContributedTo").from("p2").to("c1").
addE("ContributedTo").from("p3").to("c1").
addE("ContributedTo").from("p1").to("c2").
addE("ContributedTo").from("p2").to("c2").
addE("ContributedTo").from("p3").to("c2").
addE("ContributedTo").from("p4").to("c2").
addE("ContributedTo").from("p5").to("c2").
addE("ContributedTo").from("p1").to("c3").
addE("ContributedTo").from("p3").to("c2")
What I want doing is creating "TalkedWith" edges like this
addE("TalkedWith").from("p1").to("p2").property("countConversations",2)
I wrote a query to count how many conversations a specific person had with other persons
g.V(0L).out("ContributedTo").in("ContributedTo")
.hasId(without(0L)).groupCount().order(local).by(values,desc).next()
Now I want to run this calculation for each person and create "TalksWith" edges.
Here's one way to do it:
gremlin> g.V(0L).hasLabel('Person').
......1> store('p').
......2> out('ContributedTo').
......3> in('ContributedTo').
......4> where(without('p')).
......5> groupCount().
......6> unfold().
......7> addE('TalkedWith').from(select('p').unfold()).to(select(keys)).
......8> property('countConversations',select(values))
==>e[18][0-TalkedWith->1]
==>e[19][0-TalkedWith->2]
==>e[20][0-TalkedWith->3]
==>e[21][0-TalkedWith->4]
gremlin> g.E().hasLabel('TalkedWith').valueMap()
==>[countConversations:2]
==>[countConversations:3]
==>[countConversations:1]
==>[countConversations:1]
Given what you provided in your question as your progress in writing this traversal I'll assume you follow everything up to the groupCount() at line 5. At that point, we have a Map of the people v[0] talked to and the number of times they spoke. The next line deconstructs that Map into its component entries and iterates them creating an edge for each with addE(). The from vertex is gathered from "p" where it was original stored in a List and the to is extracted from the current key in the count map. The "countConversations" property then gets its value from current value of the count map.

Resources