Given this graph
g.addV("1").property(id,'a').as('a').
addV("2").property(id,'b').as('b').
addV("3").property(id,'c').as('c').
addE('related').from('a').to('b').
addE('related').from('a').to('c').
addE('related').from('b').to('c')
If a selection of 1 and 2 is done.
g.V().hasLabel("1", "2").has("id", within("a","b"))
And I want to get the "commonly" related node, just doing:
g.V().hasLabel("1", "2").has("id", within("a","b")).out()
Won't cut it because it will give me 2 and 3 (when only 3 was intended)
Furthermore, if 2 and 3 is not related 3 should not be shown as a result because not all nodes of the "selection" relates to 3.
Is there any good way of accomplishing this?
Here is an example that uses groupCount. Note that this assumes there are no parallel edges between any two adjacent vertices in the same direction.
gremlin> g.addV("1").property(id,'a').as('a').
......1> addV("2").property(id,'b').as('b').
......2> addV("3").property(id,'c').as('c').
......3> addE('related').from('a').to('b').
......4> addE('related').from('a').to('c').
......5> addE('related').from('b').to('c')
==>e[2][b-related->c]
gremlin> g.V().hasLabel('1','2').out()
==>v[b]
==>v[c]
==>v[c]
gremlin> g.V().hasLabel('1','2').out().groupCount()
==>[v[b]:1,v[c]:2]
gremlin> g.V().hasLabel('1','2').out().groupCount().unfold().where(select(values).is(2))
==>v[c]=2
gremlin> g.V().hasLabel('1','2').out().groupCount().unfold().where(select(values).is(2)).select(keys)
==>v[c]
Related
Problem
I am running a query which finds duplicate vertices by the name property. I would like to know the IDs for all the corresponding vertices.
At this time, only the ids in the where clause are returned.
Example graph
Here is a toy example graph. There are two vertices with the same name ex.
==>tinkergraph[vertices:0 edges:0]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('X').property('name', 'ex')
==>v[0]
gremlin> g.addV('Y').property('name', 'why')
==>v[2]
gremlin> g.addV('Y').property('name', 'ex')
==>v[4]
gremlin> g.V().elementMap()
==>[id:0,label:X,name:ex]
==>[id:2,label:Y,name:why]
==>[id:4,label:Y,name:ex]
Detecting duplicates
When I find the duplicates and get the elementMap(), the IDs are only for the vertex matched in the where clause.
gremlin> g.V().hasLabel('X').as('x').V().hasLabel('Y').as('y').where('x', P.eq('y')).by('name').elementMap()
==>[id:4,label:Y,name:ex]
Whereas I would like to see the id for both which would be id:0 and id:4
I would like something like:
==>[[id:0,label:X,name:ex], [id:4,label:Y,name:ex]]
You had actually got very close
gremlin> g.V().hasLabel('X').as('x').
......1> V().hasLabel('Y').as('y').
......2> where(eq('x')).by('name').
......3> select('x','y').
......4> by(valueMap().by(unfold()).
with(WithOptions.tokens)).
......5> select(values)
==>[[id:0,label:X,name:ex],[id:4,label:Y,name:ex]]
Sample data:
I have two vertices by names User , Points
First adding data for vertex User
g.addV('User').property('id',1).
addV('User').property('id',2).
addV('User').property('id',3).iterate()
Now adding Points vertices and connecting addingPoints Edge from User to Points
g.V().hasLabel('User').has('id',1).as('curUser1').
V().hasLabel('User').has('id',2).as('curUser2').
V().hasLabel('User').has('id',3).as('curUser3').
addV('Points').property('totalPoints',0).property('userPoints',0).
property('createDate',1560316666).property('name','user1').
addE('addingPoints').from('curUser1').
addV('Points').property('totalPoints',0).property('userPoints',0).
property('createDate',1560318666).property('name','user2').
addE('addingPoints').from('curUser2').
addV('Points').property('totalPoints',0).property('userPoints',0).
property('createDate',1560318657).property('name','user3').
addE('addingPoints').from('curUser3').iterate()
Now each User is having atleast one Points vertex.
Now I want to add 10 (or) 20 (or) 30 points randomly to totalPoints Property of user with id as 1
while adding the points, I have three cases:
1.If totalPoints are lt500 Then I just need to update the totalPoints property of Points vertex of user with id as 1.
2.If totalPoints are eq500 Then I should create new Points vertex and add points to totalPoints property of Points vertex of user with id as 1.
3.If totalPoints are 490 which is not eq500 but lt500. But now if I need to add 30 points to the totalPoints property
then I need to add 10 points to the old Points vertex of user with id as 1 and I should add remaining 20 points to new Points vertex of user with id as 1.
How can I achieve this.
Thank you.
Pick the user's Points vertex with the lowest totalPoints value.
Sum the totalPoints with the new number of points.
If the sum exceeds 500, set the totalPoints property value to 500 and add a new Points vertex with a totalPoints value of sum-500.
If the sum doesn't exceed 500, set it as the new totalPoints property value.
These 4 steps translated into a traversal:
g.withSack(points).
V().has('User','id',user).as('u').
out('addingPoints').
order().
by('totalPoints').
limit(1).
sack(sum).
by('totalPoints').
choose(sack().is(gt(maxPoints)),
sack(minus).
by(constant(maxPoints)).
property('totalPoints', maxPoints).
addV('Points').
sideEffect(addE('addingPoints').
from('u'))).
property('totalPoints', sack())
And a small console example (I initialized the first Points vertex with totalPoints=400 and the second Points vertex with totalPoints=480):
gremlin> showUserPoints = {
......1> g.V().as('u').out('addingPoints').
......2> group().
......3> by(select('u').by('id')).
......4> by('totalPoints').next()
......5> }
==>groovysh_evaluate$_run_closure1#7c2b58c0
gremlin> addPoints = { user, points, maxPoints = 500 ->
......1> g.withSack(points).
......2> V().has('User','id',user).as('u').
......3> out('addingPoints').
......4> order().
......5> by('totalPoints').
......6> limit(1).
......7> sack(sum).
......8> by('totalPoints').
......9> choose(sack().is(gt(maxPoints)),
.....10> sack(minus).
.....11> by(constant(maxPoints)).
.....12> property('totalPoints', maxPoints).
.....13> addV('Points').
.....14> sideEffect(addE('addingPoints').
.....15> from('u'))).
.....16> property('totalPoints', sack()).iterate()
.....17>
.....17> showUserPoints()
.....18> }
==>groovysh_evaluate$_run_closure1#31d6f3fe
gremlin> showUserPoints()
==>1=[400]
==>2=[480]
==>3=[0]
gremlin> addPoints(1, 10)
==>1=[410]
==>2=[480]
==>3=[0]
gremlin> addPoints(1, 90)
==>1=[500]
==>2=[480]
==>3=[0]
gremlin> addPoints(2, 30)
==>1=[500]
==>2=[500, 10]
==>3=[0]
gremlin> addPoints(2, 40)
==>1=[500]
==>2=[500, 50]
==>3=[0]
gremlin> addPoints(3, 100)
==>1=[500]
==>2=[500, 50]
==>3=[100]
I want to do a join between two vertex types using gremlin
select * from type1 inner join type2 in type2.id = type1.type2_id
The following works when using type1 and type2 as vertex labels:
g.V()
.hasLabel("type2").as("t2")
.inE("hasJoin")
.hasLabel("type1").as("t1")
.select("t1", "t2")
However, my graph does not use the vertex label to represent the type, but uses another vertex connected via the "hasType" edge instead.
g.V()//
.addV("instance1").as("instance1")//
.addV("instance2").as("instance2")//
.addV("type1").as("type1")//
.addV("type2").as("type2")//
.addE("hasType").from("instance1").to("type1")//
.addE("hasType").from("instance2").to("type2")//
.addE("hasJoin").from("instance1").to("instance2")//
.iterate();
I would need to do something like replacing
hasLabel("type2").as("t2")
with
hasLabel("type2").inE("hasType").outV().as("t2"):
which would result in
g.V()
.hasLabel("type2").inE("hasType").outV().as("t2")
.inE("hasJoin")
.hasLabel("type1").inE("hasType").outV().as("t1")
.select("t1", "t2")
This works for "t2", but not for "t1", as .inE("hasJoin").hasLabel("type1") is just wrong. What function do I need to use to join "t1" and "t2"?
All you need is a filter that checks the adjacent type vertex. Here's your sample graph (your script doesn't quite work):
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV("instance1").property("name","instance1").as("instance1").
......1> addV("instance2").property("name","instance2").as("instance2").
......2> addV("type1").as("type1").
......3> addV("type2").as("type2").
......4> addE("hasType").from("instance1").to("type1").
......5> addE("hasType").from("instance2").to("type2").
......6> addE("hasJoin").from("instance1").to("instance2").
......7> iterate()
And the query you're looking for should be something like this:
gremlin> g.V().hasLabel("type2").in("hasType").as("t2").
both("hasJoin").
filter(out("hasType").hasLabel("type1")).as("t1").
select("t1", "t2").
by("name")
==>[t1:instance1,t2:instance2]
I have a graph with reified relations, which hold useful information, but for visualization purpose I need to create a subgraph without these intermediary nodes.
Example :
[A:Person] <--AFFILIATE-- [B:Affiliation] --COMPANY--> [C:Org]
And I want to produce a subgraph like this :
[A:Person] --AFFILIATED_TO--> [C:Org]
Is there any simple way to get that with Gremlin ?
I think that your best option might be to use subgraph() step as you normally might to extract the edge-induced subgraph and then execute some Gremlin on that subgraph to introduce the visualization edges and remove the stuff you don't want.
I can demonstrate with the modern toy graph packaged with TinkerPop:
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> sg = g.V().outE('created').subgraph('sg').cap('sg').next() // subgraph creation
==>tinkergraph[vertices:5 edges:4]
gremlin> g = sg.traversal()
==>graphtraversalsource[tinkergraph[vertices:5 edges:4], standard]
gremlin> g.V().as('a'). // add special subgraph edge
......1> out('created').as('software').
......2> in('created').where(neq('a')).
......3> addE('co-developer').from('a').
......4> property('project',select('software').by('name'))
==>e[0][1-co-developer->4]
==>e[1][1-co-developer->6]
==>e[2][4-co-developer->1]
==>e[3][4-co-developer->6]
==>e[4][6-co-developer->1]
==>e[5][6-co-developer->4]
gremlin> g.V().hasLabel('software').drop() //remove junk from subgraph
gremlin> g.E()
==>e[0][1-co-developer->4]
==>e[1][1-co-developer->6]
==>e[2][4-co-developer->1]
==>e[3][4-co-developer->6]
==>e[4][6-co-developer->1]
==>e[5][6-co-developer->4]
gremlin> g.V().has('name','marko').outE('co-developer').valueMap(true)
==>[label:co-developer,project:lop,id:0]
==>[label:co-developer,project:lop,id:1]
In the example below, using gremlin syntax, I want to get vertex number 1, knowing attributes of vertices 3 and 4.
So verbally - who is connected by 'created' edge to vertex with attribute name=lop AND by 'knows' edge to vertex with attribute name=josh
I want to specify exactly the names of edges so v.out.name.filter{it.matches('lop|josh')} is not good, as it will take all out edges of 1.
This works fine:
gremlin> Gremlin.version()
==>3.2.4
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().has('name', 'lop').in('created').as('x').
......1> out('knows').has('name', 'josh').select('x')
==>v[1]
The syntax you used in your question looks more like TinkerPop 2, which is out of support. You should be using TinkerPop 3.
You could use match for this (TinkerPop 3.x):
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().
......1> match(
......2> __.as('a').out('created').has('name','lop'),
......3> __.as('a').out('knows').has('name','josh')).
......4> select('a')
==>v[1]
It reads quite similarly to your English version, though it does not start with an index look up like Jason Plurad's answer. I guess it could be inverted a bit to get the same answer:
gremlin> g.V().
......1> has('name','lop').
......2> match(
......3> __.as('a').in('created').as('b'),
......4> __.as('b').out('knows').has('name','josh')).
......5> select('b')
==>v[1]