I have the graph as shown in the figure below. The numbers represent the level format of the node which is required as output from the gremlin query along with the properties of the nodes.
The graph structure can change that is more nodes can be added to the graph. The level must be returned in a similar format for additional nodes.
For example, children of 1.1.1 should return the level as 1.1.1.1,1.1.1.2 ...
The following query but the level is in continuous format 1,2,3 ...
g.withSack(0).
V().
hasLabel('A').
has('label_A','A').
emit().
repeat(sack(sum).by(constant(1)).out()).
project('depth', 'properties').
by(sack()).
by(valueMap())
A is starting root node.
I know its too complicated. If not possible can we at least get the sub depth along with depth using multiple sack variables. Following is the example:
depth:0 sub-depth:0
depth:1 sub-depth:1.1
depth:1 sub-depth:1.2
depth:2 sub-depth:2.1
depth:2 sub-depth:2.2
depth:2 sub-depth:2.3
depth:2 sub-depth:2.4
A simple way to do what you are looking for is to take advantage of the index step. If we create a simple binary tree as follows:
g.addV('root').property('data',9).as('root').
addV('node').property('data',5).as('b').
addV('node').property('data',2).as('c').
addV('node').property('data',11).as('d').
addV('node').property('data',15).as('e').
addV('node').property('data',10).as('f').
addV('node').property('data',1).as('g').
addV('node').property('data',8).as('h').
addV('node').property('data',22).as('i').
addV('node').property('data',16).as('j').
addE('left').from('root').to('b').
addE('left').from('b').to('c').
addE('right').from('root').to('d').
addE('right').from('d').to('e').
addE('right').from('e').to('i').
addE('left').from('i').to('j').
addE('left').from('d').to('f').
addE('right').from('b').to('h').
addE('left').from('c').to('g').iterate()
We can combine loops and index as follows (I added the unfold to improve readability):
gremlin> g.V().hasLabel('root').
......1> emit().
......2> repeat(group('x').by(loops()).by(values('data').fold().index()).out()).
......3> cap('x').unfold()
==>0=[[9, 0]]
==>1=[[5, 0], [11, 1]]
==>2=[[2, 0], [8, 1], [10, 2], [15, 3]]
==>3=[[1, 0], [22, 1]]
==>4=[[16, 0]]
Given your comment about a simpler form being acceptable I think the above gets pretty close. You should be able to tweak this query to make any changes in the output formatting that you require.
You can go one step further and group using the parent vertex as follows. From this you can build whatever projections of the final results you require.
gremlin> g.V().hasLabel('root').
......1> repeat(outE().group('x').
......2> by(loops()).
......3> by(group().
......4> by(outV()).
......5> by(inV().values('data').fold().index())).
......6> inV()).
......7> times(4).
......8> cap('x').
......9> unfold()
==>0={v[0]=[[5, 0], [11, 1]]}
==>1={v[2]=[[2, 0], [8, 1]], v[6]=[[10, 0], [15, 1]]}
==>2={v[4]=[[1, 0]], v[8]=[[22, 0]]}
==>3={v[16]=[[16, 0]]}
Related
I'm trying to return the vertices and edges starting from a specific vertex by traversing a specific edge's label
the result i'm looking for is the vertices, edges that being traversed + all the edges of the leaf nodes
[example graph]
https://i.stack.imgur.com/5yQTW.png
Expected result:
ap1 contained p1
ap2 contained p1
p1 uses
f1 contained ap1
f2 contained f1
f3 contained f2
f3 uses
f4 contained ap2
f5 contained f4
f6 contained f5
f6 uses
Script that generates the graph
g.addV('project').property('name','p1').as('p1').addV('api').property('name','ap1').as('ap1').addV('api').property('name','ap2').as('ap2').addV('field').property('name','f1').as('f1').addV('field').property('name','f2').as('f2').addV('field').property('name','f3').as('f3').addV('field').property('name','f4').as('f4').addV('field').property('name','f5').as('f5') addV('field').property('name','f6').as('f6').addV('table').property('name','t1').as('t1').addV('column').property('name','c1').as('c1').addV('schema').property('name','s1').as('s1').addE('contained').from('f3').to('f2').addE('contained').from('f2').to('f1').addE('contained').from('f1').to('ap1').addE('contained').from('ap1').to('p1').addE('contained').from('f6').to('f5').addE('contained').from('f5').to('f4').addE('contained').from('f4').to('ap2').addE('contained').from('ap2').to('p1').addE('uses').from('f3').to('t1').addE('uses').from('f6').to('c1').addE('uses').from('p1').to('s1')
Seems like i'm getting a cyclic issues (orientdb stack overflow)
while using the following query:
g.V().
has("name", "p1").
repeat(
bothE().
or(hasLabel("contained"), hasLabel("uses")).
dedup().
bothV().
bothE()).
times(1).
emit().
path().
unfold().
dedup().
aggregate('r').
sideEffect(cap('r').unfold().hasLabel("api").aggregate('c0')).
sideEffect(
cap('c0').
unfold().
repeat(inE("contained").dedup().outV()).
until(inE("contained").count().is(0)).
emit().
path().
unfold().
dedup().
aggregate('c0')).
cap('c0').
unfold().
aggregate('r').
cap('r').
unfold().
dedup().
limit(1000)
The following query will make it
I changed the strategy to make the traversal be folded to a global list 'r'.
then concatenating the intermediate computations (c0,c1) to the list 'r'
g.V().
has("name", "p1").emit().
repeat(
bothE().or(hasLabel("contained"), hasLabel("uses")).dedup().otherV()).
times(1).
as('v').
aggregate('c0').
by(select(all, 'v').unfold().union(identity(), bothE()).fold()).
aggregate('c1').
by(
select(all, 'v').
unfold().
hasLabel("api").
repeat(inE("contained").dedup().outV()).
until(inE("contained").count().is(0)).
emit().
union(identity(), bothE()).
fold()).
fold().
store('r').by(cap('c0')).
store('r').by(cap('c1')).
cap('r').
unfold().
unfold().
unfold().
dedup().
limit(1000)
Given this graph
g.addV("1").property(id,'a').as('a').
addV("2").property(id,'b').as('b').
addV("3").property(id,'c').as('c').
addE('related').from('a').to('b').
addE('related').from('a').to('c').
addE('related').from('b').to('c')
If a selection of 1 and 2 is done.
g.V().hasLabel("1", "2").has("id", within("a","b"))
And I want to get the "commonly" related node, just doing:
g.V().hasLabel("1", "2").has("id", within("a","b")).out()
Won't cut it because it will give me 2 and 3 (when only 3 was intended)
Furthermore, if 2 and 3 is not related 3 should not be shown as a result because not all nodes of the "selection" relates to 3.
Is there any good way of accomplishing this?
Here is an example that uses groupCount. Note that this assumes there are no parallel edges between any two adjacent vertices in the same direction.
gremlin> g.addV("1").property(id,'a').as('a').
......1> addV("2").property(id,'b').as('b').
......2> addV("3").property(id,'c').as('c').
......3> addE('related').from('a').to('b').
......4> addE('related').from('a').to('c').
......5> addE('related').from('b').to('c')
==>e[2][b-related->c]
gremlin> g.V().hasLabel('1','2').out()
==>v[b]
==>v[c]
==>v[c]
gremlin> g.V().hasLabel('1','2').out().groupCount()
==>[v[b]:1,v[c]:2]
gremlin> g.V().hasLabel('1','2').out().groupCount().unfold().where(select(values).is(2))
==>v[c]=2
gremlin> g.V().hasLabel('1','2').out().groupCount().unfold().where(select(values).is(2)).select(keys)
==>v[c]
Sample data:
I have two vertices by names User , Points
First adding data for vertex User
g.addV('User').property('id',1).
addV('User').property('id',2).
addV('User').property('id',3).iterate()
Now adding Points vertices and connecting addingPoints Edge from User to Points
g.V().hasLabel('User').has('id',1).as('curUser1').
V().hasLabel('User').has('id',2).as('curUser2').
V().hasLabel('User').has('id',3).as('curUser3').
addV('Points').property('totalPoints',0).property('userPoints',0).
property('createDate',1560316666).property('name','user1').
addE('addingPoints').from('curUser1').
addV('Points').property('totalPoints',0).property('userPoints',0).
property('createDate',1560318666).property('name','user2').
addE('addingPoints').from('curUser2').
addV('Points').property('totalPoints',0).property('userPoints',0).
property('createDate',1560318657).property('name','user3').
addE('addingPoints').from('curUser3').iterate()
Now each User is having atleast one Points vertex.
Now I want to add 10 (or) 20 (or) 30 points randomly to totalPoints Property of user with id as 1
while adding the points, I have three cases:
1.If totalPoints are lt500 Then I just need to update the totalPoints property of Points vertex of user with id as 1.
2.If totalPoints are eq500 Then I should create new Points vertex and add points to totalPoints property of Points vertex of user with id as 1.
3.If totalPoints are 490 which is not eq500 but lt500. But now if I need to add 30 points to the totalPoints property
then I need to add 10 points to the old Points vertex of user with id as 1 and I should add remaining 20 points to new Points vertex of user with id as 1.
How can I achieve this.
Thank you.
Pick the user's Points vertex with the lowest totalPoints value.
Sum the totalPoints with the new number of points.
If the sum exceeds 500, set the totalPoints property value to 500 and add a new Points vertex with a totalPoints value of sum-500.
If the sum doesn't exceed 500, set it as the new totalPoints property value.
These 4 steps translated into a traversal:
g.withSack(points).
V().has('User','id',user).as('u').
out('addingPoints').
order().
by('totalPoints').
limit(1).
sack(sum).
by('totalPoints').
choose(sack().is(gt(maxPoints)),
sack(minus).
by(constant(maxPoints)).
property('totalPoints', maxPoints).
addV('Points').
sideEffect(addE('addingPoints').
from('u'))).
property('totalPoints', sack())
And a small console example (I initialized the first Points vertex with totalPoints=400 and the second Points vertex with totalPoints=480):
gremlin> showUserPoints = {
......1> g.V().as('u').out('addingPoints').
......2> group().
......3> by(select('u').by('id')).
......4> by('totalPoints').next()
......5> }
==>groovysh_evaluate$_run_closure1#7c2b58c0
gremlin> addPoints = { user, points, maxPoints = 500 ->
......1> g.withSack(points).
......2> V().has('User','id',user).as('u').
......3> out('addingPoints').
......4> order().
......5> by('totalPoints').
......6> limit(1).
......7> sack(sum).
......8> by('totalPoints').
......9> choose(sack().is(gt(maxPoints)),
.....10> sack(minus).
.....11> by(constant(maxPoints)).
.....12> property('totalPoints', maxPoints).
.....13> addV('Points').
.....14> sideEffect(addE('addingPoints').
.....15> from('u'))).
.....16> property('totalPoints', sack()).iterate()
.....17>
.....17> showUserPoints()
.....18> }
==>groovysh_evaluate$_run_closure1#31d6f3fe
gremlin> showUserPoints()
==>1=[400]
==>2=[480]
==>3=[0]
gremlin> addPoints(1, 10)
==>1=[410]
==>2=[480]
==>3=[0]
gremlin> addPoints(1, 90)
==>1=[500]
==>2=[480]
==>3=[0]
gremlin> addPoints(2, 30)
==>1=[500]
==>2=[500, 10]
==>3=[0]
gremlin> addPoints(2, 40)
==>1=[500]
==>2=[500, 50]
==>3=[0]
gremlin> addPoints(3, 100)
==>1=[500]
==>2=[500, 50]
==>3=[100]
I want to do a join between two vertex types using gremlin
select * from type1 inner join type2 in type2.id = type1.type2_id
The following works when using type1 and type2 as vertex labels:
g.V()
.hasLabel("type2").as("t2")
.inE("hasJoin")
.hasLabel("type1").as("t1")
.select("t1", "t2")
However, my graph does not use the vertex label to represent the type, but uses another vertex connected via the "hasType" edge instead.
g.V()//
.addV("instance1").as("instance1")//
.addV("instance2").as("instance2")//
.addV("type1").as("type1")//
.addV("type2").as("type2")//
.addE("hasType").from("instance1").to("type1")//
.addE("hasType").from("instance2").to("type2")//
.addE("hasJoin").from("instance1").to("instance2")//
.iterate();
I would need to do something like replacing
hasLabel("type2").as("t2")
with
hasLabel("type2").inE("hasType").outV().as("t2"):
which would result in
g.V()
.hasLabel("type2").inE("hasType").outV().as("t2")
.inE("hasJoin")
.hasLabel("type1").inE("hasType").outV().as("t1")
.select("t1", "t2")
This works for "t2", but not for "t1", as .inE("hasJoin").hasLabel("type1") is just wrong. What function do I need to use to join "t1" and "t2"?
All you need is a filter that checks the adjacent type vertex. Here's your sample graph (your script doesn't quite work):
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV("instance1").property("name","instance1").as("instance1").
......1> addV("instance2").property("name","instance2").as("instance2").
......2> addV("type1").as("type1").
......3> addV("type2").as("type2").
......4> addE("hasType").from("instance1").to("type1").
......5> addE("hasType").from("instance2").to("type2").
......6> addE("hasJoin").from("instance1").to("instance2").
......7> iterate()
And the query you're looking for should be something like this:
gremlin> g.V().hasLabel("type2").in("hasType").as("t2").
both("hasJoin").
filter(out("hasType").hasLabel("type1")).as("t1").
select("t1", "t2").
by("name")
==>[t1:instance1,t2:instance2]
i had created titan graph of hierarchical tree in Java.
how to find total child nodes hierarchy from a specified node with gremlin.
suggest me gremlin query for counting and it should be faster.
The basic pattern for traversing a tree lies in repeat() step. As an example, I use the graph depicted in the Tree Recipes section of the TinkerPop documentation:
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV(id, 'A').as('a').
......1> addV(id, 'B').as('b').
......2> addV(id, 'C').as('c').
......3> addV(id, 'D').as('d').
......4> addV(id, 'E').as('e').
......5> addV(id, 'F').as('f').
......6> addV(id, 'G').as('g').
......7> addE('hasParent').from('a').to('b').
......8> addE('hasParent').from('b').to('c').
......9> addE('hasParent').from('d').to('c').
.....10> addE('hasParent').from('c').to('e').
.....11> addE('hasParent').from('e').to('f').
.....12> addE('hasParent').from('g').to('f').iterate()
gremlin> g.V('F').repeat(__.in('hasParent')).emit().count()
==>6
gremlin> g.V('C').repeat(__.in('hasParent')).emit().count()
==>3
gremlin> g.V('A').repeat(__.in('hasParent')).emit().count()
==>0
The key to getting the count is in the use of emit() which allows all the traversers encountered in the repeat() to be counted.
Just for comparison to what kind of speed you can get with TinkerGraph (in-memory) I generated a 400,000 vertex deep tree:
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> lastV = g.addV().next()
==>v[0]
gremlin> (0..<400000).each{lastV=g.V(lastV).as('f').addV().as('t').addE('next').from('f').to('t').select('t').next()}
==>0
==>1
==>2
==>3
...
gremlin> graph
==>tinkergraph[vertices:400001 edges:400000]
gremlin> clockWithResult{ g.V(0L).repeat(__.out('next')).emit().count().next() }
==>171.44102253
==>400000
Done in 171ms. TinkerGraph is obviously faster as it holds its data purely in-memory. Titan/JanusGraph and other graphs have to read from disk.