How can I use until in janusgraph? - gremlin

gremlin> a = graph.addVertex("name", "alice")
gremlin> b = graph.addVertex("name", "bobby")
gremlin> c = graph.addVertex("name", "cindy")
gremlin> d = graph.addVertex("name", "david")
gremlin> e = graph.addVertex("name", "eliza")
gremlin> a.addEdge("rates",b,"tag","ruby","value",9)
gremlin> b.addEdge("rates",c,"tag","ruby","value",8)
gremlin> c.addEdge("rates",d,"tag","ruby","value",7)
gremlin> d.addEdge("rates",e,"tag","ruby","value",6)
gremlin> e.addEdge("rates",a,"tag","java","value",9)
g.V().has('name', 'alice').repeat(out()).times(6).cyclicPath().path().by('name')
I want to end with alice node. and I want to repeat all the step not want to specify time as 6. The requirement is I want to get all the loop from alice or get all the loops from the graph.

You can refer to the Cycle Detection section in the TinkerPop Recipes - it adapts fairly easily to your sample graph:
gremlin> g.V().has('name', 'alice').as('a').
......1> repeat(out().simplePath()).
......2> emit(loops().is(gt(1))).
......3> both().where(eq('a')).
......4> path().
......5> by('name').
......6> dedup().
......7> by(unfold().order().dedup().fold())
==>[alice,bobby,cindy,david,eliza,alice]

Related

Why can't I drop vertex properties?

I can't drop any properties, they're just stubbornly remaining. What am I doing wrong?
gremlin> g.V(28712).valueMap()
==>{entity_id=[256a631b-3c19-49b7-84f3-01911d66c744], prop=[0]}
gremlin> g.V().properties('prop').drop()
gremlin> g.V(28712).valueMap()
==>{entity_id=[256a631b-3c19-49b7-84f3-01911d66c744], prop=[0]}
gremlin> g.V().properties('prop').drop().iterate()
gremlin> g.V(28712).valueMap()
==>{entity_id=[256a631b-3c19-49b7-84f3-01911d66c744], prop=[0]}
gremlin> g.V(28712).properties('prop').drop()
gremlin> g.V(28712).valueMap()
==>{entity_id=[256a631b-3c19-49b7-84f3-01911d66c744], prop=[0]}
gremlin> g.V(28712).properties('prop').drop().iterate()
gremlin> g.V(28712).valueMap()
==>{entity_id=[256a631b-3c19-49b7-84f3-01911d66c744], prop=[0]}
What's doubly strange is that it seems to show the drop was successful (there appears to be no vertices with the property prop), but calling valueMap on a specific vertex shows otherwise.
gremlin> g.V().properties('prop').drop()
gremlin> g.V().has('prop').count()
==>0
gremlin> g.V().has('prop', 0).count()
==>0
gremlin> g.V(28712).valueMap()
==>{entity_id=[256a631b-3c19-49b7-84f3-01911d66c744], prop=[0]}

Grouping on properties from different verticies

I have a graph that looks something like the following:
pathway -> pathway_component -> gene -> organism
You can make an example graph like so:
m1 = g.addV('pathway').property('pathway_name', 'M00002').next()
m2 = g.addV('pathway').property('pathway_name', 'M00527').next()
c1 = g.addV('pathway_component').property('name', 'K00001').next()
c2 = g.addV('pathway_component').property('name', 'K00002').next()
c3 = g.addV('pathway_component').property('name', 'K00003').next()
g.addE('partof').from(c1).to(m1).iterate()
g.addE('partof').from(c2).to(m1).iterate()
g.addE('partof').from(c3).to(m2).iterate()
g1 = g.addV('gene').property('name', 'G00001').next()
g2 = g.addV('gene').property('name', 'G00002').next()
g3 = g.addV('gene').property('name', 'G00003').next()
g4 = g.addV('gene').property('name', 'G00004').next()
g5 = g.addV('gene').property('name', 'G00005').next()
g6 = g.addV('gene').property('name', 'G00006').next()
g7 = g.addV('gene').property('name', 'G00007').next()
g8 = g.addV('gene').property('name', 'G00008').next()
g.addE('isa').from(g1).to(c1).iterate()
g.addE('isa').from(g2).to(c3).iterate()
g.addE('isa').from(g3).to(c1).iterate()
g.addE('isa').from(g4).to(c2).iterate()
g.addE('isa').from(g5).to(c3).iterate()
g.addE('isa').from(g6).to(c1).iterate()
g.addE('isa').from(g7).to(c1).iterate()
g.addE('isa').from(g8).to(c2).iterate()
o1 = g.addV('organism').property('name', 'O000001').next()
o2 = g.addV('organism').property('name', 'O000002').next()
o3 = g.addV('organism').property('name', 'O000003').next()
o4 = g.addV('organism').property('name', 'O000004').next()
g.addE('partof').from(g1).to(o1).iterate()
g.addE('partof').from(g2).to(o1).iterate()
g.addE('partof').from(g3).to(o2).iterate()
g.addE('partof').from(g4).to(o2).iterate()
g.addE('partof').from(g5).to(o3).iterate()
g.addE('partof').from(g6).to(o3).iterate()
g.addE('partof').from(g7).to(o4).iterate()
g.addE('partof').from(g8).to(o4).iterate()
I'd like to count the genes per pathway per organism, so that the results look something like:
organism_1 pathway_1 gene_count
organism_1 pathway_2 gene_count
organism_2 pathway_1 gene_count
organism_2 pathway_2 gene_count
But so far I haven't figured it out. I tried the following:
g.V().has('pathway', 'pathway_name', within('M00002', 'M00527')).project('organism', 'pathway', 'count').
by(__.in().hasLabel('pathway_component').
in().hasLabel('gene').
out().hasLabel('organism').
values('name')).
by('pathway_name').
by(__.in().hasLabel('pathway_component').
in().hasLabel('gene').
count())
But it looks like the grouping is wrong:
==>[organism:O000001,pathway:M00002,count:6]
==>[organism:O000001,pathway:M00527,count:2]
In this case it seems like all of the organisms and their counts are being grouped together (there are four organisms) for the two pathways listed. I'd expect to see something like:
O000001 M00002 1
O000001 M00527 1
O000002 M00002 2
O000002 M00527 0
O000003 M00002 1
O000003 M00527 1
O000004 M00002 2
O000004 M00527 0
How can I split out the results by both different organisms and different pathways?
Hopefully the final query below helps. I showed the steps I used to get there, part of which was making sure I understood the structure of your data.
First I wanted to see the shape of the graph.
gremlin> g.V().hasLabel('pathway').
......1> in().hasLabel('pathway_component').
......2> in().hasLabel('gene').
......3> out().hasLabel('organism').
......4> path().
......5> by('pathway_name').
......6> by('name').
......7> by('name').
......8> by('name')
==>[M00002,K00001,G00006,O000003]
==>[M00002,K00001,G00007,O000004]
==>[M00002,K00001,G00001,O000001]
==>[M00002,K00001,G00003,O000002]
==>[M00002,K00002,G00004,O000002]
==>[M00002,K00002,G00008,O000004]
==>[M00527,K00003,G00005,O000003]
==>[M00527,K00003,G00002,O000001]
Then I used path and group to learn a bit more about these relationship groupings.
gremlin> g.V().hasLabel('pathway').
......1> in().hasLabel('pathway_component').
......2> in().hasLabel('gene').
......3> out().hasLabel('organism').as('org').
......4> group().
......5> by(select('org').by('name')).
......6> by(
......7> path().
......8> by('pathway_name').
......9> by('name').
.....10> by('name').
.....11> by('name').fold()).
.....12> unfold()
==>O000004=[path[M00002, K00001, G00007, O000004], path[M00002, K00002, G00008, O000004]]
==>O000003=[path[M00002, K00001, G00006, O000003], path[M00527, K00003, G00005, O000003]]
==>O000002=[path[M00002, K00001, G00003, O000002], path[M00002, K00002, G00004, O000002]]
==>O000001=[path[M00002, K00001, G00001, O000001], path[M00527, K00003, G00002, O000001]]
Finally I changed the above query to nest two groups
gremlin> g.V().hasLabel('pathway').as('pathway').
......1> in().hasLabel('pathway_component').
......2> in().hasLabel('gene').as('gene').
......3> out().hasLabel('organism').as('org').
......4> group().
......5> by(select('org').by('name')).
......6> by(
......7> group().
......8> by(select('pathway').by('pathway_name')).
......9> by(select('gene').by('name').fold())).
.....10> unfold()
==>O000004={M00002=[G00007, G00008]}
==>O000003={M00002=[G00006], M00527=[G00005]}
==>O000002={M00002=[G00003, G00004]}
==>O000001={M00002=[G00001], M00527=[G00002]}
This yields the organism, the pathway name and the genes.
Building on that I changed the query again to generate the counts. I hope this is close to what you needed.
gremlin> g.V().hasLabel('pathway').as('pathway').
......1> in().hasLabel('pathway_component').
......2> in().hasLabel('gene').as('gene').
......3> out().hasLabel('organism').as('org').
......4> group().
......5> by(select('org').by('name')).
......6> by(
......7> group().
......8> by(select('pathway').by('pathway_name')).
......9> by(select('gene').by('name').fold().count(local))).
.....10> unfold()
==>O000004={M00002=2}
==>O000003={M00002=1, M00527=1}
==>O000002={M00002=2}
==>O000001={M00002=1, M00527=1}

TinkerPop: Generic Query to combine and filter multiple traversals

Sample data: TinkerPop Modern Graph
Conditions:
Is vadas connected to lop within 2 hops
Is vadas connected to peter within 3 hops
Is vadas connected to does-not-exists in 1 hops (a search that wont give any results)
Dummy searches with expected results
Conditions 1 AND 2
=> [vadas-marko-lop, vadas-marko-lop-peter]
Conditions 1 OR 3
=> [vadas-marko-lop]
What I was able to get
Conditions 1 AND 2
gremlin> g.V().has("person", "name", "vadas").as("from")
.select("from").as("to1").repeat(both().as("to1")).times(2).emit().has("software", "name", "lop")
.select("from").as("to2").repeat(both().as("to2")).times(3).emit().has("person", "name", "peter")
.project("a", "b")
.by(select(all, "to1").unfold().values("name").fold())
.by(select(all, "to2").unfold().values("name").fold())
==>[a:[vadas,marko,lop],b:[vadas,marko,lop,peter]]
Conditions 1 OR 2
gremlin> g.V().has("person", "name", "vadas").as("nodes")
.union(repeat(both().as("nodes")).times(2).emit().has("software", "name", "lop"),
out().has("x", "y", "does-not-exist").as("nodes"))
.project("a")
.by(select(all, "nodes").unfold().values("name").fold())
==>[a:[vadas,marko,lop]]
So how to achieve this I have two different query formats, is there a way to writer a query format that can do both?
And this did not work, anything wrong here? Does not return the nodes that have been traversed
g.V().has("person", "name", "vadas").as("nodes")
.or(
repeat(both().as("nodes")).times(2).emit().has("software", "name", "lop"),
repeat(both().as("nodes")).times(3).emit().has("person", "name", "peter")
)
.project("a").by(select(all, "nodes").unfold().values("name").fold())
==>[a:[vadas]]
// Expect paths to be printed here vadas..lop, vadas...peter
I don't know if I understand what you're after, but if you just need something like a query template, then maybe this will help:
gremlin> conditions = [
......1> [filter: {has("software", "name", "lop")}, distance: 2],
......2> [filter: {has("person", "name", "peter")}, distance: 3],
......3> [filter: {has("x", "y", "does-not-exist")}, distance: 1]]
==>[filter:groovysh_evaluate$_run_closure1#378bd86d,distance:2]
==>[filter:groovysh_evaluate$_run_closure2#2189e7a7,distance:3]
==>[filter:groovysh_evaluate$_run_closure3#69b2f8e5,distance:1]
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().has("person", "name", "vadas").
......1> union(repeat(both().simplePath()).
......2> times(conditions[0].distance).
......3> emit().
......4> filter(conditions[0].filter()).store("x"),
......5> repeat(both().simplePath()).
......6> times(conditions[1].distance).
......7> emit().
......8> filter(conditions[1].filter()).store("x")).
......9> barrier().
.....10> filter(select("x").
.....11> and(unfold().filter(conditions[0].filter()),
.....12> unfold().filter(conditions[1].filter()))).
.....13> path().
.....14> by("name")
==>[vadas,marko,lop]
==>[vadas,marko,lop,peter]
gremlin> g.V().has("person", "name", "vadas").
......1> union(repeat(both().simplePath()).
......2> times(conditions[0].distance).
......3> emit().
......4> filter(conditions[0].filter()).store("x"),
......5> repeat(both().simplePath()).
......6> times(conditions[2].distance).
......7> emit().
......8> filter(conditions[2].filter()).store("x")).
......9> barrier().
.....10> filter(select("x").
.....11> or(unfold().filter(conditions[0].filter()),
.....12> unfold().filter(conditions[2].filter()))).
.....13> path().
.....14> by("name")
==>[vadas,marko,lop]
And a little more abstraction should make it clearer that the two queries only differ in 1 step (and vs or):
apply = { condition ->
repeat(both().simplePath()).
times(condition.distance).
emit().
filter(condition.filter()).store("x")
}
verify = { condition ->
unfold().filter(condition.filter())
}
// condition 1 AND 2
g.V().has("person", "name", "vadas").
union(apply(conditions[0]),
apply(conditions[1])).
barrier().
filter(select("x").
and(verify(conditions[0]),
verify(conditions[1]))).
path().
by("name")
// condition 1 OR 3
g.V().has("person", "name", "vadas").
union(apply(conditions[0]),
apply(conditions[2])).
barrier().
filter(select("x").
or(verify(conditions[0]),
verify(conditions[2]))).
path().
by("name")

How can I get the combined script result for janusgraph?

Graph is below:
gremlin> a = graph.addVertex("name", "alice")
gremlin> b = graph.addVertex("name", "bobby")
gremlin> c = graph.addVertex("name", "cindy")
gremlin> d = graph.addVertex("name", "david")
gremlin> e = graph.addVertex("name", "eliza")
gremlin> a.addEdge("rates",b,"tag","ruby","value",9)
gremlin> b.addEdge("rates",c,"tag","ruby","value",8)
gremlin> c.addEdge("rates",d,"tag","ruby","value",7)
gremlin> d.addEdge("rates",e,"tag","ruby","value",6)
gremlin> e.addEdge("rates",a,"tag","java","value",10)
I have 3 scripts below:
Script #1
gremlin> g.V().has('name','alice').
repeat(out()).
until(has('name','alice')).
cyclicPath().
path().by('name')`
==>[alice,bobby,cindy,david,eliza,alice]
Script #2
gremlin> g.V().has('name','alice').
repeat(outE().inV()).
until(has('name','alice')).
cyclicPath().
group().
by('name').
by(path().unfold().has('value').values('value').fold()).
next()
==>alice=[9, 8, 7, 6, 10]
Script #3
gremlin> g.V().has('name','alice').
repeat(outE().inV()).
until(has('name','alice')).
cyclicPath().
group().
by('name').
by(path().unfold().has('value').values('value').fold()).
next().collect { k, v ->
k + '=' + v.withIndex().collect { Integer it, Integer idx ->
return it * (1/(idx + 1))
}.inject(0.0) { acc,i -> acc + i }
}
==>alice=18.8333333331
My question is, how can I get the result as below listed? Just combine the 3
alice=[alice,bobby,cindy,david,eliza,alice]=[9, 8, 7, 6, 10]=18.8333333331
It's probably much easier or at least more maintainable to execute 3 queries and then merge the results as suggested by David. However, if you want to do it all in a single query, you can:
g.V().has('name','alice').as('v').
repeat(outE().as('e').inV().as('v')).
until(has('name','alice')).
store('a').
by('name').
store('a').
by(select(all, 'v').unfold().values('name').fold()).
store('a').
by(select(all, 'e').unfold().
store('x').
by(union(values('value'),
select('x').count(local)).fold()).
cap('x').
store('a').
by(unfold().limit(local, 1).fold()).unfold().
sack(assign).
by(constant(1d)).
sack(div).
by(union(constant(1d),
tail(local, 1)).sum()).
sack(mult).
by(limit(local, 1)).
sack().sum()).
cap('a')
Using your sample graph:
gremlin> g.V().has('name','alice').as('v').
......1> repeat(outE().as('e').inV().as('v')).
......2> until(has('name','alice')).
......3> store('a').
......4> by('name').
......5> store('a').
......6> by(select(all, 'v').unfold().values('name').fold()).
......7> store('a').
......8> by(select(all, 'e').unfold().
......9> store('x').
.....10> by(union(values('value'),
.....11> select('x').count(local)).fold()).
.....12> cap('x').
.....13> store('a').
.....14> by(unfold().limit(local, 1).fold()).unfold().
.....15> sack(assign).
.....16> by(constant(1d)).
.....17> sack(div).
.....18> by(union(constant(1d),
.....19> tail(local, 1)).sum()).
.....20> sack(mult).
.....21> by(limit(local, 1)).
.....22> sack().sum()).
.....23> cap('a')
==>[alice,[alice,bobby,cindy,david,eliza,alice],[9,8,7,6,10],18.833333333333332]
It has some benefits to do it all in a single query, especially as you don't have to traverse the same path over and over again, but again, it's hard to maintain such complex queries. It's probably better to just return the full path and then build the expected result on the client side.
Gremlin code is executed in a Groovy executor, so all Groovy operators are valid here. You can add your results to a list and return the list, i.e. def l = []; l << result1; l << result2; l;.

how can we get 2 vertexes path?

Now I have the query below and I would like to get all the edge and It is interesting why we got 2 same path and I want to get the path detail. How can I implement?
Vertex fromNode = g.V().has('name', 'alice').next();Vertex toNode = g.V().has('name', 'bobby').next();g.V(fromNode).repeat(both().simplePath()).until(is(toNode)).path()
==>[v[4224],v[40964296]]
==>[v[4224],v[40964296]]
==>[v[4224],v[4144],v[40964256],v[4096],v[40964296]]
We have the Graph below.
gremlin> a = graph.addVertex("name", "alice")
==>v[4208]
gremlin> b = graph.addVertex("name", "bobby")
==>v[40968424]
gremlin> c = graph.addVertex("name", "cindy")
==>v[4192]
gremlin> d = graph.addVertex("name", "david")
==>v[40972520]
gremlin> e = graph.addVertex("name", "eliza")
==>v[40964272]
gremlin> a.addEdge("rates",b,"tag","ruby","value",9)
==>e[2ry-38w-azv9-oe3fs][4208-rates->40968424]
gremlin> b.addEdge("rates",c,"tag","ruby","value",8)
==>e[odzq5-oe3fs-azv9-38g][40968424-rates->4192]
gremlin> c.addEdge("rates",d,"tag","ruby","value",7)
==>e[170-38g-azv9-oe6lk][4192-rates->40972520]
gremlin> d.addEdge("rates",e,"tag","ruby","value",6)
==>e[oe04d-oe6lk-azv9-oe08g][40972520-rates->40964272]
gremlin> a.addEdge("rates",e,"tag","java","value",9)
==>e[366-38w-azv9-oe08g][4208-rates->40964272]
gremlin> g.E().values("tag")
==>ruby
==>ruby
==>ruby
==>ruby
==>java
gremlin> graph.tx().commit()
I would like to get the path detail like below:
==>bobby=[v[0], e[10][0-rates->2], v[2]]
==>cindy=[v[0], e[10][0-rates->2], v[2], e[11][2-rates->4], v[4]]
==>david=[v[0], e[10][0-rates->2], v[2], e[11][2-rates->4], v[4], e[12][4-rates->6], v[6]]
You just need to specify that you want the edges as well by traversing over them explicitly:
gremlin> g.V(fromNode).repeat(bothE().otherV().simplePath()).until(is(toNode)).path()
==>[v[0],e[10][0-rates->2],v[2]]
==>[v[0],e[14][0-rates->8],v[8],e[13][6-rates->8],v[6],e[12][4-rates->6],v[4],e[11][2-rates->4],v[2]]

Resources